From igor.veresov at oracle.com Wed Jul 1 02:06:41 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Jun 2020 19:06:41 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: I think you forgot to include changes to BoolNode in the webrev. igor > On Jun 30, 2020, at 11:04 AM, Boris Ulasevich wrote: > > Hi Claes, > > > Seems like the optimization is mostly effective, but not getting all the way. > > Good point about LHS, thanks! CmpL turned to be not canonized on the moment. > I moved the optimization to CmpLNode::Ideal and transformations now works as follows: > 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > 2. BoolNode::Ideal: Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > > I applied your test to the benchmark. The result is: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op > SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op > > Updated webrev: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > > thanks, > Boris > > On 26.06.2020 21:31, Claes Redestad wrote: >> Hi Boris, >> >> this looks like a nice improvement! I just have some comments about the >> micro. >> >> I was curious whether the optimization works when the constant is on >> the LHS and added a variant of the micro to try that[1]. Results are >> interesting (Intel Xeon): >> >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op >> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op >> >> With your patch: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op >> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op >> >> Seems like the optimization is mostly effective, but not getting all >> the way. I wouldn't worry about it for this RFE, but perhaps something >> to investigate in a follow-up. Feel free to include such a variant in >> your patch though (no attribution necessary). >> >> The micro also stabilizes very quickly, so you might want to provide >> some default tuning to keep runtime in check, e.g., something like: >> >> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) >> @Fork(3) >> >> Thanks! >> >> /Claes >> >> [1] >> @Benchmark >> public int skipCastTestLeft() { >> for (int i = 0; i < ARRAYSIZE_L; i++) { >> if (ARRAYSIZE_L == intValues[i]) { >> return i; >> } >> } >> return 0; >> } >> >> On 2020-06-26 17:05, Boris Ulasevich wrote: >>> Hi all, >>> >>> Please review the change to eliminate the unnecessary i2l conversion >>> for expressions like this: "if (intValue == 1L)". >>> >>> http://bugs.openjdk.java.net/browse/JDK-8248043 >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >>> >>> The provided benchmark shows performance boost on all platforms: >>> - Intel Xeon: 32.705 --> 14.234 ns/op >>> - arm64: 42.060 --> 25.456 ns/op >>> - arm32: 618.763 --> 314.040 ns/op >>> - ppc8: 81.218 --> 63.026 ns/op >>> >>> Testing done: jtreg, jck. >>> >>> thanks, >>> Boris > From vladimir.kozlov at oracle.com Wed Jul 1 03:12:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Jun 2020 20:12:00 -0700 Subject: [16] RFR(T) 8005088: remove unused NativeInstruction::test methods Message-ID: <42cfbb51-4fe4-382f-6e8d-f740890df2db@oracle.com> https://cr.openjdk.java.net/~kvn/8005088/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8005088 Only SPARC implemented NativeInstruction::test() methods [1]. And it is was removed in JDK 15 with SPARC port. I think we can remove them on other platforms where they are not implemented. If someone wants to recreate test, they should do that as gtest test(s). Thanks, Vladimir [1] https://hg.openjdk.java.net/jdk/jdk14/file/6c954123ee8d/src/hotspot/cpu/sparc/nativeInst_sparc.cpp#l356 From igor.ignatyev at oracle.com Wed Jul 1 03:15:11 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 30 Jun 2020 20:15:11 -0700 Subject: [16] RFR(T) 8005088: remove unused NativeInstruction::test methods In-Reply-To: <42cfbb51-4fe4-382f-6e8d-f740890df2db@oracle.com> References: <42cfbb51-4fe4-382f-6e8d-f740890df2db@oracle.com> Message-ID: <6CBA6B2A-51A4-445E-A949-60132363E079@oracle.com> Hi Vladimir, the years in copyright notices should be updated, otherwise looks good to me, thanks for taking care of it. -- Igor > On Jun 30, 2020, at 8:12 PM, Vladimir Kozlov wrote: > > https://cr.openjdk.java.net/~kvn/8005088/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8005088 > > Only SPARC implemented NativeInstruction::test() methods [1]. And it is was removed in JDK 15 with SPARC port. > > I think we can remove them on other platforms where they are not implemented. > If someone wants to recreate test, they should do that as gtest test(s). > > Thanks, > Vladimir > > [1] https://hg.openjdk.java.net/jdk/jdk14/file/6c954123ee8d/src/hotspot/cpu/sparc/nativeInst_sparc.cpp#l356 From vladimir.kozlov at oracle.com Wed Jul 1 03:23:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Jun 2020 20:23:24 -0700 Subject: [16] RFR(T) 8005088: remove unused NativeInstruction::test methods In-Reply-To: <6CBA6B2A-51A4-445E-A949-60132363E079@oracle.com> References: <42cfbb51-4fe4-382f-6e8d-f740890df2db@oracle.com> <6CBA6B2A-51A4-445E-A949-60132363E079@oracle.com> Message-ID: Thank you, Igor On 6/30/20 8:15 PM, Igor Ignatyev wrote: > Hi Vladimir, > > the years in copyright notices should be updated, otherwise looks good to me, thanks for taking care of it. Updated webrev.00 with correct years. Vladimir > > -- Igor > >> On Jun 30, 2020, at 8:12 PM, Vladimir Kozlov wrote: >> >> https://cr.openjdk.java.net/~kvn/8005088/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8005088 >> >> Only SPARC implemented NativeInstruction::test() methods [1]. And it is was removed in JDK 15 with SPARC port. >> >> I think we can remove them on other platforms where they are not implemented. >> If someone wants to recreate test, they should do that as gtest test(s). >> >> Thanks, >> Vladimir >> >> [1] https://hg.openjdk.java.net/jdk/jdk14/file/6c954123ee8d/src/hotspot/cpu/sparc/nativeInst_sparc.cpp#l356 > From boris.ulasevich at bell-sw.com Wed Jul 1 04:33:04 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 1 Jul 2020 07:33:04 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: Hi Igor, By BoolNode I mean the canonicalization that is already in place: https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 thanks, Boris On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov wrote: > I think you forgot to include changes to BoolNode in the webrev. > > igor > > > > On Jun 30, 2020, at 11:04 AM, Boris Ulasevich > wrote: > > Hi Claes, > > > Seems like the optimization is mostly effective, but not getting all the > way. > > Good point about LHS, thanks! CmpL turned to be not canonized on the > moment. > I moved the optimization to CmpLNode::Ideal and transformations now works > as follows: > 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > 2. BoolNode::Ideal: > Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > > I applied your test to the benchmark. The result is: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op > SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op > > Updated webrev: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > > thanks, > Boris > > On 26.06.2020 21:31, Claes Redestad wrote: > > Hi Boris, > > this looks like a nice improvement! I just have some comments about the > micro. > > I was curious whether the optimization works when the constant is on > the LHS and added a variant of the micro to try that[1]. Results are > interesting (Intel Xeon): > > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op > SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op > > With your patch: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op > SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op > > Seems like the optimization is mostly effective, but not getting all > the way. I wouldn't worry about it for this RFE, but perhaps something > to investigate in a follow-up. Feel free to include such a variant in > your patch though (no attribution necessary). > > The micro also stabilizes very quickly, so you might want to provide > some default tuning to keep runtime in check, e.g., something like: > > @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) > @Fork(3) > > Thanks! > > /Claes > > [1] > @Benchmark > public int skipCastTestLeft() { > for (int i = 0; i < ARRAYSIZE_L; i++) { > if (ARRAYSIZE_L == intValues[i]) { > return i; > } > } > return 0; > } > > On 2020-06-26 17:05, Boris Ulasevich wrote: > > Hi all, > > Please review the change to eliminate the unnecessary i2l conversion > for expressions like this: "if (intValue == 1L)". > > http://bugs.openjdk.java.net/browse/JDK-8248043 > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 > > The provided benchmark shows performance boost on all platforms: > - Intel Xeon: 32.705 --> 14.234 ns/op > - arm64: 42.060 --> 25.456 ns/op > - arm32: 618.763 --> 314.040 ns/op > - ppc8: 81.218 --> 63.026 ns/op > > Testing done: jtreg, jck. > > thanks, > Boris > > > > From boris.ulasevich at bell-sw.com Wed Jul 1 04:38:36 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 1 Jul 2020 07:38:36 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <15778367-9a55-8fd2-353b-21927650125d@oracle.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> <13148df7-502b-bd1e-5aa0-fb7a9244cddc@oracle.com> <15778367-9a55-8fd2-353b-21927650125d@oracle.com> Message-ID: Hi Claes, Ok. Thank you for your review! Best regards, Boris On Wed, Jul 1, 2020 at 12:30 AM Claes Redestad wrote: > +1 > > Maybe add tests for reversed variants to TestSkipLongToIntCast too? No > need for a new webrev if you do. > > /Claes > > On 2020-06-30 23:13, Vladimir Kozlov wrote: > > Good optimization. Reviewed. > > > > Thanks, > > Vladimir > > > > On 6/30/20 11:04 AM, Boris Ulasevich wrote: > >> Hi Claes, > >> > >> > Seems like the optimization is mostly effective, but not getting > >> all the way. > >> > >> Good point about LHS, thanks! CmpL turned to be not canonized on the > >> moment. > >> I moved the optimization to CmpLNode::Ideal and transformations now > >> works as follows: > >> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > >> 2. BoolNode::Ideal: > >> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > >> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > >> > >> I applied your test to the benchmark. The result is: > >> Benchmark Mode Cnt Score Error Units > >> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op > >> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op > >> > >> Updated webrev: > >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > >> > >> thanks, > >> Boris > >> > >> On 26.06.2020 21:31, Claes Redestad wrote: > >>> Hi Boris, > >>> > >>> this looks like a nice improvement! I just have some comments about the > >>> micro. > >>> > >>> I was curious whether the optimization works when the constant is on > >>> the LHS and added a variant of the micro to try that[1]. Results are > >>> interesting (Intel Xeon): > >>> > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op > >>> > >>> With your patch: > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op > >>> > >>> Seems like the optimization is mostly effective, but not getting all > >>> the way. I wouldn't worry about it for this RFE, but perhaps something > >>> to investigate in a follow-up. Feel free to include such a variant in > >>> your patch though (no attribution necessary). > >>> > >>> The micro also stabilizes very quickly, so you might want to provide > >>> some default tuning to keep runtime in check, e.g., something like: > >>> > >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > >>> @Measurement(iterations = 5, time = 1000, timeUnit = > >>> TimeUnit.MILLISECONDS) > >>> @Fork(3) > >>> > >>> Thanks! > >>> > >>> /Claes > >>> > >>> [1] > >>> @Benchmark > >>> public int skipCastTestLeft() { > >>> for (int i = 0; i < ARRAYSIZE_L; i++) { > >>> if (ARRAYSIZE_L == intValues[i]) { > >>> return i; > >>> } > >>> } > >>> return 0; > >>> } > >>> > >>> On 2020-06-26 17:05, Boris Ulasevich wrote: > >>>> Hi all, > >>>> > >>>> Please review the change to eliminate the unnecessary i2l conversion > >>>> for expressions like this: "if (intValue == 1L)". > >>>> > >>>> http://bugs.openjdk.java.net/browse/JDK-8248043 > >>>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 > >>>> > >>>> The provided benchmark shows performance boost on all platforms: > >>>> - Intel Xeon: 32.705 --> 14.234 ns/op > >>>> - arm64: 42.060 --> 25.456 ns/op > >>>> - arm32: 618.763 --> 314.040 ns/op > >>>> - ppc8: 81.218 --> 63.026 ns/op > >>>> > >>>> Testing done: jtreg, jck. > >>>> > >>>> thanks, > >>>> Boris > >> > From vladimir.kozlov at oracle.com Wed Jul 1 05:15:02 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 30 Jun 2020 22:15:02 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: I think Igor said that you can't swap arguments of compare without changing condition test. For example, if it was CC_LT it should be CC_GT after swap. It is not clear why you need swapping in CmpLNode::Ideal() if BoolNode::Ideal() should do it already. If it does not you need to investigate why. Also your list of steps 1.-3. does not reflect changes in webrev.02b: http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/src/hotspot/share/opto/subnode.cpp.udiff.html Regards, Vladimir On 6/30/20 9:33 PM, Boris Ulasevich wrote: > Hi Igor, > > By BoolNode I mean the canonicalization that is already in place: > https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 > > thanks, > Boris > > On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov wrote: > >> I think you forgot to include changes to BoolNode in the webrev. >> >> igor >> >> >> >> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich >> wrote: >> >> Hi Claes, >> >>> Seems like the optimization is mostly effective, but not getting all the >> way. >> >> Good point about LHS, thanks! CmpL turned to be not canonized on the >> moment. >> I moved the optimization to CmpLNode::Ideal and transformations now works >> as follows: >> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >> 2. BoolNode::Ideal: >> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >> >> I applied your test to the benchmark. The result is: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op >> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op >> >> Updated webrev: >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >> >> thanks, >> Boris >> >> On 26.06.2020 21:31, Claes Redestad wrote: >> >> Hi Boris, >> >> this looks like a nice improvement! I just have some comments about the >> micro. >> >> I was curious whether the optimization works when the constant is on >> the LHS and added a variant of the micro to try that[1]. Results are >> interesting (Intel Xeon): >> >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op >> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op >> >> With your patch: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op >> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op >> >> Seems like the optimization is mostly effective, but not getting all >> the way. I wouldn't worry about it for this RFE, but perhaps something >> to investigate in a follow-up. Feel free to include such a variant in >> your patch though (no attribution necessary). >> >> The micro also stabilizes very quickly, so you might want to provide >> some default tuning to keep runtime in check, e.g., something like: >> >> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) >> @Fork(3) >> >> Thanks! >> >> /Claes >> >> [1] >> @Benchmark >> public int skipCastTestLeft() { >> for (int i = 0; i < ARRAYSIZE_L; i++) { >> if (ARRAYSIZE_L == intValues[i]) { >> return i; >> } >> } >> return 0; >> } >> >> On 2020-06-26 17:05, Boris Ulasevich wrote: >> >> Hi all, >> >> Please review the change to eliminate the unnecessary i2l conversion >> for expressions like this: "if (intValue == 1L)". >> >> http://bugs.openjdk.java.net/browse/JDK-8248043 >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >> >> The provided benchmark shows performance boost on all platforms: >> - Intel Xeon: 32.705 --> 14.234 ns/op >> - arm64: 42.060 --> 25.456 ns/op >> - arm32: 618.763 --> 314.040 ns/op >> - ppc8: 81.218 --> 63.026 ns/op >> >> Testing done: jtreg, jck. >> >> thanks, >> Boris >> >> >> >> From igor.veresov at oracle.com Wed Jul 1 05:28:48 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Tue, 30 Jun 2020 22:28:48 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: > On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov wrote: > > I think Igor said that you can't swap arguments of compare without changing condition test. For example, if it was CC_LT it should be CC_GT after swap. Yes, that?s exactly what I had in mind. Condition must be inverted. Otherwise your transformation [3] is not valid for anything else but equality, so that?s not going to work. May be if [3] didn?t work, perhaps there is another user of the CmpLNode in addition to BoolNode ? igor > > It is not clear why you need swapping in CmpLNode::Ideal() if BoolNode::Ideal() should do it already. If it does not you need to investigate why. > > Also your list of steps 1.-3. does not reflect changes in webrev.02b: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/src/hotspot/share/opto/subnode.cpp.udiff.html > > Regards, > Vladimir > > On 6/30/20 9:33 PM, Boris Ulasevich wrote: >> Hi Igor, >> By BoolNode I mean the canonicalization that is already in place: >> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 >> thanks, >> Boris >> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov wrote: >>> I think you forgot to include changes to BoolNode in the webrev. >>> >>> igor >>> >>> >>> >>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich >>> wrote: >>> >>> Hi Claes, >>> >>>> Seems like the optimization is mostly effective, but not getting all the >>> way. >>> >>> Good point about LHS, thanks! CmpL turned to be not canonized on the >>> moment. >>> I moved the optimization to CmpLNode::Ideal and transformations now works >>> as follows: >>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >>> 2. BoolNode::Ideal: >>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >>> >>> I applied your test to the benchmark. The result is: >>> Benchmark Mode Cnt Score Error Units >>> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op >>> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op >>> >>> Updated webrev: >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >>> >>> thanks, >>> Boris >>> >>> On 26.06.2020 21:31, Claes Redestad wrote: >>> >>> Hi Boris, >>> >>> this looks like a nice improvement! I just have some comments about the >>> micro. >>> >>> I was curious whether the optimization works when the constant is on >>> the LHS and added a variant of the micro to try that[1]. Results are >>> interesting (Intel Xeon): >>> >>> Benchmark Mode Cnt Score Error Units >>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op >>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op >>> >>> With your patch: >>> Benchmark Mode Cnt Score Error Units >>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op >>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op >>> >>> Seems like the optimization is mostly effective, but not getting all >>> the way. I wouldn't worry about it for this RFE, but perhaps something >>> to investigate in a follow-up. Feel free to include such a variant in >>> your patch though (no attribution necessary). >>> >>> The micro also stabilizes very quickly, so you might want to provide >>> some default tuning to keep runtime in check, e.g., something like: >>> >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >>> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) >>> @Fork(3) >>> >>> Thanks! >>> >>> /Claes >>> >>> [1] >>> @Benchmark >>> public int skipCastTestLeft() { >>> for (int i = 0; i < ARRAYSIZE_L; i++) { >>> if (ARRAYSIZE_L == intValues[i]) { >>> return i; >>> } >>> } >>> return 0; >>> } >>> >>> On 2020-06-26 17:05, Boris Ulasevich wrote: >>> >>> Hi all, >>> >>> Please review the change to eliminate the unnecessary i2l conversion >>> for expressions like this: "if (intValue == 1L)". >>> >>> http://bugs.openjdk.java.net/browse/JDK-8248043 >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >>> >>> The provided benchmark shows performance boost on all platforms: >>> - Intel Xeon: 32.705 --> 14.234 ns/op >>> - arm64: 42.060 --> 25.456 ns/op >>> - arm32: 618.763 --> 314.040 ns/op >>> - ppc8: 81.218 --> 63.026 ns/op >>> >>> Testing done: jtreg, jck. >>> >>> thanks, >>> Boris >>> >>> >>> >>> From boris.ulasevich at bell-sw.com Wed Jul 1 08:51:34 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 1 Jul 2020 11:51:34 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: Hi, I'm deeply sorry. Yes, webrev.02b is certainly wrong! Correct link is webrev.02c: http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02 c - this is the change I described in my mail and wanted to review. my apologies, Boris On Wednesday, July 1, 2020, Igor Veresov wrote: > > On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov < > vladimir.kozlov at oracle.com> wrote: > > > > I think Igor said that you can't swap arguments of compare without > changing condition test. For example, if it was CC_LT it should be CC_GT > after swap. > > Yes, that?s exactly what I had in mind. Condition must be inverted. > Otherwise your transformation [3] is not valid for anything else but > equality, so that?s not going to work. May be if [3] didn?t work, perhaps > there is another user of the CmpLNode in addition to BoolNode ? > > igor > > > > > It is not clear why you need swapping in CmpLNode::Ideal() if > BoolNode::Ideal() should do it already. If it does not you need to > investigate why. > > > > Also your list of steps 1.-3. does not reflect changes in webrev.02b: > > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/ > src/hotspot/share/opto/subnode.cpp.udiff.html > > > > Regards, > > Vladimir > > > > On 6/30/20 9:33 PM, Boris Ulasevich wrote: > >> Hi Igor, > >> By BoolNode I mean the canonicalization that is already in place: > >> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/ > hotspot/share/opto/subnode.cpp#l1391 > >> thanks, > >> Boris > >> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov > wrote: > >>> I think you forgot to include changes to BoolNode in the webrev. > >>> > >>> igor > >>> > >>> > >>> > >>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich < > boris.ulasevich at bell-sw.com> > >>> wrote: > >>> > >>> Hi Claes, > >>> > >>>> Seems like the optimization is mostly effective, but not getting all > the > >>> way. > >>> > >>> Good point about LHS, thanks! CmpL turned to be not canonized on the > >>> moment. > >>> I moved the optimization to CmpLNode::Ideal and transformations now > works > >>> as follows: > >>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > >>> 2. BoolNode::Ideal: > >>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > >>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > >>> > >>> I applied your test to the benchmark. The result is: > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op > >>> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op > >>> > >>> Updated webrev: > >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > >>> > >>> thanks, > >>> Boris > >>> > >>> On 26.06.2020 21:31, Claes Redestad wrote: > >>> > >>> Hi Boris, > >>> > >>> this looks like a nice improvement! I just have some comments about the > >>> micro. > >>> > >>> I was curious whether the optimization works when the constant is on > >>> the LHS and added a variant of the micro to try that[1]. Results are > >>> interesting (Intel Xeon): > >>> > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op > >>> > >>> With your patch: > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op > >>> > >>> Seems like the optimization is mostly effective, but not getting all > >>> the way. I wouldn't worry about it for this RFE, but perhaps something > >>> to investigate in a follow-up. Feel free to include such a variant in > >>> your patch though (no attribution necessary). > >>> > >>> The micro also stabilizes very quickly, so you might want to provide > >>> some default tuning to keep runtime in check, e.g., something like: > >>> > >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > >>> @Measurement(iterations = 5, time = 1000, timeUnit = > TimeUnit.MILLISECONDS) > >>> @Fork(3) > >>> > >>> Thanks! > >>> > >>> /Claes > >>> > >>> [1] > >>> @Benchmark > >>> public int skipCastTestLeft() { > >>> for (int i = 0; i < ARRAYSIZE_L; i++) { > >>> if (ARRAYSIZE_L == intValues[i]) { > >>> return i; > >>> } > >>> } > >>> return 0; > >>> } > >>> > >>> On 2020-06-26 17:05, Boris Ulasevich wrote: > >>> > >>> Hi all, > >>> > >>> Please review the change to eliminate the unnecessary i2l conversion > >>> for expressions like this: "if (intValue == 1L)". > >>> > >>> http://bugs.openjdk.java.net/browse/JDK-8248043 > >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 > >>> > >>> The provided benchmark shows performance boost on all platforms: > >>> - Intel Xeon: 32.705 --> 14.234 ns/op > >>> - arm64: 42.060 --> 25.456 ns/op > >>> - arm32: 618.763 --> 314.040 ns/op > >>> - ppc8: 81.218 --> 63.026 ns/op > >>> > >>> Testing done: jtreg, jck. > >>> > >>> thanks, > >>> Boris > >>> > >>> > >>> > >>> > > From boris.ulasevich at bell-sw.com Wed Jul 1 09:16:31 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 1 Jul 2020 12:16:31 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: Hi, It is the third attempt to send a correct link. Sorry for that ;) http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02c Thanks, Boris On Wednesday, July 1, 2020, Boris Ulasevich wrote: > Hi, > > I'm deeply sorry. Yes, webrev.02b is certainly wrong! > Correct link is webrev.02c: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02 > > c > - this is the change I described in my mail and wanted to review. > > my apologies, > Boris > > On Wednesday, July 1, 2020, Igor Veresov wrote: > >> > On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov < >> vladimir.kozlov at oracle.com> wrote: >> > >> > I think Igor said that you can't swap arguments of compare without >> changing condition test. For example, if it was CC_LT it should be CC_GT >> after swap. >> >> Yes, that?s exactly what I had in mind. Condition must be inverted. >> Otherwise your transformation [3] is not valid for anything else but >> equality, so that?s not going to work. May be if [3] didn?t work, perhaps >> there is another user of the CmpLNode in addition to BoolNode ? >> >> igor >> >> > >> > It is not clear why you need swapping in CmpLNode::Ideal() if >> BoolNode::Ideal() should do it already. If it does not you need to >> investigate why. >> > >> > Also your list of steps 1.-3. does not reflect changes in webrev.02b: >> > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/ >> src/hotspot/share/opto/subnode.cpp.udiff.html >> > >> > Regards, >> > Vladimir >> > >> > On 6/30/20 9:33 PM, Boris Ulasevich wrote: >> >> Hi Igor, >> >> By BoolNode I mean the canonicalization that is already in place: >> >> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/ho >> tspot/share/opto/subnode.cpp#l1391 >> >> thanks, >> >> Boris >> >> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov >> wrote: >> >>> I think you forgot to include changes to BoolNode in the webrev. >> >>> >> >>> igor >> >>> >> >>> >> >>> >> >>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich < >> boris.ulasevich at bell-sw.com> >> >>> wrote: >> >>> >> >>> Hi Claes, >> >>> >> >>>> Seems like the optimization is mostly effective, but not getting all >> the >> >>> way. >> >>> >> >>> Good point about LHS, thanks! CmpL turned to be not canonized on the >> >>> moment. >> >>> I moved the optimization to CmpLNode::Ideal and transformations now >> works >> >>> as follows: >> >>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >> >>> 2. BoolNode::Ideal: >> >>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >> >>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >> >>> >> >>> I applied your test to the benchmark. The result is: >> >>> Benchmark Mode Cnt Score Error Units >> >>> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op >> >>> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op >> >>> >> >>> Updated webrev: >> >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >> >>> >> >>> thanks, >> >>> Boris >> >>> >> >>> On 26.06.2020 21:31, Claes Redestad wrote: >> >>> >> >>> Hi Boris, >> >>> >> >>> this looks like a nice improvement! I just have some comments about >> the >> >>> micro. >> >>> >> >>> I was curious whether the optimization works when the constant is on >> >>> the LHS and added a variant of the micro to try that[1]. Results are >> >>> interesting (Intel Xeon): >> >>> >> >>> Benchmark Mode Cnt Score Error Units >> >>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op >> >>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op >> >>> >> >>> With your patch: >> >>> Benchmark Mode Cnt Score Error Units >> >>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op >> >>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op >> >>> >> >>> Seems like the optimization is mostly effective, but not getting all >> >>> the way. I wouldn't worry about it for this RFE, but perhaps something >> >>> to investigate in a follow-up. Feel free to include such a variant in >> >>> your patch though (no attribution necessary). >> >>> >> >>> The micro also stabilizes very quickly, so you might want to provide >> >>> some default tuning to keep runtime in check, e.g., something like: >> >>> >> >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >> >>> @Measurement(iterations = 5, time = 1000, timeUnit = >> TimeUnit.MILLISECONDS) >> >>> @Fork(3) >> >>> >> >>> Thanks! >> >>> >> >>> /Claes >> >>> >> >>> [1] >> >>> @Benchmark >> >>> public int skipCastTestLeft() { >> >>> for (int i = 0; i < ARRAYSIZE_L; i++) { >> >>> if (ARRAYSIZE_L == intValues[i]) { >> >>> return i; >> >>> } >> >>> } >> >>> return 0; >> >>> } >> >>> >> >>> On 2020-06-26 17:05, Boris Ulasevich wrote: >> >>> >> >>> Hi all, >> >>> >> >>> Please review the change to eliminate the unnecessary i2l conversion >> >>> for expressions like this: "if (intValue == 1L)". >> >>> >> >>> http://bugs.openjdk.java.net/browse/JDK-8248043 >> >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >> >>> >> >>> The provided benchmark shows performance boost on all platforms: >> >>> - Intel Xeon: 32.705 --> 14.234 ns/op >> >>> - arm64: 42.060 --> 25.456 ns/op >> >>> - arm32: 618.763 --> 314.040 ns/op >> >>> - ppc8: 81.218 --> 63.026 ns/op >> >>> >> >>> Testing done: jtreg, jck. >> >>> >> >>> thanks, >> >>> Boris >> >>> >> >>> >> >>> >> >>> >> >> From Charlie.Gracie at microsoft.com Wed Jul 1 16:15:50 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Wed, 1 Jul 2020 16:15:50 +0000 Subject: Stack allocation prototype for C2 Message-ID: <2D33A1A7-A3EF-4103-BD0D-D466C0E7AA31@microsoft.com> Hi Sergey, We have some old data but we will gather new data for the benchmarks showing wins on overall allocation reduction. I attempted to gather this data yesterday but I ran into an issue with JFR. I have a work-around so we should be able to get the data in the next couple of days. With the holidays in Canada and the US this week it might take until Monday to get the data together. One of the common places we see wins is with Scala iterators, in particular, when iterating over primitive arrays. Regularly the array elements get boxed to perform an operation. I believe in the Scala TMT benchmark the win is removing an allocation of boxed Double objects when iterating over an array. Thanks for the question and we will get back to you with the data soon. Charlie ?On 2020-06-29, 11:34 PM, "hotspot-compiler-dev on behalf of Sergey Kuksenko" wrote: I am just curious. For each benchmark you show allocation reduce size in general. Do you have statistics which stack allocated objects gives major impact? And which code patterns fail scalar replacement except well know Integer cache flow merge? On 6/29/20 2:05 PM, Charlie Gracie wrote: > Hi hotspot-compiler-dev community, > > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). > > For a quick refresher here is a link to Nikola?s talk at FOSDEM: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffosdem.org%2F2020%2Fschedule%2Fevent%2Freducing_gc_times%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=qB1c8l5mUVk%2BAt7W5178A9wQ3pauoxW6XTVCfOTOmHw%3D&reserved=0 > > Here is a link to our initial webrev: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcharlie%2Fstack_alloc%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=46mF34J4XcMV58TJxvJ4%2FiDSxL41TSKgW0X2MX7HRV4%3D&reserved=0 > > Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission > form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial > performance results. This document can be found here: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fopenjdk-proposals%2Fblob%2Fmaster%2Fstack_allocation%2FStack_Allocation_JEP.md&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=V%2BqKZ9QgCd%2BKDbFb9MqFDoxdtXm8fFmgh%2FLYxgiGqJA%3D&reserved=0 > > Thanks in advance for reviews, suggestions, concerns, comments and issues. > Charlie and Nikola > From igor.veresov at oracle.com Wed Jul 1 19:29:42 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 1 Jul 2020 12:29:42 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> Message-ID: <424D5809-A580-43BD-A00D-B49C470AF280@oracle.com> That looks good. igor > On Jul 1, 2020, at 2:16 AM, Boris Ulasevich wrote: > > Hi, > > It is the third attempt to send a correct link. Sorry for that ;) > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02c > > Thanks, > Boris > > On Wednesday, July 1, 2020, Boris Ulasevich > wrote: > Hi, > > I'm deeply sorry. Yes, webrev.02b is certainly wrong! > Correct link is webrev.02c: > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02 c > - this is the change I described in my mail and wanted to review. > > my apologies, > Boris > > On Wednesday, July 1, 2020, Igor Veresov > wrote: > > On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov > wrote: > > > > I think Igor said that you can't swap arguments of compare without changing condition test. For example, if it was CC_LT it should be CC_GT after swap. > > Yes, that?s exactly what I had in mind. Condition must be inverted. Otherwise your transformation [3] is not valid for anything else but equality, so that?s not going to work. May be if [3] didn?t work, perhaps there is another user of the CmpLNode in addition to BoolNode ? > > igor > > > > > It is not clear why you need swapping in CmpLNode::Ideal() if BoolNode::Ideal() should do it already. If it does not you need to investigate why. > > > > Also your list of steps 1.-3. does not reflect changes in webrev.02b: > > http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/src/hotspot/share/opto/subnode.cpp.udiff.html > > > > Regards, > > Vladimir > > > > On 6/30/20 9:33 PM, Boris Ulasevich wrote: > >> Hi Igor, > >> By BoolNode I mean the canonicalization that is already in place: > >> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 > >> thanks, > >> Boris > >> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov > wrote: > >>> I think you forgot to include changes to BoolNode in the webrev. > >>> > >>> igor > >>> > >>> > >>> > >>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich > > >>> wrote: > >>> > >>> Hi Claes, > >>> > >>>> Seems like the optimization is mostly effective, but not getting all the > >>> way. > >>> > >>> Good point about LHS, thanks! CmpL turned to be not canonized on the > >>> moment. > >>> I moved the optimization to CmpLNode::Ideal and transformations now works > >>> as follows: > >>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL > >>> 2. BoolNode::Ideal: > >>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) > >>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) > >>> > >>> I applied your test to the benchmark. The result is: > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op > >>> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op > >>> > >>> Updated webrev: > >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b > >>> > >>> thanks, > >>> Boris > >>> > >>> On 26.06.2020 21:31, Claes Redestad wrote: > >>> > >>> Hi Boris, > >>> > >>> this looks like a nice improvement! I just have some comments about the > >>> micro. > >>> > >>> I was curious whether the optimization works when the constant is on > >>> the LHS and added a variant of the micro to try that[1]. Results are > >>> interesting (Intel Xeon): > >>> > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op > >>> > >>> With your patch: > >>> Benchmark Mode Cnt Score Error Units > >>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op > >>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op > >>> > >>> Seems like the optimization is mostly effective, but not getting all > >>> the way. I wouldn't worry about it for this RFE, but perhaps something > >>> to investigate in a follow-up. Feel free to include such a variant in > >>> your patch though (no attribution necessary). > >>> > >>> The micro also stabilizes very quickly, so you might want to provide > >>> some default tuning to keep runtime in check, e.g., something like: > >>> > >>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) > >>> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) > >>> @Fork(3) > >>> > >>> Thanks! > >>> > >>> /Claes > >>> > >>> [1] > >>> @Benchmark > >>> public int skipCastTestLeft() { > >>> for (int i = 0; i < ARRAYSIZE_L; i++) { > >>> if (ARRAYSIZE_L == intValues[i]) { > >>> return i; > >>> } > >>> } > >>> return 0; > >>> } > >>> > >>> On 2020-06-26 17:05, Boris Ulasevich wrote: > >>> > >>> Hi all, > >>> > >>> Please review the change to eliminate the unnecessary i2l conversion > >>> for expressions like this: "if (intValue == 1L)". > >>> > >>> http://bugs.openjdk.java.net/browse/JDK-8248043 > >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 > >>> > >>> The provided benchmark shows performance boost on all platforms: > >>> - Intel Xeon: 32.705 --> 14.234 ns/op > >>> - arm64: 42.060 --> 25.456 ns/op > >>> - arm32: 618.763 --> 314.040 ns/op > >>> - ppc8: 81.218 --> 63.026 ns/op > >>> > >>> Testing done: jtreg, jck. > >>> > >>> thanks, > >>> Boris > >>> > >>> > >>> > >>> > From vladimir.kozlov at oracle.com Wed Jul 1 19:44:48 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Jul 2020 12:44:48 -0700 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <424D5809-A580-43BD-A00D-B49C470AF280@oracle.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> <424D5809-A580-43BD-A00D-B49C470AF280@oracle.com> Message-ID: <7616c777-cfa0-37c5-3f6b-0e03d471fe84@oracle.com> +1 Thanks, Vladimir On 7/1/20 12:29 PM, Igor Veresov wrote: > That looks good. > > igor > > > >> On Jul 1, 2020, at 2:16 AM, Boris Ulasevich wrote: >> >> Hi, >> >> It is the third attempt to send a correct link. Sorry for that ;) >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02c >> >> Thanks, >> Boris >> >> On Wednesday, July 1, 2020, Boris Ulasevich > wrote: >> Hi, >> >> I'm deeply sorry. Yes, webrev.02b is certainly wrong! >> Correct link is webrev.02c: >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02 c >> - this is the change I described in my mail and wanted to review. >> >> my apologies, >> Boris >> >> On Wednesday, July 1, 2020, Igor Veresov > wrote: >>> On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov > wrote: >>> >>> I think Igor said that you can't swap arguments of compare without changing condition test. For example, if it was CC_LT it should be CC_GT after swap. >> >> Yes, that?s exactly what I had in mind. Condition must be inverted. Otherwise your transformation [3] is not valid for anything else but equality, so that?s not going to work. May be if [3] didn?t work, perhaps there is another user of the CmpLNode in addition to BoolNode ? >> >> igor >> >>> >>> It is not clear why you need swapping in CmpLNode::Ideal() if BoolNode::Ideal() should do it already. If it does not you need to investigate why. >>> >>> Also your list of steps 1.-3. does not reflect changes in webrev.02b: >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/src/hotspot/share/opto/subnode.cpp.udiff.html >>> >>> Regards, >>> Vladimir >>> >>> On 6/30/20 9:33 PM, Boris Ulasevich wrote: >>>> Hi Igor, >>>> By BoolNode I mean the canonicalization that is already in place: >>>> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 >>>> thanks, >>>> Boris >>>> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov > wrote: >>>>> I think you forgot to include changes to BoolNode in the webrev. >>>>> >>>>> igor >>>>> >>>>> >>>>> >>>>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich > >>>>> wrote: >>>>> >>>>> Hi Claes, >>>>> >>>>>> Seems like the optimization is mostly effective, but not getting all the >>>>> way. >>>>> >>>>> Good point about LHS, thanks! CmpL turned to be not canonized on the >>>>> moment. >>>>> I moved the optimization to CmpLNode::Ideal and transformations now works >>>>> as follows: >>>>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >>>>> 2. BoolNode::Ideal: >>>>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >>>>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >>>>> >>>>> I applied your test to the benchmark. The result is: >>>>> Benchmark Mode Cnt Score Error Units >>>>> SkipIntToLongCast.skipCastTestLeft avgt 5 14.288 ? 0.052 ns/op >>>>> SkipIntToLongCast.skipCastTestRight avgt 5 14.338 ? 0.088 ns/op >>>>> >>>>> Updated webrev: >>>>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >>>>> >>>>> thanks, >>>>> Boris >>>>> >>>>> On 26.06.2020 21:31, Claes Redestad wrote: >>>>> >>>>> Hi Boris, >>>>> >>>>> this looks like a nice improvement! I just have some comments about the >>>>> micro. >>>>> >>>>> I was curious whether the optimization works when the constant is on >>>>> the LHS and added a variant of the micro to try that[1]. Results are >>>>> interesting (Intel Xeon): >>>>> >>>>> Benchmark Mode Cnt Score Error Units >>>>> SkipIntToLongCast.skipCastTest avgt 5 30.937 ? 0.056 ns/op >>>>> SkipIntToLongCast.skipCastTestLeft avgt 5 30.937 ? 0.140 ns/op >>>>> >>>>> With your patch: >>>>> Benchmark Mode Cnt Score Error Units >>>>> SkipIntToLongCast.skipCastTest avgt 5 14.123 ? 0.035 ns/op >>>>> SkipIntToLongCast.skipCastTestLeft avgt 5 17.420 ? 0.044 ns/op >>>>> >>>>> Seems like the optimization is mostly effective, but not getting all >>>>> the way. I wouldn't worry about it for this RFE, but perhaps something >>>>> to investigate in a follow-up. Feel free to include such a variant in >>>>> your patch though (no attribution necessary). >>>>> >>>>> The micro also stabilizes very quickly, so you might want to provide >>>>> some default tuning to keep runtime in check, e.g., something like: >>>>> >>>>> @Warmup(iterations = 10, time = 500, timeUnit = TimeUnit.MILLISECONDS) >>>>> @Measurement(iterations = 5, time = 1000, timeUnit = TimeUnit.MILLISECONDS) >>>>> @Fork(3) >>>>> >>>>> Thanks! >>>>> >>>>> /Claes >>>>> >>>>> [1] >>>>> @Benchmark >>>>> public int skipCastTestLeft() { >>>>> for (int i = 0; i < ARRAYSIZE_L; i++) { >>>>> if (ARRAYSIZE_L == intValues[i]) { >>>>> return i; >>>>> } >>>>> } >>>>> return 0; >>>>> } >>>>> >>>>> On 2020-06-26 17:05, Boris Ulasevich wrote: >>>>> >>>>> Hi all, >>>>> >>>>> Please review the change to eliminate the unnecessary i2l conversion >>>>> for expressions like this: "if (intValue == 1L)". >>>>> >>>>> http://bugs.openjdk.java.net/browse/JDK-8248043 >>>>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >>>>> >>>>> The provided benchmark shows performance boost on all platforms: >>>>> - Intel Xeon: 32.705 --> 14.234 ns/op >>>>> - arm64: 42.060 --> 25.456 ns/op >>>>> - arm32: 618.763 --> 314.040 ns/op >>>>> - ppc8: 81.218 --> 63.026 ns/op >>>>> >>>>> Testing done: jtreg, jck. >>>>> >>>>> thanks, >>>>> Boris >>>>> >>>>> >>>>> >>>>> >> > From joserz at linux.ibm.com Wed Jul 1 19:49:10 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Wed, 1 Jul 2020 16:49:10 -0300 Subject: RFR(M): 8248191: PPC: Implement Load/Store Vector with lxvl/stxvl in Power10 Message-ID: <20200701194910.GA141565@pacoca> This patch introduces two instructions lxvl/stvxl and replaces the current lxvd2x/stxvd2x to load and store vectors. Like lxvd2x/stxvd2x, lxvl/stxvl can access unaligned effective addresses with the advantage of *not* requiring xxswapd after lxvd2x (or before stxvd2x) to correct the lanes in little-endian mode. Webrev: https://cr.openjdk.java.net/~mhorie/8248191/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8248191 Thanks for your review! Jose R. Ziviani From nils.eliasson at oracle.com Wed Jul 1 21:00:36 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Wed, 1 Jul 2020 23:00:36 +0200 Subject: [15] RFR(S): 8248388: ZGC: Load barrier incorrectly elided in jdk/java/text/Format/DateFormat/SDFTCKZoneNamesTest.java Message-ID: <31bc9579-55d2-8cdb-6ad1-58fb43f30c91@oracle.com> Hi, This issue was found on aarch64 but applies to all platforms. Stefan Karlsson tracked down the source of the issue and created a reproducer. The bug is that the access API was not used in two places in macro.cpp where scalar replacement generate a load from the source of an arraycopy. This causes the creation of a LoadP without a barrier. I fix this by reusing arraycopynode::load to create the loads. The abstraction is a bit of, but I don't want to make a larger change this late in 15. https://bugs.openjdk.java.net/browse/JDK-8248388 http://cr.openjdk.java.net/~neliasso/8248388/webrev.01/ Please review, Nils Eliasson From doug.simon at oracle.com Wed Jul 1 21:55:48 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 1 Jul 2020 23:55:48 +0200 Subject: RFR: 8248321: [JVMCI] improve libgraal logging and fatal error handling Message-ID: Please review this change that: 1. Sends log output from libgraal for options such as -Dlibgraal.PrintGC=true to HotSpot's tty stream. 2. Forwards a fatal error in libgraal to HotSpot's report_fatal function so that a proper hs_err_pid crash log is produced. 3. Adds coarse grained JVMCI events to the hs_eer_pid crash log that can help diagnose libgraal crashes. https://bugs.openjdk.java.net/browse/JDK-8248321 https://cr.openjdk.java.net/~dnsimon/8248321/webrev.00/ Testing: hs-tier1,hs-tier2,hs-tier3-graal,hs-tier4-graal I?ve also tested this on a JDK 16 libgraal build (thanks to Bob?s recent fixes ) using the -Dlibgraal.CrashAtIsFatal=true option introduced for testing purposes. Here are extracts from the resulting hs_err_pid log: Stack: [0x000070000404e000,0x000070000424e000], sp=0x000070000424cfa0, free space=2043k Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.dylib+0xa74706] _ZN7VMError14report_and_dieEiPKcS1_P13__va_list_tagP6ThreadPhPvS7_S1_im+0x696 V [libjvm.dylib+0xa74dcb] _ZN7VMError14report_and_dieEP6ThreadPvPKciS4_S4_P13__va_list_tag+0x3b V [libjvm.dylib+0x2ffeb6] _Z12report_fatalPKciS0_z+0xb6 V [libjvm.dylib+0x623b2e] _ZL6_fatalv+0x1e C [libjvmcicompiler.dylib+0x50c2e] FunctionPointerLogHandler_fatalError_45f632dec0d6a0795524f3a791e61bc3381552ca+0x5e C [libjvmcicompiler.dylib+0x6251d9] GraalCompiler_notifyCrash_6e5abb0717b70e82f6be0f6751e33644079f0e7c+0x199 C [libjvmcicompiler.dylib+0x622f36] GraalCompiler_checkForRequestedCrash_a1f0e6b1c079f96a46be20bd2ccc87fb7db83871+0x256 C [libjvmcicompiler.dylib+0x623929] GraalCompiler_compile_5fc27c66103532b8aadfba9a53a0cfc56727e415+0x209 C [libjvmcicompiler.dylib+0x623e80] GraalCompiler_compileGraph_7c727cf4f7ff3555660a81773d74fd53c28861a9+0x1e0 C [libjvmcicompiler.dylib+0x742259] HotSpotGraalCompiler_compileHelper_d3a966217707633929a5b5a4a7670fbd583caf11+0x419 C [libjvmcicompiler.dylib+0x741d95] HotSpotGraalCompiler_compile_80896636e2e15249ae0fc7c3c7f4cb060aca0523+0x165 JVMCI Events (8 events): Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime 0 (0x00007fa01af24040) Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime -1 (0x00007fa01af240a0) Event: 0.072 Thread 0x00007fa01b02bc00 loaded JVMCI shared library from /Users/dnsimon/hs/graal/sdk/mxbuild/darwin-amd64/GRAALVM_LIBGRAAL_JAVA16/graalvm-libgraal-java16-20.2.0-dev/lib/libjvmcicompiler.dylib Event: 0.073 Thread 0x00007fa01b02bc00 created JavaVM[1]@0x00000001409a3cb0 for JVMCI runtime 0 Event: 0.073 Thread 0x00007fa01b02bc00 initializing JVMCI runtime 0 Event: 0.074 Thread 0x00007fa01b02bc00 initialized JVMCI runtime 0 Event: 0.082 Thread 0x00007fa01b02bc00 initializing JVMCI runtime -1 Event: 0.088 Thread 0x00007fa01b02bc00 initialized JVMCI runtime -1 -Doug From vladimir.kozlov at oracle.com Wed Jul 1 22:16:25 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Jul 2020 15:16:25 -0700 Subject: [15] RFR(S): 8248388: ZGC: Load barrier incorrectly elided in jdk/java/text/Format/DateFormat/SDFTCKZoneNamesTest.java In-Reply-To: <31bc9579-55d2-8cdb-6ad1-58fb43f30c91@oracle.com> References: <31bc9579-55d2-8cdb-6ad1-58fb43f30c91@oracle.com> Message-ID: <2f5d5fce-eb28-bd97-9914-b2f081df0c3e@oracle.com> Looks good. Thanks, Vladimir K On 7/1/20 2:00 PM, Nils Eliasson wrote: > Hi, > > This issue was found on aarch64 but applies to all platforms. Stefan Karlsson tracked down the source of the issue and > created a reproducer. > > The bug is that the access API was not used in two places in macro.cpp where scalar replacement generate a load from the > source of an arraycopy. This causes the creation of a LoadP without a barrier. > > I fix this by reusing arraycopynode::load to create the loads. The abstraction is a bit of, but I don't want to make a > larger change this late in 15. > > https://bugs.openjdk.java.net/browse/JDK-8248388 > http://cr.openjdk.java.net/~neliasso/8248388/webrev.01/ > > Please review, > Nils Eliasson From vladimir.kozlov at oracle.com Wed Jul 1 22:26:17 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Jul 2020 15:26:17 -0700 Subject: RFR: 8248321: [JVMCI] improve libgraal logging and fatal error handling In-Reply-To: References: Message-ID: <40d76f1c-d3cb-466f-ab14-b9cf9fdbd097@oracle.com> Looks good. Please, run usual testing before push. Thanks, Vladimir On 7/1/20 2:55 PM, Doug Simon wrote: > Please review this change that: > > 1. Sends log output from libgraal for options such as -Dlibgraal.PrintGC=true to HotSpot's tty stream. > 2. Forwards a fatal error in libgraal to HotSpot's report_fatal function so that a proper hs_err_pid crash log is produced. > 3. Adds coarse grained JVMCI events to the hs_eer_pid crash log that can help diagnose libgraal crashes. > > https://bugs.openjdk.java.net/browse/JDK-8248321 > https://cr.openjdk.java.net/~dnsimon/8248321/webrev.00/ > > Testing: hs-tier1,hs-tier2,hs-tier3-graal,hs-tier4-graal > > I?ve also tested this on a JDK 16 libgraal build (thanks to Bob?s recent fixes ) using the -Dlibgraal.CrashAtIsFatal=true option introduced for testing purposes. Here are extracts from the resulting hs_err_pid log: > > Stack: [0x000070000404e000,0x000070000424e000], sp=0x000070000424cfa0, free space=2043k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.dylib+0xa74706] _ZN7VMError14report_and_dieEiPKcS1_P13__va_list_tagP6ThreadPhPvS7_S1_im+0x696 > V [libjvm.dylib+0xa74dcb] _ZN7VMError14report_and_dieEP6ThreadPvPKciS4_S4_P13__va_list_tag+0x3b > V [libjvm.dylib+0x2ffeb6] _Z12report_fatalPKciS0_z+0xb6 > V [libjvm.dylib+0x623b2e] _ZL6_fatalv+0x1e > C [libjvmcicompiler.dylib+0x50c2e] FunctionPointerLogHandler_fatalError_45f632dec0d6a0795524f3a791e61bc3381552ca+0x5e > C [libjvmcicompiler.dylib+0x6251d9] GraalCompiler_notifyCrash_6e5abb0717b70e82f6be0f6751e33644079f0e7c+0x199 > C [libjvmcicompiler.dylib+0x622f36] GraalCompiler_checkForRequestedCrash_a1f0e6b1c079f96a46be20bd2ccc87fb7db83871+0x256 > C [libjvmcicompiler.dylib+0x623929] GraalCompiler_compile_5fc27c66103532b8aadfba9a53a0cfc56727e415+0x209 > C [libjvmcicompiler.dylib+0x623e80] GraalCompiler_compileGraph_7c727cf4f7ff3555660a81773d74fd53c28861a9+0x1e0 > C [libjvmcicompiler.dylib+0x742259] HotSpotGraalCompiler_compileHelper_d3a966217707633929a5b5a4a7670fbd583caf11+0x419 > C [libjvmcicompiler.dylib+0x741d95] HotSpotGraalCompiler_compile_80896636e2e15249ae0fc7c3c7f4cb060aca0523+0x165 > > > JVMCI Events (8 events): > Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime 0 (0x00007fa01af24040) > Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime -1 (0x00007fa01af240a0) > Event: 0.072 Thread 0x00007fa01b02bc00 loaded JVMCI shared library from /Users/dnsimon/hs/graal/sdk/mxbuild/darwin-amd64/GRAALVM_LIBGRAAL_JAVA16/graalvm-libgraal-java16-20.2.0-dev/lib/libjvmcicompiler.dylib > Event: 0.073 Thread 0x00007fa01b02bc00 created JavaVM[1]@0x00000001409a3cb0 for JVMCI runtime 0 > Event: 0.073 Thread 0x00007fa01b02bc00 initializing JVMCI runtime 0 > Event: 0.074 Thread 0x00007fa01b02bc00 initialized JVMCI runtime 0 > Event: 0.082 Thread 0x00007fa01b02bc00 initializing JVMCI runtime -1 > Event: 0.088 Thread 0x00007fa01b02bc00 initialized JVMCI runtime -1 > > -Doug > > From doug.simon at oracle.com Wed Jul 1 22:41:14 2020 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 2 Jul 2020 00:41:14 +0200 Subject: RFR: 8248321: [JVMCI] improve libgraal logging and fatal error handling In-Reply-To: <40d76f1c-d3cb-466f-ab14-b9cf9fdbd097@oracle.com> References: <40d76f1c-d3cb-466f-ab14-b9cf9fdbd097@oracle.com> Message-ID: <0EC13073-ACCD-4E13-8E05-F25DF5162F24@oracle.com> Thanks. I will be sure to run testing. -Doug > On 2 Jul 2020, at 00:26, Vladimir Kozlov wrote: > > Looks good. > > Please, run usual testing before push. > > Thanks, > Vladimir > > On 7/1/20 2:55 PM, Doug Simon wrote: >> Please review this change that: >> 1. Sends log output from libgraal for options such as -Dlibgraal.PrintGC=true to HotSpot's tty stream. >> 2. Forwards a fatal error in libgraal to HotSpot's report_fatal function so that a proper hs_err_pid crash log is produced. >> 3. Adds coarse grained JVMCI events to the hs_eer_pid crash log that can help diagnose libgraal crashes. >> https://bugs.openjdk.java.net/browse/JDK-8248321 >> https://cr.openjdk.java.net/~dnsimon/8248321/webrev.00/ >> Testing: hs-tier1,hs-tier2,hs-tier3-graal,hs-tier4-graal >> I?ve also tested this on a JDK 16 libgraal build (thanks to Bob?s recent fixes ) using the -Dlibgraal.CrashAtIsFatal=true option introduced for testing purposes. Here are extracts from the resulting hs_err_pid log: >> Stack: [0x000070000404e000,0x000070000424e000], sp=0x000070000424cfa0, free space=2043k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.dylib+0xa74706] _ZN7VMError14report_and_dieEiPKcS1_P13__va_list_tagP6ThreadPhPvS7_S1_im+0x696 >> V [libjvm.dylib+0xa74dcb] _ZN7VMError14report_and_dieEP6ThreadPvPKciS4_S4_P13__va_list_tag+0x3b >> V [libjvm.dylib+0x2ffeb6] _Z12report_fatalPKciS0_z+0xb6 >> V [libjvm.dylib+0x623b2e] _ZL6_fatalv+0x1e >> C [libjvmcicompiler.dylib+0x50c2e] FunctionPointerLogHandler_fatalError_45f632dec0d6a0795524f3a791e61bc3381552ca+0x5e >> C [libjvmcicompiler.dylib+0x6251d9] GraalCompiler_notifyCrash_6e5abb0717b70e82f6be0f6751e33644079f0e7c+0x199 >> C [libjvmcicompiler.dylib+0x622f36] GraalCompiler_checkForRequestedCrash_a1f0e6b1c079f96a46be20bd2ccc87fb7db83871+0x256 >> C [libjvmcicompiler.dylib+0x623929] GraalCompiler_compile_5fc27c66103532b8aadfba9a53a0cfc56727e415+0x209 >> C [libjvmcicompiler.dylib+0x623e80] GraalCompiler_compileGraph_7c727cf4f7ff3555660a81773d74fd53c28861a9+0x1e0 >> C [libjvmcicompiler.dylib+0x742259] HotSpotGraalCompiler_compileHelper_d3a966217707633929a5b5a4a7670fbd583caf11+0x419 >> C [libjvmcicompiler.dylib+0x741d95] HotSpotGraalCompiler_compile_80896636e2e15249ae0fc7c3c7f4cb060aca0523+0x165 >> JVMCI Events (8 events): >> Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime 0 (0x00007fa01af24040) >> Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime -1 (0x00007fa01af240a0) >> Event: 0.072 Thread 0x00007fa01b02bc00 loaded JVMCI shared library from /Users/dnsimon/hs/graal/sdk/mxbuild/darwin-amd64/GRAALVM_LIBGRAAL_JAVA16/graalvm-libgraal-java16-20.2.0-dev/lib/libjvmcicompiler.dylib >> Event: 0.073 Thread 0x00007fa01b02bc00 created JavaVM[1]@0x00000001409a3cb0 for JVMCI runtime 0 >> Event: 0.073 Thread 0x00007fa01b02bc00 initializing JVMCI runtime 0 >> Event: 0.074 Thread 0x00007fa01b02bc00 initialized JVMCI runtime 0 >> Event: 0.082 Thread 0x00007fa01b02bc00 initializing JVMCI runtime -1 >> Event: 0.088 Thread 0x00007fa01b02bc00 initialized JVMCI runtime -1 >> -Doug From dean.long at oracle.com Thu Jul 2 00:08:35 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 1 Jul 2020 17:08:35 -0700 Subject: RFR(XL) 8247922: Update Graal Message-ID: https://bugs.openjdk.java.net/browse/JDK-8247922 http://cr.openjdk.java.net/~dlong/8247922/webrev/ This is a Graal update.? Changes since the last update (JDK-8243380) are listed in the bug description. dl From vladimir.kozlov at oracle.com Thu Jul 2 00:23:56 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 1 Jul 2020 17:23:56 -0700 Subject: [16] RFR(XS) 8076985: Allocation path: biased locking + compressed oops code quality Message-ID: <580c31f3-b86e-a6c3-ca61-2d6104a846f8@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8076985 https://cr.openjdk.java.net/~kvn/8076985/webrev.00/ First, this is about how C2 generates code for *constant* class pointers. A little history here. When we implemented compressed oops and class pointers we had PermGen and classes were Java objects. We used the same decoding/encoding code for oops and classes - we used the same register containing Heap Base address. It was profitable to decode constant class and reuse it [1]. Also we greatly benefited on SPARC since decoding 32-bit constant required 4 instructions instead of up to 7 instructions to load 64-bit constant. Now compressed class decoding is different and always takes 2 instructions on x86 [2] if either base or shift is not 0. As result we generated 3 instructions to get full class pointer from compressed 32-bit constant (example for base = 0, shift = 3): movl $0x200001d5,%r11d movabs $0x0,%r10 lea (%r10,%r11,8),%r10 Also when we store compressed class pointer into new object header we don't use register anymore on x86 - keeping it in register does not help now: movl $0x200001d5,0x8(%rax) Aleksey suggested to have only one instruction to load full 64-bit class pointer: movq $0x100000EA8,%r10 It frees one register and uses 10 bytes instead of up to 20 bytes of code on x86. In JDK 9 SAP contributed nice change [3] to have choice when to use 'compressed class pointer + decoding' or full '64-bit constant class pointer'. It significantly simplified changes for this RFE. I ran performance testing but did not see difference - we don't use biased locking now and as result we don't need to load prototype header from class. But there are other places where we need load from class. Thanks, Vladimir K [1] https://bugs.openjdk.java.net/browse/JDK-6709093 http://hg.openjdk.java.net/jdk8u/jdk8u-dev/hotspot/rev/44abbb0d4c18 To generate instead of this: movl R11, narrowoop: precise klass jnt/scimark2/Random: 0x000000000083b418:Constant:exact * # compressed ptr movq R10, precise klass jnt/scimark2/Random: 0x000000000083b418:Constant:exact * # ptr movq R10, [R10 + #176 (32-bit)] # ptr movq [RAX], R10 # ptr movl [RAX + #8 (8-bit)], R11 # compressed ptr generate this: movl R11, narrowoop: precise klass Point: 0x00000000007ad518:Constant:exact * # compressed ptr movq R10, [R12 + R11 << 3 + #176] (compressed oop addressing) # ptr movq [R8], R10 # ptr movl [R8 + #8 (8-bit)], R11 # compressed ptr [2] http://hg.openjdk.java.net/jdk/jdk/file/c5ed42533134/src/hotspot/cpu/x86/macroAssembler_x86.cpp#l4609 [3] https://bugs.openjdk.java.net/browse/JDK-8155729 From tobias.hartmann at oracle.com Thu Jul 2 06:43:23 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 2 Jul 2020 08:43:23 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> References: <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> Message-ID: <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> Hi Felix, On 30.06.20 19:06, Tobias Hartmann wrote: > I'll run some perf and correctness testing and report back once it finished. All passed. Best regards, Tobias From rwestrel at redhat.com Thu Jul 2 07:03:37 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 02 Jul 2020 09:03:37 +0200 Subject: [15] RFR(S): 8248388: ZGC: Load barrier incorrectly elided in jdk/java/text/Format/DateFormat/SDFTCKZoneNamesTest.java In-Reply-To: <31bc9579-55d2-8cdb-6ad1-58fb43f30c91@oracle.com> References: <31bc9579-55d2-8cdb-6ad1-58fb43f30c91@oracle.com> Message-ID: <87v9j69z5i.fsf@redhat.com> > http://cr.openjdk.java.net/~neliasso/8248388/webrev.01/ Looks good. Roland. From rwestrel at redhat.com Thu Jul 2 07:08:10 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 02 Jul 2020 09:08:10 +0200 Subject: [16] RFR(XS) 8076985: Allocation path: biased locking + compressed oops code quality In-Reply-To: <580c31f3-b86e-a6c3-ca61-2d6104a846f8@oracle.com> References: <580c31f3-b86e-a6c3-ca61-2d6104a846f8@oracle.com> Message-ID: <87sgea9yxx.fsf@redhat.com> > https://cr.openjdk.java.net/~kvn/8076985/webrev.00/ Looks good to me. Roland. From rwestrel at redhat.com Thu Jul 2 07:10:01 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 02 Jul 2020 09:10:01 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <4d051aec-56ef-b35e-f082-2f6305ec1694@oracle.com> <9146b58c-9353-dcb8-827e-7f92a85cecd2@oracle.com> <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> Message-ID: <87pn9e9yuu.fsf@redhat.com> > Updated webrev: http://cr.openjdk.java.net/~fyang/8243670/webrev.03/ Looks good to me. Roland. From christian.hagedorn at oracle.com Thu Jul 2 07:33:24 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 2 Jul 2020 09:33:24 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs Message-ID: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8247743 http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ The testcase creates a deep graph with a lot of nodes on a chain. When running with the specified test flags, it recursively calls Node::find_recur() for each node discovered which eventually results in a segmentation fault due to a stack overflow (around 10000 calls due to such a long chain of nodes). The fix just converts the recursive algorithm into an iterative one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. I additionally removed Node::find_ctrl() and its special handling in the algorithm since it is not used. There is actually another problem with the recursive version. When running the testcase without -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because there is a debug_orig node cycle and the loop does not break based on the debug_orig nodes being visited. This is also fixed in the patch. Thank you! Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8246203 [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From adinn at redhat.com Thu Jul 2 08:04:12 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 2 Jul 2020 09:04:12 +0100 Subject: 8248336: AArch64: C2: offset overflow in BoxLockNode::emit In-Reply-To: <2db7b669-63b6-1dbd-6d7a-7bac55144167@redhat.com> References: <3fa560fa-c1fd-0131-10d2-040bac25b7f7@redhat.com> <2db7b669-63b6-1dbd-6d7a-7bac55144167@redhat.com> Message-ID: <3b578127-9a25-3bc9-c9e5-12bbc3f366ce@redhat.com> On 25/06/2020 17:48, Andrew Haley wrote: > On 25/06/2020 17:31, Andrew Haley wrote: >> BoxLockNode::emit only allows a 12-bit offset from register SP to the >> stack slot that contains the inflated lock. Rather amazingly we've >> never seen this fail in production, but in theory a BoxLockNode can be >> anywhere in the stack frame. >> >> I have once seen this fail in test code, but it is very hard to >> reproduce. > > http://cr.openjdk.java.net/~aph/8248336/ Sorry, I checked the patch when you posted the webrev and I thought I had posted an ack but clearly did not. This is fine. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From aph at redhat.com Thu Jul 2 08:15:42 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 2 Jul 2020 09:15:42 +0100 Subject: Stack allocation prototype for C2 In-Reply-To: References: Message-ID: <0f98b198-0769-08fc-f1ff-553eadcede22@redhat.com> On 29/06/2020 22:05, Charlie Gracie wrote: > Here is the prototype code for our work on adding stack allocation > to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be > contributed. We certainly need a repo where it can go. It could either be adopted by an existing project or it could have a project of its own. The latter is perhaps a bad idea because it would be too isolated. > A change of this size is difficult to review so we understand the > process will be thorough and will take time to complete. Any > suggestions on how to allow for collaboration with others, if they > wanted to, would also be appreciated (i.e., a repo somewhere). Here's my concern. Java stacks are, in general, pretty small. This is good, and makes for economical memory usage. This is particularly useful for Project Loom, where there can be enormous numbers of "virtual" threads. These threads, while they are not active, are stored in the heap. As you might imagine, the idea of embedded objects (which, of course, cannot be collected) in these virtual threads does not delight me at all. Is this likely to be a real problem, do you think, or are all of the stack-allocated objects so small that I shouldn't be concerned? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From dalibor.topic at oracle.com Thu Jul 2 08:59:40 2020 From: dalibor.topic at oracle.com (Dalibor Topic) Date: Thu, 2 Jul 2020 10:59:40 +0200 Subject: Stack allocation prototype for C2 In-Reply-To: References: Message-ID: On 29.06.2020 23:05, Charlie Gracie wrote: > Hi hotspot-compiler-dev community, > > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). Hi Charlie, You may want to take a look at https://cr.openjdk.java.net/~chegar/docs/sandbox.html "The primary purpose of the JDK Sandbox Development Repository is to facilitate OpenJDK developers that are working on non-trivial changes, possibly JEP-scale effort, whose scope and duration make it necessary to collaborate with others in an open shared version control system, rather than just using privately shared patches. " cheers, dalibor topic -- Dalibor Topic Consulting Product Manager Phone: +494089091214 , Mobile: +491737185961 , Video: dalibor.topic at oracle.com Oracle Global Services Germany GmbH Hauptverwaltung: Riesstr. 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRB 246209 Gesch?ftsf?hrer: Ralf Herrmann From zhuoren.wz at alibaba-inc.com Thu Jul 2 07:36:24 2020 From: zhuoren.wz at alibaba-inc.com (=?UTF-8?B?V2FuZyBaaHVvKFpodW9yZW4p?=) Date: Thu, 02 Jul 2020 15:36:24 +0800 Subject: =?UTF-8?B?W2FhcmNoNjQtcG9ydC1kZXYgXSBSRlIoWFhTKTo4MjQ4NTcwIEluY29ycmVjdCBjb3B5cmln?= =?UTF-8?B?aHQgaGVhZGVyIGluIFRlc3RVbnNhZmVVbmFsaWduZWRTd2FwLmphdmE=?= Message-ID: <587101a8-7cb0-453b-aed5-4edca2cdda2d.zhuoren.wz@alibaba-inc.com> Hi, There's something wrong int the legal notice of TestUnsafeUnalignedSwap.java file. It should be GPLv2 as in `make/templates/gpl-header`. This patch(from Vladimir Kozlov) fixes it. BUG Link:https://bugs.openjdk.java.net/browse/JDK-8248570 CR: http://cr.openjdk.java.net/~wzhuo/8248570/webrev.00/ Regards, Zhuoren From vladimir.x.ivanov at oracle.com Thu Jul 2 12:01:16 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 2 Jul 2020 15:01:16 +0300 Subject: Stack allocation prototype for C2 In-Reply-To: References: Message-ID: <6a3c74b0-daf8-04c2-76b0-dc2ce3714314@oracle.com> Hi Charlie and Nikola, > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). > > For a quick refresher here is a link to Nikola?s talk at FOSDEM: > https://fosdem.org/2020/schedule/event/reducing_gc_times/ > > Here is a link to our initial webrev: > http://cr.openjdk.java.net/~adityam/charlie/stack_alloc/ > > Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission > form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial > performance results. This document can be found here: > https://github.com/microsoft/openjdk-proposals/blob/master/stack_allocation/Stack_Allocation_JEP.md Very nice write-up and design overview! "To implement stack allocation, we need to modify the C2 compiler, the GCs and some of the VM runtime interfaces" From the design overview and the implementation, I'm concerned about far-reaching consequences of the chosen approach. It's not limited just to existing set of JVM features, but as Andrew noted will affect the design of forthcoming functionality as well. I think it's worth to start a broad discussion (HotSpot-wide) and decide how much JVM design complexity budged it is worth spending on such an optimization. Personally, I'm not convinced that supporting stack allocated objects in the JVM is justified. As we discussed off-line (right after FOSDEM), I do see the benefits of in-memory representation for non-escaping objects: memory aliasing (either indeterminate base or indexed access) imposes inherent constraints on the escape analysis (both partial and conservative approaches suffer from it). Nevertheless, some of the problematic cases can be addressed by improving existing approach or introducing a more powerful analysis: covering more cases and making the analysis control-sensitive should improve the situation. Also, the alternative approach (called zone-based heap allocation) looks very attractive to me. I haven't thought it through, but it looks like keeping the objects on the Java heap can save us a lot of complexity on the implementation side (more memory available for allocation - not necessarily fixed amount, no need to migrate objects from stack to heap, GC barriers are unaffected, etc.). For example, reserving a dedicated TLAB (or a stack of TLABs?) and do nmethod-scoped allocations from C2 code looks attractive. It can simplify many aspects of the implementation: much more space available, free migration of non-escaping objects to heap on deoptimization. Another idea: "When dealing with stack allocated objects in loops we need a lifetime overlap check." It doesn't look specific to stack-allocated objects. Non-overlapping live ranges can be coalesced the same way for on-heap freshly allocated objects. It should get comparable reduction in allocation pressure (single allocation per loop vs allocation per iteration) and doesn't require stack allocation support at all (as an example [1]). If such improvements are enabled for non-escaping on-heap objects, how much benefit will stack allocation bring on top of that? IMO the performance gap should become much narrower. Best regards, Vladimir Ivanov [1] MyObject a = new MyObject(x1); // no aliasing: always accessed through "a" for (...) { ... a = new MyObject(x2); ... } return a.x; can be turned into: MyObject a = new MyObject(x1); for (...) { ... a.x = x2; // plus, re-initialize other instance fields ... } return a.x; It can even be extended for escaping objects in some cases (while the object is provably not escaped). From richard.reingruber at sap.com Thu Jul 2 14:04:57 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Thu, 2 Jul 2020 14:04:57 +0000 Subject: RFR(XS) 8247695: [PPC,S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails Message-ID: Hi, could I please get reviews for this small bugfix which adds support for AbsL nodes to the C2 backends on PPC and S390? Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8247695/webrev.0/ Bug: https://bugs.openjdk.java.net/browse/JDK-8247695 The patch successfully passes regression testing @SAP which includes JCK and JTREG tests, also in Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific tests with fastdebug and release builds. Thanks, Richard. From christian.hagedorn at oracle.com Thu Jul 2 14:31:51 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 2 Jul 2020 16:31:51 +0200 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled Message-ID: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248596 http://cr.openjdk.java.net/~chagedorn/8248596/webrev.00/ It excludes the execution of this C2 specific test with Graal since it has many methods and runs with CompileOnly, possibly letting it time out with Graal. Best regards, Christian From rwestrel at redhat.com Thu Jul 2 14:58:53 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 02 Jul 2020 16:58:53 +0200 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> Message-ID: <87mu4i9d5e.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8248596/webrev.00/ That looks good to me but I've been wondering what the interaction of this and similar issues with libgraal is. Presumably, running this test with libgraal wouldn't time out. Does: @requires !vm.graal.enabled cover both graal and libgraal? If so is the plan to reevaluate all those additions once libgraal becomes the standard way of running graal? Or is there an other way to override the requirement so that if you run with libgraal, all tests are run? Roland. From tom.rodriguez at oracle.com Thu Jul 2 15:50:43 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Thu, 2 Jul 2020 08:50:43 -0700 Subject: RFR: 8248321: [JVMCI] improve libgraal logging and fatal error handling In-Reply-To: References: Message-ID: <4d9b0f80-35be-6ccc-6bf0-ea2b80b485a1@oracle.com> Looks good. tom Doug Simon wrote on 7/1/20 2:55 PM: > Please review this change that: > > 1. Sends log output from libgraal for options such as -Dlibgraal.PrintGC=true to HotSpot's tty stream. > 2. Forwards a fatal error in libgraal to HotSpot's report_fatal function so that a proper hs_err_pid crash log is produced. > 3. Adds coarse grained JVMCI events to the hs_eer_pid crash log that can help diagnose libgraal crashes. > > https://bugs.openjdk.java.net/browse/JDK-8248321 > https://cr.openjdk.java.net/~dnsimon/8248321/webrev.00/ > > Testing: hs-tier1,hs-tier2,hs-tier3-graal,hs-tier4-graal > > I?ve also tested this on a JDK 16 libgraal build (thanks to Bob?s recent fixes ) using the -Dlibgraal.CrashAtIsFatal=true option introduced for testing purposes. Here are extracts from the resulting hs_err_pid log: > > Stack: [0x000070000404e000,0x000070000424e000], sp=0x000070000424cfa0, free space=2043k > Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) > V [libjvm.dylib+0xa74706] _ZN7VMError14report_and_dieEiPKcS1_P13__va_list_tagP6ThreadPhPvS7_S1_im+0x696 > V [libjvm.dylib+0xa74dcb] _ZN7VMError14report_and_dieEP6ThreadPvPKciS4_S4_P13__va_list_tag+0x3b > V [libjvm.dylib+0x2ffeb6] _Z12report_fatalPKciS0_z+0xb6 > V [libjvm.dylib+0x623b2e] _ZL6_fatalv+0x1e > C [libjvmcicompiler.dylib+0x50c2e] FunctionPointerLogHandler_fatalError_45f632dec0d6a0795524f3a791e61bc3381552ca+0x5e > C [libjvmcicompiler.dylib+0x6251d9] GraalCompiler_notifyCrash_6e5abb0717b70e82f6be0f6751e33644079f0e7c+0x199 > C [libjvmcicompiler.dylib+0x622f36] GraalCompiler_checkForRequestedCrash_a1f0e6b1c079f96a46be20bd2ccc87fb7db83871+0x256 > C [libjvmcicompiler.dylib+0x623929] GraalCompiler_compile_5fc27c66103532b8aadfba9a53a0cfc56727e415+0x209 > C [libjvmcicompiler.dylib+0x623e80] GraalCompiler_compileGraph_7c727cf4f7ff3555660a81773d74fd53c28861a9+0x1e0 > C [libjvmcicompiler.dylib+0x742259] HotSpotGraalCompiler_compileHelper_d3a966217707633929a5b5a4a7670fbd583caf11+0x419 > C [libjvmcicompiler.dylib+0x741d95] HotSpotGraalCompiler_compile_80896636e2e15249ae0fc7c3c7f4cb060aca0523+0x165 > > > JVMCI Events (8 events): > Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime 0 (0x00007fa01af24040) > Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime -1 (0x00007fa01af240a0) > Event: 0.072 Thread 0x00007fa01b02bc00 loaded JVMCI shared library from /Users/dnsimon/hs/graal/sdk/mxbuild/darwin-amd64/GRAALVM_LIBGRAAL_JAVA16/graalvm-libgraal-java16-20.2.0-dev/lib/libjvmcicompiler.dylib > Event: 0.073 Thread 0x00007fa01b02bc00 created JavaVM[1]@0x00000001409a3cb0 for JVMCI runtime 0 > Event: 0.073 Thread 0x00007fa01b02bc00 initializing JVMCI runtime 0 > Event: 0.074 Thread 0x00007fa01b02bc00 initialized JVMCI runtime 0 > Event: 0.082 Thread 0x00007fa01b02bc00 initializing JVMCI runtime -1 > Event: 0.088 Thread 0x00007fa01b02bc00 initialized JVMCI runtime -1 > > -Doug > > From martin.doerr at sap.com Thu Jul 2 15:57:22 2020 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 2 Jul 2020 15:57:22 +0000 Subject: RFR(M): 8248191: PPC: Implement Load/Store Vector with lxvl/stxvl in Power10 In-Reply-To: <20200701194910.GA141565@pacoca> References: <20200701194910.GA141565@pacoca> Message-ID: Where do we save xxswapd instructions? I can't see it in the webrev. Best regards, Martin > -----Original Message----- > From: hotspot-compiler-dev retn at openjdk.java.net> On Behalf Of joserz at linux.ibm.com > Sent: Mittwoch, 1. Juli 2020 21:49 > To: hotspot-compiler-dev at openjdk.java.net > Cc: Michihiro Horie > Subject: RFR(M): 8248191: PPC: Implement Load/Store Vector with lxvl/stxvl > in Power10 > > This patch introduces two instructions lxvl/stvxl and replaces the current > lxvd2x/stxvd2x to load and store vectors. Like lxvd2x/stxvd2x, lxvl/stxvl can > access unaligned effective addresses with the advantage of *not* requiring > xxswapd after lxvd2x (or before stxvd2x) to correct the lanes in little-endian > mode. > > Webrev: https://cr.openjdk.java.net/~mhorie/8248191/webrev.00/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8248191 > > Thanks for your review! > > Jose R. Ziviani From doug.simon at oracle.com Thu Jul 2 16:00:23 2020 From: doug.simon at oracle.com (Doug Simon) Date: Thu, 2 Jul 2020 18:00:23 +0200 Subject: RFR: 8248321: [JVMCI] improve libgraal logging and fatal error handling In-Reply-To: <4d9b0f80-35be-6ccc-6bf0-ea2b80b485a1@oracle.com> References: <4d9b0f80-35be-6ccc-6bf0-ea2b80b485a1@oracle.com> Message-ID: Thanks Tom. -Doug > On 2 Jul 2020, at 17:50, Tom Rodriguez wrote: > > Looks good. > > tom > > Doug Simon wrote on 7/1/20 2:55 PM: >> Please review this change that: >> 1. Sends log output from libgraal for options such as -Dlibgraal.PrintGC=true to HotSpot's tty stream. >> 2. Forwards a fatal error in libgraal to HotSpot's report_fatal function so that a proper hs_err_pid crash log is produced. >> 3. Adds coarse grained JVMCI events to the hs_eer_pid crash log that can help diagnose libgraal crashes. >> https://bugs.openjdk.java.net/browse/JDK-8248321 >> https://cr.openjdk.java.net/~dnsimon/8248321/webrev.00/ >> Testing: hs-tier1,hs-tier2,hs-tier3-graal,hs-tier4-graal >> I?ve also tested this on a JDK 16 libgraal build (thanks to Bob?s recent fixes ) using the -Dlibgraal.CrashAtIsFatal=true option introduced for testing purposes. Here are extracts from the resulting hs_err_pid log: >> Stack: [0x000070000404e000,0x000070000424e000], sp=0x000070000424cfa0, free space=2043k >> Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code) >> V [libjvm.dylib+0xa74706] _ZN7VMError14report_and_dieEiPKcS1_P13__va_list_tagP6ThreadPhPvS7_S1_im+0x696 >> V [libjvm.dylib+0xa74dcb] _ZN7VMError14report_and_dieEP6ThreadPvPKciS4_S4_P13__va_list_tag+0x3b >> V [libjvm.dylib+0x2ffeb6] _Z12report_fatalPKciS0_z+0xb6 >> V [libjvm.dylib+0x623b2e] _ZL6_fatalv+0x1e >> C [libjvmcicompiler.dylib+0x50c2e] FunctionPointerLogHandler_fatalError_45f632dec0d6a0795524f3a791e61bc3381552ca+0x5e >> C [libjvmcicompiler.dylib+0x6251d9] GraalCompiler_notifyCrash_6e5abb0717b70e82f6be0f6751e33644079f0e7c+0x199 >> C [libjvmcicompiler.dylib+0x622f36] GraalCompiler_checkForRequestedCrash_a1f0e6b1c079f96a46be20bd2ccc87fb7db83871+0x256 >> C [libjvmcicompiler.dylib+0x623929] GraalCompiler_compile_5fc27c66103532b8aadfba9a53a0cfc56727e415+0x209 >> C [libjvmcicompiler.dylib+0x623e80] GraalCompiler_compileGraph_7c727cf4f7ff3555660a81773d74fd53c28861a9+0x1e0 >> C [libjvmcicompiler.dylib+0x742259] HotSpotGraalCompiler_compileHelper_d3a966217707633929a5b5a4a7670fbd583caf11+0x419 >> C [libjvmcicompiler.dylib+0x741d95] HotSpotGraalCompiler_compile_80896636e2e15249ae0fc7c3c7f4cb060aca0523+0x165 >> JVMCI Events (8 events): >> Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime 0 (0x00007fa01af24040) >> Event: 0.015 Thread 0x00007fa00b011600 created new JVMCI runtime -1 (0x00007fa01af240a0) >> Event: 0.072 Thread 0x00007fa01b02bc00 loaded JVMCI shared library from /Users/dnsimon/hs/graal/sdk/mxbuild/darwin-amd64/GRAALVM_LIBGRAAL_JAVA16/graalvm-libgraal-java16-20.2.0-dev/lib/libjvmcicompiler.dylib >> Event: 0.073 Thread 0x00007fa01b02bc00 created JavaVM[1]@0x00000001409a3cb0 for JVMCI runtime 0 >> Event: 0.073 Thread 0x00007fa01b02bc00 initializing JVMCI runtime 0 >> Event: 0.074 Thread 0x00007fa01b02bc00 initialized JVMCI runtime 0 >> Event: 0.082 Thread 0x00007fa01b02bc00 initializing JVMCI runtime -1 >> Event: 0.088 Thread 0x00007fa01b02bc00 initialized JVMCI runtime -1 >> -Doug From goetz.lindenmaier at sap.com Thu Jul 2 16:45:07 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 2 Jul 2020 16:45:07 +0000 Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails In-Reply-To: References: Message-ID: Hi Richard, I had a look at your change, looks good. Reviewed. Thanks for fixing this. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Reingruber, Richard > Sent: Thursday, July 2, 2020 4:05 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: > compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails > > Hi, > > could I please get reviews for this small bugfix which adds support for AbsL > nodes to the C2 > backends on PPC and S390? > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8247695/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8247695 > > The patch successfully passes regression testing @SAP which includes JCK > and JTREG tests, also in > Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific > tests with fastdebug and > release builds. > > Thanks, Richard. From vladimir.kozlov at oracle.com Thu Jul 2 17:05:14 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 10:05:14 -0700 Subject: [16] RFR(XS) 8076985: Allocation path: biased locking + compressed oops code quality In-Reply-To: <87sgea9yxx.fsf@redhat.com> References: <580c31f3-b86e-a6c3-ca61-2d6104a846f8@oracle.com> <87sgea9yxx.fsf@redhat.com> Message-ID: Thank you, Roland Vladimir K On 7/2/20 12:08 AM, Roland Westrelin wrote: > >> https://cr.openjdk.java.net/~kvn/8076985/webrev.00/ > > Looks good to me. > > Roland. > From vladimir.kozlov at oracle.com Thu Jul 2 17:12:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 10:12:00 -0700 Subject: RFR(XL) 8247922: Update Graal In-Reply-To: References: Message-ID: Looks good. Thank you for linking failures to bugs. Vladimir K. On 7/1/20 5:08 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8247922 > http://cr.openjdk.java.net/~dlong/8247922/webrev/ > > This is a Graal update.? Changes since the last update (JDK-8243380) are listed in the bug description. > > dl > > > From vladimir.kozlov at oracle.com Thu Jul 2 17:30:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 10:30:01 -0700 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> Message-ID: <660c7d05-520a-ae36-c608-36dfba5eebf2@oracle.com> I think it should requires vm.compiler2.enabled because this test very C2 specific. Note, Graal and C2 are mutually exclusive. Thanks, Vladimir K On 7/2/20 7:31 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248596 > http://cr.openjdk.java.net/~chagedorn/8248596/webrev.00/ > > It excludes the execution of this C2 specific test with Graal since it has many methods and runs with CompileOnly, > possibly letting it time out with Graal. > > Best regards, > Christian From vladimir.kozlov at oracle.com Thu Jul 2 17:42:12 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 10:42:12 -0700 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <87mu4i9d5e.fsf@redhat.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> Message-ID: <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> On 7/2/20 7:58 AM, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8248596/webrev.00/ > > That looks good to me but I've been wondering what the interaction of > this and similar issues with libgraal is. Presumably, running this test > with libgraal wouldn't time out. Does: > @requires !vm.graal.enabled cover > both graal and libgraal? If so is the plan to reevaluate all those > additions once libgraal becomes the standard way of running graal? Or > is there an other way to override the requirement so that if you run > with libgraal, all tests are run? Yes, we have such way: ProblemList-graal.txt, We list a test and link it to next bug: https://bugs.openjdk.java.net/browse/JDK-8207267 But sometimes we have a test which only checks very C2 specific functionality. I think it is okay to run it only with C2. Vladimir K > > Roland. > From vladimir.kozlov at oracle.com Thu Jul 2 17:49:55 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 10:49:55 -0700 Subject: [aarch64-port-dev ] RFR(XXS):8248570 Incorrect copyright header in TestUnsafeUnalignedSwap.java In-Reply-To: <587101a8-7cb0-453b-aed5-4edca2cdda2d.zhuoren.wz@alibaba-inc.com> References: <587101a8-7cb0-453b-aed5-4edca2cdda2d.zhuoren.wz@alibaba-inc.com> Message-ID: Thank you, Zhuoren Checks passed now. Vladimir K On 7/2/20 12:36 AM, Wang Zhuo(Zhuoren) wrote: > Hi, > There's something wrong int the legal notice of TestUnsafeUnalignedSwap.java file. It should be GPLv2 as in `make/templates/gpl-header`. This patch(from Vladimir Kozlov) fixes it. > BUG Link:https://bugs.openjdk.java.net/browse/JDK-8248570 > CR: http://cr.openjdk.java.net/~wzhuo/8248570/webrev.00/ > > > Regards, > Zhuoren > From boris.ulasevich at bell-sw.com Thu Jul 2 18:02:49 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 2 Jul 2020 21:02:49 +0300 Subject: RFR 8248043: Need to eliminate excessive i2l conversions In-Reply-To: <424D5809-A580-43BD-A00D-B49C470AF280@oracle.com> References: <096e0df7-8208-2a07-975f-e2de8bc27e3a@bell-sw.com> <75920e44-518e-10e0-53b3-c2a6f85fd841@oracle.com> <0be466e7-057c-b029-3461-de21d9cd3910@bell-sw.com> <424D5809-A580-43BD-A00D-B49C470AF280@oracle.com> Message-ID: <044d72f2-8895-3070-21fe-937af7fd2bc3@bell-sw.com> Thank you, Igor and Vladimir. Boris On 01.07.2020 22:29, Igor Veresov wrote: > That looks good. > > igor > > > >> On Jul 1, 2020, at 2:16 AM, Boris Ulasevich >> > wrote: >> >> Hi, >> >> It is the third attempt to send a correct link. Sorry for that ;) >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02c >> >> Thanks, >> Boris >> >> On Wednesday, July 1, 2020, Boris Ulasevich >> > wrote: >> >> Hi, >> >> I'm deeply sorry. Yes, webrev.02b is certainly wrong! >> Correct link is webrev.02c: >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02 >> c >> - this is the change I described in my mail and wanted to review. >> >> my apologies, >> Boris >> >> On Wednesday, July 1, 2020, Igor Veresov > > wrote: >> >> > On Jun 30, 2020, at 10:15 PM, Vladimir Kozlov >> > > wrote: >> > >> > I think Igor said that you can't swap arguments of compare >> without changing condition test. For example, if it was CC_LT >> it should be CC_GT after swap. >> >> Yes, that?s exactly what I had in mind.? Condition must be >> inverted. Otherwise your transformation [3] is not valid for >> anything else but equality, so that?s not going to work. May >> be if [3] didn?t work, perhaps there is another user of? the >> CmpLNode in addition to BoolNode ? >> >> igor >> >> > >> > It is not clear why you need swapping in CmpLNode::Ideal() >> if BoolNode::Ideal() should do it already. If it does not you >> need to investigate why. >> > >> > Also your list of steps 1.-3. does not reflect changes in >> webrev.02b: >> > >> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b/src/hotspot/share/opto/subnode.cpp.udiff.html >> >> > >> > Regards, >> > Vladimir >> > >> > On 6/30/20 9:33 PM, Boris Ulasevich wrote: >> >> Hi Igor, >> >> By BoolNode I mean the canonicalization that is already in >> place: >> >> >> https://hg.openjdk.java.net/jdk/jdk/file/de6ad5f86276/src/hotspot/share/opto/subnode.cpp#l1391 >> >> >> thanks, >> >> Boris >> >> On Wed, Jul 1, 2020 at 5:07 AM Igor Veresov >> > wrote: >> >>> I think you forgot to include changes to BoolNode in the >> webrev. >> >>> >> >>> igor >> >>> >> >>> >> >>> >> >>> On Jun 30, 2020, at 11:04 AM, Boris Ulasevich >> > > >> >>> wrote: >> >>> >> >>> Hi Claes, >> >>> >> >>>> Seems like the optimization is mostly effective, but not >> getting all the >> >>> way. >> >>> >> >>> Good point about LHS, thanks! CmpL turned to be not >> canonized on the >> >>> moment. >> >>> I moved the optimization to CmpLNode::Ideal and >> transformations now works >> >>> as follows: >> >>> 1. CmpINode::Ideal: CmpI(CmpL3)->CmpL >> >>> 2. BoolNode::Ideal: >> >>> Bool(CmpL(const,val),test)->Bool(CmpL(val,const),test_invert) >> >>> 3. CmpLNode::Ideal: CmpL(ConvI2L(val),ConL)->CmpI(val,ConI) >> >>> >> >>> I applied your test to the benchmark. The result is: >> >>> Benchmark Mode? Cnt? ?Score? ?Error Units >> >>> SkipIntToLongCast.skipCastTestLeft? ?avgt? ? 5? 14.288 ? >> 0.052 ns/op >> >>> SkipIntToLongCast.skipCastTestRight? avgt? ? 5? 14.338 ? >> 0.088 ns/op >> >>> >> >>> Updated webrev: >> >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.02b >> >> >>> >> >>> thanks, >> >>> Boris >> >>> >> >>> On 26.06.2020 21:31, Claes Redestad wrote: >> >>> >> >>> Hi Boris, >> >>> >> >>> this looks like a nice improvement! I just have some >> comments about the >> >>> micro. >> >>> >> >>> I was curious whether the optimization works when the >> constant is on >> >>> the LHS and added a variant of the micro to try that[1]. >> Results are >> >>> interesting (Intel Xeon): >> >>> >> >>> Benchmark Mode? Cnt? ?Score? ?Error Units >> >>> SkipIntToLongCast.skipCastTest ? ? ?avgt? ? 5? 30.937 ? >> 0.056 ns/op >> >>> SkipIntToLongCast.skipCastTestLeft? ?avgt? ? 5? 30.937 ? >> 0.140 ns/op >> >>> >> >>> With your patch: >> >>> Benchmark Mode? Cnt? ?Score? ?Error Units >> >>> SkipIntToLongCast.skipCastTest ? ? ?avgt? ? 5? 14.123 ? >> 0.035 ns/op >> >>> SkipIntToLongCast.skipCastTestLeft? ?avgt? ? 5? 17.420 ? >> 0.044 ns/op >> >>> >> >>> Seems like the optimization is mostly effective, but not >> getting all >> >>> the way. I wouldn't worry about it for this RFE, but >> perhaps something >> >>> to investigate in a follow-up. Feel free to include such >> a variant in >> >>> your patch though (no attribution necessary). >> >>> >> >>> The micro also stabilizes very quickly, so you might want >> to provide >> >>> some default tuning to keep runtime in check, e.g., >> something like: >> >>> >> >>> @Warmup(iterations = 10, time = 500, timeUnit = >> TimeUnit.MILLISECONDS) >> >>> @Measurement(iterations = 5, time = 1000, timeUnit = >> TimeUnit.MILLISECONDS) >> >>> @Fork(3) >> >>> >> >>> Thanks! >> >>> >> >>> /Claes >> >>> >> >>> [1] >> >>>? ? ?@Benchmark >> >>>? ? ?public int skipCastTestLeft() { >> >>>? ? ? ? ?for (int i = 0; i < ARRAYSIZE_L; i++) { >> >>>? ? ? ? ? ? ?if (ARRAYSIZE_L == intValues[i]) { >> >>>? ? ? ? ? ? ? ? ?return i; >> >>>? ? ? ? ? ? ?} >> >>>? ? ? ? ?} >> >>>? ? ? ? ?return 0; >> >>>? ? ?} >> >>> >> >>> On 2020-06-26 17:05, Boris Ulasevich wrote: >> >>> >> >>> Hi all, >> >>> >> >>> Please review the change to eliminate the unnecessary i2l >> conversion >> >>> for expressions like this: "if (intValue == 1L)". >> >>> >> >>> http://bugs.openjdk.java.net/browse/JDK-8248043 >> >> >>> http://cr.openjdk.java.net/~bulasevich/8248043/webrev.01 >> >> >>> >> >>> The provided benchmark shows performance boost on all >> platforms: >> >>> - Intel Xeon: 32.705 --> 14.234 ns/op >> >>> - arm64: 42.060 --> 25.456 ns/op >> >>> - arm32: 618.763 --> 314.040 ns/op >> >>> - ppc8:? 81.218 --> 63.026 ns/op >> >>> >> >>> Testing done: jtreg, jck. >> >>> >> >>> thanks, >> >>> Boris >> >>> >> >>> >> >>> >> >>> >> > From boris.ulasevich at bell-sw.com Thu Jul 2 18:13:44 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 2 Jul 2020 21:13:44 +0300 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr Message-ID: Hi, Please review a one-line change: adding -Xbatch option to recently introduced test to get a more predictable PrintOptoAssembly output. http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 http://bugs.openjdk.java.net/browse/JDK-8248568 thanks, Boris From vladimir.kozlov at oracle.com Thu Jul 2 18:54:27 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 11:54:27 -0700 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr In-Reply-To: References: Message-ID: Good. You may also replace next requirements: vm.flavor == "server" & !vm.graal.enabled with one: vm.compiler2.enabled Graal and C2 are mutually exclusive. May be also run processes without C1 by switching off Tiered Compilation. And instead of: @run main/othervm compiler.c2.TestBit use: @run driver compiler.c2.TestBit Because you launching separate processes. Please, test changes with jtreg testing. Thanks, Vladimir K On 7/2/20 11:13 AM, Boris Ulasevich wrote: > Hi, > > Please review a one-line change: adding -Xbatch option to recently > introduced test to get a more predictable PrintOptoAssembly output. > > http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 > http://bugs.openjdk.java.net/browse/JDK-8248568 > > thanks, > Boris From richard.reingruber at sap.com Thu Jul 2 19:22:19 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Thu, 2 Jul 2020 19:22:19 +0000 Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails In-Reply-To: References: Message-ID: Thank you Goetz! Cheers, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 2. Juli 2020 18:45 To: Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: RE: [CAUTION] RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails Hi Richard, I had a look at your change, looks good. Reviewed. Thanks for fixing this. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Reingruber, Richard > Sent: Thursday, July 2, 2020 4:05 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: > compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails > > Hi, > > could I please get reviews for this small bugfix which adds support for AbsL > nodes to the C2 > backends on PPC and S390? > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8247695/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8247695 > > The patch successfully passes regression testing @SAP which includes JCK > and JTREG tests, also in > Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific > tests with fastdebug and > release builds. > > Thanks, Richard. From dean.long at oracle.com Thu Jul 2 20:01:30 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 2 Jul 2020 13:01:30 -0700 Subject: RFR(XL) 8247922: Update Graal In-Reply-To: References: Message-ID: <8afc61d8-a542-0952-7854-13bfca718d2f@oracle.com> Thanks Vladimir. dl On 7/2/20 10:12 AM, Vladimir Kozlov wrote: > Looks good. Thank you for linking failures to bugs. > > Vladimir K. > > On 7/1/20 5:08 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8247922 >> http://cr.openjdk.java.net/~dlong/8247922/webrev/ >> >> This is a Graal update.? Changes since the last update (JDK-8243380) >> are listed in the bug description. >> >> dl >> >> >> From boris.ulasevich at bell-sw.com Thu Jul 2 21:29:35 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 3 Jul 2020 00:29:35 +0300 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr In-Reply-To: References: Message-ID: Hi Vladimir, Thank you. I applied your suggestions. On our machines jtreg runs well. Update: http://cr.openjdk.java.net/~bulasevich/8248568/webrev.01 regards, Boris On Thu, Jul 2, 2020 at 9:54 PM Vladimir Kozlov wrote: > Good. > > You may also replace next requirements: > > vm.flavor == "server" & !vm.graal.enabled > > with one: > > vm.compiler2.enabled > > Graal and C2 are mutually exclusive. > > May be also run processes without C1 by switching off Tiered Compilation. > > And instead of: > @run main/othervm compiler.c2.TestBit > > use: > @run driver compiler.c2.TestBit > > Because you launching separate processes. > > Please, test changes with jtreg testing. > > Thanks, > Vladimir K > > On 7/2/20 11:13 AM, Boris Ulasevich wrote: > > Hi, > > > > Please review a one-line change: adding -Xbatch option to recently > > introduced test to get a more predictable PrintOptoAssembly output. > > > > http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 > > http://bugs.openjdk.java.net/browse/JDK-8248568 > > > > thanks, > > Boris > From vladimir.kozlov at oracle.com Thu Jul 2 21:45:49 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 14:45:49 -0700 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr In-Reply-To: References: Message-ID: Looks good. Thanks, Vladimir On 7/2/20 2:29 PM, Boris Ulasevich wrote: > Hi Vladimir, > > Thank you. I applied your suggestions. On our machines jtreg runs well. > Update: http://cr.openjdk.java.net/~bulasevich/8248568/webrev.01 > > regards, > Boris > > On Thu, Jul 2, 2020 at 9:54 PM Vladimir Kozlov > wrote: > >> Good. >> >> You may also replace next requirements: >> >> vm.flavor == "server" & !vm.graal.enabled >> >> with one: >> >> vm.compiler2.enabled >> >> Graal and C2 are mutually exclusive. >> >> May be also run processes without C1 by switching off Tiered Compilation. >> >> And instead of: >> @run main/othervm compiler.c2.TestBit >> >> use: >> @run driver compiler.c2.TestBit >> >> Because you launching separate processes. >> >> Please, test changes with jtreg testing. >> >> Thanks, >> Vladimir K >> >> On 7/2/20 11:13 AM, Boris Ulasevich wrote: >>> Hi, >>> >>> Please review a one-line change: adding -Xbatch option to recently >>> introduced test to get a more predictable PrintOptoAssembly output. >>> >>> http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 >>> http://bugs.openjdk.java.net/browse/JDK-8248568 >>> >>> thanks, >>> Boris >> From vladimir.kozlov at oracle.com Fri Jul 3 02:02:06 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 2 Jul 2020 19:02:06 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC Message-ID: https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8247527 Test should have @requires which excludes running Graal with GC which it does not support. Testing: hs-tier1,hs-tier4-graal Thanks, Vladimir From Yang.Zhang at arm.com Fri Jul 3 02:15:27 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Fri, 3 Jul 2020 02:15:27 +0000 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> Message-ID: Hi Sandhya Thanks very much for your help. Regards, Yang -----Original Message----- From: Viswanathan, Sandhya Sent: Wednesday, July 1, 2020 2:57 AM To: Yang Zhang ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Yang, I have merged vectorIntrinsics with changes from panama/default. Hope this helps. Best Regards, Sandhya -----Original Message----- From: Yang Zhang Sent: Monday, June 29, 2020 12:49 AM To: Viswanathan, Sandhya ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Andrew, 1. Instructions that can be matched with NEON instructions directly. MulVB, SqrtVF and AbsV have been merged into jdk master already. 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. Hi Sandhya, Could you please help to manual merge panama vectorIntrinsics/vector-unstable to jdk master? So that I can update this patch based on latest jdk master. Regards Yang -----Original Message----- From: Viswanathan, Sandhya Sent: Thursday, June 25, 2020 3:04 AM To: Yang Zhang ; Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes Hi Andrew/Yang, We couldn?t propose Vector API to target in time for JDK 15 and hoping to do so early in JDK 16 timeframe. The implementation reviews on other components have made good progress. We have so far ok to PPT from (runtime, shared compiler changes, x86 backend). Java API implementation review is in progress. I wanted to check with you both if we have a go ahead from aarch64 backed point of view. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Yang Zhang Sent: Tuesday, May 26, 2020 7:59 PM To: Andrew Haley ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: RE: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes > But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? The new instructions can be classified as: 1. Instructions that can be matched with NEON instructions directly. MulVB and SqrtVF have been merged into jdk master already. The patch of AbsV is in review [1]. 2. Instructions that Jdk master has middle end support for, but they cannot be matched with NEON instructions directly. Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. May I have a new patch for these? 3. Panama/Vector API specific instructions Such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. These instructions cannot be moved into jdk master first because there isn't middle-end support. Regards Yang [1] https://mail.openjdk.java.net/pipermail/aarch64-port-dev/2020-May/008861.html -----Original Message----- From: Andrew Haley Sent: Tuesday, May 26, 2020 4:25 PM To: Yang Zhang ; Paul Sandoz Cc: hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; nd Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes On 25/05/2020 09:26, Yang Zhang wrote: > In jdk master, what we need to do is that writing m4 file for existing > vector instructions and placed them to a new file aarch64_neon.ad. > If no question, I will do it right away. I'm not entirely sure that such a change is necessary now. In particular, reorganizing the existing vector instructions is IMO excessive, but I admit that it might be an improvement. But to my earlier question. please: can the new instructions be moved into jdk head first, and then merged into the Panama branch, or not? It'd help if this was possible. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From igor.ignatyev at oracle.com Fri Jul 3 02:24:15 2020 From: igor.ignatyev at oracle.com (igor.ignatyev at oracle.com) Date: Thu, 2 Jul 2020 19:24:15 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: <5E33E613-882E-400A-886A-EA4FAD85F2EA@oracle.com> LGTM ? Igor > On Jul 2, 2020, at 7:03 PM, Vladimir Kozlov wrote: > > ?https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8247527 > > Test should have @requires which excludes running Graal with GC which it does not support. > > Testing: hs-tier1,hs-tier4-graal > > Thanks, > Vladimir From david.holmes at oracle.com Fri Jul 3 02:25:45 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 3 Jul 2020 12:25:45 +1000 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: Hi Vladimir, On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8247527 > > Test should have @requires which excludes running Graal with GC which it > does not support. I find it somewhat disturbing that a generic test has to know about the limitations between GCs and Graal! I would have been more inclined to just exclude this test when running with Graal, even if that theoretically reduced the test coverage in a ting way. If/When Graal supports these other GCs who will remember to re-enable these test cases? Thanks, David > Testing: hs-tier1,hs-tier4-graal > > Thanks, > Vladimir From igor.ignatyev at oracle.com Fri Jul 3 02:59:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 2 Jul 2020 19:59:35 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: Hi David, it's in my todo list to improve this situation and have vm.gc.X to take selected JIT into account; and update existing (>200) occurrences of 'vm.gc.X & !vm.graal.enabled' -- Igor > On Jul 2, 2020, at 7:25 PM, David Holmes wrote: > > Hi Vladimir, > > On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8247527 >> Test should have @requires which excludes running Graal with GC which it does not support. > > I find it somewhat disturbing that a generic test has to know about the limitations between GCs and Graal! > > I would have been more inclined to just exclude this test when running with Graal, even if that theoretically reduced the test coverage in a ting way. > > If/When Graal supports these other GCs who will remember to re-enable these test cases? > > Thanks, > David > >> Testing: hs-tier1,hs-tier4-graal >> Thanks, >> Vladimir From david.holmes at oracle.com Fri Jul 3 05:09:28 2020 From: david.holmes at oracle.com (David Holmes) Date: Fri, 3 Jul 2020 15:09:28 +1000 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: Hi Igor, On 3/07/2020 12:59 pm, Igor Ignatyev wrote: > Hi David, > > it's in my todo list to improve this situation and have vm.gc.X to take selected JIT into account; and update existing (>200) occurrences of 'vm.gc.X & !vm.graal.enabled' 200+ ouch! :( I guess this fix doesn't make the situation any worse in a practical sense. Thanks, David ----- > -- Igor > >> On Jul 2, 2020, at 7:25 PM, David Holmes wrote: >> >> Hi Vladimir, >> >> On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: >>> https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8247527 >>> Test should have @requires which excludes running Graal with GC which it does not support. >> >> I find it somewhat disturbing that a generic test has to know about the limitations between GCs and Graal! >> >> I would have been more inclined to just exclude this test when running with Graal, even if that theoretically reduced the test coverage in a ting way. >> >> If/When Graal supports these other GCs who will remember to re-enable these test cases? >> >> Thanks, >> David >> >>> Testing: hs-tier1,hs-tier4-graal >>> Thanks, >>> Vladimir > From felix.yang at huawei.com Fri Jul 3 06:30:05 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 3 Jul 2020 06:30:05 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> References: <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Thursday, July 2, 2020 2:43 PM > To: Yangfei (Felix) ; Roland Westrelin > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 30.06.20 19:06, Tobias Hartmann wrote: > > I'll run some perf and correctness testing and report back once it finished. > > All passed. Thanks for the effort. :-) I also submitted the latest patch to jdk/submit repo for testing. First time submitted in branch JDK-8243670-3, but I haven?t got any test result after about 24 hours. Then I closed this branch and resubmitted in a new branch JDK-8243670-4 about 8 hours ago. I guess maybe something is wrong with the submit repo? I am still waiting for the test result. http://hg.openjdk.java.net/jdk/submit/rev/798000e6da7f Thanks, Felix From christian.hagedorn at oracle.com Fri Jul 3 07:19:14 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 3 Jul 2020 09:19:14 +0200 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> Message-ID: Hi Vladimir, hi Roland Thank you for your reviews! > I think it should requires vm.compiler2.enabled because this test very C2 specific. > Note, Graal and C2 are mutually exclusive. Sounds reasonable. I changed that in a new webrev: http://cr.openjdk.java.net/~chagedorn/8248596/webrev.01/ Best regards, Christian On 02.07.20 19:42, Vladimir Kozlov wrote: > On 7/2/20 7:58 AM, Roland Westrelin wrote: >> >>> http://cr.openjdk.java.net/~chagedorn/8248596/webrev.00/ >> >> That looks good to me but I've been wondering what the interaction of >> this and similar issues with libgraal is. Presumably, running this test >> with libgraal wouldn't time out. Does: >> @requires !vm.graal.enabled cover >> both graal and libgraal? If so is the plan to reevaluate all those >> additions once libgraal becomes the standard way of running graal? Or >> is there an other way to override the requirement so that if you run >> with libgraal, all tests are run? > > Yes, we have such way: ProblemList-graal.txt, We list a test and link it > to next bug: > > https://bugs.openjdk.java.net/browse/JDK-8207267 > > But sometimes we have a test which only checks very C2 specific > functionality. I think it is okay to run it only with C2. > > Vladimir K > >> >> Roland. >> From tobias.hartmann at oracle.com Fri Jul 3 07:31:36 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 09:31:36 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> Message-ID: <06cc9b64-0b56-1ed1-ad3f-5d646e46c98a@oracle.com> Hi Felix, On 03.07.20 08:30, Yangfei (Felix) wrote: > Thanks for the effort. :-) > I also submitted the latest patch to jdk/submit repo for testing. The testing I did includes the jobs executed by the submit repo, so no need to submit again. You can push your patch to: http://hg.openjdk.java.net/jdk/jdk15 > First time submitted in branch JDK-8243670-3, but I haven?t got any test result after about 24 hours. > Then I closed this branch and resubmitted in a new branch JDK-8243670-4 about 8 hours ago. > I guess maybe something is wrong with the submit repo? I am still waiting for the test result. > > http://hg.openjdk.java.net/jdk/submit/rev/798000e6da7f Okay, there seems to be an issue with the submit repo. I'll report it. Best regards, Tobias From tobias.hartmann at oracle.com Fri Jul 3 07:37:03 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 09:37:03 +0200 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> Message-ID: <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> Hi Christian, On 03.07.20 09:19, Christian Hagedorn wrote: > Sounds reasonable. I changed that in a new webrev: > http://cr.openjdk.java.net/~chagedorn/8248596/webrev.01/ Looks good to me. Best regards, Tobias From nils.eliasson at oracle.com Fri Jul 3 07:39:02 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 3 Jul 2020 09:39:02 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> Message-ID: <3ff8176b-9122-9fff-f628-5e001b21a1dd@oracle.com> Thank you Tobias, Claes and Vladimir! I got the suggestion from Patric to change to use unified logging instead of PrintCompilation. Are you ok with that? Best regards, Nils On 2020-06-29 08:33, Tobias Hartmann wrote: > Hi Nils, > > Looks good to me! > > In globals.hpp:543 there is an excess whitespace before "\". > > Best regards, > Tobias > > On 26.06.20 16:48, Nils Eliasson wrote: >> Hi, >> >> This is a diagnostic utility that was requested by Claes to enable better profiling of the compilers. >> >> This patch introduces the diagnostic flag RepeatCompilation. >> >> RepeatCompilation hold he number of times the compilation gets repeated without having the code >> installed. RepeatCompilation = 0 is the default and means that only the regular compilation is done. >> RepeatCompilation = 100 means that an extra 100 compilations are done but without installing the code. >> >> I have tried keeping the change small and non-intrusive, contained to the CompilerBroker (except the >> boolean for disabling code install that is passed to the compilers). >> >> RepatCompilation works as a flag: "-XX:RepeatCompilation=100", a compile command: >> "-XX:CompileCommand=option,*::toString,intx,RepeatCompilation,100" >> and a compiler directive: "RepeatCompilation : 100". >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8248398 >> Webrev: http://cr.openjdk.java.net/~neliasso/8248398/webrev.04/ >> >> Please review! >> >> Best regards, >> Nils Eliasson >> From tobias.hartmann at oracle.com Fri Jul 3 07:51:59 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 09:51:59 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3ff8176b-9122-9fff-f628-5e001b21a1dd@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> <3ff8176b-9122-9fff-f628-5e001b21a1dd@oracle.com> Message-ID: <3df4b83a-de08-3a1f-4daf-4ed48515217c@oracle.com> Hi Nils, On 03.07.20 09:39, Nils Eliasson wrote: > I got the suggestion from Patric to change to use unified logging instead of PrintCompilation. Are > you ok with that? For the task printing with "NO CODE INSTALLED" message? I think that should go with PrintCompilation to be consistent. Otherwise it would be weird that PrintCompilation would not print any compilations if RepeatCompilation is enabled. Best regards, Tobias From christian.hagedorn at oracle.com Fri Jul 3 07:52:27 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 3 Jul 2020 09:52:27 +0200 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> Message-ID: Thank you Tobias for your review! Best regards, Christian On 03.07.20 09:37, Tobias Hartmann wrote: > Hi Christian, > > On 03.07.20 09:19, Christian Hagedorn wrote: >> Sounds reasonable. I changed that in a new webrev: >> http://cr.openjdk.java.net/~chagedorn/8248596/webrev.01/ > > Looks good to me. > > Best regards, > Tobias > From felix.yang at huawei.com Fri Jul 3 08:03:26 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 3 Jul 2020 08:03:26 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <06cc9b64-0b56-1ed1-ad3f-5d646e46c98a@oracle.com> References: <87k103w2o7.fsf@redhat.com> <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> <06cc9b64-0b56-1ed1-ad3f-5d646e46c98a@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, July 3, 2020 3:32 PM > To: Yangfei (Felix) ; Roland Westrelin > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 03.07.20 08:30, Yangfei (Felix) wrote: > > Thanks for the effort. :-) > > I also submitted the latest patch to jdk/submit repo for testing. > > The testing I did includes the jobs executed by the submit repo, so no need > to submit again. Great to know that :-) > You can push your patch to: > http://hg.openjdk.java.net/jdk/jdk15 Yes. Will push to jdk/jdk15 and to jdk/jdk after that. > > First time submitted in branch JDK-8243670-3, but I haven?t got any test > result after about 24 hours. > > Then I closed this branch and resubmitted in a new branch JDK-8243670-4 > about 8 hours ago. > > I guess maybe something is wrong with the submit repo? I am still waiting > for the test result. > > > > http://hg.openjdk.java.net/jdk/submit/rev/798000e6da7f > > Okay, there seems to be an issue with the submit repo. I'll report it. Thanks for reporting that. Felix From tobias.hartmann at oracle.com Fri Jul 3 08:56:04 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 10:56:04 +0200 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: References: <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> <06cc9b64-0b56-1ed1-ad3f-5d646e46c98a@oracle.com> Message-ID: <556b5a00-0b89-b2cd-e243-e1f4a15201a4@oracle.com> Hi Felix, On 03.07.20 10:03, Yangfei (Felix) wrote: > Yes. Will push to jdk/jdk15 and to jdk/jdk after that. Pushing to jdk/jdk (JDK 16) is not required. It will be merged automatically: https://mail.openjdk.java.net/pipermail/jdk-dev/2020-June/004372.html Best regards, Tobias From felix.yang at huawei.com Fri Jul 3 09:02:40 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Fri, 3 Jul 2020 09:02:40 +0000 Subject: RFR(S): 8243670: Unexpected test result caused by C2 MergeMemNode::Ideal In-Reply-To: <556b5a00-0b89-b2cd-e243-e1f4a15201a4@oracle.com> References: <87eeq7wmd2.fsf@redhat.com> <878sgfwbyc.fsf@redhat.com> <87wo3yupks.fsf@redhat.com> <87o8p1ours.fsf@redhat.com> <134e1fc1-8e5c-a1f2-d0ed-50784b807578@oracle.com> <4d71bb09-2569-4d01-16cc-707ce61d23de@oracle.com> <06cc9b64-0b56-1ed1-ad3f-5d646e46c98a@oracle.com> <556b5a00-0b89-b2cd-e243-e1f4a15201a4@oracle.com> Message-ID: Hi Tobias, > -----Original Message----- > From: Tobias Hartmann [mailto:tobias.hartmann at oracle.com] > Sent: Friday, July 3, 2020 4:56 PM > To: Yangfei (Felix) ; Roland Westrelin > ; hotspot-compiler-dev at openjdk.java.net > Cc: guoge (A) ; zhouyong (V) > > Subject: Re: RFR(S): 8243670: Unexpected test result caused by C2 > MergeMemNode::Ideal > > Hi Felix, > > On 03.07.20 10:03, Yangfei (Felix) wrote: > > Yes. Will push to jdk/jdk15 and to jdk/jdk after that. > > Pushing to jdk/jdk (JDK 16) is not required. It will be merged automatically: > https://mail.openjdk.java.net/pipermail/jdk-dev/2020-June/004372.html Thanks for reminding that. Will do. Felix From nils.eliasson at oracle.com Fri Jul 3 09:21:58 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 3 Jul 2020 11:21:58 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <3df4b83a-de08-3a1f-4daf-4ed48515217c@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> <3ff8176b-9122-9fff-f628-5e001b21a1dd@oracle.com> <3df4b83a-de08-3a1f-4daf-4ed48515217c@oracle.com> Message-ID: <4860fa4b-f42f-61cd-56c7-b5682fae31a4@oracle.com> On 2020-07-03 09:51, Tobias Hartmann wrote: > Hi Nils, > > On 03.07.20 09:39, Nils Eliasson wrote: >> I got the suggestion from Patric to change to use unified logging instead of PrintCompilation. Are >> you ok with that? > For the task printing with "NO CODE INSTALLED" message? I think that should go with PrintCompilation > to be consistent. Otherwise it would be weird that PrintCompilation would not print any compilations > if RepeatCompilation is enabled. I would like to encourage moving to Xlog/UL and from PrintCompilation. The few people that will use RepeatCompilation will know how to use -Xlog. If you insist I can add both. // Nils > > Best regards, > Tobias From tobias.hartmann at oracle.com Fri Jul 3 09:28:38 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 11:28:38 +0200 Subject: [16] RFR(S): 8248398: Add diagnostic RepeatCompilation utility In-Reply-To: <4860fa4b-f42f-61cd-56c7-b5682fae31a4@oracle.com> References: <3708c0f5-1c43-5adf-c817-2fb1ff6518c8@oracle.com> <3b5be72a-8c6e-8d93-f48b-d37e6e7ef049@oracle.com> <3ff8176b-9122-9fff-f628-5e001b21a1dd@oracle.com> <3df4b83a-de08-3a1f-4daf-4ed48515217c@oracle.com> <4860fa4b-f42f-61cd-56c7-b5682fae31a4@oracle.com> Message-ID: <7275e88d-6efc-45c5-6dc5-9b177b0a9fe4@oracle.com> On 03.07.20 11:21, Nils Eliasson wrote: > I would like to encourage moving to Xlog/UL and from PrintCompilation. The few people that will use > RepeatCompilation will know how to use -Xlog. Okay, fair enough. > If you insist I can add both. No, using only UL is fine with me. Best regards, Tobias From christian.hagedorn at oracle.com Fri Jul 3 11:42:04 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 3 Jul 2020 13:42:04 +0200 Subject: [16] RFR(S): 8248226: TestCloneAccessStressGCM fails with -XX:-ReduceBulkZeroing Message-ID: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248226 http://cr.openjdk.java.net/~chagedorn/8248226/webrev.00/ C2 erroneously folds the addition in the return statement in the method TestCloneAccessStressGCM::test() to 0 when ReduceInitialCardMarks and ReduceBulkZeroing are disabled. The problem in the testcase can be traced back to LoadNode::find_previous_arraycopy() called from LoadNode::Ideal() for the loads dest.i1, dest.i2 etc. where we do not take GC barriers into account (disabled ReduceInitialCardMarks) when trying to find an ArrayCopyNode which belongs to a clone. As a result, we conclude that there is no ArrayCopyNode and bailout of the ideal transformation. Afterwards, we call LoadNode::Value() and look for a stored value for the allocation belonging to the clone() call. Since we cannot find one (because the ArrayCopyNode is initializing the allocation) we conclude that the field is 0 and replace the LoadNode by a constant 0. This happens for all the LoadNodes in the addition in the return statement which is then folded to 0 and returned. This could have been prevented if ReduceBulkZeroing was enabled. Because in that case, the InitializationNode would have been marked as completed at [1] and the InitializationNode::find_captured_store() method returned NULL at [2] and eventually the entire LoadNode::Value() method returned _type (int) instead of the constant 0 because of the bailout at [3] for completed InitializationNodes. Thank you! Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/library_call.cpp#l4234 [2] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3775 [3] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3722 From aph at redhat.com Fri Jul 3 13:40:45 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 3 Jul 2020 14:40:45 +0100 Subject: Running IGV In-Reply-To: References: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> Message-ID: Hi, On 30/06/2020 08:11, Tobias Hartmann wrote: > > igv.sh writes into a log file (.igv.log). The problem might be that you need to run with JDK 8. Thanks. It's better with JDK 8, but although it does load saved XML Ideal Graphs, all it's possible to see is a tree with the names of the compilation passes. No graphs are displayed. I'm guessing IGV must have rotted, and there's no version that works with current HotSpot available. Thanks anyway, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Fri Jul 3 13:48:44 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 3 Jul 2020 15:48:44 +0200 Subject: Running IGV In-Reply-To: References: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> Message-ID: Hi Andrew, On 03.07.20 15:40, Andrew Haley wrote: > Thanks. It's better with JDK 8, but although it does load saved XML > Ideal Graphs, all it's possible to see is a tree with the names of > the compilation passes. No graphs are displayed. After double-clicking on the phase, it sometimes takes a while to load if the graph is huge. If there's an issue, you should at least get an error message (did you check the console?). > I'm guessing IGV must have rotted, and there's no version that works > with current HotSpot available. Well it does work fine for me and I'm using it on a regular basis. Best regards, Tobias From patric.hedlin at oracle.com Fri Jul 3 15:09:17 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 3 Jul 2020 17:09:17 +0200 Subject: RFR(S): 8245021: Add method 'remove_if_existing' to growableArray. In-Reply-To: <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> References: <054bdcb1-9543-eefc-b814-60ad5ab641d3@oracle.com> <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> Message-ID: Hi Tobias, On 2020-05-19 11:33, Tobias Hartmann wrote: > Hi Patric, > > Looks good to me but please add brackets around the for loop. > > Also, there are some more cases of this code pattern. For example, > JvmtiPendingMonitors::destroy/exit and > ShenandoahBarrierSetC2State::remove_enqueue_barrier/remove_load_reference_barrier. Fixed. I moved this to 16 (after JDK-8247755). Added some refactoring to new webrev (refreshed). /Patric > Best regards, > Tobias > > On 18.05.20 22:37, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8245021 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8245021/ >> >> >> 8245021: Add method 'remove_if_existing' to growableArray. >> >> Minor improvement to simplify the code pattern "if contains then remove" found in a few places (in >> "compile.hpp"). >> >> >> Testing: hs-tier1-3 >> >> >> Best regards, >> Patric From patric.hedlin at oracle.com Fri Jul 3 15:09:23 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 3 Jul 2020 17:09:23 +0200 Subject: RFR(S): 8245021: Add method 'remove_if_existing' to growableArray. In-Reply-To: <9c722439-2b3f-a94f-baa6-2ac9aef825c4@oracle.com> References: <054bdcb1-9543-eefc-b814-60ad5ab641d3@oracle.com> <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> <9c722439-2b3f-a94f-baa6-2ac9aef825c4@oracle.com> Message-ID: <244903a6-c870-dc55-41ba-460679b7a779@oracle.com> Thanks for reviewing Nils. Care to take another look? I moved this to 16 (after JDK-8247755). Added some refactoring to new webrev (refreshed). /Patric On 2020-06-02 09:51, Nils Eliasson wrote: > +1 > > Best regards, > Nils Eliasson > > On 2020-05-19 11:33, Tobias Hartmann wrote: >> Hi Patric, >> >> Looks good to me but please add brackets around the for loop. >> >> Also, there are some more cases of this code pattern. For example, >> JvmtiPendingMonitors::destroy/exit and >> ShenandoahBarrierSetC2State::remove_enqueue_barrier/remove_load_reference_barrier. >> >> >> Best regards, >> Tobias >> >> On 18.05.20 22:37, Patric Hedlin wrote: >>> Dear all, >>> >>> I would like to ask for help to review the following change/update: >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8245021 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8245021/ >>> >>> >>> 8245021: Add method 'remove_if_existing' to growableArray. >>> >>> Minor improvement to simplify the code pattern "if contains then >>> remove" found in a few places (in >>> "compile.hpp"). >>> >>> >>> Testing: hs-tier1-3 >>> >>> >>> Best regards, >>> Patric From patric.hedlin at oracle.com Fri Jul 3 15:10:30 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Fri, 3 Jul 2020 17:10:30 +0200 Subject: RFR(S): 8245021: Add method 'remove_if_existing' to growableArray. In-Reply-To: <1EEC80B7-9603-4B8C-A0D4-97D3DE51EBDB@amazon.com> References: <054bdcb1-9543-eefc-b814-60ad5ab641d3@oracle.com> <1EEC80B7-9603-4B8C-A0D4-97D3DE51EBDB@amazon.com> Message-ID: <843b3998-8246-d571-ee8f-9ac795306b8d@oracle.com> Hi, On 2020-05-19 21:03, Liu, Xin wrote: > Hi, Patric, > > I don't object to your change. I feel that the API 'remove' of GrowableArray was not good. Even though it's complexity is still linear, you scan all elements and write some of them. The interface (remove) is what it is I guess. There was no intention to change current behaviour. > The problem is it has to retain order. Actually, I didn't run into any problem when I replace the removing element with the last one. > It suggests that probably nobody in hotspot makes use the sorted GrowableArray. > > I found another interesting point. There's an API delete_at which ignore orders, so I try and replace your remove_if_exists with it. > bool delete_if_existing(const E& elem) { > int index = find(elem); > > if (index != -1) { > _data[index] = _data[--_len]; > return true; > } > > return false; > } > I didn't have any regression in jtreg:hotspot:tier1. Actually, CodeCache::unregister_old_nmethod use the same trick. Indeed, in analogy with *remove*, you might argue that both "a delete" and delete_if_existing are missing in the interface (in cases when order is not required). Perhaps also; delete_all/remove_all for multi entry usage. However, adding one or the other is perhaps another RFE (and might require more than a test-run to replace current uses of *remove*).At the same time, I have to assume that they have not been added for a reason. I moved this to 16 (after JDK-8247755). Added some refactoring to new webrev (refreshed). Best regards, Patric > Here is current implementation of delete_at(). It checks if the index is the last element, and skip copying if so. I am not sure if an extra comparison is worthy here. > Users should use pop() instead in that scenario. > > // The order is changed. > void delete_at(int index) { > assert(0 <= index && index < _len, "illegal index"); > if (index < --_len) { > // Replace removed element with last one. > _data[index] = _data[_len]; > } > } > > Thanks, > --lx > > > Dear all, > > I would like to ask for help to review the following change/update: > > Issue: https://bugs.openjdk.java.net/browse/JDK-8245021 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8245021/ > > > 8245021: Add method 'remove_if_existing' to growableArray. > > Minor improvement to simplify the code pattern "if contains then remove" > found in a few places (in "compile.hpp"). > > > Testing: hs-tier1-3 > > > Best regards, > Patric From nils.eliasson at oracle.com Fri Jul 3 16:18:26 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Fri, 3 Jul 2020 18:18:26 +0200 Subject: RFR(S): 8245021: Add method 'remove_if_existing' to growableArray. In-Reply-To: <244903a6-c870-dc55-41ba-460679b7a779@oracle.com> References: <054bdcb1-9543-eefc-b814-60ad5ab641d3@oracle.com> <243790ff-6640-8f48-b345-b195efc46ede@oracle.com> <9c722439-2b3f-a94f-baa6-2ac9aef825c4@oracle.com> <244903a6-c870-dc55-41ba-460679b7a779@oracle.com> Message-ID: Still looking good! Best regards, Nil On 2020-07-03 17:09, Patric Hedlin wrote: > Thanks for reviewing Nils. > > Care to take another look? > > I moved this to 16 (after JDK-8247755). Added some refactoring to new > webrev (refreshed). > > /Patric > > On 2020-06-02 09:51, Nils Eliasson wrote: >> +1 >> >> Best regards, >> Nils Eliasson >> >> On 2020-05-19 11:33, Tobias Hartmann wrote: >>> Hi Patric, >>> >>> Looks good to me but please add brackets around the for loop. >>> >>> Also, there are some more cases of this code pattern. For example, >>> JvmtiPendingMonitors::destroy/exit and >>> ShenandoahBarrierSetC2State::remove_enqueue_barrier/remove_load_reference_barrier. >>> >>> >>> Best regards, >>> Tobias >>> >>> On 18.05.20 22:37, Patric Hedlin wrote: >>>> Dear all, >>>> >>>> I would like to ask for help to review the following change/update: >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8245021 >>>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8245021/ >>>> >>>> >>>> 8245021: Add method 'remove_if_existing' to growableArray. >>>> >>>> Minor improvement to simplify the code pattern "if contains then >>>> remove" found in a few places (in >>>> "compile.hpp"). >>>> >>>> >>>> Testing: hs-tier1-3 >>>> >>>> >>>> Best regards, >>>> Patric From joserz at linux.ibm.com Fri Jul 3 18:09:34 2020 From: joserz at linux.ibm.com (joserz at linux.ibm.com) Date: Fri, 3 Jul 2020 15:09:34 -0300 Subject: RFR(M): 8248191: PPC: Implement Load/Store Vector with lxvl/stxvl in Power10 In-Reply-To: References: <20200701194910.GA141565@pacoca> Message-ID: <20200703180934.GA14622@pacoca> Hello Martin, Actually, there isn't xxswapd, my bad. In fact we usually need xxswapd to fix the vector lanes after lxvd2x but, if I understand it correctly, that order makes no difference in Hotspot. Site note: GCC does a similar job when generating code at -O1 or higher and they're also avoiding lxvd2x on Power10. Do you want me to resend the e-mail without mentioning xxswapd? Thank you!! Jose On Thu, Jul 02, 2020 at 03:57:22PM +0000, Doerr, Martin wrote: > Where do we save xxswapd instructions? > I can't see it in the webrev. > > Best regards, > Martin > > > -----Original Message----- > > From: hotspot-compiler-dev > retn at openjdk.java.net> On Behalf Of joserz at linux.ibm.com > > Sent: Mittwoch, 1. Juli 2020 21:49 > > To: hotspot-compiler-dev at openjdk.java.net > > Cc: Michihiro Horie > > Subject: RFR(M): 8248191: PPC: Implement Load/Store Vector with lxvl/stxvl > > in Power10 > > > > This patch introduces two instructions lxvl/stvxl and replaces the current > > lxvd2x/stxvd2x to load and store vectors. Like lxvd2x/stxvd2x, lxvl/stxvl can > > access unaligned effective addresses with the advantage of *not* requiring > > xxswapd after lxvd2x (or before stxvd2x) to correct the lanes in little-endian > > mode. > > > > Webrev: https://cr.openjdk.java.net/~mhorie/8248191/webrev.00/ > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248191 > > > > Thanks for your review! > > > > Jose R. Ziviani From vladimir.kozlov at oracle.com Fri Jul 3 18:09:26 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 Jul 2020 11:09:26 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: <5E33E613-882E-400A-886A-EA4FAD85F2EA@oracle.com> References: <5E33E613-882E-400A-886A-EA4FAD85F2EA@oracle.com> Message-ID: <2bde8004-4ed8-8ca3-b387-05240f423e3f@oracle.com> Thank you, Igor Vladimir K On 7/2/20 7:24 PM, igor.ignatyev at oracle.com wrote: > LGTM > > ? Igor > >> On Jul 2, 2020, at 7:03 PM, Vladimir Kozlov wrote: >> >> ?https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8247527 >> >> Test should have @requires which excludes running Graal with GC which it does not support. >> >> Testing: hs-tier1,hs-tier4-graal >> >> Thanks, >> Vladimir > From vladimir.kozlov at oracle.com Fri Jul 3 18:30:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 Jul 2020 11:30:31 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: Thank you, David, for looking on changes. I will remember to update tests. I filed RFE 8248815 [1] for tracking. Can you approve this fix now? Thanks, Vladimir K [1] https://bugs.openjdk.java.net/browse/JDK-8248815 On 7/2/20 10:09 PM, David Holmes wrote: > Hi Igor, > > On 3/07/2020 12:59 pm, Igor Ignatyev wrote: >> Hi David, >> >> it's in my todo list to improve this situation and have vm.gc.X to take selected JIT into account; and update existing >> (>200) occurrences of 'vm.gc.X & !vm.graal.enabled' > > 200+ ouch! :( > > I guess this fix doesn't make the situation any worse in a practical sense. > > Thanks, > David > ----- > >> -- Igor >> >>> On Jul 2, 2020, at 7:25 PM, David Holmes wrote: >>> >>> Hi Vladimir, >>> >>> On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: >>>> https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8247527 >>>> Test should have @requires which excludes running Graal with GC which it does not support. >>> >>> I find it somewhat disturbing that a generic test has to know about the limitations between GCs and Graal! >>> >>> I would have been more inclined to just exclude this test when running with Graal, even if that theoretically reduced >>> the test coverage in a ting way. >>> >>> If/When Graal supports these other GCs who will remember to re-enable these test cases? >>> >>> Thanks, >>> David >>> >>>> Testing: hs-tier1,hs-tier4-graal >>>> Thanks, >>>> Vladimir >> From vladimir.kozlov at oracle.com Fri Jul 3 18:36:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 Jul 2020 11:36:31 -0700 Subject: [aarch64-port-dev ] RFR(XXS):8248570 Incorrect copyright header in TestUnsafeUnalignedSwap.java In-Reply-To: References: <587101a8-7cb0-453b-aed5-4edca2cdda2d.zhuoren.wz@alibaba-inc.com> Message-ID: I forgot to ask to push the fix into jdk/jdk15 repository to fix it in JDK 15. It will be automatically forward ported into JDK 16 later. Thanks, Vladimir K On 7/2/20 10:49 AM, Vladimir Kozlov wrote: > Thank you, Zhuoren > > Checks passed now. > > Vladimir K > > On 7/2/20 12:36 AM, Wang Zhuo(Zhuoren) wrote: >> Hi, >> There's something wrong int the legal notice of TestUnsafeUnalignedSwap.java file. It should be GPLv2 as in >> `make/templates/gpl-header`. This patch(from Vladimir Kozlov) fixes it. >> BUG Link:https://bugs.openjdk.java.net/browse/JDK-8248570 >> CR: http://cr.openjdk.java.net/~wzhuo/8248570/webrev.00/ >> >> >> Regards, >> Zhuoren >> From vladimir.kozlov at oracle.com Fri Jul 3 18:37:45 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 Jul 2020 11:37:45 -0700 Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> Message-ID: <67de0043-b2b4-17c7-a6e2-df44954305e7@oracle.com> +1 Thanks, Vladimir K On 7/3/20 12:37 AM, Tobias Hartmann wrote: > Hi Christian, > > On 03.07.20 09:19, Christian Hagedorn wrote: >> Sounds reasonable. I changed that in a new webrev: >> http://cr.openjdk.java.net/~chagedorn/8248596/webrev.01/ > > Looks good to me. > > Best regards, > Tobias > From david.holmes at oracle.com Fri Jul 3 22:18:53 2020 From: david.holmes at oracle.com (David Holmes) Date: Sat, 4 Jul 2020 08:18:53 +1000 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: References: Message-ID: <030c19da-616e-3b05-da33-5add5e6da747@oracle.com> On 4/07/2020 4:30 am, Vladimir Kozlov wrote: > Thank you, David, for looking on changes. > > I will remember to update tests. I filed RFE 8248815 [1] for tracking. > > Can you approve this fix now? Yes - thanks. David > Thanks, > Vladimir K > > [1] https://bugs.openjdk.java.net/browse/JDK-8248815 > > On 7/2/20 10:09 PM, David Holmes wrote: >> Hi Igor, >> >> On 3/07/2020 12:59 pm, Igor Ignatyev wrote: >>> Hi David, >>> >>> it's in my todo list to improve this situation and have vm.gc.X to >>> take selected JIT into account; and update existing (>200) >>> occurrences of 'vm.gc.X & !vm.graal.enabled' >> >> 200+ ouch! :( >> >> I guess this fix doesn't make the situation any worse in a practical >> sense. >> >> Thanks, >> David >> ----- >> >>> -- Igor >>> >>>> On Jul 2, 2020, at 7:25 PM, David Holmes >>>> wrote: >>>> >>>> Hi Vladimir, >>>> >>>> On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: >>>>> https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >>>>> https://bugs.openjdk.java.net/browse/JDK-8247527 >>>>> Test should have @requires which excludes running Graal with GC >>>>> which it does not support. >>>> >>>> I find it somewhat disturbing that a generic test has to know about >>>> the limitations between GCs and Graal! >>>> >>>> I would have been more inclined to just exclude this test when >>>> running with Graal, even if that theoretically reduced the test >>>> coverage in a ting way. >>>> >>>> If/When Graal supports these other GCs who will remember to >>>> re-enable these test cases? >>>> >>>> Thanks, >>>> David >>>> >>>>> Testing: hs-tier1,hs-tier4-graal >>>>> Thanks, >>>>> Vladimir >>> From vladimir.kozlov at oracle.com Fri Jul 3 22:47:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 3 Jul 2020 15:47:24 -0700 Subject: [15] RFR(T) 8247527: serviceability/dcmd/gc/HeapDumpCompressedTest.java fails with Graal + ZGC In-Reply-To: <030c19da-616e-3b05-da33-5add5e6da747@oracle.com> References: <030c19da-616e-3b05-da33-5add5e6da747@oracle.com> Message-ID: Thank you, David Vladimir K On 7/3/20 3:18 PM, David Holmes wrote: > On 4/07/2020 4:30 am, Vladimir Kozlov wrote: >> Thank you, David, for looking on changes. >> >> I will remember to update tests. I filed RFE 8248815 [1] for tracking. >> >> Can you approve this fix now? > > Yes - thanks. > > David > >> Thanks, >> Vladimir K >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8248815 >> >> On 7/2/20 10:09 PM, David Holmes wrote: >>> Hi Igor, >>> >>> On 3/07/2020 12:59 pm, Igor Ignatyev wrote: >>>> Hi David, >>>> >>>> it's in my todo list to improve this situation and have vm.gc.X to take selected JIT into account; and update >>>> existing (>200) occurrences of 'vm.gc.X & !vm.graal.enabled' >>> >>> 200+ ouch! :( >>> >>> I guess this fix doesn't make the situation any worse in a practical sense. >>> >>> Thanks, >>> David >>> ----- >>> >>>> -- Igor >>>> >>>>> On Jul 2, 2020, at 7:25 PM, David Holmes wrote: >>>>> >>>>> Hi Vladimir, >>>>> >>>>> On 3/07/2020 12:02 pm, Vladimir Kozlov wrote: >>>>>> https://cr.openjdk.java.net/~kvn/8247527/webrev.00/ >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247527 >>>>>> Test should have @requires which excludes running Graal with GC which it does not support. >>>>> >>>>> I find it somewhat disturbing that a generic test has to know about the limitations between GCs and Graal! >>>>> >>>>> I would have been more inclined to just exclude this test when running with Graal, even if that theoretically >>>>> reduced the test coverage in a ting way. >>>>> >>>>> If/When Graal supports these other GCs who will remember to re-enable these test cases? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>>> Testing: hs-tier1,hs-tier4-graal >>>>>> Thanks, >>>>>> Vladimir >>>> From christian.hagedorn at oracle.com Mon Jul 6 06:49:49 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 6 Jul 2020 06:49:49 +0000 (UTC) Subject: [16] RFR(T): 8248596: [TESTBUG] compiler/loopopts/PartialPeelingUnswitch.java times out with Graal enabled In-Reply-To: <67de0043-b2b4-17c7-a6e2-df44954305e7@oracle.com> References: <19cf54c7-776d-63e4-6d40-bd84733a2f17@oracle.com> <87mu4i9d5e.fsf@redhat.com> <905225a7-8e07-ba19-f9b4-d5fad89e68ce@oracle.com> <8a88f3f6-fd14-7e93-2013-a0f37e6b7094@oracle.com> <67de0043-b2b4-17c7-a6e2-df44954305e7@oracle.com> Message-ID: <870d5242-aa28-ecf2-f78c-9b9dc07d7c54@oracle.com> Thank you Vladimir for your review again! Best regards, Christian On 03.07.20 20:37, Vladimir Kozlov wrote: > +1 > > Thanks, > Vladimir K > > On 7/3/20 12:37 AM, Tobias Hartmann wrote: >> Hi Christian, >> >> On 03.07.20 09:19, Christian Hagedorn wrote: >>> Sounds reasonable. I changed that in a new webrev: >>> http://cr.openjdk.java.net/~chagedorn/8248596/webrev.01/ >> >> Looks good to me. >> >> Best regards, >> Tobias >> From boris.ulasevich at bell-sw.com Mon Jul 6 09:17:54 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Mon, 6 Jul 2020 12:17:54 +0300 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr In-Reply-To: References: Message-ID: <10386b46-ada9-2ee9-2a53-9397faf23f87@bell-sw.com> Thank you Vladimir. May I consider the change trivial or should I ask for more reviews? regards, Boris On 03.07.2020 00:45, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/2/20 2:29 PM, Boris Ulasevich wrote: >> Hi Vladimir, >> >> Thank you. I applied your suggestions. On our machines jtreg runs well. >> Update: http://cr.openjdk.java.net/~bulasevich/8248568/webrev.01 >> >> regards, >> Boris >> >> On Thu, Jul 2, 2020 at 9:54 PM Vladimir Kozlov >> >> wrote: >> >>> Good. >>> >>> You may also replace next requirements: >>> >>> vm.flavor == "server" & !vm.graal.enabled >>> >>> with one: >>> >>> vm.compiler2.enabled >>> >>> Graal and C2 are mutually exclusive. >>> >>> May be also run processes without C1 by switching off Tiered >>> Compilation. >>> >>> And instead of: >>> @run main/othervm compiler.c2.TestBit >>> >>> use: >>> @run driver compiler.c2.TestBit >>> >>> Because you launching separate processes. >>> >>> Please, test changes with jtreg testing. >>> >>> Thanks, >>> Vladimir K >>> >>> On 7/2/20 11:13 AM, Boris Ulasevich wrote: >>>> Hi, >>>> >>>> Please review a one-line change: adding -Xbatch option to recently >>>> introduced test to get a more predictable PrintOptoAssembly output. >>>> >>>> http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 >>>> http://bugs.openjdk.java.net/browse/JDK-8248568 >>>> >>>> thanks, >>>> Boris >>> From rwestrel at redhat.com Mon Jul 6 15:55:19 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Mon, 06 Jul 2020 17:55:19 +0200 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> Message-ID: <87r1topriw.fsf@redhat.com> I took that bug over. Thanks to Patric for helping me understand the root cause of the bug. http://cr.openjdk.java.net/~roland/8229495/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8229495 > The approach to insert range-check guards (see, JDK-8193130, > JDK-8216135, JDK-8240335) between the pre- and the main-loopis somewhat > problematic. The immediate problem here is due to an inherent dependency > between the additional (template) range-check guards introduced (during > RCE) and the state of the loop, such as the level of loop-unrolling.To > keep the range-check guards valid through the compilation, these > arere-generated when/if the main-loop is unrolled further. Here, the > error is introduced when a guard is generated with an illegal offset, > that will erroneously cut the path to the main-loop (resulting in a > 'Halt'). The reason for range-checks to be present in the main-loop to > begin with is due to a failing dominator search (this was also corrected > in JDK-8231412, for JDK14). For a range check: scale * i + offset References: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> Message-ID: <515c1402-4c7a-9d3b-e4fa-ac2a6d43da4c@oracle.com> Looks good. Thanks, Vladimir On 7/3/20 4:42 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248226 > http://cr.openjdk.java.net/~chagedorn/8248226/webrev.00/ > > C2 erroneously folds the addition in the return statement in the method TestCloneAccessStressGCM::test() to 0 when > ReduceInitialCardMarks and ReduceBulkZeroing are disabled. > > The problem in the testcase can be traced back to LoadNode::find_previous_arraycopy() called from LoadNode::Ideal() for > the loads dest.i1, dest.i2 etc. where we do not take GC barriers into account (disabled ReduceInitialCardMarks) when > trying to find an ArrayCopyNode which belongs to a clone. > > As a result, we conclude that there is no ArrayCopyNode and bailout of the ideal transformation. Afterwards, we call > LoadNode::Value() and look for a stored value for the allocation belonging to the clone() call. Since we cannot find one > (because the ArrayCopyNode is initializing the allocation) we conclude that the field is 0 and replace the LoadNode by a > constant 0. This happens for all the LoadNodes in the addition in the return statement which is then folded to 0 and > returned. > > This could have been prevented if ReduceBulkZeroing was enabled. Because in that case, the InitializationNode would have > been marked as completed at [1] and the InitializationNode::find_captured_store() method returned NULL at [2] and > eventually the entire LoadNode::Value() method returned _type (int) instead of the constant 0 because of the bailout at > [3] for completed InitializationNodes. > > Thank you! > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/library_call.cpp#l4234 > [2] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3775 > [3] http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3722 From nick.gasson at arm.com Tue Jul 7 06:58:06 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 07 Jul 2020 14:58:06 +0800 Subject: [15] RFR(S): 8248845: AArch64: stack corruption after spilling vector register Message-ID: <857dvfrev5.fsf@arm.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8248845 Webrev: http://cr.openjdk.java.net/~ngasson/8248845/webrev.0/ This crash was seen on the Panama vectorIntrinsics branch and a minimal test case is attached to the JBS entry, but it should also affect vanilla jdk/jdk although I haven't found a reliable way to reproduce it. 0x0000ffffa173fd40: ldr x11, [sp, #120] 0x0000ffffa173fd44: ldr x10, [sp, #48] 0x0000ffffa173fd48: add x10, x10, x11 0x0000ffffa173fd4c: str q16, [x10, #16] ; <==== CRASH HERE X10 loaded from sp+48 contains a valid pointer but X11 from sp+120 contains a garbage value. Here's the relevant opto assembly that spills to sp+120: 6b4 + spill [sp, #80] -> [sp, #16] # vector spill size = 128 6bc + spill [sp, #24] -> [sp, #120] # spill size = 64 These two instructions have been scheduled in the wrong order: 6b4 writes 16 bytes at sp+16 which overwrites another live value at sp+24. Instruction 6bc spills the clobbered value at sp+24 to sp+120. If I dump out the instructions before scheduling or pass -XX:-OptoScheduling the order is correct. It seems to be a known limitation that the scheduler doesn't correctly compute anti-dependencies when a vector occupies more than two slots, because PhaseOutput::ScheduleAndBundle() already has a check to skip scheduling if a too-wide vector was generated. Unfortunately the check is wrong as a pair of slots is 8 bytes not 16: // Scheduling code works only with pairs (16 bytes) maximum. if (C->max_vector_size() > 16) Actually the test here used to be > 8, but was changed as part of JDK-8076276 which added AVX512 support [1]. I couldn't see any explanation of that change in the bug or mailing list thread, but it seems wrong and reverting it fixes this crash. This affects AArch64 because OptoScheduling is enabled by default and NEON vectors are 16 bytes wide. Tested hotspot_all_no_apps, jdk_core on AArch64 and x86_64. [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017579.html -- Thanks, Nick From christian.hagedorn at oracle.com Tue Jul 7 07:19:25 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 7 Jul 2020 09:19:25 +0200 Subject: [16] RFR(S): 8248226: TestCloneAccessStressGCM fails with -XX:-ReduceBulkZeroing In-Reply-To: <515c1402-4c7a-9d3b-e4fa-ac2a6d43da4c@oracle.com> References: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> <515c1402-4c7a-9d3b-e4fa-ac2a6d43da4c@oracle.com> Message-ID: <23ee0766-6316-e7bd-d3bd-e9345b082f91@oracle.com> Thank you Vladimir for your review! Best regards, Christian On 07.07.20 02:23, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/3/20 4:42 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248226 >> http://cr.openjdk.java.net/~chagedorn/8248226/webrev.00/ >> >> C2 erroneously folds the addition in the return statement in the >> method TestCloneAccessStressGCM::test() to 0 when >> ReduceInitialCardMarks and ReduceBulkZeroing are disabled. >> >> The problem in the testcase can be traced back to >> LoadNode::find_previous_arraycopy() called from LoadNode::Ideal() for >> the loads dest.i1, dest.i2 etc. where we do not take GC barriers into >> account (disabled ReduceInitialCardMarks) when trying to find an >> ArrayCopyNode which belongs to a clone. >> >> As a result, we conclude that there is no ArrayCopyNode and bailout of >> the ideal transformation. Afterwards, we call LoadNode::Value() and >> look for a stored value for the allocation belonging to the clone() >> call. Since we cannot find one (because the ArrayCopyNode is >> initializing the allocation) we conclude that the field is 0 and >> replace the LoadNode by a constant 0. This happens for all the >> LoadNodes in the addition in the return statement which is then folded >> to 0 and returned. >> >> This could have been prevented if ReduceBulkZeroing was enabled. >> Because in that case, the InitializationNode would have been marked as >> completed at [1] and the InitializationNode::find_captured_store() >> method returned NULL at [2] and eventually the entire >> LoadNode::Value() method returned _type (int) instead of the constant >> 0 because of the bailout at [3] for completed InitializationNodes. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/library_call.cpp#l4234 >> >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3775 >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/a7c030723240/src/hotspot/share/opto/memnode.cpp#l3722 >> From rwestrel at redhat.com Tue Jul 7 07:27:16 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Tue, 07 Jul 2020 09:27:16 +0200 Subject: [16] RFR(S): 8248226: TestCloneAccessStressGCM fails with -XX:-ReduceBulkZeroing In-Reply-To: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> References: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> Message-ID: <87o8orpyy3.fsf@redhat.com> > http://cr.openjdk.java.net/~chagedorn/8248226/webrev.00/ Looks good. Roland. From christian.hagedorn at oracle.com Tue Jul 7 07:31:57 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 7 Jul 2020 09:31:57 +0200 Subject: [16] RFR(S): 8248226: TestCloneAccessStressGCM fails with -XX:-ReduceBulkZeroing In-Reply-To: <87o8orpyy3.fsf@redhat.com> References: <88cd871e-b05a-5803-cc11-f082fc18f80b@oracle.com> <87o8orpyy3.fsf@redhat.com> Message-ID: <2a59757e-3e51-309b-8ef3-73e8a0d357d0@oracle.com> Thank you Roland for your review! Best regards, Christian On 07.07.20 09:27, Roland Westrelin wrote: > >> http://cr.openjdk.java.net/~chagedorn/8248226/webrev.00/ > > Looks good. > > Roland. > From patric.hedlin at oracle.com Tue Jul 7 11:00:12 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 7 Jul 2020 13:00:12 +0200 Subject: RFR(S): 8248901: Signed immediate support in .../share/assembler.hpp is broken. Message-ID: <3df3dab6-aa2f-bbbc-d231-6cda8f2a0ff7@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8248901 Webrev: http://cr.openjdk.java.net/~phedlin/tr8248901/ Current definition(s) of is_simm() and friends are not robust over inputs. Both min and max values are undefined for width > 32 (and width < 0). No is_uimm() is currently provided (added). Several definitions are not used (cleanup). NOTE: Adding currently unused is_simm9() and is_uimm12(), required by JDK-8247766. Testing: hs-tier1-3 Best regards, Patric From patric.hedlin at oracle.com Tue Jul 7 11:17:43 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 7 Jul 2020 13:17:43 +0200 Subject: RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn Message-ID: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Dear all, I would like to ask for help to review the following change/update: Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ C1 code generation for reading and writing stack-slots does not handle large immediate offsets on aarch64. This patch will ensure that immediate offsets are admissible for base+(immediate)offset encoding or, if this is not the case, will enforce an explicit address calculation to a scratch register. (Also correcting a small glitch in 9-bit signed immediate encoding check.) NOTE: Current patch includes (local) definitions of is_simm/9 and is_uimm/12, for review purpose only. With JDK-8248901 these will move to Assembler, and will not be included in the change-set. Testing: tier1-3,6 Best regards, Patric From nils.eliasson at oracle.com Tue Jul 7 13:55:56 2020 From: nils.eliasson at oracle.com (Nils Eliasson) Date: Tue, 7 Jul 2020 15:55:56 +0200 Subject: RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: <1cedcefe-547c-80ad-854f-0a38e7a07639@oracle.com> Hi Patric, There are some minor typos in the comments. Otherwise it looks good. No re-review needed. Best regards, Nils src/hotspot/cpu/aarch64/assembler_aarch64.hpp: "// Scaled unsigned offset, ecoded in an unsigned imm12:_ field." ecoded -> encoded imm12:_ field.? -> imm12 field "// Unscaled signed offset, ecoded in a signed imm9 field." ecoded -> encoded "// Scaled unsigned offset, ecoded in an unsigned imm12:_ field." ecoded -> encoded imm12:_ field.? -> imm12 field On 2020-07-07 13:17, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ > > > C1 code generation for reading and writing stack-slots does not handle > large immediate offsets on aarch64. This patch will ensure that > immediate offsets are admissible for base+(immediate)offset encoding > or, if this is not the case, will enforce an explicit address > calculation to a scratch register. (Also correcting a small glitch in > 9-bit signed immediate encoding check.) > > NOTE: Current patch includes (local) definitions of is_simm/9 and > is_uimm/12, for review purpose only. With JDK-8248901 these will move > to Assembler, and will not be included in the change-set. > > > Testing: tier1-3,6 > > > Best regards, > Patric > > From patric.hedlin at oracle.com Tue Jul 7 14:20:47 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Tue, 7 Jul 2020 16:20:47 +0200 Subject: RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <1cedcefe-547c-80ad-854f-0a38e7a07639@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> <1cedcefe-547c-80ad-854f-0a38e7a07639@oracle.com> Message-ID: <6217195b-fe7b-c28a-2ae8-292a2e319e95@oracle.com> Thanks for reviewing Nils. On 2020-07-07 15:55, Nils Eliasson wrote: > Hi Patric, > > There are some minor typos in the comments. Otherwise it looks good. > > No re-review needed. > > Best regards, > Nils > > src/hotspot/cpu/aarch64/assembler_aarch64.hpp: > > "// Scaled unsigned offset, ecoded in an unsigned imm12:_ field." > > ecoded -> encoded > imm12:_ field.? -> imm12 field > > "// Unscaled signed offset, ecoded in a signed imm9 field." > > ecoded -> encoded > > "// Scaled unsigned offset, ecoded in an unsigned imm12:_ field." > > ecoded -> encoded > imm12:_ field.? -> imm12 field > Ok, that was may obviously failed attempt to illustrate that there is scaling... /Patric > > > > On 2020-07-07 13:17, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ >> >> >> C1 code generation for reading and writing stack-slots does not >> handle large immediate offsets on aarch64. This patch will ensure >> that immediate offsets are admissible for base+(immediate)offset >> encoding or, if this is not the case, will enforce an explicit >> address calculation to a scratch register. (Also correcting a small >> glitch in 9-bit signed immediate encoding check.) >> >> NOTE: Current patch includes (local) definitions of is_simm/9 and >> is_uimm/12, for review purpose only. With JDK-8248901 these will move >> to Assembler, and will not be included in the change-set. >> >> >> Testing: tier1-3,6 >> >> >> Best regards, >> Patric From fairoz.matte at oracle.com Tue Jul 7 14:49:11 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 7 Jul 2020 07:49:11 -0700 (PDT) Subject: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java fails with -Xcomp -XX:TieredStopAtLevel=1 Message-ID: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> Hi, Please review this small test change to consider the scenario when there is no "printmdo" output JBS - https://bugs.openjdk.java.net/browse/JDK-8236042 Webrev - http://cr.openjdk.java.net/~fmatte/8236042/webrev.00/ Thanks, Fairoz From boris.ulasevich at bell-sw.com Tue Jul 7 15:47:35 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Tue, 7 Jul 2020 18:47:35 +0300 Subject: RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values Message-ID: Hi, Please review the change to skip i2l conversion after the mask: http://cr.openjdk.java.net/~bulasevich/8248870/webrev.00 http://bugs.openjdk.java.net/browse/JDK-8248870 With the change the micro-benchmark gets 11.082->7.520 ns/op performance improvement. Tested with jtreg. thanks, Boris From hohensee at amazon.com Tue Jul 7 16:20:46 2020 From: hohensee at amazon.com (Hohensee, Paul) Date: Tue, 7 Jul 2020 16:20:46 +0000 Subject: RFR(XS) 8248568: compiler/c2/TestBit.java failed: 'test' missing from stdout/stderr Message-ID: In case it's not judged to be trivial, lgtm. Thanks, Paul ?On 7/6/20, 2:38 AM, "hotspot-compiler-dev on behalf of Boris Ulasevich" wrote: Thank you Vladimir. May I consider the change trivial or should I ask for more reviews? regards, Boris On 03.07.2020 00:45, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/2/20 2:29 PM, Boris Ulasevich wrote: >> Hi Vladimir, >> >> Thank you. I applied your suggestions. On our machines jtreg runs well. >> Update: http://cr.openjdk.java.net/~bulasevich/8248568/webrev.01 >> >> regards, >> Boris >> >> On Thu, Jul 2, 2020 at 9:54 PM Vladimir Kozlov >> >> wrote: >> >>> Good. >>> >>> You may also replace next requirements: >>> >>> vm.flavor == "server" & !vm.graal.enabled >>> >>> with one: >>> >>> vm.compiler2.enabled >>> >>> Graal and C2 are mutually exclusive. >>> >>> May be also run processes without C1 by switching off Tiered >>> Compilation. >>> >>> And instead of: >>> @run main/othervm compiler.c2.TestBit >>> >>> use: >>> @run driver compiler.c2.TestBit >>> >>> Because you launching separate processes. >>> >>> Please, test changes with jtreg testing. >>> >>> Thanks, >>> Vladimir K >>> >>> On 7/2/20 11:13 AM, Boris Ulasevich wrote: >>>> Hi, >>>> >>>> Please review a one-line change: adding -Xbatch option to recently >>>> introduced test to get a more predictable PrintOptoAssembly output. >>>> >>>> http://cr.openjdk.java.net/~bulasevich/8248568/webrev.00 >>>> http://bugs.openjdk.java.net/browse/JDK-8248568 >>>> >>>> thanks, >>>> Boris >>> From vladimir.kozlov at oracle.com Tue Jul 7 17:07:04 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 7 Jul 2020 10:07:04 -0700 Subject: [15] RFR(S): 8248845: AArch64: stack corruption after spilling vector register In-Reply-To: <857dvfrev5.fsf@arm.com> References: <857dvfrev5.fsf@arm.com> Message-ID: <0eeec297-f2e1-e326-5d3a-eb4a11e47934@oracle.com> Thank you, Nick You are absolutely right that it was mistake change in 8076276. And we don't do scheduling for x86. Do you need sponsorship for push? The fix is trivial and should be pushed into jdk/jdk15. Thanks, Vladimir On 7/6/20 11:58 PM, Nick Gasson wrote: > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8248845 > Webrev: http://cr.openjdk.java.net/~ngasson/8248845/webrev.0/ > > This crash was seen on the Panama vectorIntrinsics branch and a minimal > test case is attached to the JBS entry, but it should also affect > vanilla jdk/jdk although I haven't found a reliable way to reproduce it. > > 0x0000ffffa173fd40: ldr x11, [sp, #120] > 0x0000ffffa173fd44: ldr x10, [sp, #48] > 0x0000ffffa173fd48: add x10, x10, x11 > 0x0000ffffa173fd4c: str q16, [x10, #16] ; <==== CRASH HERE > > X10 loaded from sp+48 contains a valid pointer but X11 from sp+120 > contains a garbage value. Here's the relevant opto assembly that spills > to sp+120: > > 6b4 + spill [sp, #80] -> [sp, #16] # vector spill size = 128 > 6bc + spill [sp, #24] -> [sp, #120] # spill size = 64 > > These two instructions have been scheduled in the wrong order: 6b4 > writes 16 bytes at sp+16 which overwrites another live value at sp+24. > Instruction 6bc spills the clobbered value at sp+24 to sp+120. If I dump > out the instructions before scheduling or pass -XX:-OptoScheduling the > order is correct. > > It seems to be a known limitation that the scheduler doesn't correctly > compute anti-dependencies when a vector occupies more than two slots, > because PhaseOutput::ScheduleAndBundle() already has a check to skip > scheduling if a too-wide vector was generated. Unfortunately the check > is wrong as a pair of slots is 8 bytes not 16: > > // Scheduling code works only with pairs (16 bytes) maximum. > if (C->max_vector_size() > 16) > > Actually the test here used to be > 8, but was changed as part of > JDK-8076276 which added AVX512 support [1]. I couldn't see any > explanation of that change in the bug or mailing list thread, but it > seems wrong and reverting it fixes this crash. > > This affects AArch64 because OptoScheduling is enabled by default and > NEON vectors are 16 bytes wide. > > Tested hotspot_all_no_apps, jdk_core on AArch64 and x86_64. > > [1] http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017579.html > > -- > Thanks, > Nick > From chris.plummer at oracle.com Tue Jul 7 22:07:49 2020 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 7 Jul 2020 15:07:49 -0700 Subject: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java fails with -Xcomp -XX:TieredStopAtLevel=1 In-Reply-To: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> References: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> Message-ID: <70057c31-e535-f03a-391d-d181b2ec150b@oracle.com> Hi Fairoz, Looks good, except for the missing space in "if(testJavaOpts...". thanks, Chris On 7/7/20 7:49 AM, Fairoz Matte wrote: > Hi, > > Please review this small test change to consider the scenario when there is no "printmdo" output > > JBS - https://bugs.openjdk.java.net/browse/JDK-8236042 > Webrev - http://cr.openjdk.java.net/~fmatte/8236042/webrev.00/ > > Thanks, > Fairoz From igor.ignatyev at oracle.com Wed Jul 8 00:38:18 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 7 Jul 2020 17:38:18 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account Message-ID: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ > 241 lines changed: 34 ins; 5 del; 202 mod; Hi all, could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. from JBS: > not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. > > as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT Thanks, -- Igor From vladimir.kozlov at oracle.com Wed Jul 8 03:00:12 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 7 Jul 2020 20:00:12 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: References: Message-ID: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> Nice clean up, Igor test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug + * @requires vm.gc == "null" + * @requires !vm.debug test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. * @requires vm.gc.Z * @requires vm.gc.Serial * @requires vm.gc == null Thanks, Vladimir On 7/7/20 5:38 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >> 241 lines changed: 34 ins; 5 del; 202 mod; > > > Hi all, > > could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? > > the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. > > from JBS: >> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >> >> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. > > webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 > testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Wed Jul 8 03:30:38 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 7 Jul 2020 20:30:38 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> References: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> Message-ID: <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> Hi Vladimir, thanks for your review! `vm.gc` and `vm.gc.X`-s are different beasts (and admittedly, they confuse people a lot), `vm.gc` is set to "X", by jtreg itself, only if there is UseXGC in vm flags, otherwise it's "null". `vm.gc.X` are set by VMProps class, and you can have more than one vm.gc.X == true, as vm.gc.X means that X gc is supported by JVM and it can be selected; so if there are no Use.*GC in vm flags, vm.gc.X will yield true for all GCs which JVM was built with; if one of UseXGC is provided, only corresponding vm.gc.X is true, and all others are false. so to answer your questions, yes `vm.gc` can be "null" (if there are no Use.*GC) , and yes `vm.gc.Z & vm.gc.Serial & vm.gc == null` can be true (if there are no Use.*GC and JVM supports both Z and Serial GCs). Thanks, -- Igor > On Jul 7, 2020, at 8:00 PM, Vladimir Kozlov wrote: > > Nice clean up, Igor > > test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java > > Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. > > - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug > + * @requires vm.gc == "null" > + * @requires !vm.debug > > > test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java > > Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. > > * @requires vm.gc.Z > * @requires vm.gc.Serial > * @requires vm.gc == null > > > Thanks, > Vladimir > > On 7/7/20 5:38 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>> 241 lines changed: 34 ins; 5 del; 202 mod; >> Hi all, >> could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? >> the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. >> from JBS: >>> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >>> >>> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. >> webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 >> testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT >> Thanks, >> -- Igor From fairoz.matte at oracle.com Wed Jul 8 03:47:48 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Tue, 7 Jul 2020 20:47:48 -0700 (PDT) Subject: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java fails with -Xcomp -XX:TieredStopAtLevel=1 In-Reply-To: <70057c31-e535-f03a-391d-d181b2ec150b@oracle.com> References: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> <70057c31-e535-f03a-391d-d181b2ec150b@oracle.com> Message-ID: <958fecdf-d7a1-4b22-835e-a75fadda0a84@default> Thanks Chris, for the review comments. I have updated the suggested change. Thanks, Fairoz > -----Original Message----- > From: Chris Plummer > Sent: Wednesday, July 8, 2020 3:38 AM > To: Fairoz Matte ; hotspot-compiler- > dev at openjdk.java.net; serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java > fails with -Xcomp -XX:TieredStopAtLevel=1 > > Hi Fairoz, > > Looks good, except for the missing space in "if(testJavaOpts...". > > thanks, > > Chris > > On 7/7/20 7:49 AM, Fairoz Matte wrote: > > Hi, > > > > Please review this small test change to consider the scenario when there is no > "printmdo" output > > > > JBS - https://bugs.openjdk.java.net/browse/JDK-8236042 > > Webrev - http://cr.openjdk.java.net/~fmatte/8236042/webrev.00/ > > > > Thanks, > > Fairoz > From igor.ignatyev at oracle.com Wed Jul 8 06:56:27 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 7 Jul 2020 23:56:27 -0700 Subject: RFR [15] : 8249018 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_mlvm tests Message-ID: http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 > 116 lines changed: 0 ins; 64 del; 52 mod; Hi all, could you please review the patch which removes `FileInstaller . .` jtreg action from :vmTestbase_vm_mlvm tests? from the main issue(8204985): > all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. testing: :vmTestbase_vm_mlvm on linux-x64 webrev: http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8249018 Thanks, -- Igor From Yang.Zhang at arm.com Wed Jul 8 07:05:09 2020 From: Yang.Zhang at arm.com (Yang Zhang) Date: Wed, 8 Jul 2020 07:05:09 +0000 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> Message-ID: Hi Andrew I have updated this patch. Could you please help to review it again? In this patch, the following changes are made: 1. Separate newly added NEON instructions to a new ad file aarch64_neon.ad 2. Add assembler tests for NEON instructions. Trailing spaces in the python script are also removed. http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ Thanks, Yang -----Original Message----- From: Andrew Haley Sent: Tuesday, June 30, 2020 12:10 AM To: Yang Zhang ; Viswanathan, Sandhya ; Paul Sandoz Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes On 29/06/2020 08:48, Yang Zhang wrote: > 1. Instructions that can be matched with NEON instructions directly. > MulVB, SqrtVF and AbsV have been merged into jdk master already. > > 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. > Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. > > 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. > These instructions cannot be moved into jdk master first because there isn't middle-end support. > > I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. Thank you *very* much for your hard work. Appreciated! -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From xxinliu at amazon.com Wed Jul 8 08:26:01 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 8 Jul 2020 08:26:01 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> Message-ID: <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> hi, Reviewers, Please allow me to ping this CR. It's the last left-over task for -XX:ControlIntrinsic=. it adds a sanity check for user-input. Thanks, --lx On 6/25/20 6:59 PM, Liu, Xin wrote: hi, Reviewers, Could you review this patch? bug: https://bugs.openjdk.java.net/browse/JDK-8247732 webrev: http://cr.openjdk.java.net/~xliu/8247732/00/webrev/ The core logic is class ControlIntrinsicValidator in compilerDirectives.hpp It iterates the ccstrlist option and makes sure user-input intrinsic ids are all valid. It stops and take a record when it meets the first unrecognized intrinsic. I used constraints to validate the global options ControlIntrinsic and DisableIntrinsic. ControlIntrinsic/DisableIntrinsic in compiler directives are more complex. The matched directive is only parsed when hotspot attempts to compile the corresponding method. I validate at that time and JVM will crash if it doesnot meet guarantee() statement. I added Method::external_name_short() which only returns the shorter method name in the form of "classname::method". Probably hotspot has had similar code, but I failed to discover. please let me know and I will remove it. Test: passed hotspot:tier1 and gtest:all manually tests with wrong inputs. https://bugs.openjdk.java.net/browse/JDK-8247732?focusedCommentId=14349960&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14349960 From nick.gasson at arm.com Wed Jul 8 09:28:27 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Wed, 08 Jul 2020 17:28:27 +0800 Subject: [15] RFR(S): 8248845: AArch64: stack corruption after spilling vector register In-Reply-To: <0eeec297-f2e1-e326-5d3a-eb4a11e47934@oracle.com> References: <857dvfrev5.fsf@arm.com> <0eeec297-f2e1-e326-5d3a-eb4a11e47934@oracle.com> Message-ID: <854kqiqrt0.fsf@arm.com> On 07/08/20 01:07 am, Vladimir Kozlov wrote: > > You are absolutely right that it was mistake change in 8076276. And we don't do scheduling for x86. I wonder whether we should only do scheduling on AArch64 for in-order CPUs? I tried SPECjvm with/without OptoScheduling on a few different AArch64 systems but couldn't get conclusive results either way. > Do you need sponsorship for push? The fix is trivial and should be pushed into jdk/jdk15. > I pushed it to jdk15, thanks. -- Nick From aph at redhat.com Wed Jul 8 09:46:53 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 8 Jul 2020 10:46:53 +0100 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: References: Message-ID: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> On 07/07/2020 16:47, Boris Ulasevich wrote: > Please review the change to skip i2l conversion after the mask: > > http://cr.openjdk.java.net/~bulasevich/8248870/webrev.00 > http://bugs.openjdk.java.net/browse/JDK-8248870 You seem to have inserted this between the DO NOT EDIT THIS SECTION markers. Please hold off this change until I've committed the patch for 8248414. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Wed Jul 8 11:28:23 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 8 Jul 2020 12:28:23 +0100 Subject: Running IGV In-Reply-To: References: <09f19846-cd66-85ed-c491-c5348d8fe532@redhat.com> Message-ID: <4ce51de9-0cf3-8548-ca2b-bced67e3a561@redhat.com> Hi, On 03/07/2020 14:48, Tobias Hartmann wrote: > > On 03.07.20 15:40, Andrew Haley wrote: >> Thanks. It's better with JDK 8, but although it does load saved XML >> Ideal Graphs, all it's possible to see is a tree with the names of >> the compilation passes. No graphs are displayed. > > After double-clicking on the phase, it sometimes takes a while to load if the graph is huge. If > there's an issue, you should at least get an error message (did you check the console?). > >> I'm guessing IGV must have rotted, and there's no version that works >> with current HotSpot available. > > Well it does work fine for me and I'm using it on a regular basis. Thank you very much for your help. I persisted with my build and I've now got something that works. I am seeing these, but it doesn't seem to stop IGV from working. java.lang.AssertionError at org.netbeans.api.visual.graph.GraphScene.addEdge(GraphScene.java:154) at com.sun.hotspot.igv.controlflow.ControlFlowScene.setGraph(ControlFlowScene.java:113) at com.sun.hotspot.igv.controlflow.ControlFlowTopComponent$1.run(ControlFlowTopComponent.java:145) at java.awt.event.InvocationEvent.dispatch(InvocationEvent.java:311) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:758) at java.awt.EventQueue.access$500(EventQueue.java:97) at java.awt.EventQueue$3.run(EventQueue.java:709) at java.awt.EventQueue$3.run(EventQueue.java:703) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.awt.EventQueue.dispatchEvent(EventQueue.java:728) at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:159) [catch] at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Wed Jul 8 13:41:30 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 8 Jul 2020 15:41:30 +0200 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <87r1topriw.fsf@redhat.com> References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> <87r1topriw.fsf@redhat.com> Message-ID: <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> Hi Roland That's a nice solution and looks reasonable to me. Thanks for the detailed explanation! I submitted some testing. Some minor general comments: 1824 // Add back the predicate for the value at the beginning of the first entry 1825 prev_proj = clone_skeleton_predicate(iff, init, max_value, entry, proj, ctrl, outer_loop, prev_proj); This comment seems to be outdated as you now clone both skeleton predicates with the same function call in different loop iterations. - In loopopts.cpp: While fixing the spacing you could also add curly braces to the one-liner if statements like 955 if (n_op == Op_MergeMem) return n; > I implemented this by using 2 subclasses to Opaque1 to denotate init and > stride and facilitate pattern matching. I had to extend _class_id to > juint to make Opaque1 a new class. While at it, you might want to consider to update other uses of the pattern Opcode() == Op_Opaque1 by is_Opaque1() as well like in loopTransform.cpp: 1158 assert(iff->in(1)->in(1)->Opcode() == Op_Opaque1, "unexpected predicate shape"); > Finally, the "asserts" above used to be removed at the > end of compilation. I now leave them in debug builds so we can catch > similar bugs earlier. That's helpful! > The bug doesn't reproduce with the included test case anymore but after > I backed out a couple unrelated changes I could use the test case to > verify the bug is indeed fixed. I observed a Java Fuzzer crash ("fatal error: DEBUG MESSAGE: duplicated predicate failed which is impossible") this weekend which looked very similar to this bug and indeed it could be fixed with your patch. You could add it as additional testcase. Here is the simplified code and the command line I used to reproduce it. $ java -Xcomp -XX:-TieredCompilation -XX:CompileOnly=Test::test Test.java public class Test { public static int iFld = 0; public static short sFld = 1; public static void main(String[] strArr) { test(); } public static int test() { int x = 11; int y = 0; int j = 0; int iArr[] = new int[400]; init(iArr); for (int i = 0; i < 2; i++) { doNothing(); for (j = 10; j > 1; j -= 2) { sFld += (short)j; iArr = iArr; y += (j * 3); x = (iArr[j - 1]/ x); x = sFld; } int k = 1; while (++k < 8) { iFld += x; } } return Float.floatToIntBits(654) + x + j + y; } // Inlined public static void doNothing() { } // Inlined public static void init(int[] a) { for (int j = 0; j < a.length; j++) { a[j] = 0; } } } Best regards, Christian From vladimir.kozlov at oracle.com Wed Jul 8 17:58:59 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 10:58:59 -0700 Subject: RFR [15] : 8249018 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_mlvm tests In-Reply-To: References: Message-ID: Good. Thanks, Vladimir On 7/7/20 11:56 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 >> 116 lines changed: 0 ins; 64 del; 52 mod; > > Hi all, > > could you please review the patch which removes `FileInstaller . .` jtreg action from :vmTestbase_vm_mlvm tests? > from the main issue(8204985): >> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. > > > testing: :vmTestbase_vm_mlvm on linux-x64 > webrev: http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8249018 > > Thanks, > -- Igor > From vladimir.kozlov at oracle.com Wed Jul 8 18:34:08 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 11:34:08 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> References: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> Message-ID: <2c92a9a5-77af-c100-fa9b-f765e9d23dce@oracle.com> Thank you, Igor I got the difference between `vm.gc` and `vm.gc.X`. In this case TestReclaimStringsLeaksMemory.java should be put into ProblemList-graal.txt with 8207267 to enable it with libgraal. Current usage of !vm.graal.enabled in test is to skip this test with Java Graal because its effect on Java heap. On 7/7/20 8:30 PM, Igor Ignatyev wrote: > Hi Vladimir, > > thanks for your review! > > `vm.gc` and `vm.gc.X`-s are different beasts (and admittedly, they confuse people a lot), `vm.gc` is set to "X", by jtreg itself, only if there is UseXGC in vm flags, otherwise it's "null". `vm.gc.X` are set by VMProps class, and you can have more than one vm.gc.X == true, as vm.gc.X means that X gc is supported by JVM and it can be selected; so if there are no Use.*GC in vm flags, vm.gc.X will yield true for all GCs which JVM was built with; if one of UseXGC is provided, only corresponding vm.gc.X is true, and all others are false. so to answer your questions, yes `vm.gc` can be "null" (if there are no Use.*GC) , and yes `vm.gc.Z & vm.gc.Serial & vm.gc == null` can be true (if there are no Use.*GC and JVM supports both Z and Serial GCs). Interesting. I thought vmGC will list only one selected GC. That explains requires in TestZGCWithCDS.java. You only need to add TestReclaimStringsLeaksMemory.java into ProblemList-graal.txt. Thanks, Vladimir > > Thanks, > -- Igor > > >> On Jul 7, 2020, at 8:00 PM, Vladimir Kozlov wrote: >> >> Nice clean up, Igor >> >> test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java >> >> Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. >> >> - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug >> + * @requires vm.gc == "null" >> + * @requires !vm.debug > >> >> >> test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java >> >> Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. >> >> * @requires vm.gc.Z >> * @requires vm.gc.Serial >> * @requires vm.gc == null >> >> >> Thanks, >> Vladimir >> >> On 7/7/20 5:38 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>> 241 lines changed: 34 ins; 5 del; 202 mod; >>> Hi all, >>> could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? >>> the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. >>> from JBS: >>>> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >>>> >>>> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. >>> webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 >>> testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT >>> Thanks, >>> -- Igor > From igor.ignatyev at oracle.com Wed Jul 8 18:40:14 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 8 Jul 2020 11:40:14 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: <2c92a9a5-77af-c100-fa9b-f765e9d23dce@oracle.com> References: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> <2c92a9a5-77af-c100-fa9b-f765e9d23dce@oracle.com> Message-ID: Thanks Vladimir. for the record, I've updated ProblemList-graal.txt w/ the following: > diff -r 14ffd658a23a test/hotspot/jtreg/ProblemList-graal.txt > --- a/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:35:30 2020 -0700 > +++ b/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:37:44 2020 -0700 > @@ -229,6 +229,7 @@ > compiler/loopopts/TestOverunrolling.java 8207267 generic-all > compiler/jsr292/NonInlinedCall/InvokeTest.java 8207267 generic-all > compiler/codegen/TestTrichotomyExpressions.java 8207267 generic-all > +gc/stress/TestReclaimStringsLeaksMemory.java 8207267 generic-all > > runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java 8222582 generic-all -- Igor > On Jul 8, 2020, at 11:34 AM, Vladimir Kozlov wrote: > > Thank you, Igor > > I got the difference between `vm.gc` and `vm.gc.X`. > > In this case TestReclaimStringsLeaksMemory.java should be put into ProblemList-graal.txt with 8207267 to enable it with libgraal. Current usage of !vm.graal.enabled in test is to skip this test with Java Graal because its effect on Java heap. > > On 7/7/20 8:30 PM, Igor Ignatyev wrote: >> Hi Vladimir, >> thanks for your review! >> `vm.gc` and `vm.gc.X`-s are different beasts (and admittedly, they confuse people a lot), `vm.gc` is set to "X", by jtreg itself, only if there is UseXGC in vm flags, otherwise it's "null". `vm.gc.X` are set by VMProps class, and you can have more than one vm.gc.X == true, as vm.gc.X means that X gc is supported by JVM and it can be selected; so if there are no Use.*GC in vm flags, vm.gc.X will yield true for all GCs which JVM was built with; if one of UseXGC is provided, only corresponding vm.gc.X is true, and all others are false. so to answer your questions, yes `vm.gc` can be "null" (if there are no Use.*GC) , and yes `vm.gc.Z & vm.gc.Serial & vm.gc == null` can be true (if there are no Use.*GC and JVM supports both Z and Serial GCs). > > Interesting. I thought vmGC will list only one selected GC. That explains requires in TestZGCWithCDS.java. > > You only need to add TestReclaimStringsLeaksMemory.java into ProblemList-graal.txt. > > Thanks, > Vladimir > >> Thanks, >> -- Igor >>> On Jul 7, 2020, at 8:00 PM, Vladimir Kozlov wrote: >>> >>> Nice clean up, Igor >>> >>> test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java >>> >>> Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. >>> >>> - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug >>> + * @requires vm.gc == "null" >>> + * @requires !vm.debug >>> >>> >>> test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java >>> >>> Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. >>> >>> * @requires vm.gc.Z >>> * @requires vm.gc.Serial >>> * @requires vm.gc == null >>> >>> >>> Thanks, >>> Vladimir >>> >>> On 7/7/20 5:38 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>>> 241 lines changed: 34 ins; 5 del; 202 mod; >>>> Hi all, >>>> could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? >>>> the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. >>>> from JBS: >>>>> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >>>>> >>>>> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. >>>> webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 >>>> testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT >>>> Thanks, >>>> -- Igor From igor.ignatyev at oracle.com Wed Jul 8 18:41:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 8 Jul 2020 11:41:35 -0700 Subject: RFR [15] : 8249018 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_mlvm tests In-Reply-To: References: Message-ID: thanks Vladimir, pushed. -- Igor > On Jul 8, 2020, at 10:58 AM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 7/7/20 11:56 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 >>> 116 lines changed: 0 ins; 64 del; 52 mod; >> Hi all, >> could you please review the patch which removes `FileInstaller . .` jtreg action from :vmTestbase_vm_mlvm tests? >> from the main issue(8204985): >>> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. >> testing: :vmTestbase_vm_mlvm on linux-x64 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249018/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249018 >> Thanks, >> -- Igor From vladimir.kozlov at oracle.com Wed Jul 8 20:14:01 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 13:14:01 -0700 Subject: [16] RFR(S) 8248987: AOT's Linker.java seems to eagerly fail-fast on Windows Message-ID: https://cr.openjdk.java.net/~kvn/8248987/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8248987 Treat all problems in getVC141AndNewerLinker() as non-fatal to try find old version linker if newer one was not found. Print exception message from getVC141AndNewerLinker() with --verbose flag. Thanks, Vladimir From igor.veresov at oracle.com Wed Jul 8 20:17:10 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 8 Jul 2020 13:17:10 -0700 Subject: [15] RFR(S) 8248822: 8 vm/classfmt/atr_ann/atr_rtm_annot007/atr_rtm_annot00709 tests fail w/ AOT Message-ID: <21B89D82-3C1A-4E20-A405-9962F721F8D5@oracle.com> The root cause of this is that Graal has intrinsics for jdk.internal.reflect.ConstantPool.{getIntAt0, getLongAt0, getFloatAt0, getDoubleAt0}() that don't check the range of the cp index or tag validity, whereas the original native implementations do. Since the utility of these intrinsics is of dubious value I?d like to remove it. The same change is going upstream as well. Webrev: http://cr.openjdk.java.net/~iveresov/8248822/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8248822 Thanks, igor From igor.veresov at oracle.com Wed Jul 8 20:35:37 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 8 Jul 2020 13:35:37 -0700 Subject: [16] RFR(S) 8248987: AOT's Linker.java seems to eagerly fail-fast on Windows In-Reply-To: References: Message-ID: <85A8134F-B6EF-436A-BC03-2F8CE1737460@oracle.com> Looks good to me. igor > On Jul 8, 2020, at 1:14 PM, Vladimir Kozlov wrote: > > https://cr.openjdk.java.net/~kvn/8248987/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8248987 > > Treat all problems in getVC141AndNewerLinker() as non-fatal to try find old version linker if newer one was not found. Print exception message from getVC141AndNewerLinker() with --verbose flag. > > Thanks, > Vladimir From Charlie.Gracie at microsoft.com Wed Jul 8 20:41:31 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Wed, 8 Jul 2020 20:41:31 +0000 Subject: Stack allocation prototype for C2 Message-ID: <97F7697A-7A47-456D-832C-5BC8746880E0@microsoft.com> Hi Sergey, To get an idea of the objects which are being stack allocated you can use a fastdebug build and gather the output from -XX:+PrintStackAllocation. This static view can be combined with inspecting the source code to find patterns where allocations can be stack allocated but fail to be scalar replaced. This information is not great to understand which allocation sites are important since it just describes where heap allocations were replaced with stack allocations but not the frequency that they are used at runtime. The common patterns we have recognized are: 1. Boxing objects, with caches, make up a significant portion of the wins we measured. 2. Iterators and transient data created during collection iteration. 3. Object chains of non-escaping objects. In these scenarios a lot of the time the root object gets scalar replaced (SCR) but the children objects do not. I think SCR might be able to be improved for some of these cases but I need to get more data to understand why it is failing. 4. Backing arrays for data structures. A lot of data structures have a default initial array length. Since the array may grow it is not eligible for SCR but it may be eligible for stack allocation. This is a common subcase of #3 but I separated it out since the reason why SCR fails is due to merge points. To get a better understanding of the runtime wins we gathered JFR data with and without stack allocation enabled for some of the benchmarks showing large reductions in heap allocation. These workloads were all Scala based. 1. In TMT, almost 100% of the reduction in heap allocations is due to stack allocation of java.lang.Double objects created via scala.runtime.BoxesRunTime.boxToDouble(double). The reduction is due to 2 different call stacks where this method was inlined. Here are the 2 callers that generate the allocations which get stack allocated. a. scala.runtime.ScalaRunTime$.array_apply(Object, int) b. edu.stanford.nlp.tmt.model.SoftAssignmentModel$$anonfun$summary$1$$anonfun$apply$5.apply(Object). 2. In ALS ,almost 100% of the reduction in heap allocations is due to stack allocation of java.lang.Integer objects created via scala.runtime.BoxesRunTime.boxToInteger(int). The reduction is due to 1 call stack containing the following caller. a. scala.runtime.ScalaRunTime$.array_apply(Object, int). When this function is used for primitive arrays it looks like stack allocation can regularly see big wins with the right amount of inlining. 3. In factorie, there are 5 object types that benefit from stack allocation to reduce overall heap allocations. Digging further into the call stacks for the 5 allocation sites it appears as they are all related to iterating over data structures. Most of the objects are transient objects used for a single iteration and are not Boxed primitives. The object types are: a. scala.Some which is allocated as the result of scala.collection.mutable.HashMap.get(Object) b. scala.collection.immutable.ListBuffer which is allocated by scala.collection.immutable.List$.newBuilder() c. cc.factorie.generative.Proportions[] which is allocated by cc.factorie.generative.DiscreteMixtureVar$class.chosenParents(DiscreteMixtureVar) d. cc.factorie.package$$anon$1 which is allocated by cc.factorie.package$.singleFactorIterable(Factor) e. cc.factorie.Domain$$anonfun$get$1 which is allocated by cc.factorie.Domain$.get(Class) I hope this is the type of information you were looking for. If you have any other questions or would like to see more/different data please let us know. I can always make log files available via our GitHub project or similar if that helps. Charlie ?On 2020-06-29, 11:34 PM, "hotspot-compiler-dev on behalf of Sergey Kuksenko" wrote: I am just curious. For each benchmark you show allocation reduce size in general. Do you have statistics which stack allocated objects gives major impact? And which code patterns fail scalar replacement except well know Integer cache flow merge? On 6/29/20 2:05 PM, Charlie Gracie wrote: > Hi hotspot-compiler-dev community, > > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). > > For a quick refresher here is a link to Nikola?s talk at FOSDEM: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Ffosdem.org%2F2020%2Fschedule%2Fevent%2Freducing_gc_times%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=qB1c8l5mUVk%2BAt7W5178A9wQ3pauoxW6XTVCfOTOmHw%3D&reserved=0 > > Here is a link to our initial webrev: > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~adityam%2Fcharlie%2Fstack_alloc%2F&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=46mF34J4XcMV58TJxvJ4%2FiDSxL41TSKgW0X2MX7HRV4%3D&reserved=0 > > Expecting that a change like this will require a JEP, we have prepared a document describing our work based off of the JEP submission > form. Our document has a few extra sections at the end discussing areas that we are looking for guidance on and some initial > performance results. This document can be found here: > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fopenjdk-proposals%2Fblob%2Fmaster%2Fstack_allocation%2FStack_Allocation_JEP.md&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7C9e9b56c23fde463bf6b808d81ca68bf4%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637290848926541670&sdata=V%2BqKZ9QgCd%2BKDbFb9MqFDoxdtXm8fFmgh%2FLYxgiGqJA%3D&reserved=0 > > Thanks in advance for reviews, suggestions, concerns, comments and issues. > Charlie and Nikola > From Nikola.Grcevski at microsoft.com Wed Jul 8 21:18:06 2020 From: Nikola.Grcevski at microsoft.com (Nikola Grcevski) Date: Wed, 8 Jul 2020 21:18:06 +0000 Subject: Stack allocation prototype for C2 In-Reply-To: <0f98b198-0769-08fc-f1ff-553eadcede22@redhat.com> References: <0f98b198-0769-08fc-f1ff-553eadcede22@redhat.com> Message-ID: Hi Andrew, >Here's my concern. > >Java stacks are, in general, pretty small. This is good, and makes for >economical memory usage. This is particularly useful for Project Loom, >where there can be enormous numbers of "virtual" threads. These threads, >while they are not active, are stored in the heap. >As you might imagine, the idea of embedded objects (which, of course, >cannot be collected) in these virtual threads does not delight me at all. >Is this likely to be a real problem, do you think, or are all of the >stack-allocated objects so small that I shouldn't be concerned? Your concern about memory consumption increase is very valid, especially in the context of project Loom. We only stack allocate java objects of size 256B or less and arrays of length less than 64 elements. There?s also a C2 per method limit of how many stack slots can be allocated. After the stack slots limit is reached, we stop stack allocating more objects. These checks limit the overall amount of stack space that will be consumed. We see stack allocation as an addition to scalar replacement. Currently, scalar replacement will increase the stack size and we expect stack allocation to grow the stack, a similar but larger amount per object. Scalar replacement does not preserve the header words nor unused fields where stack allocated objects do. We have collected some static data to understand the amount of increase of the stack size, but perhaps we need to extend the measurement in scenarios that are closer to typical project Loom use cases. Out of all programs in the Renaissance benchmark suite, ALS is where we stack allocate the most. There are about 2,500 methods compiled with C2 in ALS and the average method stack size can be found in the table below: No stack allocation, average per method stack size: 69.9 B With stack allocation, average per method stack size: 72.2 B Average stack allocated object size: 25.7 B MAX stack allocated object size: 96 B It comes to about less than 2.5 bytes increase on average (or 3%) in a program where we?ve seen the most opportunities so far. We observe similar numbers in the DaCapoScala benchmark suite in the benchmarks where we stack allocate a lot: TMT and FACTORIE. FACTORIE (around 650 compiled methods) No stack allocation, average per method stack size: 63.9 B With stack allocation, , average per method stack size: 66.6 B Average stack allocated object size: 25.5 B MAX stack allocated object size: 48 B TMT (around 900 compiled methods) No stack allocation, average per method stack size: 67.4 B With stack allocation, average per method stack size: 70.1 B Average stack allocated object size: 23.5 B MAX stack allocated object size: 40 B If there is data from other workloads you would like to see, in particular when using Loom, please let us know. Also, if there are any other metrics you would like to see we can add those to our must gather list going forward. If it turns out that the stack size increase is unacceptable, we can add further heuristics to do cost benefit analysis while deciding whether to stack allocate a given allocation candidate. For example, we might stack allocate only smaller objects, objects used in loops or only those in code with high frequency. Thanks for reviewing. Nikola -----Original Message----- From: hotspot-compiler-dev On Behalf Of Andrew Haley Sent: July 2, 2020 4:16 AM To: Charlie Gracie ; hotspot-compiler-dev at openjdk.java.net Subject: Re: Stack allocation prototype for C2 On 29/06/2020 22:05, Charlie Gracie wrote: > Here is the prototype code for our work on adding stack allocation to > the HotSpot C2 compiler. We are looking for any and all feedback as we > hope to move from a prototype to something that could be contributed. We certainly need a repo where it can go. It could either be adopted by an existing project or it could have a project of its own. The latter is perhaps a bad idea because it would be too isolated. > A change of this size is difficult to review so we understand the > process will be thorough and will take time to complete. Any > suggestions on how to allow for collaboration with others, if they > wanted to, would also be appreciated (i.e., a repo somewhere). Here's my concern. Java stacks are, in general, pretty small. This is good, and makes for economical memory usage. This is particularly useful for Project Loom, where there can be enormous numbers of "virtual" threads. These threads, while they are not active, are stored in the heap. As you might imagine, the idea of embedded objects (which, of course, cannot be collected) in these virtual threads does not delight me at all. Is this likely to be a real problem, do you think, or are all of the stack-allocated objects so small that I shouldn't be concerned? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7CNikola.Grcevski%40microsoft.com%7C10c6163e539749badcbb08d81e604677%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637292746135351851&sdata=k%2FlcpRETCDafZMvL%2B4P3abYrK4Eb83SOkoZcVBWoeS8%3D&reserved=0 EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Wed Jul 8 22:16:08 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 15:16:08 -0700 Subject: [15] RFR(S) 8248822: 8 vm/classfmt/atr_ann/atr_rtm_annot007/atr_rtm_annot00709 tests fail w/ AOT In-Reply-To: <21B89D82-3C1A-4E20-A405-9962F721F8D5@oracle.com> References: <21B89D82-3C1A-4E20-A405-9962F721F8D5@oracle.com> Message-ID: I see that Doug and Tom approved these changes. I am fine with fix too. Thanks, Vladimir On 7/8/20 1:17 PM, Igor Veresov wrote: > The root cause of this is that Graal has intrinsics for jdk.internal.reflect.ConstantPool.{getIntAt0, getLongAt0, getFloatAt0, getDoubleAt0}() that don't check the range of the cp index or tag validity, whereas the original native implementations do. Since the utility of these intrinsics is of dubious value I?d like to remove it. The same change is going upstream as well. > > Webrev: http://cr.openjdk.java.net/~iveresov/8248822/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8248822 > > Thanks, > igor > > > From vladimir.kozlov at oracle.com Wed Jul 8 22:16:38 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 15:16:38 -0700 Subject: [16] RFR(S) 8248987: AOT's Linker.java seems to eagerly fail-fast on Windows In-Reply-To: <85A8134F-B6EF-436A-BC03-2F8CE1737460@oracle.com> References: <85A8134F-B6EF-436A-BC03-2F8CE1737460@oracle.com> Message-ID: <25c25b01-6914-6854-d7c7-42683e1d5e92@oracle.com> Thank you, Igor Vladimir K On 7/8/20 1:35 PM, Igor Veresov wrote: > Looks good to me. > > igor > > > >> On Jul 8, 2020, at 1:14 PM, Vladimir Kozlov wrote: >> >> https://cr.openjdk.java.net/~kvn/8248987/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8248987 >> >> Treat all problems in getVC141AndNewerLinker() as non-fatal to try find old version linker if newer one was not found. Print exception message from getVC141AndNewerLinker() with --verbose flag. >> >> Thanks, >> Vladimir > From vladimir.kozlov at oracle.com Wed Jul 8 22:36:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 15:36:20 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: References: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> <2c92a9a5-77af-c100-fa9b-f765e9d23dce@oracle.com> Message-ID: Good. Thanks, Vladimir On 7/8/20 11:40 AM, Igor Ignatyev wrote: > Thanks Vladimir. > > for the record, I've updated ProblemList-graal.txt w/ the following: > >> diff -r 14ffd658a23a test/hotspot/jtreg/ProblemList-graal.txt >> --- a/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:35:30 2020 -0700 >> +++ b/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:37:44 2020 -0700 >> @@ -229,6 +229,7 @@ >> compiler/loopopts/TestOverunrolling.java 8207267 generic-all >> compiler/jsr292/NonInlinedCall/InvokeTest.java 8207267 generic-all >> compiler/codegen/TestTrichotomyExpressions.java 8207267 generic-all >> +gc/stress/TestReclaimStringsLeaksMemory.java 8207267 generic-all >> >> runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java 8222582 generic-all > > -- Igor > > >> On Jul 8, 2020, at 11:34 AM, Vladimir Kozlov wrote: >> >> Thank you, Igor >> >> I got the difference between `vm.gc` and `vm.gc.X`. >> >> In this case TestReclaimStringsLeaksMemory.java should be put into ProblemList-graal.txt with 8207267 to enable it with libgraal. Current usage of !vm.graal.enabled in test is to skip this test with Java Graal because its effect on Java heap. >> >> On 7/7/20 8:30 PM, Igor Ignatyev wrote: >>> Hi Vladimir, >>> thanks for your review! >>> `vm.gc` and `vm.gc.X`-s are different beasts (and admittedly, they confuse people a lot), `vm.gc` is set to "X", by jtreg itself, only if there is UseXGC in vm flags, otherwise it's "null". `vm.gc.X` are set by VMProps class, and you can have more than one vm.gc.X == true, as vm.gc.X means that X gc is supported by JVM and it can be selected; so if there are no Use.*GC in vm flags, vm.gc.X will yield true for all GCs which JVM was built with; if one of UseXGC is provided, only corresponding vm.gc.X is true, and all others are false. so to answer your questions, yes `vm.gc` can be "null" (if there are no Use.*GC) , and yes `vm.gc.Z & vm.gc.Serial & vm.gc == null` can be true (if there are no Use.*GC and JVM supports both Z and Serial GCs). >> >> Interesting. I thought vmGC will list only one selected GC. That explains requires in TestZGCWithCDS.java. >> >> You only need to add TestReclaimStringsLeaksMemory.java into ProblemList-graal.txt. >> >> Thanks, >> Vladimir >> >>> Thanks, >>> -- Igor >>>> On Jul 7, 2020, at 8:00 PM, Vladimir Kozlov wrote: >>>> >>>> Nice clean up, Igor >>>> >>>> test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java >>>> >>>> Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. >>>> >>>> - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug >>>> + * @requires vm.gc == "null" >>>> + * @requires !vm.debug >>>> >>>> >>>> test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java >>>> >>>> Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. >>>> >>>> * @requires vm.gc.Z >>>> * @requires vm.gc.Serial >>>> * @requires vm.gc == null >>>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/7/20 5:38 PM, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>>>> 241 lines changed: 34 ins; 5 del; 202 mod; >>>>> Hi all, >>>>> could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? >>>>> the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. >>>>> from JBS: >>>>>> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >>>>>> >>>>>> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. >>>>> webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 >>>>> testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT >>>>> Thanks, >>>>> -- Igor > From igor.ignatyev at oracle.com Wed Jul 8 22:36:53 2020 From: igor.ignatyev at oracle.com (igor.ignatyev at oracle.com) Date: Wed, 8 Jul 2020 15:36:53 -0700 Subject: [16] RFR(S) 8248987: AOT's Linker.java seems to eagerly fail-fast on Windows In-Reply-To: References: Message-ID: <55500A48-5982-4AAD-90F6-FA941967B439@oracle.com> LGTM ? Igor > On Jul 8, 2020, at 1:14 PM, Vladimir Kozlov wrote: > > ?https://cr.openjdk.java.net/~kvn/8248987/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8248987 > > Treat all problems in getVC141AndNewerLinker() as non-fatal to try find old version linker if newer one was not found. Print exception message from getVC141AndNewerLinker() with --verbose flag. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Wed Jul 8 22:39:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 8 Jul 2020 15:39:42 -0700 Subject: [16] RFR(S) 8248987: AOT's Linker.java seems to eagerly fail-fast on Windows In-Reply-To: <55500A48-5982-4AAD-90F6-FA941967B439@oracle.com> References: <55500A48-5982-4AAD-90F6-FA941967B439@oracle.com> Message-ID: <7c8110d2-87a3-cab8-2946-b83f86e83fe2@oracle.com> Thank you, Igor Vladimir K On 7/8/20 3:36 PM, igor.ignatyev at oracle.com wrote: > LGTM > > ? Igor > >> On Jul 8, 2020, at 1:14 PM, Vladimir Kozlov wrote: >> >> ?https://cr.openjdk.java.net/~kvn/8248987/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8248987 >> >> Treat all problems in getVC141AndNewerLinker() as non-fatal to try find old version linker if newer one was not found. Print exception message from getVC141AndNewerLinker() with --verbose flag. >> >> Thanks, >> Vladimir > From igor.veresov at oracle.com Wed Jul 8 23:48:40 2020 From: igor.veresov at oracle.com (Igor Veresov) Date: Wed, 8 Jul 2020 16:48:40 -0700 Subject: [15] RFR(S) 8248822: 8 vm/classfmt/atr_ann/atr_rtm_annot007/atr_rtm_annot00709 tests fail w/ AOT In-Reply-To: References: <21B89D82-3C1A-4E20-A405-9962F721F8D5@oracle.com> Message-ID: <2A8297CE-C29E-48F4-B5ED-D09C365F6EDD@oracle.com> Thanks, Vladimir! igor > On Jul 8, 2020, at 3:16 PM, Vladimir Kozlov wrote: > > I see that Doug and Tom approved these changes. I am fine with fix too. > > Thanks, > Vladimir > > On 7/8/20 1:17 PM, Igor Veresov wrote: >> The root cause of this is that Graal has intrinsics for jdk.internal.reflect.ConstantPool.{getIntAt0, getLongAt0, getFloatAt0, getDoubleAt0}() that don't check the range of the cp index or tag validity, whereas the original native implementations do. Since the utility of these intrinsics is of dubious value I?d like to remove it. The same change is going upstream as well. >> Webrev: http://cr.openjdk.java.net/~iveresov/8248822/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8248822 >> Thanks, >> igor From jamsheed.c.m at oracle.com Thu Jul 9 07:31:11 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 9 Jul 2020 13:01:11 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 Message-ID: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> Hi all, JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 Request for review changes made to offset computation and field write detection for init captured stores due to phis addition between alloc and init. This happen if init node in different outer loop wrt to alloc node and there is a loop opt.? This was required as a result of enhancement [1]. Normally init are not associated with multiple alloc node during EA phase, but changes done for [1] caused the code shapes of the form [2]? to generate inits associated with multiple alloc node. This had implication in offset computation and field write detection related to initializing stores. Attempt to fix in EA: ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ Alternate fix: ???? Minimize the scenario in compiler generated code by throwing only j.l.Error from slowpath(all exception async/sync are handled in runtime exit). ???? Stub epilog doesn't poll or throw any exceptions. Disable full loop opt before EA for detectable patterns and bailout EA for late detected patterns. ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ Please advice. Testing : mach tier1-5 (logs in jbs) Best regards, Jamsheed [1] JDK-8231291 C2: loop opts before EA should maximally unroll loops [2] that have its init node in different outer loop wrt to alloc node. loop begin ?? try{ ?? return new obj()/? throw new obj()/ uncommon trap after allocation, in a loop ?? } catch(ex) { ?? } loop end 42 public static IntA test(int n) { 43 for (int i=0; i<2; i++) { 44 try { 45 return new IntA(n + i); 46 } catch (Exception e) { 47 } 48 } 49 From rwestrel at redhat.com Thu Jul 9 08:32:00 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 09 Jul 2020 10:32:00 +0200 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state Message-ID: <87v9ixnl6n.fsf@redhat.com> http://cr.openjdk.java.net/~roland/8248598/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8248598 It's the upstream graal fix from https://github.com/oracle/graal/pull/2651 unmodifiered. I wanted to verify that the test case once part of the jdk source tree does fail without the fix and runs fine with it but couldn't figure out how. What are the steps for that? Roland. From aph at redhat.com Thu Jul 9 09:56:47 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 9 Jul 2020 10:56:47 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: On 07/07/2020 12:17, Patric Hedlin wrote: > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ Can we have a reproducer for this please? The test is in the open/ directory but I can't find it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Thu Jul 9 10:05:03 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 9 Jul 2020 11:05:03 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: <51ef5108-69dc-573e-ea6f-ddc05e00ab04@redhat.com> On 09/07/2020 10:56, Andrew Haley wrote: > On 07/07/2020 12:17, Patric Hedlin wrote: >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ > > Can we have a reproducer for this please? The test is in the open/ directory > but I can't find it. And jtreg_test_jdk_java_lang_invoke_BigArityTest_java passes for me. If you are running with some "interesting" seetings, please tell me. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Thu Jul 9 11:43:55 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 09 Jul 2020 13:43:55 +0200 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> <87r1topriw.fsf@redhat.com> <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> Message-ID: <87sge0oqv8.fsf@redhat.com> Hi Christian, new webrev: http://cr.openjdk.java.net/~roland/8229495/webrev.01/ > I submitted some testing. Thanks. > 1824 // Add back the predicate for the value at the beginning of > the first entry > 1825 prev_proj = clone_skeleton_predicate(iff, init, max_value, > entry, proj, ctrl, outer_loop, prev_proj); > > This comment seems to be outdated as you now clone both skeleton > predicates with the same function call in different loop iterations. I tweaked the comment. > > - In loopopts.cpp: While fixing the spacing you could also add curly > braces to the one-liner if statements like > > 955 if (n_op == Op_MergeMem) return n; Ok. > While at it, you might want to consider to update other uses of the > pattern Opcode() == Op_Opaque1 by is_Opaque1() as well like in > loopTransform.cpp: > > 1158 assert(iff->in(1)->in(1)->Opcode() == Op_Opaque1, "unexpected > predicate shape"); Except in this case it really is an Opaque1 instead of a subclass so using is_Opaque1() would weaken the assert. > I observed a Java Fuzzer crash ("fatal error: DEBUG MESSAGE: duplicated > predicate failed which is impossible") this weekend which looked very > similar to this bug and indeed it could be fixed with your patch. You > could add it as additional testcase. Here is the simplified code and the > command line I used to reproduce it. Thanks for test case. I included it in the new webrev. Roland. From christian.hagedorn at oracle.com Thu Jul 9 12:16:12 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 9 Jul 2020 14:16:12 +0200 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: <87sge0oqv8.fsf@redhat.com> References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> <87r1topriw.fsf@redhat.com> <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> <87sge0oqv8.fsf@redhat.com> Message-ID: Hi Roland On 09.07.20 13:43, Roland Westrelin wrote: > new webrev: > http://cr.openjdk.java.net/~roland/8229495/webrev.01/ That looks good to me! >> I submitted some testing. > > Thanks. An extended testing was completed successfully (up to tier7). >> While at it, you might want to consider to update other uses of the >> pattern Opcode() == Op_Opaque1 by is_Opaque1() as well like in >> loopTransform.cpp: >> >> 1158 assert(iff->in(1)->in(1)->Opcode() == Op_Opaque1, "unexpected >> predicate shape"); > > Except in this case it really is an Opaque1 instead of a subclass so > using is_Opaque1() would weaken the assert. You're right, I have not thought about that - then better leave it as it is. >> I observed a Java Fuzzer crash ("fatal error: DEBUG MESSAGE: duplicated >> predicate failed which is impossible") this weekend which looked very >> similar to this bug and indeed it could be fixed with your patch. You >> could add it as additional testcase. Here is the simplified code and the >> command line I used to reproduce it. > > Thanks for test case. I included it in the new webrev. Great, thanks for adding it. Best regards, Christian From patric.hedlin at oracle.com Thu Jul 9 12:44:41 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Thu, 9 Jul 2020 14:44:41 +0200 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <51ef5108-69dc-573e-ea6f-ddc05e00ab04@redhat.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> <51ef5108-69dc-573e-ea6f-ddc05e00ab04@redhat.com> Message-ID: <92889b14-2e5f-d0de-c6d2-016468619368@oracle.com> I have updated the comment (in the report) on BigArityTest with the command to reproduce the failure. /Patric On 2020-07-09 12:05, Andrew Haley wrote: > On 09/07/2020 10:56, Andrew Haley wrote: >> On 07/07/2020 12:17, Patric Hedlin wrote: >>> I would like to ask for help to review the following change/update: >>> >>> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 >>> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ >> Can we have a reproducer for this please? The test is in the open/ directory >> but I can't find it. > And jtreg_test_jdk_java_lang_invoke_BigArityTest_java passes for me. If you > are running with some "interesting" seetings, please tell me. > From jamsheed.c.m at oracle.com Thu Jul 9 14:06:38 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 9 Jul 2020 19:36:38 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> Message-ID: <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> Hi, request to hold the review. need to change the code for dealing with unsafe access. as current capture code go for more execution time analyzing things. Best regards, Jamsheed On 09/07/2020 13:01, Jamsheed C M wrote: > > Hi all, > > JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 > > Request for review changes made to offset computation and field write > detection for init captured stores due to phis addition between alloc > and init. This happen if init node in different outer loop wrt to > alloc node and there is a loop opt.? This was required as a result of > enhancement [1]. > > Normally init are not associated with multiple alloc node during EA > phase, but changes done for [1] caused the code shapes of the form > [2]? to generate inits associated with multiple alloc node. > > This had implication in offset computation and field write detection > related to initializing stores. > > Attempt to fix in EA: > > ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ > > Alternate fix: > > ???? Minimize the scenario in compiler generated code by throwing only > j.l.Error from slowpath(all exception async/sync are handled in > runtime exit). > > ???? Stub epilog doesn't poll or throw any exceptions. Disable full > loop opt before EA for detectable patterns and bailout EA for late > detected patterns. > > ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ > > Please advice. > > Testing : mach tier1-5 (logs in jbs) > > Best regards, > > Jamsheed > > > [1] JDK-8231291 C2: > loop opts before EA should maximally unroll loops > > [2] that have its init node in different outer loop wrt to alloc node. > > > loop begin > > ?? try{ > > ?? return new obj()/? throw new obj()/ uncommon trap after allocation, > in a loop > > ?? } catch(ex) { > > ?? } > > loop end > > 42 public static IntA test(int n) { > 43 for (int i=0; i<2; i++) { > 44 try { > 45 return new IntA(n + i); > 46 } catch (Exception e) { > 47 } > 48 } > 49 > From aph at redhat.com Thu Jul 9 14:26:36 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 9 Jul 2020 15:26:36 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: On 07/07/2020 12:17, Patric Hedlin wrote: > C1 code generation for reading and writing stack-slots does not handle > large immediate offsets on aarch64. This patch will ensure that > immediate offsets are admissible for base+(immediate)offset encoding or, > if this is not the case, will enforce an explicit address calculation to > a scratch register. (Also correcting a small glitch in 9-bit signed > immediate encoding check.) > > NOTE: Current patch includes (local) definitions of is_simm/9 and > is_uimm/12, for review purpose only. With JDK-8248901 these will move to > Assembler, and will not be included in the change-set. Umm, OK. These functions seem too complicated: all you have to do is int64_t chk = val >> (nbits - 1); |(gdb) guarantee (chk == -1 || chk == 0, "Field too big for insn"); |#9 0x0000ffffbcab5c30 in Compilation::compile_method (this=0xffff80b7dde8) but the AArch64 part of it looks fine. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Charlie.Gracie at microsoft.com Thu Jul 9 15:15:15 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Thu, 9 Jul 2020 15:15:15 +0000 Subject: Stack allocation prototype for C2 Message-ID: Hi Dalibor, Thanks for pointing us at the Sandbox Repo! It looks like a great place to host and collaborate on large changes. If the community decides we should move forward with this investigation, I believe the Sandbox repo would be a good fit. Thanks, Charlie Gracie ?On 2020-07-02, 5:02 AM, "hotspot-compiler-dev on behalf of Dalibor Topic" wrote: On 29.06.2020 23:05, Charlie Gracie wrote: > Hi hotspot-compiler-dev community, > > Here is the prototype code for our work on adding stack allocation to the HotSpot C2 compiler. We are looking for any and all feedback > as we hope to move from a prototype to something that could be contributed. A change of this size is difficult to review so we > understand the process will be thorough and will take time to complete. Any suggestions on how to allow for collaboration with others, > if they wanted to, would also be appreciated (i.e., a repo somewhere). Hi Charlie, You may want to take a look at https://nam06.safelinks.protection.outlook.com/?url=https:%2F%2Fcr.openjdk.java.net%2F~chegar%2Fdocs%2Fsandbox.html&data=02%7C01%7Ccharlie.gracie%40microsoft.com%7Cc115b25534314799610308d81e669427%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637292773207414395&sdata=cfAnpnbxBxfbMeA7lf2EOJokRjTSKpxqP25Ap7c6FZ4%3D&reserved=0 "The primary purpose of the JDK Sandbox Development Repository is to facilitate OpenJDK developers that are working on non-trivial changes, possibly JEP-scale effort, whose scope and duration make it necessary to collaborate with others in an open shared version control system, rather than just using privately shared patches. " cheers, dalibor topic -- Dalibor Topic Consulting Product Manager Phone: +494089091214 , Mobile: +491737185961 , Video: dalibor.topic at oracle.com Oracle Global Services Germany GmbH Hauptverwaltung: Riesstr. 25, D-80992 M?nchen Registergericht: Amtsgericht M?nchen, HRB 246209 Gesch?ftsf?hrer: Ralf Herrmann From aph at redhat.com Thu Jul 9 15:48:59 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 9 Jul 2020 16:48:59 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: <2809ab8c-4a2e-c0c3-9b93-a0f5df41b992@redhat.com> On 07/07/2020 12:17, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ > > > C1 code generation for reading and writing stack-slots does not handle > large immediate offsets on aarch64. This patch will ensure that > immediate offsets are admissible for base+(immediate)offset encoding or, > if this is not the case, will enforce an explicit address calculation to > a scratch register. (Also correcting a small glitch in 9-bit signed > immediate encoding check.) This is all very complicated. So it seems to me that there is a better way to do this. We already have MacroAssembler::legitimize_address(), and you should use that. Like so: diff -r 7c59af4db158 src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Jul 09 11:01:29 2020 -0400 +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Jul 09 11:36:02 2020 -0400 @@ -736,25 +736,32 @@ void LIR_Assembler::reg2stack(LIR_Opr src, LIR_Opr dest, BasicType type, bool pop_fpu_stack) { if (src->is_single_cpu()) { + int index = dest->single_stack_ix(); if (is_reference_type(type)) { - __ str(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); + __ str(src->as_register(), + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerWord, rscratch1)); __ verify_oop(src->as_register()); } else if (type == T_METADATA || type == T_DOUBLE || type == T_ADDRESS) { - __ str(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); + __ str(src->as_register(), + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerWord, rscratch1)); } else { - __ strw(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); + __ strw(src->as_register(), + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerInt, rscratch1)); } } else if (src->is_double_cpu()) { Address dest_addr_LO = frame_map()->address_for_slot(dest->double_stack_ix(), lo_word_offset_in_bytes); + dest_addr_LO = __ legitimize_address(dest_addr_LO, BytesPerLong, rscratch1); __ str(src->as_register_lo(), dest_addr_LO); } else if (src->is_single_fpu()) { Address dest_addr = frame_map()->address_for_slot(dest->single_stack_ix()); + dest_addr = __ legitimize_address(dest_addr, BytesPerInt, rscratch1); __ strs(src->as_float_reg(), dest_addr); } else if (src->is_double_fpu()) { Address dest_addr = frame_map()->address_for_slot(dest->double_stack_ix()); + dest_addr = __ legitimize_address(dest_addr, BytesPerLong, rscratch1); __ strd(src->as_double_reg(), dest_addr); } else { stack_offset_in_reach() seems to duplicate the functionality of offset_ok_for_immed(), and it's only used in this one place. By all means please use the new is_uimm() and is_simm() in offset_ok_for_immed(). -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From boris.ulasevich at bell-sw.com Thu Jul 9 16:20:20 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 9 Jul 2020 19:20:20 +0300 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> Message-ID: Hi Andrew, Ok, let us proceed after 8248414. Meanwhile, I moved the change out of do-not-edit scope, thanks: http://cr.openjdk.java.net/~bulasevich/8248870/webrev.01 regards, Boris On 08.07.2020 12:46, Andrew Haley wrote: > On 07/07/2020 16:47, Boris Ulasevich wrote: >> Please review the change to skip i2l conversion after the mask: >> >> http://cr.openjdk.java.net/~bulasevich/8248870/webrev.00 >> http://bugs.openjdk.java.net/browse/JDK-8248870 > You seem to have inserted this between the DO NOT EDIT THIS SECTION > markers. > > Please hold off this change until I've committed the patch for > 8248414. > From igor.ignatyev at oracle.com Thu Jul 9 16:25:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 9 Jul 2020 09:25:35 -0700 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87v9ixnl6n.fsf@redhat.com> References: <87v9ixnl6n.fsf@redhat.com> Message-ID: <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> Hi Roland, applications/jcstress tests are just jtreg wrappers around jcstress tests[1], so you can just run them as you would normally run a jcstress test: $ java -jar jcstress.jar --jvmArgs -XX:+UnlockExperimentalVMOptions --jvmArgs -XX:+EnableJVMCI " if you need information on how to get jcstress.jar, please refer to jsctress wiki[1] (or ping Aleksey, he might have a place where he publishes jcstress-tests-all) [1] https://wiki.openjdk.java.net/display/CodeTools/jcstress -- Igor > On Jul 9, 2020, at 1:32 AM, Roland Westrelin wrote: > > > http://cr.openjdk.java.net/~roland/8248598/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8248598 > > It's the upstream graal fix from > https://github.com/oracle/graal/pull/2651 unmodifiered. > > I wanted to verify that the test case once part of the jdk source tree > does fail without the fix and runs fine with it but couldn't figure out > how. What are the steps for that? > > Roland. > From rwestrel at redhat.com Thu Jul 9 18:03:05 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Thu, 09 Jul 2020 20:03:05 +0200 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> Message-ID: <87pn94o9ba.fsf@redhat.com> Hi Igor, Thanks for helping but my question was not about jcstress but about the graal regression test: http://cr.openjdk.java.net/~roland/8248598/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.core.test/src/org/graalvm/compiler/core/test/VolatileAccessReadEliminationTest.java.html I can run it fine in the graal repo with mx but I have no idea how to run it once it's pulled into the jdk repo. Roland. From igor.ignatyev at oracle.com Thu Jul 9 18:07:40 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 9 Jul 2020 11:07:40 -0700 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87pn94o9ba.fsf@redhat.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> Message-ID: oh, I see. I guess the easiest way would be to use jtreg wrappers (test/hotspot/jtreg/compiler/graalunit), there is README.md which explains where you can get dependencies and where you need to put them to make it work, after you finish that, you can run the test by run-test framework as `make test TEST=test/hotspot/jtreg/compiler/graalunit/CoreTest.java`. HTH -- Igor > On Jul 9, 2020, at 11:03 AM, Roland Westrelin wrote: > > > Hi Igor, > > Thanks for helping but my question was not about jcstress but about the > graal regression test: > > http://cr.openjdk.java.net/~roland/8248598/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.core.test/src/org/graalvm/compiler/core/test/VolatileAccessReadEliminationTest.java.html > > I can run it fine in the graal repo with mx but I have no idea how to > run it once it's pulled into the jdk repo. > > Roland. > From Charlie.Gracie at microsoft.com Thu Jul 9 19:28:01 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Thu, 9 Jul 2020 19:28:01 +0000 Subject: Stack allocation prototype for C2 Message-ID: <4C6D4959-00E1-4300-BE30-BB6FC60A491F@microsoft.com> Hi Vladimir, Thanks for reviewing the document and providing your feedback. > From the design overview and the implementation, I'm concerned about > far-reaching consequences of the chosen approach. It's not limited just > to existing set of JVM features, but as Andrew noted will affect the > design of forthcoming functionality as well. > > I think it's worth to start a broad discussion (HotSpot-wide) and decide > how much JVM design complexity budged it is worth spending on such an >optimization. This is a great suggestion, where and how should we start this discussion to get feedback from the broader community? > As we discussed off-line (right after FOSDEM), I do see the benefits of > in-memory representation for non-escaping objects: memory aliasing > (either indeterminate base or indexed access) imposes inherent > constraints on the escape analysis (both partial and conservative > approaches suffer from it). Nevertheless, some of the problematic cases > can be addressed by improving existing approach or introducing a more > powerful analysis: covering more cases and making the analysis > control-sensitive should improve the situation. We would like to work to improve escape analysis as per your suggestions above. If we can achieve the same allocation reductions with this solution, it would be a better long-term solution. We would like to continue reviewing stack allocation and start a sandbox project as Dalibor suggested, but work on improving escape analysis and measure against the sandbox for a baseline. > Also, the alternative approach (called zone-based heap allocation) looks > very attractive to me. I haven't thought it through, but it looks like > keeping the objects on the Java heap can save us a lot of complexity on > the implementation side (more memory available for allocation - not > necessarily fixed amount, no need to migrate objects from stack to heap, > GC barriers are unaffected, etc.). For example, reserving a dedicated > TLAB (or a stack of TLABs?) and do nmethod-scoped allocations from C2 > code looks attractive. It can simplify many aspects of the > implementation: much more space available, free migration of > non-escaping objects to heap on deoptimization. We have been thinking about this idea since FOSDEM and we completely agree with the pros of zone-based allocation. The biggest benefits are the removal of the restrictions in compressed oops mode and that barriers would not have to be modified. For this approach were you envisioning that objects allocated in a stack zone are pinned until the method returns? Also, while that zone memory is pinned the GC would not reclaim memory in that zone? That is what we were thinking, but we are worried about the complexity of the changes and restrictions it might add to the GC implementations. Another thought is about the added cost to method enter / exit. With the current on stack approach there is no added instructions for entering / exiting a method since the stack size is just larger. For the zone-based approach we would need to have a few more instructions on enter and exit to get the space from the zone TLAB and to return it. If the current zone TLAB is full we would need to do more work to get another one. Hopefully the common case of satisfying the space requirements from the current zone TLAB would on average be the same or less than the current TLAB checks for fast path allocations. A final consideration is the footprint cost for project Loom. In the zone-based approach would each virtual thread (fibre) have its own zone TLAB (or stack of TLABs)? If each virtual thread had a zone TLAB it may lead to more frequent GCs because a significant portion of the heap is reserved for zone-based allocations. We do not see any of these as showstoppers, but just be sure we have the full picture. > Another idea: > > "When dealing with stack allocated objects in loops we need a lifetime > overlap check." > > It doesn't look specific to stack-allocated objects. Non-overlapping > live ranges can be coalesced the same way for on-heap freshly allocated > objects. It should get comparable reduction in allocation pressure > (single allocation per loop vs allocation per iteration) and doesn't > require stack allocation support at all (as an example [1]). > > If such improvements are enabled for non-escaping on-heap objects, how > much benefit will stack allocation bring on top of that? IMO the >performance gap should become much narrower. We agree, it?s one of the first things we wanted to try after we submitted the initial stack allocation code for review. Again, our approach would be to have the current stack allocation prototype as a baseline and work to see if we can shrink the gap with other approaches. Thanks again for providing valuable feedback and insight Charlie and Nikola From luhenry at microsoft.com Thu Jul 9 20:31:11 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Thu, 9 Jul 2020 20:31:11 +0000 Subject: RFR(S): 8248676: AArch64: Add workaround for LITable constructor Message-ID: Hello, JBS: https://bugs.openjdk.java.net/browse/JDK-8248676 Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248676/webrev.00/ Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions. This small fix is in the context of the larger support for Windows-AArch64. The attribute `__attribute__ ((constructor))` is not supported by MSVC, and the documented workaround is to allocate an empty static struct with a constructor. This patch only applies this workaround when compiling on Windows, and leaves other platforms unchanged. I am using Bernhard Urban's CR as I am currently not an author. Thank you, -- Ludovic From igor.ignatyev at oracle.com Thu Jul 9 20:34:00 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 9 Jul 2020 13:34:00 -0700 Subject: RFR [15] : 8249019 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_compiler tests Message-ID: <50F2024A-BF63-4298-AB44-137179383723@oracle.com> http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 > 269 lines changed: 0 ins; 163 del; 106 mod Hi all, could you please review the patch which removes `FileInstaller . .` jtreg action from vmTestbase_vm_compiler tests? from the main issue(8204985): > all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. some of vmTestbase_vm_compiler tests depend on FileInstaller, so they are left intact and will be updated separately. testing: :vmTestbase_vm_compiler on linux-x64 JBS: https://bugs.openjdk.java.net/browse/JDK-8249019 webrev: http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 Thanks, -- Igor From ekaterina.pavlova at oracle.com Thu Jul 9 20:44:39 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 9 Jul 2020 13:44:39 -0700 Subject: RFR [15] : 8249019 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_compiler tests In-Reply-To: <50F2024A-BF63-4298-AB44-137179383723@oracle.com> References: <50F2024A-BF63-4298-AB44-137179383723@oracle.com> Message-ID: Looks good, -katya On 7/9/20 1:34 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >> 269 lines changed: 0 ins; 163 del; 106 mod > > Hi all, > > could you please review the patch which removes `FileInstaller . .` jtreg action from vmTestbase_vm_compiler tests? > from the main issue(8204985): >> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. > > some of vmTestbase_vm_compiler tests depend on FileInstaller, so they are left intact and will be updated separately. > > testing: :vmTestbase_vm_compiler on linux-x64 > JBS: https://bugs.openjdk.java.net/browse/JDK-8249019 > webrev: http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 > > Thanks, > -- Igor > From beurba at microsoft.com Thu Jul 9 21:08:48 2020 From: beurba at microsoft.com (Bernhard Urban-Forster) Date: Thu, 9 Jul 2020 21:08:48 +0000 Subject: RFR(XS) 8248671: AArch64: Remove unused variables Message-ID: Hello everyone, please review this change: JBS: https://bugs.openjdk.java.net/browse/JDK-8248671 Webrev: http://cr.openjdk.java.net/~burban/8248671_unused-vars/ We found this issue while bringing up Windows+AArch64 support for HotSpot. The Microsoft toolchain (MSVC) seems to be slightly more pedantic than GCC. Thanks, -Bernhard From dean.long at oracle.com Fri Jul 10 01:48:46 2020 From: dean.long at oracle.com (Dean Long) Date: Thu, 9 Jul 2020 18:48:46 -0700 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87pn94o9ba.fsf@redhat.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> Message-ID: <3db3371d-ed71-2bad-6c67-9fb6906d719f@oracle.com> I confirmed that VolatileAccessReadEliminationTest fails without the patch and passed with it. dl On 7/9/20 11:03 AM, Roland Westrelin wrote: > Hi Igor, > > Thanks for helping but my question was not about jcstress but about the > graal regression test: > > http://cr.openjdk.java.net/~roland/8248598/webrev.00/src/jdk.internal.vm.compiler/share/classes/org.graalvm.compiler.core.test/src/org/graalvm/compiler/core/test/VolatileAccessReadEliminationTest.java.html > > I can run it fine in the graal repo with mx but I have no idea how to > run it once it's pulled into the jdk repo. > > Roland. > From goetz.lindenmaier at sap.com Fri Jul 10 06:41:19 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 10 Jul 2020 06:41:19 +0000 Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout Message-ID: Hi Fairoz, we also see this test timing out on mac. But only so with jdk11u. Do you mind sharing how you fixed this? Did you just increase the timeout, or did you figure out why this fails on mac in 11u? Thanks, Goetz From christian.hagedorn at oracle.com Fri Jul 10 07:37:42 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 10 Jul 2020 09:37:42 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero Message-ID: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248552 http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p and ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is then updated to 0. The division/modulo node n is now executed before the loop limit check which results in a SIGFPE. The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed (i.e. control in NULL) and is split through a phi which has an input that could be zero. This should only happen for an induction variable phi of a trip-counted (integer) loop. Best regards, Christian From rwestrel at redhat.com Fri Jul 10 08:01:08 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 10 Jul 2020 10:01:08 +0200 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> Message-ID: <87mu47ol2z.fsf@redhat.com> > oh, I see. I guess the easiest way would be to use jtreg wrappers > (test/hotspot/jtreg/compiler/graalunit), there is README.md which > explains where you can get dependencies and where you need to put them > to make it work, after you finish that, you can run the test by > run-test framework as `make test > TEST=test/hotspot/jtreg/compiler/graalunit/CoreTest.java`. I gave it a try. I downloaded the dependencies with downloadLibs.sh. But then running the test fail. See output below. The comment line would run all the core tests? Is there a way to run only one? Roland. [roland at ws jdk-jdk]$ make CONF=linux-x86_64-server-release run-test TEST="compiler/graalunit/CoreTest.java" TEST_VM_OPTS="-server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI" Building target 'run-test' in configuration 'linux-x86_64-server-release' *** failed to import extension defpath from ~/code-tools/defpath/defpath.py: [Errno 2] No such file or directory: '/home/roland/code-tools/defpath/defpath.py' *** failed to import extension jcheck from ~/code-tools/jcheck/jcheck.py: [Errno 2] No such file or directory: '/home/roland/code-tools/jcheck/jcheck.py' Running tests using TEST_OPTS control variable 'VM_OPTIONS=-server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' Test selection 'compiler/graalunit/CoreTest.java', will run: * jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java Running test 'jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java' -------------------------------------------------- TEST: compiler/graalunit/CoreTest.java TEST JDK: /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk ACTION: build -- Passed. Build successful REASON: User specified action: run build compiler.graalunit.common.GraalUnitTestLauncher TIME: 1.427 seconds messages: command: build compiler.graalunit.common.GraalUnitTestLauncher reason: User specified action: run build compiler.graalunit.common.GraalUnitTestLauncher Library /: compile: compiler.graalunit.common.GraalUnitTestLauncher elapsed time (seconds): 1.427 ACTION: compile -- Passed. Compilation successful REASON: .class file out of date or does not exist TIME: 1.423 seconds messages: command: compile /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java reason: .class file out of date or does not exist Additional options from @modules: --add-modules jdk.internal.vm.compiler Mode: agentvm Agent id: 1 elapsed time (seconds): 1.423 configuration: Boot Layer (javac runtime environment) class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base javac compilation environment add modules: jdk.internal.vm.compiler source path: /home/roland/jdk-jdk/test/lib /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit /home/roland/jdk-jdk/test/hotspot/jtreg class path: /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 rerun: cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ HOME=/home/roland \ JDK8_HOME=/home/roland/jdk-14.0.1 \ LANG=en_US.UTF-8 \ LC_ALL=C \ PATH=/bin:/usr/bin:/usr/sbin \ TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ XMODIFIERS=@im=ibus \ /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/javac \ -J-XX:MaxRAMPercentage=3 \ -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ -J-server \ -J-XX:+UnlockExperimentalVMOptions \ -J-XX:+EnableJVMCI \ -J-Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -J-Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ -J-Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ -J-Dtest.compiler.opts= \ -J-Dtest.java.opts= \ -J-Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -J-Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -J-Dtest.timeout.factor=4.0 \ -J-Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -J-Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ -J-Dtest.name=compiler/graalunit/CoreTest.java \ -J-Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ -J-Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ -J-Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -J-Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ -J-Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -J-Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -J-Dtest.modules=jdk.internal.vm.compiler \ --add-modules jdk.internal.vm.compiler \ -d /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -sourcepath /home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -classpath /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java direct: Note: /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java uses unchecked or unsafe operations. Note: Recompile with -Xlint:unchecked for details. ACTION: build -- Passed. Build successful REASON: Named class compiled on demand TIME: 0.061 seconds messages: command: build jdk.test.lib.FileInstaller reason: Named class compiled on demand Library /test/lib: compile: jdk.test.lib.FileInstaller elapsed time (seconds): 0.061 ACTION: compile -- Passed. Compilation successful REASON: .class file out of date or does not exist TIME: 0.061 seconds messages: command: compile /home/roland/jdk-jdk/test/lib/jdk/test/lib/FileInstaller.java reason: .class file out of date or does not exist Additional options from @modules: --add-modules jdk.internal.vm.compiler Mode: agentvm Agent id: 1 elapsed time (seconds): 0.061 configuration: Boot Layer (javac runtime environment) class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base javac compilation environment add modules: jdk.internal.vm.compiler source path: /home/roland/jdk-jdk/test/lib /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit /home/roland/jdk-jdk/test/hotspot/jtreg class path: /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 rerun: cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ HOME=/home/roland \ JDK8_HOME=/home/roland/jdk-14.0.1 \ LANG=en_US.UTF-8 \ LC_ALL=C \ PATH=/bin:/usr/bin:/usr/sbin \ TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ XMODIFIERS=@im=ibus \ /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/javac \ -J-XX:MaxRAMPercentage=3 \ -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ -J-server \ -J-XX:+UnlockExperimentalVMOptions \ -J-XX:+EnableJVMCI \ -J-Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -J-Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ -J-Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ -J-Dtest.compiler.opts= \ -J-Dtest.java.opts= \ -J-Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -J-Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -J-Dtest.timeout.factor=4.0 \ -J-Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -J-Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ -J-Dtest.name=compiler/graalunit/CoreTest.java \ -J-Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ -J-Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ -J-Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -J-Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ -J-Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -J-Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -J-Dtest.modules=jdk.internal.vm.compiler \ --add-modules jdk.internal.vm.compiler \ -d /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib \ -sourcepath /home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -classpath /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 /home/roland/jdk-jdk/test/lib/jdk/test/lib/FileInstaller.java ACTION: driver -- Passed. Execution successful REASON: User specified action: run driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt TIME: 0.258 seconds messages: command: driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt reason: User specified action: run driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt Mode: agentvm Agent id: 2 elapsed time (seconds): 0.258 configuration: Boot Layer class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar /home/roland/tools/jtreg/build/images/jtreg/lib/junit.jar /home/roland/tools/jtreg/build/images/jtreg/lib/testng.jar /home/roland/tools/jtreg/build/images/jtreg/lib/jcommander.jar patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base Test Layer class path: /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib /home/roland/jdk-jdk/test/lib /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 /home/roland/jdk-jdk/test/hotspot/jtreg rerun: cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ HOME=/home/roland \ JDK8_HOME=/home/roland/jdk-14.0.1 \ LANG=en_US.UTF-8 \ LC_ALL=C \ PATH=/bin:/usr/bin:/usr/sbin \ TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ XMODIFIERS=@im=ibus \ /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java \ -Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ -Dtest.compiler.opts= \ -Dtest.java.opts= \ -Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -Dtest.timeout.factor=4.0 \ -Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ -Dtest.name=compiler/graalunit/CoreTest.java \ -Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ -Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ -Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ -Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -Dtest.modules=jdk.internal.vm.compiler \ -classpath /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar \ jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt STDOUT: copying /home/roland/jdk-jdk/test/hotspot/jtreg/ProblemList-graal.txt to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0/ExcludeList.txt STDERR: JavaTest Message: Test complete. ACTION: build -- Passed. All files up to date REASON: Named class compiled on demand TIME: 0.0 seconds messages: command: build compiler.graalunit.common.GraalUnitTestLauncher reason: Named class compiled on demand elapsed time (seconds): 0.0 ACTION: main -- Failed. Execution failed: `main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 REASON: User specified action: run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED TIME: 0.166 seconds messages: command: main -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED reason: User specified action: run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED Mode: othervm [/othervm specified] Additional options from @modules: --add-modules jdk.internal.vm.compiler elapsed time (seconds): 0.166 configuration: Boot Layer add modules: jdk.internal.vm.compiler STDOUT: INFO: graal libs dir is '/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal' INFO: use following pattern to find tests: org\.graalvm\.compiler\.core\.test.* Command line: [/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar:/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/com.oracle.mxtool.junit.jar com.oracle.mxtool.junit.FindClassesByAnnotatedMethods /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/jdk.vm.compiler.tests.jar @Test ] INFO: run command /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar:/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/com.oracle.mxtool.junit.jar com.oracle.mxtool.junit.FindClassesByAnnotatedMethods /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/jdk.vm.compiler.tests.jar @Test [2020-07-10T07:58:37.884779794Z] Gathering output for process 2096875 [2020-07-10T07:58:37.901243107Z] Waiting for completion for process 2096875 [2020-07-10T07:58:37.931042481Z] Waiting for completion finished for process 2096875 STDERR: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 at compiler.graalunit.common.GraalUnitTestLauncher.getListOfTestsByPrefix(GraalUnitTestLauncher.java:125) at compiler.graalunit.common.GraalUnitTestLauncher.main(GraalUnitTestLauncher.java:223) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:564) at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) at java.base/java.lang.Thread.run(Thread.java:832) JavaTest Message: Test threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 JavaTest Message: shutting down test STATUS:Failed.`main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 rerun: cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ HOME=/home/roland \ JDK8_HOME=/home/roland/jdk-14.0.1 \ LANG=en_US.UTF-8 \ LC_ALL=C \ PATH=/bin:/usr/bin:/usr/sbin \ TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ XMODIFIERS=@im=ibus \ CLASSPATH=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar \ /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java \ -Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ -Dtest.compiler.opts= \ -Dtest.java.opts= \ -Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ -Dtest.timeout.factor=4.0 \ -Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ -Dtest.name=compiler/graalunit/CoreTest.java \ -Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ -Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ -Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ -Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ -Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ -Dtest.modules=jdk.internal.vm.compiler \ --add-modules jdk.internal.vm.compiler \ -XX:MaxRAMPercentage=3 \ -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ -server \ -XX:+UnlockExperimentalVMOptions \ -XX:+EnableJVMCI \ -Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ -XX:+UnlockExperimentalVMOptions \ -XX:+EnableJVMCI \ com.sun.javatest.regtest.agent.MainWrapper /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/compiler/graalunit/CoreTest.d/main.0.jta -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 -------------------------------------------------- Test results: failed: 1 Report written to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-results/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/html/report.html Results written to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java Error: Some tests failed or other problems occurred. Finished running test 'jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java' Test report is stored in build/linux-x86_64-server-release/test-results/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java ============================== Test summary ============================== TEST TOTAL PASS FAIL ERROR jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java >> 1 0 1 0 << ============================== TEST FAILURE make[1]: *** [/home/roland/jdk-jdk/make/Init.gmk:319: main] Error 1 make: *** [/home/roland/jdk-jdk/make/Init.gmk:186: run-test] Error 2 From fairoz.matte at oracle.com Fri Jul 10 08:01:25 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Fri, 10 Jul 2020 01:01:25 -0700 (PDT) Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout In-Reply-To: References: Message-ID: Hi Goetz, This issue is only applicable to 11u. After the fix of JDK-8246203, which changed the algorithm for the verification used with VerifyIterativeGVN (takes more time) We have adjusted timeout to 1200 from 600. Thanks, Fairoz From: Lindenmaier, Goetz Sent: Friday, July 10, 2020 12:11 PM To: Fairoz Matte Cc: hotspot-compiler-dev at openjdk.java.net Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout Hi Fairoz, we also see this test timing out on mac. But only so with jdk11u. Do you mind sharing how you fixed this? Did you just increase the timeout, or did you figure out why this fails on mac in 11u? Thanks, Goetz From goetz.lindenmaier at sap.com Fri Jul 10 08:06:04 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 10 Jul 2020 08:06:04 +0000 Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout In-Reply-To: References: Message-ID: Hi Fairoz, Thanks for the info. It's still unclear to me why the algorithm takes longer in 11 than in 15 ... but no matter. Best regards, Goetz. From: Fairoz Matte Sent: Friday, July 10, 2020 10:01 AM To: Lindenmaier, Goetz Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout Hi Goetz, This issue is only applicable to 11u. After the fix of JDK-8246203, which changed the algorithm for the verification used with VerifyIterativeGVN (takes more time) We have adjusted timeout to 1200 from 600. Thanks, Fairoz From: Lindenmaier, Goetz > Sent: Friday, July 10, 2020 12:11 PM To: Fairoz Matte > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout Hi Fairoz, we also see this test timing out on mac. But only so with jdk11u. Do you mind sharing how you fixed this? Did you just increase the timeout, or did you figure out why this fails on mac in 11u? Thanks, Goetz From aph at redhat.com Fri Jul 10 08:10:26 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 10 Jul 2020 09:10:26 +0100 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: Message-ID: On 09/07/2020 21:31, Ludovic Henry wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8248676 > Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248676/webrev.00/ > Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions. > > This small fix is in the context of the larger support for Windows-AArch64. The attribute `__attribute__ ((constructor))` is not supported by MSVC, and the documented workaround is to allocate an empty static struct with a constructor. This patch only applies this workaround when compiling on Windows, and leaves other platforms unchanged. Please take out the #ifdef WINDOWS: we can use portable C++ here on all platforms. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Jul 10 08:11:13 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 10 Jul 2020 09:11:13 +0100 Subject: [aarch64-port-dev ] RFR(XS) 8248671: AArch64: Remove unused variables In-Reply-To: References: Message-ID: <108fd979-c60c-11d1-f125-e8e67160d099@redhat.com> On 09/07/2020 22:08, Bernhard Urban-Forster wrote: > JBS: https://bugs.openjdk.java.net/browse/JDK-8248671 > Webrev: http://cr.openjdk.java.net/~burban/8248671_unused-vars/ > > We found this issue while bringing up Windows+AArch64 support for HotSpot. The Microsoft toolchain (MSVC) seems to be slightly more pedantic than GCC. OK, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From rwestrel at redhat.com Fri Jul 10 08:27:34 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 10 Jul 2020 10:27:34 +0200 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <3db3371d-ed71-2bad-6c67-9fb6906d719f@oracle.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> <3db3371d-ed71-2bad-6c67-9fb6906d719f@oracle.com> Message-ID: <87k0zbojux.fsf@redhat.com> > I confirmed that VolatileAccessReadEliminationTest fails without the > patch and passed with it. Thanks for checking. Can I push the change? Do I need to have it go through the submit repo? Roland. From christian.hagedorn at oracle.com Fri Jul 10 08:28:04 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 10 Jul 2020 10:28:04 +0200 Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout In-Reply-To: References: Message-ID: <5eb6bc2c-7690-9d69-a82d-4ceac3399b3f@oracle.com> Hi Goetz As Fairoz has mentioned, JDK-8246203 changed the algorithm slightly such that it needs now more time for the verification (we check more nodes than before). JDK-8246203 originally only happened in JDK-11 with -XX:+VerifyIterativeGVN where a stack overflow crash occurred with a more or less HelloWorld test (the old algorithm as a recursive one). It turned out that with JDK-11 it compiled a specific big method which generated quite a lot of nodes in a chain which let it crash. However, with JDK-15 (and 16), this method was not compiled anymore as part of a HelloWorld test. It probably got changed since JDK-11 or is not called anymore when starting up. Therefore, we concluded that it must be an 11 only issue and just increased the timeout for the test as we have not seen timing it out in JDK-15 or 16. Best regards, Christian On 10.07.20 10:06, Lindenmaier, Goetz wrote: > Hi Fairoz, > > Thanks for the info. > > It's still unclear to me why the algorithm takes longer in 11 > than in 15 ... but no matter. > > Best regards, > Goetz. > > From: Fairoz Matte > Sent: Friday, July 10, 2020 10:01 AM > To: Lindenmaier, Goetz > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout > > Hi Goetz, > > This issue is only applicable to 11u. > After the fix of JDK-8246203, which changed the algorithm for the verification used with VerifyIterativeGVN (takes more time) > We have adjusted timeout to 1200 from 600. > > Thanks, > Fairoz > > From: Lindenmaier, Goetz > > Sent: Friday, July 10, 2020 12:11 PM > To: Fairoz Matte > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout > > Hi Fairoz, > > we also see this test timing out on mac. But only so with > jdk11u. > Do you mind sharing how you fixed this? Did you just increase > the timeout, or did you figure out why this fails on mac in 11u? > > Thanks, > Goetz > From goetz.lindenmaier at sap.com Fri Jul 10 08:46:36 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 10 Jul 2020 08:46:36 +0000 Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout In-Reply-To: <5eb6bc2c-7690-9d69-a82d-4ceac3399b3f@oracle.com> References: <5eb6bc2c-7690-9d69-a82d-4ceac3399b3f@oracle.com> Message-ID: Hi Christian, Thanks for your explanation. I can confirm that it never timed out in 15 in our test infra. It might start again in case code changes again, but that is the risk with any test. (We never saw the stack overflow, though.) Best regards, Goetz. > -----Original Message----- > From: Christian Hagedorn > Sent: Friday, July 10, 2020 10:28 AM > To: Lindenmaier, Goetz ; 'Fairoz Matte' > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: Question regarding 8248521: TestVerifyIterativeGVN.java is > failing with timeout > > Hi Goetz > > As Fairoz has mentioned, JDK-8246203 changed the algorithm slightly such > that it needs now more time for the verification (we check more nodes > than before). > > JDK-8246203 originally only happened in JDK-11 with > -XX:+VerifyIterativeGVN where a stack overflow crash occurred with a > more or less HelloWorld test (the old algorithm as a recursive one). It > turned out that with JDK-11 it compiled a specific big method which > generated quite a lot of nodes in a chain which let it crash. However, > with JDK-15 (and 16), this method was not compiled anymore as part of a > HelloWorld test. It probably got changed since JDK-11 or is not called > anymore when starting up. Therefore, we concluded that it must be an 11 > only issue and just increased the timeout for the test as we have not > seen timing it out in JDK-15 or 16. > > Best regards, > Christian > > On 10.07.20 10:06, Lindenmaier, Goetz wrote: > > Hi Fairoz, > > > > Thanks for the info. > > > > It's still unclear to me why the algorithm takes longer in 11 > > than in 15 ... but no matter. > > > > Best regards, > > Goetz. > > > > From: Fairoz Matte > > Sent: Friday, July 10, 2020 10:01 AM > > To: Lindenmaier, Goetz > > Cc: hotspot-compiler-dev at openjdk.java.net > > Subject: RE: Question regarding 8248521: TestVerifyIterativeGVN.java is > failing with timeout > > > > Hi Goetz, > > > > This issue is only applicable to 11u. > > After the fix of JDK-8246203, which changed the algorithm for the > verification used with VerifyIterativeGVN (takes more time) > > We have adjusted timeout to 1200 from 600. > > > > Thanks, > > Fairoz > > > > From: Lindenmaier, Goetz > > > > Sent: Friday, July 10, 2020 12:11 PM > > To: Fairoz Matte > > > > Cc: hotspot-compiler-dev at openjdk.java.net dev at openjdk.java.net> > > Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing > with timeout > > > > Hi Fairoz, > > > > we also see this test timing out on mac. But only so with > > jdk11u. > > Do you mind sharing how you fixed this? Did you just increase > > the timeout, or did you figure out why this fails on mac in 11u? > > > > Thanks, > > Goetz > > From christian.hagedorn at oracle.com Fri Jul 10 09:00:46 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 10 Jul 2020 11:00:46 +0200 Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing with timeout In-Reply-To: References: <5eb6bc2c-7690-9d69-a82d-4ceac3399b3f@oracle.com> Message-ID: <09cd9ab9-3dd4-2613-d7b1-86d4474081b0@oracle.com> Hi Goetz You're welcome and thanks for confirming that you have not seen a timeout in 15 either in your testing. We only saw the stack overflow a few times on SPARC in 11. Best regards, Christian On 10.07.20 10:46, Lindenmaier, Goetz wrote: > Hi Christian, > > Thanks for your explanation. > I can confirm that it never timed out in 15 in our test infra. > It might start again in case code changes again, but > that is the risk with any test. > (We never saw the stack overflow, though.) > > Best regards, > Goetz. > > >> -----Original Message----- >> From: Christian Hagedorn >> Sent: Friday, July 10, 2020 10:28 AM >> To: Lindenmaier, Goetz ; 'Fairoz Matte' >> >> Cc: hotspot-compiler-dev at openjdk.java.net >> Subject: Re: Question regarding 8248521: TestVerifyIterativeGVN.java is >> failing with timeout >> >> Hi Goetz >> >> As Fairoz has mentioned, JDK-8246203 changed the algorithm slightly such >> that it needs now more time for the verification (we check more nodes >> than before). >> >> JDK-8246203 originally only happened in JDK-11 with >> -XX:+VerifyIterativeGVN where a stack overflow crash occurred with a >> more or less HelloWorld test (the old algorithm as a recursive one). It >> turned out that with JDK-11 it compiled a specific big method which >> generated quite a lot of nodes in a chain which let it crash. However, >> with JDK-15 (and 16), this method was not compiled anymore as part of a >> HelloWorld test. It probably got changed since JDK-11 or is not called >> anymore when starting up. Therefore, we concluded that it must be an 11 >> only issue and just increased the timeout for the test as we have not >> seen timing it out in JDK-15 or 16. >> >> Best regards, >> Christian >> >> On 10.07.20 10:06, Lindenmaier, Goetz wrote: >>> Hi Fairoz, >>> >>> Thanks for the info. >>> >>> It's still unclear to me why the algorithm takes longer in 11 >>> than in 15 ... but no matter. >>> >>> Best regards, >>> Goetz. >>> >>> From: Fairoz Matte >>> Sent: Friday, July 10, 2020 10:01 AM >>> To: Lindenmaier, Goetz >>> Cc: hotspot-compiler-dev at openjdk.java.net >>> Subject: RE: Question regarding 8248521: TestVerifyIterativeGVN.java is >> failing with timeout >>> >>> Hi Goetz, >>> >>> This issue is only applicable to 11u. >>> After the fix of JDK-8246203, which changed the algorithm for the >> verification used with VerifyIterativeGVN (takes more time) >>> We have adjusted timeout to 1200 from 600. >>> >>> Thanks, >>> Fairoz >>> >>> From: Lindenmaier, Goetz >> > >>> Sent: Friday, July 10, 2020 12:11 PM >>> To: Fairoz Matte >> > >>> Cc: hotspot-compiler-dev at openjdk.java.net> dev at openjdk.java.net> >>> Subject: Question regarding 8248521: TestVerifyIterativeGVN.java is failing >> with timeout >>> >>> Hi Fairoz, >>> >>> we also see this test timing out on mac. But only so with >>> jdk11u. >>> Do you mind sharing how you fixed this? Did you just increase >>> the timeout, or did you figure out why this fails on mac in 11u? >>> >>> Thanks, >>> Goetz >>> From dean.long at oracle.com Fri Jul 10 09:49:37 2020 From: dean.long at oracle.com (Dean Long) Date: Fri, 10 Jul 2020 02:49:37 -0700 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87k0zbojux.fsf@redhat.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> <3db3371d-ed71-2bad-6c67-9fb6906d719f@oracle.com> <87k0zbojux.fsf@redhat.com> Message-ID: <1e4e7462-5b59-b627-972b-03262c32bc64@oracle.com> On 7/10/20 1:27 AM, Roland Westrelin wrote: >> I confirmed that VolatileAccessReadEliminationTest fails without the >> patch and passed with it. > Thanks for checking. Can I push the change? Yes.? You can list me as a reviewer. > Do I need to have it go > through the submit repo? Yes, I believe so. dl > Roland. > From lutz.schmidt at sap.com Fri Jul 10 10:13:28 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 10 Jul 2020 10:13:28 +0000 Subject: [CAUTION] RE: [CAUTION] RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails Message-ID: <5FAB4241-ADCE-4E5B-80E0-04893D8AC2C5@sap.com> Hi Richard, your change looks good to me. Reviewed. We had the change active in our test landscape for quite a few days now. It solves the issue and shows no negative side effects. Thanks for fixing. Lutz ?On 02.07.20, 18:45, "hotspot-compiler-dev on behalf of Lindenmaier, Goetz" wrote: Hi Richard, I had a look at your change, looks good. Reviewed. Thanks for fixing this. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Reingruber, Richard > Sent: Thursday, July 2, 2020 4:05 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: > compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails > > Hi, > > could I please get reviews for this small bugfix which adds support for AbsL > nodes to the C2 > backends on PPC and S390? > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8247695/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8247695 > > The patch successfully passes regression testing @SAP which includes JCK > and JTREG tests, also in > Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific > tests with fastdebug and > release builds. > > Thanks, Richard. From aph at redhat.com Fri Jul 10 11:21:21 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 10 Jul 2020 12:21:21 +0100 Subject: Stack allocation prototype for C2 In-Reply-To: <4C6D4959-00E1-4300-BE30-BB6FC60A491F@microsoft.com> References: <4C6D4959-00E1-4300-BE30-BB6FC60A491F@microsoft.com> Message-ID: <85cde128-9b75-c20b-6d17-3724c744392b@redhat.com> On 09/07/2020 20:28, Charlie Gracie wrote: > A final consideration is the footprint cost for project Loom. In the zone-based approach > would each virtual thread (fibre) have its own zone TLAB (or stack of TLABs)? I wouldn't have thought so. From the VM's point of view, it makes more sense for the zone TLAB to be owned by the carrier thread. There is the problem of what happens when we unmount a virtual thread, but that's quite solvable, I would have thought. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From richard.reingruber at sap.com Fri Jul 10 12:16:45 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Fri, 10 Jul 2020 12:16:45 +0000 Subject: RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails Message-ID: Hi Lutz, thanks for your Review. I'll push after the weekend. Cheers, Richard. -----Original Message----- From: Schmidt, Lutz Sent: Freitag, 10. Juli 2020 12:13 To: Lindenmaier, Goetz ; Reingruber, Richard ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [CAUTION] RE: [CAUTION] RFR(XS) 8247695: [PPC, S390]: compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails Hi Richard, your change looks good to me. Reviewed. We had the change active in our test landscape for quite a few days now. It solves the issue and shows no negative side effects. Thanks for fixing. Lutz ?On 02.07.20, 18:45, "hotspot-compiler-dev on behalf of Lindenmaier, Goetz" wrote: Hi Richard, I had a look at your change, looks good. Reviewed. Thanks for fixing this. Best regards, Goetz. > -----Original Message----- > From: hotspot-compiler-dev > On Behalf Of Reingruber, Richard > Sent: Thursday, July 2, 2020 4:05 PM > To: hotspot-compiler-dev at openjdk.java.net > Subject: [CAUTION] RFR(XS) 8247695: [PPC, S390]: > compiler/intrinsics/math/TestFpMinMaxIntrinsics.java fails > > Hi, > > could I please get reviews for this small bugfix which adds support for AbsL > nodes to the C2 > backends on PPC and S390? > > Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8247695/webrev.0/ > Bug: https://bugs.openjdk.java.net/browse/JDK-8247695 > > The patch successfully passes regression testing @SAP which includes JCK > and JTREG tests, also in > Xcomp mode, SPECjvm2008, SPECjbb2015, Renaissance Suite, SAP specific > tests with fastdebug and > release builds. > > Thanks, Richard. From rwestrel at redhat.com Fri Jul 10 13:16:56 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 10 Jul 2020 15:16:56 +0200 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <1e4e7462-5b59-b627-972b-03262c32bc64@oracle.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> <3db3371d-ed71-2bad-6c67-9fb6906d719f@oracle.com> <87k0zbojux.fsf@redhat.com> <1e4e7462-5b59-b627-972b-03262c32bc64@oracle.com> Message-ID: <87h7ufo6gn.fsf@redhat.com> > Yes.? You can list me as a reviewer. Ok. Thanks. I had it go through the submit repo and pushed it. Roland. From igor.ignatyev at oracle.com Fri Jul 10 14:03:29 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 10 Jul 2020 07:03:29 -0700 Subject: RFR(S): 8248598: [Graal] Several testcases from applications/jcstress/acqrel.java fails with forbidden state In-Reply-To: <87mu47ol2z.fsf@redhat.com> References: <87v9ixnl6n.fsf@redhat.com> <87BD32EE-9FAE-4AA7-9861-583B499E39BF@oracle.com> <87pn94o9ba.fsf@redhat.com> <87mu47ol2z.fsf@redhat.com> Message-ID: <10867EC3-A199-490D-A70D-43FC95CA69DA@oracle.com> > On Jul 10, 2020, at 1:01 AM, Roland Westrelin wrote: > > >> oh, I see. I guess the easiest way would be to use jtreg wrappers >> (test/hotspot/jtreg/compiler/graalunit), there is README.md which >> explains where you can get dependencies and where you need to put them >> to make it work, after you finish that, you can run the test by >> run-test framework as `make test >> TEST=test/hotspot/jtreg/compiler/graalunit/CoreTest.java`. > > I gave it a try. I downloaded the dependencies with downloadLibs.sh. But > then running the test fail. See output below. that's weird... Katya, could you please take a look? > > The comment line would run all the core tests? Is there a way to run > only one? AFAIK, the only way to do that is to temporary modify existing (or create a new one) jtreg test to have the concrete test name passed as `-prefix` option. -- Igor > > Roland. > > [roland at ws jdk-jdk]$ make CONF=linux-x86_64-server-release run-test TEST="compiler/graalunit/CoreTest.java" TEST_VM_OPTS="-server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI" > Building target 'run-test' in configuration 'linux-x86_64-server-release' > *** failed to import extension defpath from ~/code-tools/defpath/defpath.py: [Errno 2] No such file or directory: '/home/roland/code-tools/defpath/defpath.py' > *** failed to import extension jcheck from ~/code-tools/jcheck/jcheck.py: [Errno 2] No such file or directory: '/home/roland/code-tools/jcheck/jcheck.py' > Running tests using TEST_OPTS control variable 'VM_OPTIONS=-server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' > Test selection 'compiler/graalunit/CoreTest.java', will run: > * jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java > > Running test 'jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java' > -------------------------------------------------- > TEST: compiler/graalunit/CoreTest.java > TEST JDK: /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk > > ACTION: build -- Passed. Build successful > REASON: User specified action: run build compiler.graalunit.common.GraalUnitTestLauncher > TIME: 1.427 seconds > messages: > command: build compiler.graalunit.common.GraalUnitTestLauncher > reason: User specified action: run build compiler.graalunit.common.GraalUnitTestLauncher > Library /: > compile: compiler.graalunit.common.GraalUnitTestLauncher > elapsed time (seconds): 1.427 > > ACTION: compile -- Passed. Compilation successful > REASON: .class file out of date or does not exist > TIME: 1.423 seconds > messages: > command: compile /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java > reason: .class file out of date or does not exist > Additional options from @modules: --add-modules jdk.internal.vm.compiler > Mode: agentvm > Agent id: 1 > elapsed time (seconds): 1.423 > configuration: > Boot Layer (javac runtime environment) > class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar > patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base > > javac compilation environment > add modules: jdk.internal.vm.compiler > source path: /home/roland/jdk-jdk/test/lib > /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit > /home/roland/jdk-jdk/test/hotspot/jtreg > class path: /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 > > rerun: > cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ > HOME=/home/roland \ > JDK8_HOME=/home/roland/jdk-14.0.1 \ > LANG=en_US.UTF-8 \ > LC_ALL=C \ > PATH=/bin:/usr/bin:/usr/sbin \ > TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ > TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ > XMODIFIERS=@im=ibus \ > /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/javac \ > -J-XX:MaxRAMPercentage=3 \ > -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ > -J-server \ > -J-XX:+UnlockExperimentalVMOptions \ > -J-XX:+EnableJVMCI \ > -J-Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -J-Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ > -J-Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ > -J-Dtest.compiler.opts= \ > -J-Dtest.java.opts= \ > -J-Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -J-Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -J-Dtest.timeout.factor=4.0 \ > -J-Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -J-Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ > -J-Dtest.name=compiler/graalunit/CoreTest.java \ > -J-Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ > -J-Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ > -J-Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -J-Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ > -J-Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -J-Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -J-Dtest.modules=jdk.internal.vm.compiler \ > --add-modules jdk.internal.vm.compiler \ > -d /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -sourcepath /home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -classpath /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java > direct: > Note: /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/common/GraalUnitTestLauncher.java uses unchecked or unsafe operations. > Note: Recompile with -Xlint:unchecked for details. > > ACTION: build -- Passed. Build successful > REASON: Named class compiled on demand > TIME: 0.061 seconds > messages: > command: build jdk.test.lib.FileInstaller > reason: Named class compiled on demand > Library /test/lib: > compile: jdk.test.lib.FileInstaller > elapsed time (seconds): 0.061 > > ACTION: compile -- Passed. Compilation successful > REASON: .class file out of date or does not exist > TIME: 0.061 seconds > messages: > command: compile /home/roland/jdk-jdk/test/lib/jdk/test/lib/FileInstaller.java > reason: .class file out of date or does not exist > Additional options from @modules: --add-modules jdk.internal.vm.compiler > Mode: agentvm > Agent id: 1 > elapsed time (seconds): 0.061 > configuration: > Boot Layer (javac runtime environment) > class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar > patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base > > javac compilation environment > add modules: jdk.internal.vm.compiler > source path: /home/roland/jdk-jdk/test/lib > /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit > /home/roland/jdk-jdk/test/hotspot/jtreg > class path: /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 > > rerun: > cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ > HOME=/home/roland \ > JDK8_HOME=/home/roland/jdk-14.0.1 \ > LANG=en_US.UTF-8 \ > LC_ALL=C \ > PATH=/bin:/usr/bin:/usr/sbin \ > TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ > TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ > XMODIFIERS=@im=ibus \ > /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/javac \ > -J-XX:MaxRAMPercentage=3 \ > -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ > -J-server \ > -J-XX:+UnlockExperimentalVMOptions \ > -J-XX:+EnableJVMCI \ > -J-Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -J-Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ > -J-Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ > -J-Dtest.compiler.opts= \ > -J-Dtest.java.opts= \ > -J-Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -J-Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -J-Dtest.timeout.factor=4.0 \ > -J-Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -J-Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ > -J-Dtest.name=compiler/graalunit/CoreTest.java \ > -J-Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ > -J-Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ > -J-Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -J-Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ > -J-Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -J-Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -J-Dtest.modules=jdk.internal.vm.compiler \ > --add-modules jdk.internal.vm.compiler \ > -d /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib \ > -sourcepath /home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -classpath /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 /home/roland/jdk-jdk/test/lib/jdk/test/lib/FileInstaller.java > > ACTION: driver -- Passed. Execution successful > REASON: User specified action: run driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt > TIME: 0.258 seconds > messages: > command: driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt > reason: User specified action: run driver jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt > Mode: agentvm > Agent id: 2 > elapsed time (seconds): 0.258 > configuration: > Boot Layer > class path: /home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/junit.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/testng.jar > /home/roland/tools/jtreg/build/images/jtreg/lib/jcommander.jar > patch: java.base /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/patches/java.base > > Test Layer > class path: /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib > /home/roland/jdk-jdk/test/lib > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit > /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 > /home/roland/jdk-jdk/test/hotspot/jtreg > > rerun: > cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ > HOME=/home/roland \ > JDK8_HOME=/home/roland/jdk-14.0.1 \ > LANG=en_US.UTF-8 \ > LC_ALL=C \ > PATH=/bin:/usr/bin:/usr/sbin \ > TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ > TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ > XMODIFIERS=@im=ibus \ > /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java \ > -Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ > -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ > -Dtest.compiler.opts= \ > -Dtest.java.opts= \ > -Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -Dtest.timeout.factor=4.0 \ > -Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ > -Dtest.name=compiler/graalunit/CoreTest.java \ > -Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ > -Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ > -Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ > -Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -Dtest.modules=jdk.internal.vm.compiler \ > -classpath /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar \ > jdk.test.lib.FileInstaller ../../ProblemList-graal.txt ExcludeList.txt > STDOUT: > copying /home/roland/jdk-jdk/test/hotspot/jtreg/ProblemList-graal.txt to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0/ExcludeList.txt > STDERR: > > JavaTest Message: Test complete. > > > ACTION: build -- Passed. All files up to date > REASON: Named class compiled on demand > TIME: 0.0 seconds > messages: > command: build compiler.graalunit.common.GraalUnitTestLauncher > reason: Named class compiled on demand > elapsed time (seconds): 0.0 > > ACTION: main -- Failed. Execution failed: `main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 > REASON: User specified action: run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED > TIME: 0.166 seconds > messages: > command: main -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED > reason: User specified action: run main/othervm -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI compiler.graalunit.common.GraalUnitTestLauncher -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED > Mode: othervm [/othervm specified] > Additional options from @modules: --add-modules jdk.internal.vm.compiler > elapsed time (seconds): 0.166 > configuration: > Boot Layer > add modules: jdk.internal.vm.compiler > > STDOUT: > INFO: graal libs dir is '/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal' > INFO: use following pattern to find tests: org\.graalvm\.compiler\.core\.test.* > Command line: [/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar:/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/com.oracle.mxtool.junit.jar com.oracle.mxtool.junit.FindClassesByAnnotatedMethods /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/jdk.vm.compiler.tests.jar @Test ] > INFO: run command /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar -cp /home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar:/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/com.oracle.mxtool.junit.jar com.oracle.mxtool.junit.FindClassesByAnnotatedMethods /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal/jdk.vm.compiler.tests.jar @Test > [2020-07-10T07:58:37.884779794Z] Gathering output for process 2096875 > [2020-07-10T07:58:37.901243107Z] Waiting for completion for process 2096875 > [2020-07-10T07:58:37.931042481Z] Waiting for completion finished for process 2096875 > STDERR: > java.lang.Exception: Failed to find tests, VM crashed with exit code 1 > at compiler.graalunit.common.GraalUnitTestLauncher.getListOfTestsByPrefix(GraalUnitTestLauncher.java:125) > at compiler.graalunit.common.GraalUnitTestLauncher.main(GraalUnitTestLauncher.java:223) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:64) > at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:564) > at com.sun.javatest.regtest.agent.MainWrapper$MainThread.run(MainWrapper.java:127) > at java.base/java.lang.Thread.run(Thread.java:832) > > JavaTest Message: Test threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 > JavaTest Message: shutting down test > > STATUS:Failed.`main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 > rerun: > cd /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/scratch/0 && \ > HOME=/home/roland \ > JDK8_HOME=/home/roland/jdk-14.0.1 \ > LANG=en_US.UTF-8 \ > LC_ALL=C \ > PATH=/bin:/usr/bin:/usr/sbin \ > TEST_IMAGE_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test \ > TEST_IMAGE_GRAAL_DIR=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/graal \ > XMODIFIERS=@im=ibus \ > CLASSPATH=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0:/home/roland/jdk-jdk/test/hotspot/jtreg:/home/roland/tools/jtreg/build/images/jtreg/lib/javatest.jar:/home/roland/tools/jtreg/build/images/jtreg/lib/jtreg.jar \ > /home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk/bin/java \ > -Dtest.vm.opts='-XX:MaxRAMPercentage=3 -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -server -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI' \ > -Dtest.tool.vm.opts='-J-XX:MaxRAMPercentage=3 -J-Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp -J-server -J-XX:+UnlockExperimentalVMOptions -J-XX:+EnableJVMCI' \ > -Dtest.compiler.opts= \ > -Dtest.java.opts= \ > -Dtest.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -Dcompile.jdk=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/jdk \ > -Dtest.timeout.factor=4.0 \ > -Dtest.nativepath=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -Dtest.root=/home/roland/jdk-jdk/test/hotspot/jtreg \ > -Dtest.name=compiler/graalunit/CoreTest.java \ > -Dtest.file=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit/CoreTest.java \ > -Dtest.src=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit \ > -Dtest.src.path=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/lib:/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/test/hotspot/jtreg \ > -Dtest.classes=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d \ > -Dtest.class.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit/CoreTest.d:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -Dtest.class.path.prefix=/home/roland/jdk-jdk/test/hotspot/jtreg/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/test/lib:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0/compiler/graalunit:/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/classes/0 \ > -Dtest.modules=jdk.internal.vm.compiler \ > --add-modules jdk.internal.vm.compiler \ > -XX:MaxRAMPercentage=3 \ > -Djava.io.tmpdir=/home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/tmp \ > -server \ > -XX:+UnlockExperimentalVMOptions \ > -XX:+EnableJVMCI \ > -Djava.library.path=/home/roland/jdk-jdk/build/linux-x86_64-server-release/images/test/hotspot/jtreg/native \ > -XX:+UnlockExperimentalVMOptions \ > -XX:+EnableJVMCI \ > com.sun.javatest.regtest.agent.MainWrapper /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/compiler/graalunit/CoreTest.d/main.0.jta -prefix org.graalvm.compiler.core.test -exclude ExcludeList.txt -vmargs --add-opens=java.base/java.lang=ALL-UNNAMED > > TEST RESULT: Failed. Execution failed: `main' threw exception: java.lang.Exception: Failed to find tests, VM crashed with exit code 1 > -------------------------------------------------- > Test results: failed: 1 > Report written to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-results/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java/html/report.html > Results written to /home/roland/jdk-jdk/build/linux-x86_64-server-release/test-support/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java > Error: Some tests failed or other problems occurred. > Finished running test 'jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java' > Test report is stored in build/linux-x86_64-server-release/test-results/jtreg_test_hotspot_jtreg_compiler_graalunit_CoreTest_java > > ============================== > Test summary > ============================== > TEST TOTAL PASS FAIL ERROR > jtreg:test/hotspot/jtreg/compiler/graalunit/CoreTest.java >>> 1 0 1 0 << > ============================== > TEST FAILURE > > make[1]: *** [/home/roland/jdk-jdk/make/Init.gmk:319: main] Error 1 > make: *** [/home/roland/jdk-jdk/make/Init.gmk:186: run-test] Error 2 > > From jatin.bhateja at intel.com Fri Jul 10 14:04:30 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Fri, 10 Jul 2020 14:04:30 +0000 Subject: 8248830 : RFR[S] : C2 : Rotate API intrinsification for X86 Message-ID: Hi All, Following patch adds intrinsification and vectorization support for 4 java APIs:- * Integer.rotateLeft * Integer.rotateRight * Long.rotateLeft * Long.rotateRight JBS : https://bugs.openjdk.java.net/browse/JDK-8248830 WebRev: http://cr.openjdk.java.net/~jbhateja/8248830/webrev.01/ AVX512 offers 8 new vector rotate instructions [1], these can accept both immediate and variable rotate count arguments. Patch exploits both these flavors of instructions. Following are the benchmarks results Before: UseAVX=3 Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units RotateBenchmark.testRotateLeftI 20 1024 thrpt 2 13336.170 ops/ms RotateBenchmark.testRotateLeftL 20 1024 thrpt 2 8897.930 ops/ms RotateBenchmark.testRotateRightI 20 1024 thrpt 2 13447.273 ops/ms RotateBenchmark.testRotateRightL 20 1024 thrpt 2 8783.535 ops/ms After: UseAVX=3 Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units RotateBenchmark.testRotateLeftI 20 1024 thrpt 2 20438.609 ops/ms RotateBenchmark.testRotateLeftL 20 1024 thrpt 2 11238.110 ops/ms RotateBenchmark.testRotateRightI 20 1024 thrpt 2 20306.805 ops/ms RotateBenchmark.testRotateRightL 20 1024 thrpt 2 11190.639 ops/ms Kindly review the patch. Best Regards, Jatin [1] : https://www.felixcloutier.com/x86/vprold:vprolvd:vprolq:vprolvq https://www.felixcloutier.com/x86/vprord:vprorvd:vprorq:vprorvq From igor.ignatyev at oracle.com Fri Jul 10 16:24:38 2020 From: igor.ignatyev at oracle.com (igor.ignatyev at oracle.com) Date: Fri, 10 Jul 2020 09:24:38 -0700 Subject: RFR [15] : 8249019 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_compiler tests In-Reply-To: References: Message-ID: <4DE9F13E-2B55-42DA-ABCB-4CF5F6EE422A@oracle.com> Thanks Katya. Can I get a (R)eview? ? Igor > On Jul 9, 2020, at 1:44 PM, Ekaterina Pavlova wrote: > > ?Looks good, > > -katya > > >> On 7/9/20 1:34 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >>> 269 lines changed: 0 ins; 163 del; 106 mod >> Hi all, >> could you please review the patch which removes `FileInstaller . .` jtreg action from vmTestbase_vm_compiler tests? >> from the main issue(8204985): >>> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. >> some of vmTestbase_vm_compiler tests depend on FileInstaller, so they are left intact and will be updated separately. >> testing: :vmTestbase_vm_compiler on linux-x64 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249019 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >> Thanks, >> -- Igor > From vladimir.x.ivanov at oracle.com Fri Jul 10 16:26:33 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 10 Jul 2020 19:26:33 +0300 Subject: [15] RFR (S): 8247502: PhaseStringOpts crashes while optimising effectively dead code Message-ID: <9ee563ef-501b-bdaa-4e87-8e9e8aaf2dd7@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8247502 http://cr.openjdk.java.net/~vlivanov/8247502/webrev.00/ As Tobias discovered, PhaseStringOpts crashes when it encounters String::append() argument being TOP: TOP is a constant, but the code expects to see a String constant instead. It happens while processing a call in unreachable infinite loop. The code is effectively dead, but IGVN and PhaseRemoveUseless don't see that. It is discovered later when loop opts kick in which clean it up. Proposed fix tries to make the code more robust and just bails out the optimization when TOP is encountered. Alternative way to fix the problem would be to clean up the graph before PhaseStringOpts (e.g., by running PhaseIdealLoop(LoopOptsNone) since PhaseRemoveUseless is not enough), but PhaseIdealLoop pass can be expensive. So, I'm in favor of the local fix in PhaseStringOpts. Testing: crash reproducer, hs-precheckin-comp, hs-tier1, hs-tier2, tier1 Thanks! PS: no regression test since I wasn't able to extract a simple reproducer from the crash log. Best regards, Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Jul 10 17:32:00 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 10 Jul 2020 20:32:00 +0300 Subject: 8248830 : RFR[S] : C2 : Rotate API intrinsification for X86 In-Reply-To: References: Message-ID: <9ad508ae-bf73-d4d7-ff2b-d1f4280adeff@oracle.com> > WebRev: http://cr.openjdk.java.net/~jbhateja/8248830/webrev.01/ Nice work, Jatin! High-level comment: so far, there were no pressing need in explicitly marking the methods as intrinsics. ROR/ROL instructions were selected during matching [1]. Now the patch introduces dedicated nodes (RotateLeft/RotateRight) specifically for intrinsics which partly duplicates existing logic. As a consequence, while ROL/ROR instructions can be utilized without using the dedicated API methods, auto-vectorization won't handle rotations unless the intrinsics are used. It would be nice to unify the approaches and get rid of the duplication. (Either by folding scalar operations into Rotate nodes or by extending auto-vectorizer to detect vector rotates in a similar way scalar rotates are handled.) Otherwise, looks good. I'll submit it for testing. Minor comments: src/hotspot/share/opto/countbitsnode.hpp Thought the nodes look like in the right company, formally speaking, RotateLeft/RotateRight aren't subtypes of CountBitsNode. Maybe rename countbitsnode.hpp or move RotateLeft/RotateRight declarations to src/hotspot/share/opto/intrinsicnode.hpp? Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk/jdk/file/tip/src/hotspot/cpu/x86/x86_64.ad#l8970 > > AVX512 offers 8 new vector rotate instructions [1], these can accept both immediate and variable rotate count > arguments. Patch exploits both these flavors of instructions. > > Following are the benchmarks results > > Before: > UseAVX=3 > Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units > RotateBenchmark.testRotateLeftI 20 1024 thrpt 2 13336.170 ops/ms > RotateBenchmark.testRotateLeftL 20 1024 thrpt 2 8897.930 ops/ms > RotateBenchmark.testRotateRightI 20 1024 thrpt 2 13447.273 ops/ms > RotateBenchmark.testRotateRightL 20 1024 thrpt 2 8783.535 ops/ms > > After: > UseAVX=3 > Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units > RotateBenchmark.testRotateLeftI 20 1024 thrpt 2 20438.609 ops/ms > RotateBenchmark.testRotateLeftL 20 1024 thrpt 2 11238.110 ops/ms > RotateBenchmark.testRotateRightI 20 1024 thrpt 2 20306.805 ops/ms > RotateBenchmark.testRotateRightL 20 1024 thrpt 2 11190.639 ops/ms > > Kindly review the patch. > > Best Regards, > Jatin > > [1] : https://www.felixcloutier.com/x86/vprold:vprolvd:vprolq:vprolvq > https://www.felixcloutier.com/x86/vprord:vprorvd:vprorq:vprorvq > > From luhenry at microsoft.com Fri Jul 10 17:58:42 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 10 Jul 2020 17:58:42 +0000 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: , Message-ID: Hi Andrew, I uploaded a new webrev following your review. Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248676/webrev.01/ Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions Thank you, ________________________________________ From: Andrew Haley Sent: Friday, July 10, 2020 01:10 To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64 Subject: Re: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor On 09/07/2020 21:31, Ludovic Henry wrote: > JBS: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8248676&data=02%7C01%7Cluhenry%40microsoft.com%7C69984ac08e714025b20608d824a8b8c5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637299654356677974&sdata=Xh0ioKQ3xkfQcrNwxbxfY8jPDyIUWuCA%2FiXFZWMiruE%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248676%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C69984ac08e714025b20608d824a8b8c5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637299654356677974&sdata=8c1m3FeJw2ppdJ9LdYNY352AhS9ZcZ9WXGCmsKojU1s%3D&reserved=0 > Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions. > > This small fix is in the context of the larger support for Windows-AArch64. The attribute `__attribute__ ((constructor))` is not supported by MSVC, and the documented workaround is to allocate an empty static struct with a constructor. This patch only applies this workaround when compiling on Windows, and leaves other platforms unchanged. Please take out the #ifdef WINDOWS: we can use portable C++ here on all platforms. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7Cluhenry%40microsoft.com%7C69984ac08e714025b20608d824a8b8c5%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637299654356677974&sdata=%2FmdsMxaEVB%2FWlJVAY%2FyxrKt6XOH0GrfL64EXTZgZAxE%3D&reserved=0 EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Fri Jul 10 18:09:33 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 11:09:33 -0700 Subject: RFR [15] : 8249019 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_compiler tests In-Reply-To: <4DE9F13E-2B55-42DA-ABCB-4CF5F6EE422A@oracle.com> References: <4DE9F13E-2B55-42DA-ABCB-4CF5F6EE422A@oracle.com> Message-ID: Reviewed. Vladimir K On 7/10/20 9:24 AM, igor.ignatyev at oracle.com wrote: > Thanks Katya. > > Can I get a (R)eview? > > ? Igor > >> On Jul 9, 2020, at 1:44 PM, Ekaterina Pavlova wrote: >> >> ?Looks good, >> >> -katya >> >> >>> On 7/9/20 1:34 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >>>> 269 lines changed: 0 ins; 163 del; 106 mod >>> Hi all, >>> could you please review the patch which removes `FileInstaller . .` jtreg action from vmTestbase_vm_compiler tests? >>> from the main issue(8204985): >>>> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. >>> some of vmTestbase_vm_compiler tests depend on FileInstaller, so they are left intact and will be updated separately. >>> testing: :vmTestbase_vm_compiler on linux-x64 >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249019 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >>> Thanks, >>> -- Igor >> > From vladimir.x.ivanov at oracle.com Fri Jul 10 18:42:25 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 10 Jul 2020 21:42:25 +0300 Subject: Stack allocation prototype for C2 In-Reply-To: <4C6D4959-00E1-4300-BE30-BB6FC60A491F@microsoft.com> References: <4C6D4959-00E1-4300-BE30-BB6FC60A491F@microsoft.com> Message-ID: Hi Charlie, > Thanks for reviewing the document and providing your feedback. One request about improving the document: please, elaborate more on interactions with EA implementation in C2. For example, stack allocation can be used for both non-scalarizable NoEscape and ArgEscape, but the latter requires GC barriers everywhere to check for stack allocated objects while in the former case it can be limited only to the current nmethod. >> From the design overview and the implementation, I'm concerned about >> far-reaching consequences of the chosen approach. It's not limited just >> to existing set of JVM features, but as Andrew noted will affect the >> design of forthcoming functionality as well. >> >> I think it's worth to start a broad discussion (HotSpot-wide) and decide >> how much JVM design complexity budged it is worth spending on such an >> optimization. > > This is a great suggestion, where and how should we start this discussion > to get feedback from the broader community? I suggest to initiate a new discussion on hotspot-dev at ojn and stress that it's not just about optimizations in JIT-compilers, but a proposal to enable object allocations on thread stack and discuss the effects on other JVM subsystems and features. >> As we discussed off-line (right after FOSDEM), I do see the benefits of >> in-memory representation for non-escaping objects: memory aliasing >> (either indeterminate base or indexed access) imposes inherent >> constraints on the escape analysis (both partial and conservative >> approaches suffer from it). Nevertheless, some of the problematic cases >> can be addressed by improving existing approach or introducing a more >> powerful analysis: covering more cases and making the analysis >> control-sensitive should improve the situation. > > We would like to work to improve escape analysis as per your suggestions above. > If we can achieve the same allocation reductions with this solution, it would be a > better long-term solution. We would like to continue reviewing stack allocation > and start a sandbox project as Dalibor suggested, but work on improving escape > analysis and measure against the sandbox for a baseline. Good idea! Keeping the up-to-date patches in a sandbox repository would be very convenient. >> Also, the alternative approach (called zone-based heap allocation) looks >> very attractive to me. I haven't thought it through, but it looks like >> keeping the objects on the Java heap can save us a lot of complexity on >> the implementation side (more memory available for allocation - not >> necessarily fixed amount, no need to migrate objects from stack to heap, >> GC barriers are unaffected, etc.). For example, reserving a dedicated >> TLAB (or a stack of TLABs?) and do nmethod-scoped allocations from C2 >> code looks attractive. It can simplify many aspects of the >> implementation: much more space available, free migration of >> non-escaping objects to heap on deoptimization. > > We have been thinking about this idea since FOSDEM and we completely agree > with the pros of zone-based allocation. The biggest benefits are the removal of > the restrictions in compressed oops mode and that barriers would not have to be > modified. > > For this approach were you envisioning that objects allocated in a stack zone are > pinned until the method returns? Also, while that zone memory is pinned the GC > would not reclaim memory in that zone? That is what we were thinking, but we > are worried about the complexity of the changes and restrictions it might add to > the GC implementations. Just want to reiterate that I haven't thought the idea through, but my educated guess is there should be a way to implement it in an optimistic way and mostly transparent to runtime and GCs. Just a sketch of the idea: (1) JIT can optimistically use a dedicated TLAB in some scope (e.g., nmethod-based: record a watermark at nmethod entry for future use); (2) when leaving the scope (e.g, on nmethod exit), JIT can try to free allocated space (up to some watermark), but has to verify that some per-thread invariant still holds; (3) runtime can break the invariant at any time, but has to ensure that all allocated objects end up in Java heap. For example (assuming all TLABs are allocated on-heap): using "the same zone TLAB is registered with the thread" as the invariant and de-registering zone TLAB with the thread (allocating new TLAB / resetting it to NULL) should do the job. Plus, there's an option to promote zone TLAB to ordinary TLAB may reduce heap waste. So far, I don't see any major problems, but it is pending some validation with an experiment to get an understanding how efficient proposed scheme is in reducing allocation rate. > Another thought is about the added cost to method enter / exit. With the current > on stack approach there is no added instructions for entering / exiting a method > since the stack size is just larger. For the zone-based approach we would need to > have a few more instructions on enter and exit to get the space from the zone TLAB > and to return it. If the current zone TLAB is full we would need to do more work to > get another one. Hopefully the common case of satisfying the space requirements > from the current zone TLAB would on average be the same or less than the current > TLAB checks for fast path allocations. Allocating a TLAB per method looks wasteful: TLABs are normally quite large (hence more heap waste for deep thread stacks and large number of threads) and their allocation is expensive (requires a CAS). > A final consideration is the footprint cost for project Loom. In the zone-based approach > would each virtual thread (fibre) have its own zone TLAB (or stack of TLABs)? If each > virtual thread had a zone TLAB it may lead to more frequent GCs because a significant > portion of the heap is reserved for zone-based allocations. IMO having a TLAB per virtual thread may cause too much waste: TLAB size can easily outweight the footprint of the virtual thread itself. Sharing a TLAB from a carrier thread may help, but it can't be used across possible freeze points. So, I don't have a clear picture what will be the best option there. > We do not see any of these as showstoppers, but just be sure we have the full picture. >> Another idea: >> >> "When dealing with stack allocated objects in loops we need a lifetime >> overlap check." >> >> It doesn't look specific to stack-allocated objects. Non-overlapping >> live ranges can be coalesced the same way for on-heap freshly allocated >> objects. It should get comparable reduction in allocation pressure >> (single allocation per loop vs allocation per iteration) and doesn't >> require stack allocation support at all (as an example [1]). >> >> If such improvements are enabled for non-escaping on-heap objects, how >> much benefit will stack allocation bring on top of that? IMO the >> performance gap should become much narrower. > > We agree, it?s one of the first things we wanted to try after we submitted the initial stack > allocation code for review. Again, our approach would be to have the current stack allocation > prototype as a baseline and work to see if we can shrink the gap with other approaches. Sounds good! Best regards, Vladimir Ivanov From igor.ignatyev at oracle.com Fri Jul 10 18:51:11 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 10 Jul 2020 11:51:11 -0700 Subject: RFR [15] : 8249019 : clean up FileInstaller $test.src $cwd in vmTestbase_vm_compiler tests In-Reply-To: References: <4DE9F13E-2B55-42DA-ABCB-4CF5F6EE422A@oracle.com> Message-ID: thanks Vladimir, pushed to jdk/jdk15. -- Igor > On Jul 10, 2020, at 11:09 AM, Vladimir Kozlov wrote: > > Reviewed. > > Vladimir K > > On 7/10/20 9:24 AM, igor.ignatyev at oracle.com wrote: >> Thanks Katya. >> Can I get a (R)eview? >> ? Igor >>> On Jul 9, 2020, at 1:44 PM, Ekaterina Pavlova wrote: >>> >>> ?Looks good, >>> >>> -katya >>> >>> >>>> On 7/9/20 1:34 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >>>>> 269 lines changed: 0 ins; 163 del; 106 mod >>>> Hi all, >>>> could you please review the patch which removes `FileInstaller . .` jtreg action from vmTestbase_vm_compiler tests? >>>> from the main issue(8204985): >>>>> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. >>>> some of vmTestbase_vm_compiler tests depend on FileInstaller, so they are left intact and will be updated separately. >>>> testing: :vmTestbase_vm_compiler on linux-x64 >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249019 >>>> webrev: http://cr.openjdk.java.net/~iignatyev//8249019/webrev.00 >>>> Thanks, >>>> -- Igor >>> From igor.ignatyev at oracle.com Fri Jul 10 19:07:12 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 10 Jul 2020 12:07:12 -0700 Subject: RFR(S) [15] : 8249000 : vm.gc.X should take selected JIT into account In-Reply-To: References: <6964ac32-e9ec-d700-0bdb-ea51f4610afe@oracle.com> <7A1992A7-1493-4DF0-B621-195CE986D34F@oracle.com> <2c92a9a5-77af-c100-fa9b-f765e9d23dce@oracle.com> Message-ID: <6409D2AD-173C-451A-814E-32C88860A5C5@oracle.com> thanks Vladimir, pushed. -- Igor > On Jul 8, 2020, at 3:36 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 7/8/20 11:40 AM, Igor Ignatyev wrote: >> Thanks Vladimir. >> for the record, I've updated ProblemList-graal.txt w/ the following: >>> diff -r 14ffd658a23a test/hotspot/jtreg/ProblemList-graal.txt >>> --- a/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:35:30 2020 -0700 >>> +++ b/test/hotspot/jtreg/ProblemList-graal.txt Wed Jul 08 11:37:44 2020 -0700 >>> @@ -229,6 +229,7 @@ >>> compiler/loopopts/TestOverunrolling.java 8207267 generic-all >>> compiler/jsr292/NonInlinedCall/InvokeTest.java 8207267 generic-all >>> compiler/codegen/TestTrichotomyExpressions.java 8207267 generic-all >>> +gc/stress/TestReclaimStringsLeaksMemory.java 8207267 generic-all >>> runtime/exceptionMsgs/AbstractMethodError/AbstractMethodErrorTest.java 8222582 generic-all >> -- Igor >>> On Jul 8, 2020, at 11:34 AM, Vladimir Kozlov wrote: >>> >>> Thank you, Igor >>> >>> I got the difference between `vm.gc` and `vm.gc.X`. >>> >>> In this case TestReclaimStringsLeaksMemory.java should be put into ProblemList-graal.txt with 8207267 to enable it with libgraal. Current usage of !vm.graal.enabled in test is to skip this test with Java Graal because its effect on Java heap. >>> >>> On 7/7/20 8:30 PM, Igor Ignatyev wrote: >>>> Hi Vladimir, >>>> thanks for your review! >>>> `vm.gc` and `vm.gc.X`-s are different beasts (and admittedly, they confuse people a lot), `vm.gc` is set to "X", by jtreg itself, only if there is UseXGC in vm flags, otherwise it's "null". `vm.gc.X` are set by VMProps class, and you can have more than one vm.gc.X == true, as vm.gc.X means that X gc is supported by JVM and it can be selected; so if there are no Use.*GC in vm flags, vm.gc.X will yield true for all GCs which JVM was built with; if one of UseXGC is provided, only corresponding vm.gc.X is true, and all others are false. so to answer your questions, yes `vm.gc` can be "null" (if there are no Use.*GC) , and yes `vm.gc.Z & vm.gc.Serial & vm.gc == null` can be true (if there are no Use.*GC and JVM supports both Z and Serial GCs). >>> >>> Interesting. I thought vmGC will list only one selected GC. That explains requires in TestZGCWithCDS.java. >>> >>> You only need to add TestReclaimStringsLeaksMemory.java into ProblemList-graal.txt. >>> >>> Thanks, >>> Vladimir >>> >>>> Thanks, >>>> -- Igor >>>>> On Jul 7, 2020, at 8:00 PM, Vladimir Kozlov wrote: >>>>> >>>>> Nice clean up, Igor >>>>> >>>>> test/hotspot/jtreg/gc/stress/TestReclaimStringsLeaksMemory.java >>>>> >>>>> Do we even can have vm.gc=="null" based on code in VMProps.java? At least some GC should be selected ergonomically even if non is specified on command line. >>>>> >>>>> - * @requires vm.gc=="null" & !vm.graal.enabled & !vm.debug >>>>> + * @requires vm.gc == "null" >>>>> + * @requires !vm.debug >>>>> >>>>> >>>>> test/hotspot/jtreg/runtime/cds/appcds/TestZGCWithCDS.java >>>>> >>>>> Does next combination of @requires ever work? I thought such sequence means 'AND' operation on all such conditions. >>>>> >>>>> * @requires vm.gc.Z >>>>> * @requires vm.gc.Serial >>>>> * @requires vm.gc == null >>>>> >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/7/20 5:38 PM, Igor Ignatyev wrote: >>>>>> http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>>>>> 241 lines changed: 34 ins; 5 del; 202 mod; >>>>>> Hi all, >>>>>> could you please review the patch which modifies requires/VMProps to set vm.gc.X to false if Graal is selected and X GC isn't supported by Graal? >>>>>> the patch also replaces @requires similar to `vm.gc.X & !vm.graal.enabled` w/ `vm.gc.X` where it's applicable. >>>>>> from JBS: >>>>>>> not all GCs are supported by Graal JIT, which leads to failures like JDK-8247527 and boilerplate fixes like replacing all `@requires vm.gc.Z` w/ `@requires vm.gc.Z & !vm.graal.enabled`. >>>>>>> >>>>>>> as vm.gc.X means that X GC can be selected, it would be more natural, less surprising, and much more clear to have it true if the selected JIT supports the said X GC. >>>>>> webrev: http://cr.openjdk.java.net/~iignatyev/8249000/webrev.00/ >>>>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249000 >>>>>> testing: test/hotspot/jtreg/{gc,compiler,runtime,serviceability} on {linux,windows,macos}-x64 w/ and w/o Graal as JIT >>>>>> Thanks, >>>>>> -- Igor From vladimir.kozlov at oracle.com Fri Jul 10 22:56:54 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 15:56:54 -0700 Subject: [16] RFFR(S) 8249165: Remove unneeded nops introduced by 8234160 changes Message-ID: <3baf8ea8-0ae3-1ce5-4d7a-0f524c53bb30@oracle.com> https://cr.openjdk.java.net/~kvn/8249165/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8249165 Check for branch instruction at the and of code block when generating NOPs to align it. I did not see significant difference in performance in our regular benchmarks (jvm2008, JBB) but I think it is still good to do. Thanks, Vladimir From vladimir.kozlov at oracle.com Fri Jul 10 23:19:32 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 16:19:32 -0700 Subject: [15] RFR (S): 8247502: PhaseStringOpts crashes while optimising effectively dead code In-Reply-To: <9ee563ef-501b-bdaa-4e87-8e9e8aaf2dd7@oracle.com> References: <9ee563ef-501b-bdaa-4e87-8e9e8aaf2dd7@oracle.com> Message-ID: I agree with this small fix. Thanks, Vladimir On 7/10/20 9:26 AM, Vladimir Ivanov wrote: > https://bugs.openjdk.java.net/browse/JDK-8247502 > http://cr.openjdk.java.net/~vlivanov/8247502/webrev.00/ > > As Tobias discovered, PhaseStringOpts crashes when it encounters String::append() argument being TOP: TOP is a constant, > but the code expects to see a String constant instead. > > It happens while processing a call in unreachable infinite loop. The code is effectively dead, but IGVN and > PhaseRemoveUseless don't see that. It is discovered later when loop opts kick in which clean it up. > > Proposed fix tries to make the code more robust and just bails out the optimization when TOP is encountered. > > Alternative way to fix the problem would be to clean up the graph before PhaseStringOpts (e.g., by running > PhaseIdealLoop(LoopOptsNone) since PhaseRemoveUseless is not enough), but PhaseIdealLoop pass can be expensive. So, I'm > in favor of the local fix in PhaseStringOpts. > > Testing: crash reproducer, hs-precheckin-comp, hs-tier1, hs-tier2, tier1 > > Thanks! > > PS: no regression test since I wasn't able to extract a simple reproducer from the crash log. > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Fri Jul 10 23:23:33 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 11 Jul 2020 02:23:33 +0300 Subject: [16] RFFR(S) 8249165: Remove unneeded nops introduced by 8234160 changes In-Reply-To: <3baf8ea8-0ae3-1ce5-4d7a-0f524c53bb30@oracle.com> References: <3baf8ea8-0ae3-1ce5-4d7a-0f524c53bb30@oracle.com> Message-ID: Looks good. Best regards, Vladimir Ivanov On 11.07.2020 01:56, Vladimir Kozlov wrote: > https://cr.openjdk.java.net/~kvn/8249165/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8249165 > > Check for branch instruction at the and of code block when generating > NOPs to align it. > I did not see significant difference in performance in our regular > benchmarks (jvm2008, JBB) but I think it is still good to do. > > Thanks, > Vladimir From vladimir.kozlov at oracle.com Fri Jul 10 23:25:03 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 16:25:03 -0700 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> Message-ID: <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> Looks good. Thanks, Vladimir On 7/10/20 12:37 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248552 > http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ > > In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of the loop > induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p and ends up after > the AddNode which updates the induction variable p. In the last iteration j equals 2 and is then updated to 0. The > division/modulo node n is now executed before the loop limit check which results in a SIGFPE. > > The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed (i.e. > control in NULL) and is split through a phi which has an input that could be zero. This should only happen for an > induction variable phi of a trip-counted (integer) loop. > > Best regards, > Christian From vladimir.kozlov at oracle.com Fri Jul 10 23:26:09 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 16:26:09 -0700 Subject: [16] RFFR(S) 8249165: Remove unneeded nops introduced by 8234160 changes In-Reply-To: References: <3baf8ea8-0ae3-1ce5-4d7a-0f524c53bb30@oracle.com> Message-ID: Thank you, Vladimir On 7/10/20 4:23 PM, Vladimir Ivanov wrote: > Looks good. > > Best regards, > Vladimir Ivanov > > On 11.07.2020 01:56, Vladimir Kozlov wrote: >> https://cr.openjdk.java.net/~kvn/8249165/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8249165 >> >> Check for branch instruction at the and of code block when generating NOPs to align it. >> I did not see significant difference in performance in our regular benchmarks (jvm2008, JBB) but I think it is still >> good to do. >> >> Thanks, >> Vladimir From vladimir.kozlov at oracle.com Fri Jul 10 23:32:36 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 16:32:36 -0700 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> <87r1topriw.fsf@redhat.com> <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> <87sge0oqv8.fsf@redhat.com> Message-ID: On 7/9/20 5:16 AM, Christian Hagedorn wrote: > Hi Roland > > On 09.07.20 13:43, Roland Westrelin wrote: >> new webrev: >> http://cr.openjdk.java.net/~roland/8229495/webrev.01/ > > That looks good to me! +1 Thanks, Vladimir K > >>> I submitted some testing. >> >> Thanks. > > An extended testing was completed successfully (up to tier7). > >>> While at it, you might want to consider to update other uses of the >>> pattern Opcode() == Op_Opaque1 by is_Opaque1() as well like in >>> loopTransform.cpp: >>> >>> 1158???? assert(iff->in(1)->in(1)->Opcode() == Op_Opaque1, "unexpected >>> predicate shape"); >> >> Except in this case it really is an Opaque1 instead of a subclass so >> using is_Opaque1() would weaken the assert. > > You're right, I have not thought about that - then better leave it as it is. > >>> I observed a Java Fuzzer crash ("fatal error: DEBUG MESSAGE: duplicated >>> predicate failed which is impossible") this weekend which looked very >>> similar to this bug and indeed it could be fixed with your patch. You >>> could add it as additional testcase. Here is the simplified code and the >>> command line I used to reproduce it. >> >> Thanks for test case. I included it in the new webrev. > > Great, thanks for adding it. > > Best regards, > Christian From vladimir.kozlov at oracle.com Sat Jul 11 00:06:15 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 10 Jul 2020 17:06:15 -0700 Subject: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java fails with -Xcomp -XX:TieredStopAtLevel=1 In-Reply-To: <958fecdf-d7a1-4b22-835e-a75fadda0a84@default> References: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> <70057c31-e535-f03a-391d-d181b2ec150b@oracle.com> <958fecdf-d7a1-4b22-835e-a75fadda0a84@default> Message-ID: <37bb4585-21b5-a6e7-9ee1-88ccf9be0914@oracle.com> Fix is good. I think next are reasons you don't get MDO in this scenario. Tier1 (C1 compilation) does not generate profiling code and does not created MDO. C1 request MDO only with tiers 2 and 3 [1][2]. With -Xcomp flag a Java method is not executed in Interpreter but requests its compilation and waits when it is finished. As result MDO is not created in Interpreter too. May be late if a method is deoptimized it will be executed in Interpreter and MDO will be created. Thanks, Vladimir [1] http://hg.openjdk.java.net/jdk/jdk/file/796c9fa50850/src/hotspot/share/c1/c1_Compilation.hpp#l226 [2] http://hg.openjdk.java.net/jdk/jdk/file/796c9fa50850/src/hotspot/share/c1/c1_Compilation.cpp#l381 On 7/7/20 8:47 PM, Fairoz Matte wrote: > Thanks Chris, for the review comments. > > I have updated the suggested change. > > Thanks, > Fairoz > >> -----Original Message----- >> From: Chris Plummer >> Sent: Wednesday, July 8, 2020 3:38 AM >> To: Fairoz Matte ; hotspot-compiler- >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net >> Subject: Re: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java >> fails with -Xcomp -XX:TieredStopAtLevel=1 >> >> Hi Fairoz, >> >> Looks good, except for the missing space in "if(testJavaOpts...". >> >> thanks, >> >> Chris >> >> On 7/7/20 7:49 AM, Fairoz Matte wrote: >>> Hi, >>> >>> Please review this small test change to consider the scenario when there is no >> "printmdo" output >>> >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8236042 >>> Webrev - http://cr.openjdk.java.net/~fmatte/8236042/webrev.00/ >>> >>> Thanks, >>> Fairoz >> From fairoz.matte at oracle.com Sat Jul 11 03:10:54 2020 From: fairoz.matte at oracle.com (Fairoz Matte) Date: Fri, 10 Jul 2020 20:10:54 -0700 (PDT) Subject: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java fails with -Xcomp -XX:TieredStopAtLevel=1 In-Reply-To: <37bb4585-21b5-a6e7-9ee1-88ccf9be0914@oracle.com> References: <2abe9fba-e958-4b34-9f92-6bb8d8478f4e@default> <70057c31-e535-f03a-391d-d181b2ec150b@oracle.com> <958fecdf-d7a1-4b22-835e-a75fadda0a84@default> <37bb4585-21b5-a6e7-9ee1-88ccf9be0914@oracle.com> Message-ID: Thanks Vladimir for the review. Thanks for mentioning the reasons for MDO's not being generated, I have added them as comment in bug for future reference. Thanks, Fairoz > -----Original Message----- > From: Vladimir Kozlov > Sent: Saturday, July 11, 2020 5:36 AM > To: Fairoz Matte ; Chris Plummer > ; hotspot-compiler-dev at openjdk.java.net; > serviceability-dev at openjdk.java.net > Subject: Re: RFR(s): 8236042: [TESTBUG] serviceability/sa/ClhsdbCDSCore.java > fails with -Xcomp -XX:TieredStopAtLevel=1 > > Fix is good. > > I think next are reasons you don't get MDO in this scenario. > > Tier1 (C1 compilation) does not generate profiling code and does not created > MDO. C1 request MDO only with tiers 2 and 3 [1][2]. > > With -Xcomp flag a Java method is not executed in Interpreter but requests its > compilation and waits when it is finished. As result MDO is not created in > Interpreter too. May be late if a method is deoptimized it will be executed in > Interpreter and MDO will be created. > > Thanks, > Vladimir > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/796c9fa50850/src/hotspot/share/c1/c1_ > Compilation.hpp#l226 > [2] > http://hg.openjdk.java.net/jdk/jdk/file/796c9fa50850/src/hotspot/share/c1/c1_ > Compilation.cpp#l381 > > On 7/7/20 8:47 PM, Fairoz Matte wrote: > > Thanks Chris, for the review comments. > > > > I have updated the suggested change. > > > > Thanks, > > Fairoz > > > >> -----Original Message----- > >> From: Chris Plummer > >> Sent: Wednesday, July 8, 2020 3:38 AM > >> To: Fairoz Matte ; hotspot-compiler- > >> dev at openjdk.java.net; serviceability-dev at openjdk.java.net > >> Subject: Re: RFR(s): 8236042: [TESTBUG] > >> serviceability/sa/ClhsdbCDSCore.java > >> fails with -Xcomp -XX:TieredStopAtLevel=1 > >> > >> Hi Fairoz, > >> > >> Looks good, except for the missing space in "if(testJavaOpts...". > >> > >> thanks, > >> > >> Chris > >> > >> On 7/7/20 7:49 AM, Fairoz Matte wrote: > >>> Hi, > >>> > >>> Please review this small test change to consider the scenario when > >>> there is no > >> "printmdo" output > >>> > >>> JBS - https://bugs.openjdk.java.net/browse/JDK-8236042 > >>> Webrev - http://cr.openjdk.java.net/~fmatte/8236042/webrev.00/ > >>> > >>> Thanks, > >>> Fairoz > >> From aph at redhat.com Sat Jul 11 08:54:07 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 11 Jul 2020 09:54:07 +0100 Subject: 8248830 : RFR[S] : C2 : Rotate API intrinsification for X86 In-Reply-To: <9ad508ae-bf73-d4d7-ff2b-d1f4280adeff@oracle.com> References: <9ad508ae-bf73-d4d7-ff2b-d1f4280adeff@oracle.com> Message-ID: On 10/07/2020 18:32, Vladimir Ivanov wrote: > High-level comment: so far, there were no pressing need in > explicitly marking the methods as intrinsics. ROR/ROL instructions > were selected during matching [1]. Now the patch introduces > dedicated nodes (RotateLeft/RotateRight) specifically for intrinsics > which partly duplicates existing logic. The lack of rotate nodes in the IR has always meant that AArch64 doesn't generate optimal code for e.g. (Set dst (XorL reg1 (RotateLeftL reg2 imm))) because, with the RotateLeft expanded to its full combination of ORs and shifts, it's to complicated to match. At the time I put this to one side because it wasn't urgent. This is a shame because although such combinations are unusual they are used in some crypto operations. If we can generate immediate-form rotate nodes early by pattern matching during parsing (rather than depending on intrinsics) we'll get more value than by depending on programmers calling intrinsics. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From david.holmes at oracle.com Mon Jul 13 04:07:35 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 13 Jul 2020 14:07:35 +1000 Subject: RFR(XS) 8248671: AArch64: Remove unused variables In-Reply-To: References: Message-ID: <1c652b56-2476-ede0-47f8-13c4e99639d0@oracle.com> Hi Bernhard, On 10/07/2020 7:08 am, Bernhard Urban-Forster wrote: > Hello everyone, > > > please review this change: > > JBS: https://bugs.openjdk.java.net/browse/JDK-8248671 > Webrev: http://cr.openjdk.java.net/~burban/8248671_unused-vars/ > > We found this issue while bringing up Windows+AArch64 support for HotSpot. The Microsoft toolchain (MSVC) seems to be slightly more pedantic than GCC. Looks good and trivial. But could I request that webrevs/patches for mainline be generated against the mainline hg repository rather than the git mirror. Thanks, David > > Thanks, > -Bernhard > From jamsheed.c.m at oracle.com Mon Jul 13 05:44:02 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Mon, 13 Jul 2020 11:14:02 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> Message-ID: <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> Hi, I reworked the fix. I compute offset for all init captures stores, but treats this special init captured stores similar to unsafe(as these objects are usually GlobalEscape and doesn't have any perf implications). revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ testing: mach1-5( logs in jbs) Best regards, Jamsheed On 09/07/2020 19:36, Jamsheed C M wrote: > > Hi, > > request to hold the review. need to change the code for dealing with > unsafe access. as current capture code go for more execution time > analyzing things. > > Best regards, > > Jamsheed > > On 09/07/2020 13:01, Jamsheed C M wrote: >> >> Hi all, >> >> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >> >> Request for review changes made to offset computation and field write >> detection for init captured stores due to phis addition between alloc >> and init. This happen if init node in different outer loop wrt to >> alloc node and there is a loop opt.? This was required as a result of >> enhancement [1]. >> >> Normally init are not associated with multiple alloc node during EA >> phase, but changes done for [1] caused the code shapes of the form >> [2]? to generate inits associated with multiple alloc node. >> >> This had implication in offset computation and field write detection >> related to initializing stores. >> >> Attempt to fix in EA: >> >> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >> >> Alternate fix: >> >> ???? Minimize the scenario in compiler generated code by throwing >> only j.l.Error from slowpath(all exception async/sync are handled in >> runtime exit). >> >> ???? Stub epilog doesn't poll or throw any exceptions. Disable full >> loop opt before EA for detectable patterns and bailout EA for late >> detected patterns. >> >> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >> >> Please advice. >> >> Testing : mach tier1-5 (logs in jbs) >> >> Best regards, >> >> Jamsheed >> >> >> [1] JDK-8231291 C2: >> loop opts before EA should maximally unroll loops >> >> [2] that have its init node in different outer loop wrt to alloc node. >> >> >> loop begin >> >> ?? try{ >> >> ?? return new obj()/? throw new obj()/ uncommon trap after >> allocation, in a loop >> >> ?? } catch(ex) { >> >> ?? } >> >> loop end >> >> 42 public static IntA test(int n) { >> 43 for (int i=0; i<2; i++) { >> 44 try { >> 45 return new IntA(n + i); >> 46 } catch (Exception e) { >> 47 } >> 48 } >> 49 >> From richard.reingruber at sap.com Mon Jul 13 06:42:13 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Mon, 13 Jul 2020 06:42:13 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, thanks for looking at this! And my apologies for taking that long... So here is the new webrev.6 Webrev.6: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/ Delta: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.inc/ I spent most of the time running a microbenchmark [1] I wrote to answer questions from your review. At first I had trouble with variance in the results until I found out it was due to the NUMA architecture of the server I used. After that I noticed that there was a performance regression of about 5% even at low agent activity. I finally found out that it was due to the implementation of JavaThread::wait_for_object_deoptimization() which is called by the target of the JVMTI operation to self suspend for object deoptimization. I fixed this by adding limited spinning before calling wait() on the monitor. The delta includes many changes in comments, renaming of names, etc. So I'd like to summarize functional changes: * Collected all the code for the testing feature DeoptimizeObjectsALot in compileBroker.cpp and reworked it. With DeoptimizeObjectsALot enabled internal threads are started that deoptimize frames and objects. The number of threads started are given with DeoptimizeObjectsALotThreadCountAll and DeoptimizeObjectsALotThreadCountSingle. The former targets all existing threads whereas the latter operates on a single thread selected round robin. I removed the mode where deoptimizations were performed at every nth exit from the runtime. I never used it. * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and execute it always independently of is_thread_fully_suspended(). * Bugfix in EscapeBarrier::thread_added(): must not clear deopt flag. Found this testing with DeoptimizeObjectsALot. * Added EscapeBarrier::thread_removed(). * EscapeBarrier constructors: barriers can now be entirely disabled by disabling DoEscapeAnalysis. This effectively disables the enhancement. * JavaThread::wait_for_object_deoptimization(): - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the safepoint check! This caused issues with not walkable stacks with DeoptimizeObjectsALot. - Added limited spinning inspired by HandshakeSpinYield to fix regression in microbenchmark [1] I refer to some more changes answering your questions and comments inline below. Thanks, Richard. [1] Microbenchmark: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/ > Hi Richard, > > I had a look at your change. It's complex, but not that big. > A lot of code is just passing info through layers of abstraction. Also it leverages preexisting functionality like materialization of virtual objects in non-top frames (see materializeVirtualObjects). > Also, one can tell this went through some iterations by now, > I think it's very well engineered. > I had a look at webrev.05 > > Unfortunately > "8242425: JVMTI monitor operations should use Thread-Local Handshakes" > breaks webrev.05. > I updated to before that change and took that as base of my review. > > I see four parts of the change that can be looked at > rather individually. > > * Refactoring the scopeDesc constructors. Trivial. > * Persisting information about the optimizations done by the compilers. > Large and mostly trivial. > * Deoptimizing. The most complicated part. Really well abstracted, though. > * DeoptimizeObjectsALot for testing and the tests. > > Review of compiler changes: > > I understand you annotate at safepoints where the escape analysis > finds out that an object is "better" than global escape. > This are the cases where the analysis identifies optimization > opportunities. These annotations are then used to deoptimize > frames and the objects referenced by them. > Doesn't this overestimate the optimized > objects? E.g., eliminate_alloc_node has many cases where it bails > out. Yes, the implementation is conservative, but it is comparatively simple and the additional debug info is just 2 flags per safepoint. On the other hand, those JVMTI operations that really trigger deoptimizations are expected to be comparatively infrequent such that switching to the interpreter for a few microseconds will hardly have an effect. I've done microbenchmarking to check this. http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/ I found that in the worst case performance can be impacted by 10%. If the agent is extremely active and does relevant JVMTI calls like GetOwnedMonitorStackDepthInfo() every millisecond or more often, then the performance impact can be 30%. But I would think that this is not realistic. These calls are issued in interactive sessions to analyze deadlocks. We could get more precise deoptimizations by adding a third flag per safepoint for ea-local objects among the owned monitors. This would help improve the worst case in the benchmark. But I'm not convinced, if it is worth it. Refer to the README.txt of the microbenchmark for a more detailled discussion. > c1_IR.hpp > > OK, nothing to do for C1, just adapt to extended method signature. > > Break line once more so that it matches above line length. Done. > ciEnv.h|cpp > > Pass through another jvmti capability. Trivial & good. > > > debugInfoRec.hpp > > Pass through escape info that must be recorded. OK. > > pcDesc.hpp > > I would like to see some documentation of the methods. > > Maybe: > // There is an object in the scope that does not escape globally. > // It either does not escape at all or it escapes as arguemnt. > and > // One of the arguments is an object that is not globally visible > // but escapes to the callee. Done. I didn't take your text, though, because I only noticed it after writing my own. Let me know if you are not ok with it. > scopeDesc.cpp > > Besides refactoring copy escape info from pcDesc to scopeDesc > and add accessors. Trivial. > > In scopeDesc.hpp you talk about NoEscape and ArgEscape. > This are opto terms, but scopeDesc is a shared datastructure > that does not depend on a specific compiler. > Please explain what is going on without using these terms. Actually these are not too opto specific terms. They are used in the paper referenced in escape.hpp. Also you can easily google them. I'd rather keep the comments as they are. > jvmciCodeInstaller.cpp > > OK, nothing for JVMCI. Here support for Object Optimizations > for JVMCI compilers could be added. Leave this to graal people. > > callnode.hpp > > You add functionality to annotate callnodes with escape information > This is carried through code generation to final output where it is > added to the compiled methods meta information. > > At Safepoints in general jvmti can access > - Objects that were scalar replaced. They must be reallocated. > (Flag EliminateAllocations) > - Objects that should be locked but are not because they never > escape the thread. They need to be relocked. > > At calls, Objects where locks have been removed escape to callees. > We must persist this information so that if jvmti accesses the > object in a callee, we can determine by looking at the caller that > it needs to be relocked. Note that the ea-optimization must not be at the current location, it can also follow when control returns to the caller. Lock elimination isn't the only relevant optimization. Accesses to instance members or array elements can be optimized as well. > A side comment: > I think the flage handling in Opto is not very intuitive. > DoEscapeAnalysis depends on the jvmti capabilities. > This makes no sense. It is only an analysis. The optimizations > should depend on the jvmti capabilities. > The correct setup would be to handle this in > CompilerConfig::ergo_initialize(): > If the jvmti capabilities allow, enable the optimizations > EliminateAllocations or EliminateLocks/EliminateNestedLocks. > If one of these optimizations is on, enable EscapeAnalysis. > -- end side comment. > > So I would propose the following comments: > > // In the scope of this safepoints there are objects > // that do not globally escape. They are either NoEscape or > // ArgEscape. As such, they might be subject to optimizations. > // Persist this information here so that the frame an the > // Objects in scope can > // be deoptimized if jvmti accesses an object at this safepoint. > void set_not_global_escape_in_scope(bool b) { > > // This call passes objects that do not globally escape > // to its callee. The object might be subject to optimization, > // e.g. a lock might be omitted. Persist this information here > // so that on a jvmti access to the callee frame we can deoptimize > // the object and this frame. > void set_arg_escape(bool f) { _arg_escape = f; } I do not really like these comments. They are too verbose and do not match the comment style of the surrounding code. The names are descriptive enough IMO. Also the measures taken depending on the flags should be commented at the locations, where the flags are read. > Actuall I am not sure whether the name of these fields (and all > the others in the course of this change) should refer to > escape analysis. I think the term "Object deoptimization" > you also use is much better. You could call these properties > (througout the whole change) > set_optimized_objects_in_scope() > and > set_passes_optimized_objects(). > > I think this would make the whole matter much easier > to understand. I'd prefer the current names. They are closer to established terminology. And it is actually unknown, if optimizations based on their escape state exist. > Anyways, locks can already be removed without running > escape analysis at all. C2 recognizes some local patterns > that allow this. > > escape.h|cpp > > The code looks good. > > Line 325: The comment could be a bit more elaborate: > // Annotate at safepoints if they have <= ArgEscape objects in their > // scope. Additionally, if the safepoint is a java call, annotate > // whether it passes ArgEscape objects as parameters. > > And maybe add these comments?: > > // Returns true if an oop in the scope of sfn does not escape > // globally. > bool ConnectionGraph::has_not_global_escape_in_scope(SafePointNode* sfn) { > > // Returns true if at least one of the arguments to the call is an oop > // that does not escape globally. > bool ConnectionGraph::has_arg_escape(CallJavaNode* call) { IMHO the method names are descriptive and don't need the comments. But I give in :) (only replaced "oop" with "object") > General question: > You collect the information you want to annotate to the > method during escape analysis. > Don't you overestimate the optimized objects by this? > E.g. elimination of allocations does bail out for > various reasons. At the end, no optimization might > have happened, but then during runtime the frame is > deoptimized nevertheless. Please see statements and worst case microbenchmark above. > machnode.hpp: > > Extends MachSafePointNode similar to the ideal version. Good. > > matcher.cpp > > Copy info from ideal to mach node. good. > > output.cpp > > Now finally the information is written to the > debug info. Good. > > --------------------------------------------------------- > > So now let's have a look at the runtime part (including > relaxing constraints to escape analysis): > > rootResolver.cpp > > Adapt to changed interface. good. > > c2compiler.cpp / macro.cpp > > Make EscpaeAnlysis independent of jvmti capabilities. Good. > > jvmtiEnv.cpp/jvmtiEnvBase.cpp > > You add deoptimization of objects where they are > accessed. good. > > jvmtiImpl.cpp > > In deoptimize_objects, you check for DoEscapeAnalysis. > This is correct given the current design of the flag > handling in the compiler. > It's not really nice to have a dependency to C2 here, > though. I understand it's an optimization, the code > could be run anyways, it would check but not find > anything. But actually I would excpect dependencies > on EliminateLocks and EliminateAllocations (if they > were set according to jvmti capabilitiers as I elaborated > above.) > Would it make sense to protect the ArgEscape > loop by if (EliminateLocks)? You are right, it is not correct how flags are checked. Especially if only running with the JVMCI compiler. I changed Deoptimization::deoptimize_objects_internal() to make reallocation and relocking dependent on similar checks as in Deoptimization::fetch_unroll_info_helper(). Furthermore EscapeBarriers are conditionally activated depending on the following (see EscapeBarrier ctors): JVMCI_ONLY(UseJVMCICompiler) NOT_JVMCI(false) COMPILER2_PRESENT(|| DoEscapeAnalysis) So the enhancement can be practically completely disabled by disabling DoEscapeAnalysis, which is what C2 currently does if JVMTI capabilities that allow access to local references are taken. > jvmtiTagMap.cpp > > Deoptimize for jvmti operations. Good. > > deoptimization.cpp > > I guess this is the core of your work. > > > You add a new mode that just deoptimizes objects but not frames. > Good idea. You have to use reallocated objects in upper frames, > or by jvmti accesses to inner frames, which can not easily be > replaced by interpreter frames. > This way you can wait with replacing the frame until just before > execution returns. > > eliminate_allocations(): > (Strange method name, should at least be in past tense, even > better reallocate_eliminated_allocations() or > allocate_scalarized_objects(). Confused me until > I groked the code. Legacy though, not your business.) I still don't grok the name... ;) but it's preexisting as you noted > It's not that nice to return whether you only deoptimized > objects by the boolean reference argument. After all, > it again depends on the mode you pass in. > A different design would be to clone the method and > have an eliminate_allocations_no_unpack() variant, but that would > not be better as some code would be duplicated. > Maybe a comment for argument eliminate_allocations: > // deoptimized_objects is set to true if objects were deoptimized > // but not the frame. It is unchanged if there are no objects to > // be deoptimized, or if the frame was deoptim I agree: duplicating the code would be really bad, but I don't think that having reference parameters is not nice. I think it is a common pattern, if you return an error code and additional result data. The variable is a minor detail. With the meaningful name it is not necessary to document it. In my eyes it should be set independently of the exec_mode. I didn't do it to make the change smaller. > Similar for eliminate_locks(): > // deoptimized_objects is set to true if objects were relocked, > // else it is left unchanged. > > You reuse and extend the existing realloc/relock_objects, but extended it. > > deoptimize_objects_internal() > > Simple version of fetch_unroll_info_helper for EscapeBarrier. > Good. > I attributed the comment "Then relock objects if synchronization on them was eliminated." > to the if() just below. Add an empty line to make clear the comment > refers to the next 10 lines. > Alternatively, replace the whole comment by > // At first, reallocate the non-escaping objects and restore their fields > // so they are available for relocking. > And add > // Now relock objects with eliminated locks. > befor the if ((DoEscape... below. I went for the latter. > In fetch_unroll_info_helper, I don't understand why you need > && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { > for eliminated locks, but not for skalar replaced objects? In short reallocation is idempotent, relocking is not. Without the enhancement Deoptimization::realloc_objects() can already be called more than once for a frame: First call in materializeVirtualObjects() (also iterateFrames()). Second (indirect) call in fetch_unroll_info_helper(). The objects from the first call are saved as jvmti deferred updates when realloc_objects() returns. Note that there is no relationship to jvmti. The thing in common is that updates cannot be directely installed into a compiled frame, it is necessary to deoptimize the frame and defer the updates until the compiled frame gets replaced. Every time the vframes corresponding to the owner frame are iterated, they get the deferred updates. So in fetch_unroll_info_helper() the GrowableArray* chunk reference them too. All references to the objects created by the second (indirect) call to realloc_objects() are never used, because compiledVFrame accessors to locals, expressions, and monitors override them with the deferred updates. The objects become unreachable and get gc'ed. materializeVirtualObjects() does not bother with relocking. deoptimize_objects_internal(), which is introduced by the enhancement, does relock objects, after all the lock elimination becomes illegal with the change in escape state. Relocking twice does not work, so the enhancement avoids it by checking EscapeBarrier::objs_are_deoptimized(thread, deoptee.id()). Note that materializeVirtualObjects() can be called more than once and will always return the very same objects, even though it calls realloc_objects() again. > I would guess it is because the eliminated locks can be applied to > argEscape, but scalar replacement only to noescape objects? > I.e. it might have been done before? > > But why isn't this the case for eliminate_allocations? > deoptimize_objects_internal does both unconditionally, > so both can happen to inner frames, right? Sorry, I don't quite understand. Hope the explanation above helps. > relock_objects() > > Ok, you need to undo biased locking. Also, you remember the > lock nesting for later relocking if waiting for lock. > > revoke_for_object_deoptimization() > I like if boolean operators are at the beginning of broken lines, > but I think hotspot convention is to have them at the end. Ok, fixed. > Code will get much more simple if BiasedLocking is removed. > > EscapeBarrier:: ... > > (This class maybe would qualify for a file of its own.) > > deoptimize_objects() > I would mention escape analysis only as side remark. Also, as I understand, > there is only one frame at given depth? > // Deoptimize frames with optimized objects. This can be omitted locks and > // objects not allocated but replaced by scalars. In C2, these optimizations > // are based on escape analysis. > // Up to depth, deoptimize frames with any optimized objects. > // From depth to entry_frame, deoptimize only frames that > // pass optimized objects to their callees. > (First part similar for the comment above EscapeBarrier::deoptimize_objects_internal().) I've reworked the comment. Let me know if you still think it needs to be improved. > > What is the check (cur_depth <= depth) good for? Can you > ever walk past entry_frame? Yes (assuming you mean the outer while-statement), there are java frames beyond the entry frame if a native method calls java methods again. So we visit all frames up to the given depth and from there we continue to the entry frame. It is not necessary to continue beyond that entry frame, because escape analysis assumes that arguments to native functions escape globally. Example: Let the java stack look like this: +---------+ | Frame A | +---------+ | Frame N | +---------+ | Frame B | +---------+ <- top of stack Where java method A calls native method N and N calls java method B. Very simplified the native stack will look like this +-------------------------+ | Frame of JIT Compiled A | +-------------------------+ | Frame N | +-------------------------+ | Entry Frame | +-------------------------+ | Frame B | +-------------------------+ <- top of stack The entry frame is an activation of the call stub, which is a small assembler routine that translates from the native calling convention to the java calling convention. There cannot be any ArgEscape that is passed to B (see above), therefore we can stop the stackwalk at the entry frame if depth is 1. If depth is 3 we have to continue to Frame A, as it is directely accessed. > Isn't vf->is_compiled_frame() prerequisite that "Move to next physical frame" > is needed? You could move it into the other check. > If so, similar for deoptimize_objects_all_threads(). Only compiledVFrame require moving to the /top/ frame. Fixed. > Syncronization: looks good. I think others had a look at this before. > > EscapeBarrier::deoptimize_objects_internal() > The method name is misleading, it is not used by > deoptimize_objects(). > Also, method with the same name is in Deopitmization. > Proposal: deoptimize_objects_thread() ? Sorry, but I don't see, why it would be misleading. What would be the meaning of 'deoptimize_objects_thread'? I don't understand that name. > C1 stubs: this really shows you tested all configurations, great! > > > mutexLocker: ok. > objectMonitor.cpp: ok > stackValue.hpp Is this missing clearing a bug? In short: that change is not needed anymore. I'll remove it again. Details: it is not a real bug, but the assertion in vframeArrayElement::fill_in() was triggered: assert(!value->obj_is_scalar_replaced() || realloc_failures) failed: object should be reallocated already. But only with the first version of the enhancement (webrev.0), were objects were only reallocated when replacing a compiled frame with equivalent interpreter frames iff virtual objects where not reallocated before. I changed this after prexisting code was refactored (JDK-8226705), because practically never already reallocated objects exist and if there should be any, it does not harm to reallocate again, because the unnecessarily allocated objects become immediately garbage and last but not least no tricky synchronization is required. Also that's what happens with the preexisting code if virtual objects are materialized with materializeVirtualObjects(). > > thread.hpp > > I would remove "_ea" from the flag and method names. Done. > > Renaming deferred_locals to deferred_updates is good, as well as > adding a datastructure for it. > (Adding this data structure might be a breakout, too.) > > good. > > thread.cpp > > good. > > vframe.cpp > > Is this a bug in existing code? > Makes sense. Depends on your definition of bug. There are no references to vframe::is_entry_frame() in the existing code. I would think it is a bug. > > vframe_hp.hpp > (What stands _hp for? helper? The file should be named compiledVFrame ...) > > not_global_escape_in_scope() ... > Again, you mention escape analysis here. Comments above hold, too. I think it is the right name, because it is meaningful and simple. > You introduce JvmtiDeferredUpdates. Good. > > vframe_hp.cpp > > Changes for JvmtiDeferredUpdates, escape state accessors, > > line 422: > Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here? > > > macros.hpp > Good. > > > Test coding > ============ > > compileBroker.h|cpp > > You introduce a third class of threads handled here and > add a new flag to distinguish it. Before, the two kinds > of threads were distinguished implicitly by passing in > a compiler for compiler threads. > The new thread kind is only used for testing in debug. > > make_thread: > You could assert (comp != NULL...) to assure previous > conditions. If replaced the if-statements with a switch-statement, made sure all enum-elements are covered, and added the assertion you suggested. > line 989 indentation broken You are referring to this block I assume: (from http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/src/hotspot/share/compiler/compileBroker.cpp.frames.html) 976 if (MethodFlushing) { 977 // Initialize the sweeper thread 978 Handle thread_oop = create_thread_oop("Sweeper thread", CHECK); 979 jobject thread_handle = JNIHandles::make_local(THREAD, thread_oop()); 980 make_thread(sweeper_t, thread_handle, NULL, NULL, THREAD); 981 } 982 983 #if defined(ASSERT) && COMPILER2_OR_JVMCI 984 if (DeoptimizeObjectsALot == 2) { 985 // Initialize and start the object deoptimizer threads 986 for (int thread_count = 0; thread_count < DeoptimizeObjectsALotThreadCount; thread_count++) { 987 Handle thread_oop = create_thread_oop("Deoptimize objects a lot thread", CHECK); 988 jobject thread_handle = JNIHandles::make_local(THREAD, thread_oop()); 989 make_thread(deoptimizer_t, thread_handle, NULL, NULL, THREAD); 990 } 991 } 992 #endif // defined(ASSERT) && COMPILER2_OR_JVMCI I cannot really see broken indentation here. Am I looking at the wrong location? > escape.cpp > > You enable the optimization in case of testruns. good. > > whitebox.cpp ok. > > deoptimization.cpp > > deoptimize_objects_alot_loop() Good. > > globals.hpp > > Nice docu of flags, but pleas mention "for testing purposes" > or the like in DeoptimizeObjectsALot. > I would place the flags next to each other. > > interfaceSupport.cpp: good. Thanks! :) -----Original Message----- From: Lindenmaier, Goetz Sent: Mittwoch, 6. Mai 2020 12:28 To: Reingruber, Richard ; Doerr, Martin ; 'Robbin Ehn' ; David Holmes ; Vladimir Kozlov (vladimir.kozlov at oracle.com) ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, I had a look at your change. It's complex, but not that big. A lot of code is just passing info through layers of abstraction. Also, one can tell this went through some iterations by now, I think it's very well engineered. I had a look at webrev.05 Unfortunately "8242425: JVMTI monitor operations should use Thread-Local Handshakes" breaks webrev.05. I updated to before that change and took that as base of my review. I see four parts of the change that can be looked at rather individually. * Refactoring the scopeDesc constructors. Trivial. * Persisting information about the optimizations done by the compilers. Large and mostly trivial. * Deoptimizing. The most complicated part. Really well abstracted, though. * DeoptimizeObjectsALot for testing and the tests. Review of compiler changes: I understand you annotate at safepoints where the escape analysis finds out that an object is "better" than global escape. This are the cases where the analysis identifies optimization opportunities. These annotations are then used to deoptimize frames and the objects referenced by them. Doesn't this overestimate the optimized objects? E.g., eliminate_alloc_node has many cases where it bails out. c1_IR.hpp OK, nothing to do for C1, just adapt to extended method signature. Break line once more so that it matches above line length. ciEnv.h|cpp Pass through another jvmti capability. Trivial & good. debugInfoRec.hpp Pass through escape info that must be recorded. OK. pcDesc.hpp I would like to see some documentation of the methods. Maybe: // There is an object in the scope that does not escape globally. // It either does not escape at all or it escapes as arguemnt. and // One of the arguments is an object that is not globally visible // but escapes to the callee. scopeDesc.cpp Besides refactoring copy escape info from pcDesc to scopeDesc and add accessors. Trivial. In scopeDesc.hpp you talk about NoEscape and ArgEscape. This are opto terms, but scopeDesc is a shared datastructure that does not depend on a specific compiler. Please explain what is going on without using these terms. jvmciCodeInstaller.cpp OK, nothing for JVMCI. Here support for Object Optimizations for JVMCI compilers could be added. Leave this to graal people. callnode.hpp You add functionality to annotate callnodes with escape information This is carried through code generation to final output where it is added to the compiled methods meta information. At Safepoints in general jvmti can access - Objects that were scalar replaced. They must be reallocated. (Flag EliminateAllocations) - Objects that should be locked but are not because they never escape the thread. They need to be relocked. At calls, Objects where locks have been removed escape to callees. We must persist this information so that if jvmti accesses the object in a callee, we can determine by looking at the caller that it needs to be relocked. A side comment: I think the flage handling in Opto is not very intuitive. DoEscapeAnalysis depends on the jvmti capabilities. This makes no sense. It is only an analysis. The optimizations should depend on the jvmti capabilities. The correct setup would be to handle this in CompilerConfig::ergo_initialize(): If the jvmti capabilities allow, enable the optimizations EliminateAllocations or EliminateLocks/EliminateNestedLocks. If one of these optimizations is on, enable EscapeAnalysis. -- end side comment. So I would propose the following comments: // In the scope of this safepoints there are objects // that do not globally escape. They are either NoEscape or // ArgEscape. As such, they might be subject to optimizations. // Persist this information here so that the frame an the // Objects in scope can // be deoptimized if jvmti accesses an object at this safepoint. void set_not_global_escape_in_scope(bool b) { // This call passes objects that do not globally escape // to its callee. The object might be subject to optimization, // e.g. a lock might be omitted. Persist this information here // so that on a jvmti access to the callee frame we can deoptimize // the object and this frame. void set_arg_escape(bool f) { _arg_escape = f; } Actuall I am not sure whether the name of these fields (and all the others in the course of this change) should refer to escape analysis. I think the term "Object deoptimization" you also use is much better. You could call these properties (througout the whole change) set_optimized_objects_in_scope() and set_passes_optimized_objects(). I think this would make the whole matter much easier to understand. Anyways, locks can already be removed without running escape analysis at all. C2 recognizes some local patterns that allow this. escape.h|cpp The code looks good. Line 325: The comment could be a bit more elaborate: // Annotate at safepoints if they have <= ArgEscape objects in their // scope. Additionally, if the safepoint is a java call, annotate // whether it passes ArgEscape objects as parameters. And maybe add these comments?: // Returns true if an oop in the scope of sfn does not escape // globally. bool ConnectionGraph::has_not_global_escape_in_scope(SafePointNode* sfn) { // Returns true if at least one of the arguments to the call is an oop // that does not escape globally. bool ConnectionGraph::has_arg_escape(CallJavaNode* call) { General question: You collect the information you want to annotate to the method during escape analysis. Don't you overestimate the optimized objects by this? E.g. elimination of allocations does bail out for various reasons. At the end, no optimization might have happened, but then during runtime the frame is deoptimized nevertheless. machnode.hpp: Extends MachSafePointNode similar to the ideal version. Good. matcher.cpp Copy info from ideal to mach node. good. output.cpp Now finally the information is written to the debug info. Good. --------------------------------------------------------- So now let's have a look at the runtime part (including relaxing constraints to escape analysis): rootResolver.cpp Adapt to changed interface. good. c2compiler.cpp / macro.cpp Make EscpaeAnlysis independent of jvmti capabilities. Good. jvmtiEnv.cpp/jvmtiEnvBase.cpp You add deoptimization of objects where they are accessed. good. jvmtiImpl.cpp In deoptimize_objects, you check for DoEscapeAnalysis. This is correct given the current design of the flag handling in the compiler. It's not really nice to have a dependency to C2 here, though. I understand it's an optimization, the code could be run anyways, it would check but not find anything. But actually I would excpect dependencies on EliminateLocks and EliminateAllocations (if they were set according to jvmti capabilitiers as I elaborated above.) Would it make sense to protect the ArgEscape loop by if (EliminateLocks)? jvmtiTagMap.cpp Deoptimize for jvmti operations. Good. deoptimization.cpp I guess this is the core of your work. You add a new mode that just deoptimizes objects but not frames. Good idea. You have to use reallocated objects in upper frames, or by jvmti accesses to inner frames, which can not easily be replaced by interpreter frames. This way you can wait with replacing the frame until just before execution returns. eliminate_allocations(): (Strange method name, should at least be in past tense, even better reallocate_eliminated_allocations() or allocate_scalarized_objects(). Confused me until I groked the code. Legacy though, not your business.) It's not that nice to return whether you only deoptimized objects by the boolean reference argument. After all, it again depends on the mode you pass in. A different design would be to clone the method and have an eliminate_allocations_no_unpack() variant, but that would not be better as some code would be duplicated. Maybe a comment for argument eliminate_allocations: // deoptimized_objects is set to true if objects were deoptimized // but not the frame. It is unchanged if there are no objects to // be deoptimized, or if the frame was deoptim Similar for eliminate_locks(): // deoptimized_objects is set to true if objects were relocked, // else it is left unchanged. You reuse and extend the existing realloc/relock_objects, but extended it. deoptimize_objects_internal() Simple version of fetch_unroll_info_helper for EscapeBarrier. Good. I attributed the comment "Then relock objects if synchronization on them was eliminated." to the if() just below. Add an empty line to make clear the comment refers to the next 10 lines. Alternatively, replace the whole comment by // At first, reallocate the non-escaping objects and restore their fields // so they are available for relocking. And add // Now relock objects with eliminated locks. befor the if ((DoEscape... below. In fetch_unroll_info_helper, I don't understand why you need && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { for eliminated locks, but not for skalar replaced objects? I would guess it is because the eliminated locks can be applied to argEscape, but scalar replacement only to noescape objects? I.e. it might have been done before? But why isn't this the case for eliminate_allocations? deoptimize_objects_internal does both unconditionally, so both can happen to inner frames, right? relock_objects() Ok, you need to undo biased locking. Also, you remember the lock nesting for later relocking if waiting for lock. revoke_for_object_deoptimization() I like if boolean operators are at the beginning of broken lines, but I think hotspot convention is to have them at the end. Code will get much more simple if BiasedLocking is removed. EscapeBarrier:: ... (This class maybe would qualify for a file of its own.) deoptimize_objects() I would mention escape analysis only as side remark. Also, as I understand, there is only one frame at given depth? // Deoptimize frames with optimized objects. This can be omitted locks and // objects not allocated but replaced by scalars. In C2, these optimizations // are based on escape analysis. // Up to depth, deoptimize frames with any optimized objects. // From depth to entry_frame, deoptimize only frames that // pass optimized objects to their callees. (First part similar for the comment above EscapeBarrier::deoptimize_objects_internal().) What is the check (cur_depth <= depth) good for? Can you ever walk past entry_frame? Isn't vf->is_compiled_frame() prerequisite that "Move to next physical frame" is needed? You could move it into the other check. If so, similar for deoptimize_objects_all_threads(). Syncronization: looks good. I think others had a look at this before. EscapeBarrier::deoptimize_objects_internal() The method name is misleading, it is not used by deoptimize_objects(). Also, method with the same name is in Deopitmization. Proposal: deoptimize_objects_thread() ? C1 stubs: this really shows you tested all configurations, great! mutexLocker: ok. objectMonitor.cpp: ok stackValue.hpp Is this missing clearing a bug? thread.hpp I would remove "_ea" from the flag and method names. Renaming deferred_locals to deferred_updates is good, as well as adding a datastructure for it. (Adding this data structure might be a breakout, too.) good. thread.cpp good. vframe.cpp Is this a bug in existing code? Makes sense. vframe_hp.hpp (What stands _hp for? helper? The file should be named compiledVFrame ...) not_global_escape_in_scope() ... Again, you mention escape analysis here. Comments above hold, too. You introduce JvmtiDeferredUpdates. Good. vframe_hp.cpp Changes for JvmtiDeferredUpdates, escape state accessors, line 422: Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here? macros.hpp Good. Test coding ============ compileBroker.h|cpp You introduce a third class of threads handled here and add a new flag to distinguish it. Before, the two kinds of threads were distinguished implicitly by passing in a compiler for compiler threads. The new thread kind is only used for testing in debug. make_thread: You could assert (comp != NULL...) to assure previous conditions. line 989 indentation broken escape.cpp You enable the optimization in case of testruns. good. whitebox.cpp ok. deoptimization.cpp deoptimize_objects_alot_loop() Good. globals.hpp Nice docu of flags, but pleas mention "for testing purposes" or the like in DeoptimizeObjectsALot. I would place the flags next to each other. interfaceSupport.cpp: good. I'll look at the test themselves in an extra mail (learning from Martin ??) Best regards, Goetz. > -----Original Message----- > From: Reingruber, Richard > Sent: Wednesday, April 1, 2020 8:15 AM > To: Doerr, Martin ; 'Robbin Ehn' > ; Lindenmaier, Goetz > ; David Holmes ; > Vladimir Kozlov (vladimir.kozlov at oracle.com) ; > serviceability-dev at openjdk.java.net; hotspot-compiler- > dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Martin, > > > thanks for addressing all my points. I've looked over webrev.5 and I'm > satisfied with your changes. > > Thanks! > > > I had also promised to review the tests. > > Thanks++ > I appreciate it very much, the tests are many lines of code. > > > test/jdk/com/sun/jdi/EATests.java > > This is a substantial amount of tests which is appropriate for a such a large > change. Skipping some subtests with UseJVMCICompiler makes sense > because it doesn't provide the necessary JVMTI functionality, yet. > > Nice work! > > I also like that you test with and without BiasedLocking. Your tests will still > be fine after BiasedLocking deprecation. > > Hope so :) > > > Very minor nits: > > - 2 typos in comment above EARelockingNestedInflatedTarget: "lockes are > ommitted" (sounds funny) > > - You sometimes write "graal" and sometimes "Graal". I guess the capital G > is better. (Also in EATestsJVMCI.java.) > > > test/jdk/com/sun/jdi/EATestsJVMCI.java > > EATests with Graal enabled. Nice that you support Graal to some extent. > Maybe Graal folks want to enhance them in the future. I think this is a good > starting point. > > Will change this in the next webrev. > > > Conclusion: Looks good and not trivial :-) > > Now, you have one full review. I'd be ok with covering 2nd review by partial > reviews. > > Compiler and JVMTI parts are not too complicated IMHO. > > Runtime part should get at least one additional careful review. > > Thanks a lot, > Richard. > > -----Original Message----- > From: Doerr, Martin > Sent: Dienstag, 31. M?rz 2020 16:01 > To: Reingruber, Richard ; 'Robbin Ehn' > ; Lindenmaier, Goetz > ; David Holmes ; > Vladimir Kozlov (vladimir.kozlov at oracle.com) ; > serviceability-dev at openjdk.java.net; hotspot-compiler- > dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > in the Presence of JVMTI Agents > > Hi Richard, > > thanks for addressing all my points. I've looked over webrev.5 and I'm > satisfied with your changes. > > > I had also promised to review the tests. > > test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysis > Enabled.java > Thanks for updating the @summary comment. Looks good in webrev.5. > > test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnaly > sisEnabled.c > JVMTI agent for object tagging and heap iteration. Good. > > test/jdk/com/sun/jdi/EATests.java > This is a substantial amount of tests which is appropriate for a such a large > change. Skipping some subtests with UseJVMCICompiler makes sense > because it doesn't provide the necessary JVMTI functionality, yet. > Nice work! > I also like that you test with and without BiasedLocking. Your tests will still be > fine after BiasedLocking deprecation. > > Very minor nits: > - 2 typos in comment above EARelockingNestedInflatedTarget: "lockes are > ommitted" (sounds funny) > - You sometimes write "graal" and sometimes "Graal". I guess the capital G is > better. (Also in EATestsJVMCI.java.) > > test/jdk/com/sun/jdi/EATestsJVMCI.java > EATests with Graal enabled. Nice that you support Graal to some extent. > Maybe Graal folks want to enhance them in the future. I think this is a good > starting point. > > > Conclusion: Looks good and not trivial :-) > Now, you have one full review. I'd be ok with covering 2nd review by partial > reviews. > Compiler and JVMTI parts are not too complicated IMHO. > Runtime part should get at least one additional careful review. > > Best regards, > Martin > > > > -----Original Message----- > > From: Reingruber, Richard > > Sent: Montag, 30. M?rz 2020 10:32 > > To: Doerr, Martin ; 'Robbin Ehn' > > ; Lindenmaier, Goetz > > ; David Holmes > ; > > Vladimir Kozlov (vladimir.kozlov at oracle.com) > > ; serviceability-dev at openjdk.java.net; > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime- > > dev at openjdk.java.net > > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > > in the Presence of JVMTI Agents > > > > Hi, > > > > this is webrev.5 based on Robbin's feedback and Martin's review - thanks! :) > > > > The change affects jvmti, hotspot and c2. Partial reviews are very welcome > > too. > > > > Full: http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/ > > Delta: > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5.inc/ > > > > Robbin, Martin, please let me know, if anything shouldn't be quite as you > > wanted it. Also find my > > comments on your feedback below. > > > > Robbin, can I count you as Reviewer for the runtime part? > > > > Thanks, Richard. > > > > -- > > > > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp. > > > You can move both declaration and definition to that file, no need to > > clobber > > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) > > > > Done. > > > > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in > it's > > own > > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. > > > > I moved JvmtiDeferredUpdates to vframe_hp.hpp where preexisting > > jvmtiDeferredLocalVariableSet is > > declared. > > > > > src/hotspot/share/code/compiledMethod.cpp > > > Nice cleanup! > > > > Thanks :) > > > > > src/hotspot/share/code/debugInfoRec.cpp > > > src/hotspot/share/code/debugInfoRec.hpp > > > Additional parmeters. (Remark: I think "non_global_escape_in_scope" > > would read better than "not_global_escape_in_scope", but your version is > > consistent with existing code, so no change request from my side.) Ok. > > > > I've been thinking about this too and finally stayed with > > not_global_escape_in_scope. It's supposed > > to mean an object whose escape state is not GlobalEscape is in scope. > > > > > src/hotspot/share/compiler/compileBroker.cpp > > > src/hotspot/share/compiler/compileBroker.hpp > > > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into > > a follow up change together with the test in order to make this webrev > > smaller, but since it is included, I'm reviewing everything at once. Not a big > > deal.) Ok. > > > > Yes the change would be a little smaller. And if it helps I'll split it off. In > > general I prefer > > patches that bring along a suitable amount of tests. > > > > > src/hotspot/share/opto/c2compiler.cpp > > > Make do_escape_analysis independent of JVMCI capabilities. Nice! > > > > It is the main goal of the enhancement. It is done for C2, but could be done > > for JVMCI compilers > > with just a small effort as well. > > > > > src/hotspot/share/opto/escape.cpp > > > Annotation for MachSafePointNodes. Your added functionality looks > > correct. > > > But I'd prefer to move the bulky code out of the large function. > > > I suggest to factor out something like has_not_global_escape and > > has_arg_escape. So the code could look like this: > > > SafePointNode* sfn = sfn_worklist.at(next); > > > sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn)); > > > if (sfn->is_CallJava()) { > > > CallJavaNode* call = sfn->as_CallJava(); > > > call->set_arg_escape(has_arg_escape(call)); > > > } > > > This would also allow us to get rid of the found_..._escape_in_args > > variables making the loops better readable. > > > > Done. > > > > > It's kind of ugly to use strcmp to recognize uncommon trap, but that > seems > > to be the way to do it (there are more such places). So it's ok. > > > > Yeah. I copied the snippet. > > > > > src/hotspot/share/prims/jvmtiImpl.cpp > > > src/hotspot/share/prims/jvmtiImpl.hpp > > > The sequence is pretty complex: > > > VM_GetOrSetLocal element initialization executes EscapeBarrier code > > which suspends the target thread (extra VM Operation). > > > > Note that the target threads have to be suspended already for > > VM_GetOrSetLocal*. So it's mainly the > > synchronization effect of EscapeBarrier::sync_and_suspend_one() that is > > required here. Also no extra > > _handshake_ is executed, since sync_and_suspend_one() will find the > > target threads already > > suspended. > > > > > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by > VM > > Thread to prepare VM Operation with frame deoptimization). > > > VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor > > which resumes the target thread. > > > But I don't have any improvement proposal. Performance is probably not > a > > concern, here. So it's ok. > > > > > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it > > has non-globally escaping objects and other frames if they have arg > escaping > > ones. Good. > > > > It's not specifically the top frame, but the frame that is accessed. > > > > > src/hotspot/share/runtime/deoptimization.cpp > > > Object deoptimization. I have more comments and proposals, here. > > > First of all, handling recursive and waiting locks in relock_objects is tricky, > > but looks correct. > > > Comments are sufficient to understand why things are done as they are > > implemented. > > > > > BiasedLocking related parts are complex, but we may get rid of them in > the > > future (with BiasedLocking removal). > > > Anyway, looks correct, too. > > > > > Typo in comment: "regularily" => "regularly" > > > > > Deoptimization::fetch_unroll_info_helper is the only place where > > _jvmti_deferred_updates get deallocated (except JavaThread destructor). > > But I think we always go through it, so I can't see a memory leak or such > kind > > of issues. > > > > That's correct. The compiled frame for which deferred updates are > allocated > > is always deoptimized > > before (see EscapeBarrier::deoptimize_objects()). This is also asserted in > > compiledVFrame::update_deferred_value(). I've added the same assertion > > to > > Deoptimization::relock_objects(). So we can be sure that > > _jvmti_deferred_updates are deallocated > > again in fetch_unroll_info_helper(). > > > > > EscapeBarrier::deoptimize_objects: ResourceMark should use > > calling_thread(). > > > > Sure, well spotted! > > > > > You can use MutexLocker and MonitorLocker with Thread* to save the > > Thread::current() call. > > > > Right, good hint. This was recently introduced with 8235678. I even had to > > resolve conflicts. Should > > have done this then. > > > > > I'd make set_objs_are_deoptimized static and remove it from the > > EscapeBarrier interface because I think it shouldn't be used outside of > > EscapeBarrier::deoptimize_objects. > > > > Done. > > > > > Typo in comment: "we must only deoptimize" => "we only have to > > deoptimize" > > > > Replaced with "[...] we deoptimize iff local objects are passed as args" > > > > > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and > > barrier_active() is redundant. Implementation can get moved to hpp file. > > > > Ok. Done. > > > > > I'll get back to suspend flags, later. > > > > > There are weird cases regarding _self_deoptimization_in_progress. > > > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. > > C can set _self_deoptimization_in_progress while A performs the > handshake > > for suspending C. I think this doesn't lead to errors, but it's probably not > > desired. > > > I think it would be better to use only one "wait" call in > > sync_and_suspend_one and sync_and_suspend_all. > > > > You're right. We've discussed that face-to-face, but couldn't find a real > issue. > > But now, thinking again, a reckon I found one: > > > > 2808 // Sync with other threads that might be doing deoptimizations > > 2809 { > > 2810 // Need to switch to _thread_blocked for the wait() call > > 2811 ThreadBlockInVM tbivm(_calling_thread); > > 2812 MonitorLocker ml(EscapeBarrier_lock, > > Mutex::_no_safepoint_check_flag); > > 2813 while (_self_deoptimization_in_progress) { > > 2814 ml.wait(); > > 2815 } > > 2816 > > 2817 if (self_deopt()) { > > 2818 _self_deoptimization_in_progress = true; > > 2819 } > > 2820 > > 2821 while (_deoptee_thread->is_ea_obj_deopt_suspend()) { > > 2822 ml.wait(); > > 2823 } > > 2824 > > 2825 if (self_deopt()) { > > 2826 return; > > 2827 } > > 2828 > > 2829 // set suspend flag for target thread > > 2830 _deoptee_thread->set_ea_obj_deopt_flag(); > > 2831 } > > > > - A waits in 2822 > > - C is suspended > > - B notifies all in resume_one() > > - A and C wake up > > - C wins over A and sets _self_deoptimization_in_progress = true in 2818 > > - C does the self deoptimization > > - A executes 2830 _deoptee_thread->set_ea_obj_deopt_flag() > > > > C will self suspend at some undefined point. The resulting state is illegal. > > > > > I first thought it'd be better to move ThreadBlockInVM before wait() to > > reduce thread state transitions, but that seems to be problematic because > > ThreadBlockInVM destructor contains a safepoint check which we > shouldn't > > do while holding EscapeBarrier_lock. So no change request. > > > > Yes, would be nice to have the state change only if needed, but for the > > reason you mentioned it is > > not quite as easy as it seems to be. I experimented as well with a second > > lock, but did not succeed. > > > > > Change in thred_added: > > > I think the sequence would be more comprehensive if we waited for > > deopt_all_threads in Thread::start and all other places where a new thread > > can run into Java code (e.g. JVMTI attach). > > > Your version makes new threads come up with suspend flag set. That > looks > > correct, too. Advantage is that you only have to change one place > > (thread_added). It'll be interesting to see how it will look like when we use > > async handshakes instead of suspend flags. > > > For now, I'm ok with your version. > > > > I had a version that did what you are suggesting. The current version also > has > > the advantage, that > > there are fewer places where a thread has to wait for ongoing object > > deoptimization. This means > > viewer places where you have to worry about correct thread state > > transitions, possible deadlocks, > > and if all oops are properly Handle'ed. > > > > > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt- > > >is_hidden_from_external_view()). > > > > Done. > > > > > Having 4 different deoptimize_objects functions makes it a little hard to > > keep an overview of which one is used for what. > > > Maybe adding suffixes would help a little bit, but I can also live with what > > you have. > > > Implementation looks correct to me. > > > > 2 are internal. I added the suffix _internal to them. This leaves 2 to choose > > from. > > > > > src/hotspot/share/runtime/deoptimization.hpp > > > Escape barriers and object deoptimization functions. > > > Typo in comment: "helt" => "held" > > > > Done in place already. > > > > > src/hotspot/share/runtime/interfaceSupport.cpp > > > InterfaceSupport::deoptimizeAllObjects() is only used for > > DeoptimizeObjectsALot = 1. > > > I think DeoptimizeObjectsALot = 2 is more important, but I think it's not > bad > > to have DeoptimizeObjectsALot = 1 in addition. Ok. > > > > I never used DeoptimizeObjectsALot = 1 that much. It could be more > > deterministic in single threaded > > scenarios. I wouldn't object to get rid of it though. > > > > > src/hotspot/share/runtime/stackValue.hpp > > > Better reinitilization in StackValue. Good. > > > > StackValue::obj_is_scalar_replaced() should not return true after calling > > set_obj(). > > > > > src/hotspot/share/runtime/thread.cpp > > > src/hotspot/share/runtime/thread.hpp > > > src/hotspot/share/runtime/thread.inline.hpp > > > wait_for_object_deoptimization, suspend flag, deferred updates and test > > feature to deoptimize objects. > > > > > In the long term, we want to get rid of suspend flags, so it's not so nice to > > introduce a new one. But I agree with G?tz that it should be acceptable as > > temporary solution until async handshakes are available (which takes more > > time). So I'm ok with your change. > > > > I'm keen to build the feature on async handshakes when the arive. > > > > > You can use MutexLocker with Thread*. > > > > Done. > > > > > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class > > out of thread.hpp. > > > > Done. > > > > > src/hotspot/share/runtime/vframe.cpp > > > Added support for entry frame to new_vframe. Ok. > > > > > > > src/hotspot/share/runtime/vframe_hp.cpp > > > src/hotspot/share/runtime/vframe_hp.hpp > > > > > I think code()->as_nmethod() in not_global_escape_in_scope() and > > arg_escape() should better be under #ifdef ASSERT or inside the assert > > statement (no need for code cache walking in product build). > > > > Done. > > > > > jvmtiDeferredLocalVariableSet::update_monitors: > > > Please add a comment explaining that owner referenced by original info > > may be scalar replaced, but it is deoptimized in the vframe. > > > > Done. > > > > -----Original Message----- > > From: Doerr, Martin > > Sent: Donnerstag, 12. M?rz 2020 17:28 > > To: Reingruber, Richard ; 'Robbin Ehn' > > ; Lindenmaier, Goetz > > ; David Holmes > ; > > Vladimir Kozlov (vladimir.kozlov at oracle.com) > > ; serviceability-dev at openjdk.java.net; > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime- > > dev at openjdk.java.net > > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance > > in the Presence of JVMTI Agents > > > > Hi Richard, > > > > > > I managed to find time for a (almost) complete review of webrev.4. (I'll > > review the tests separately.) > > > > First of all, the change seems to be in pretty good quality for its significant > > complexity. I couldn't find any real bugs. But I'd like to propose minor > > improvements. > > I'm convinced that it's mature because we did substantial testing. > > > > I like the new functionality for object deoptimization. It can possibly be > > reused for future escape analysis based optimizations. So I appreciate > having > > it available in the code base. > > In addition to that, your change makes the JVMTI implementation better > > integrated into the VM. > > > > > > Now to the details: > > > > > > src/hotspot/share/c1/c1_IR.hpp > > describe_scope parameters. Ok. > > > > > > src/hotspot/share/ci/ciEnv.cpp > > src/hotspot/share/ci/ciEnv.hpp > > Fix for JvmtiExport::can_walk_any_space() capability. Ok. > > > > > > src/hotspot/share/code/compiledMethod.cpp > > Nice cleanup! > > > > > > src/hotspot/share/code/debugInfoRec.cpp > > src/hotspot/share/code/debugInfoRec.hpp > > Additional parmeters. (Remark: I think "non_global_escape_in_scope" > > would read better than "not_global_escape_in_scope", but your version is > > consistent with existing code, so no change request from my side.) Ok. > > > > > > src/hotspot/share/code/nmethod.cpp > > Nice cleanup! > > > > > > src/hotspot/share/code/pcDesc.hpp > > Additional parameters. Ok. > > > > > > src/hotspot/share/code/scopeDesc.cpp > > src/hotspot/share/code/scopeDesc.hpp > > Improved implementation + additional parameters. Ok. > > > > > > src/hotspot/share/compiler/compileBroker.cpp > > src/hotspot/share/compiler/compileBroker.hpp > > Extra thread for DeoptimizeObjectsALot. (Remark: I would have put it into a > > follow up change together with the test in order to make this webrev > > smaller, but since it is included, I'm reviewing everything at once. Not a big > > deal.) Ok. > > > > > > src/hotspot/share/jvmci/jvmciCodeInstaller.cpp > > Additional parameters. Ok. > > > > > > src/hotspot/share/opto/c2compiler.cpp > > Make do_escape_analysis independent of JVMCI capabilities. Nice! > > > > > > src/hotspot/share/opto/callnode.hpp > > Additional fields for MachSafePointNodes. Ok. > > > > > > src/hotspot/share/opto/escape.cpp > > Annotation for MachSafePointNodes. Your added functionality looks > correct. > > But I'd prefer to move the bulky code out of the large function. > > I suggest to factor out something like has_not_global_escape and > > has_arg_escape. So the code could look like this: > > SafePointNode* sfn = sfn_worklist.at(next); > > sfn->set_not_global_escape_in_scope(has_not_global_escape(sfn)); > > if (sfn->is_CallJava()) { > > CallJavaNode* call = sfn->as_CallJava(); > > call->set_arg_escape(has_arg_escape(call)); > > } > > This would also allow us to get rid of the found_..._escape_in_args > variables > > making the loops better readable. > > > > It's kind of ugly to use strcmp to recognize uncommon trap, but that seems > > to be the way to do it (there are more such places). So it's ok. > > > > > > src/hotspot/share/opto/machnode.hpp > > Additional fields for MachSafePointNodes. Ok. > > > > > > src/hotspot/share/opto/macro.cpp > > Allow elimination of non-escaping allocations. Ok. > > > > > > src/hotspot/share/opto/matcher.cpp > > src/hotspot/share/opto/output.cpp > > Copy attribute / pass parameters. Ok. > > > > > > src/hotspot/share/prims/jvmtiCodeBlobEvents.cpp > > Nice cleanup! > > > > > > src/hotspot/share/prims/jvmtiEnv.cpp > > src/hotspot/share/prims/jvmtiEnvBase.cpp > > Escape barriers + deoptimize objects for target thread. Good. > > > > > > src/hotspot/share/prims/jvmtiImpl.cpp > > src/hotspot/share/prims/jvmtiImpl.hpp > > The sequence is pretty complex: > > VM_GetOrSetLocal element initialization executes EscapeBarrier code > which > > suspends the target thread (extra VM Operation). > > VM_GetOrSetLocal::doit_prologue performs object deoptimization (by VM > > Thread to prepare VM Operation with frame deoptimization). > > VM_GetOrSetLocal destructor implicitly calls EscapeBarrier destructor > which > > resumes the target thread. > > But I don't have any improvement proposal. Performance is probably not a > > concern, here. So it's ok. > > > > VM_GetOrSetLocal::deoptimize_objects deoptimizes the top frame if it has > > non-globally escaping objects and other frames if they have arg escaping > > ones. Good. > > > > > > src/hotspot/share/prims/jvmtiTagMap.cpp > > Escape barriers + deoptimize objects for all threads. Ok. > > > > > > src/hotspot/share/prims/whitebox.cpp > > Added WB_IsFrameDeoptimized to API. Ok. > > > > > > src/hotspot/share/runtime/deoptimization.cpp > > Object deoptimization. I have more comments and proposals, here. > > First of all, handling recursive and waiting locks in relock_objects is tricky, > but > > looks correct. > > Comments are sufficient to understand why things are done as they are > > implemented. > > > > BiasedLocking related parts are complex, but we may get rid of them in the > > future (with BiasedLocking removal). > > Anyway, looks correct, too. > > > > Typo in comment: "regularily" => "regularly" > > > > Deoptimization::fetch_unroll_info_helper is the only place where > > _jvmti_deferred_updates get deallocated (except JavaThread destructor). > > But I think we always go through it, so I can't see a memory leak or such > kind > > of issues. > > > > EscapeBarrier::deoptimize_objects: ResourceMark should use > > calling_thread(). > > > > You can use MutexLocker and MonitorLocker with Thread* to save the > > Thread::current() call. > > > > I'd make set_objs_are_deoptimized static and remove it from the > > EscapeBarrier interface because I think it shouldn't be used outside of > > EscapeBarrier::deoptimize_objects. > > > > Typo in comment: "we must only deoptimize" => "we only have to > > deoptimize" > > > > "bool EscapeBarrier::deoptimize_objects(intptr_t* fr_id)" is trivial and > > barrier_active() is redundant. Implementation can get moved to hpp file. > > > > I'll get back to suspend flags, later. > > > > There are weird cases regarding _self_deoptimization_in_progress. > > Assume we have 3 threads A, B and C. A deopts C, B deopts C, C deopts C. > C > > can set _self_deoptimization_in_progress while A performs the handshake > > for suspending C. I think this doesn't lead to errors, but it's probably not > > desired. > > I think it would be better to use only one "wait" call in > > sync_and_suspend_one and sync_and_suspend_all. > > > > I first thought it'd be better to move ThreadBlockInVM before wait() to > > reduce thread state transitions, but that seems to be problematic because > > ThreadBlockInVM destructor contains a safepoint check which we > shouldn't > > do while holding EscapeBarrier_lock. So no change request. > > > > Change in thred_added: > > I think the sequence would be more comprehensive if we waited for > > deopt_all_threads in Thread::start and all other places where a new thread > > can run into Java code (e.g. JVMTI attach). > > Your version makes new threads come up with suspend flag set. That looks > > correct, too. Advantage is that you only have to change one place > > (thread_added). It'll be interesting to see how it will look like when we use > > async handshakes instead of suspend flags. > > For now, I'm ok with your version. > > > > I'd only move MutexLocker ml(EscapeBarrier_lock...) after if (!jt- > > >is_hidden_from_external_view()). > > > > Having 4 different deoptimize_objects functions makes it a little hard to > keep > > an overview of which one is used for what. > > Maybe adding suffixes would help a little bit, but I can also live with what > you > > have. > > Implementation looks correct to me. > > > > > > src/hotspot/share/runtime/deoptimization.hpp > > Escape barriers and object deoptimization functions. > > Typo in comment: "helt" => "held" > > > > > > src/hotspot/share/runtime/globals.hpp > > Addition of develop flag DeoptimizeObjectsALotInterval. Ok. > > > > > > src/hotspot/share/runtime/interfaceSupport.cpp > > InterfaceSupport::deoptimizeAllObjects() is only used for > > DeoptimizeObjectsALot = 1. > > I think DeoptimizeObjectsALot = 2 is more important, but I think it's not bad > > to have DeoptimizeObjectsALot = 1 in addition. Ok. > > > > > > src/hotspot/share/runtime/interfaceSupport.inline.hpp > > Addition of deoptimizeAllObjects. Ok. > > > > > > src/hotspot/share/runtime/mutexLocker.cpp > > src/hotspot/share/runtime/mutexLocker.hpp > > Addition of EscapeBarrier_lock. Ok. > > > > > > src/hotspot/share/runtime/objectMonitor.cpp > > Make recursion count relock aware. Ok. > > > > > > src/hotspot/share/runtime/stackValue.hpp > > Better reinitilization in StackValue. Good. > > > > > > src/hotspot/share/runtime/thread.cpp > > src/hotspot/share/runtime/thread.hpp > > src/hotspot/share/runtime/thread.inline.hpp > > wait_for_object_deoptimization, suspend flag, deferred updates and test > > feature to deoptimize objects. > > > > In the long term, we want to get rid of suspend flags, so it's not so nice to > > introduce a new one. But I agree with G?tz that it should be acceptable as > > temporary solution until async handshakes are available (which takes more > > time). So I'm ok with your change. > > > > You can use MutexLocker with Thread*. > > > > JVMTIDeferredUpdates: I agree with Robin. It'd be nice to move the class > out > > of thread.hpp. > > > > > > src/hotspot/share/runtime/vframe.cpp > > Added support for entry frame to new_vframe. Ok. > > > > > > src/hotspot/share/runtime/vframe_hp.cpp > > src/hotspot/share/runtime/vframe_hp.hpp > > > > I think code()->as_nmethod() in not_global_escape_in_scope() and > > arg_escape() should better be under #ifdef ASSERT or inside the assert > > statement (no need for code cache walking in product build). > > > > jvmtiDeferredLocalVariableSet::update_monitors: > > Please add a comment explaining that owner referenced by original info > may > > be scalar replaced, but it is deoptimized in the vframe. > > > > > > src/hotspot/share/utilities/macros.hpp > > Addition of NOT_COMPILER2_OR_JVMCI_RETURN macros. Ok. > > > > > > > test/hotspot/jtreg/serviceability/jvmti/Heap/IterateHeapWithEscapeAnalysi > > sEnabled.java > > > test/hotspot/jtreg/serviceability/jvmti/Heap/libIterateHeapWithEscapeAnal > > ysisEnabled.c > > New test. Will review separately. > > > > > > test/jdk/TEST.ROOT > > Addition of vm.jvmci as required property. Ok. > > > > > > test/jdk/com/sun/jdi/EATests.java > > test/jdk/com/sun/jdi/EATestsJVMCI.java > > New test. Will review separately. > > > > > > test/lib/sun/hotspot/WhiteBox.java > > Added isFrameDeoptimized to API. Ok. > > > > > > That was it. Best regards, > > Martin > > > > > > > -----Original Message----- > > > From: hotspot-compiler-dev > > bounces at openjdk.java.net> On Behalf Of Reingruber, Richard > > > Sent: Dienstag, 3. M?rz 2020 21:23 > > > To: 'Robbin Ehn' ; Lindenmaier, Goetz > > > ; David Holmes > > ; > > > Vladimir Kozlov (vladimir.kozlov at oracle.com) > > > ; serviceability-dev at openjdk.java.net; > > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime- > > > dev at openjdk.java.net > > > Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better > > > Performance in the Presence of JVMTI Agents > > > > > > Hi Robbin, > > > > > > > > I understand that Robbin proposed to replace the usage of > > > > > _suspend_flag with handshakes. Apparently, async handshakes > > > > > are needed to do so. We have been waiting a while for removal > > > > > of the _suspend_flag / introduction of async handshakes [2]. > > > > > What is the status here? > > > > > > > I have an old prototype which I would like to continue to work on. > > > > So do not assume asynch handshakes will make 15. > > > > Even if it would, I think there are a lot more investigate work to remove > > > > _suspend_flag. > > > > > > Let us know, if we can be of any help to you and be it only testing. > > > > > > > >> Full: > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/ > > > > > > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp. > > > > You can move both declaration and definition to that file, no need to > > > clobber > > > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) > > > > > > Will do. > > > > > > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in > > it's > > > own > > > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. > > > > > > You are right. It shouldn't be declared in thread.hpp. I will look into that. > > > > > > > Note that we also think we may have a bug in deopt: > > > > https://bugs.openjdk.java.net/browse/JDK-8238237 > > > > > > > I think it would be best, if possible, to push after that is resolved. > > > > > > Sure. > > > > > > > Not even nearly a full review :) > > > > > > I know :) > > > > > > Anyways, thanks a lot, > > > Richard. > > > > > > > > > -----Original Message----- > > > From: Robbin Ehn > > > Sent: Monday, March 2, 2020 11:17 AM > > > To: Lindenmaier, Goetz ; Reingruber, > > Richard > > > ; David Holmes > > ; > > > Vladimir Kozlov (vladimir.kozlov at oracle.com) > > > ; serviceability-dev at openjdk.java.net; > > > hotspot-compiler-dev at openjdk.java.net; hotspot-runtime- > > > dev at openjdk.java.net > > > Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > Performance > > > in the Presence of JVMTI Agents > > > > > > Hi, > > > > > > On 2/24/20 5:39 PM, Lindenmaier, Goetz wrote: > > > > Hi, > > > > > > > > I had a look at the progress of this change. Nothing > > > > happened since Richard posted his update using more > > > > handshakes [1]. > > > > But we (SAP) would appreciate a lot if this change could > > > > be successfully reviewed and pushed. > > > > > > > > I think there is basic understanding that this > > > > change is helpful. It fixes a number of issues with JVMTI, > > > > and will deliver the same performance benefits as EA > > > > does in current production mode for debugging scenarios. > > > > > > > > This is important for us as we run our VMs prepared > > > > for debugging in production mode. > > > > > > > > I understand that Robbin proposed to replace the usage of > > > > _suspend_flag with handshakes. Apparently, async handshakes > > > > are needed to do so. We have been waiting a while for removal > > > > of the _suspend_flag / introduction of async handshakes [2]. > > > > What is the status here? > > > > > > I have an old prototype which I would like to continue to work on. > > > So do not assume asynch handshakes will make 15. > > > Even if it would, I think there are a lot more investigate work to remove > > > _suspend_flag. > > > > > > > > > > > I think we should no longer wait, but proceed with > > > > this change. We will look into removing the usage of > > > > suspend_flag introduced here once it is possible to implement > > > > it with handshakes. > > > > > > Yes, sure. > > > > > > >> Full: > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4/ > > > > > > DeoptimizeObjectsALotThread is only used in compileBroker.cpp. > > > You can move both declaration and definition to that file, no need to > > clobber > > > thread.[c|h]pp. (and the static function deopt_objs_alot_thread_entry) > > > > > > Does JvmtiDeferredUpdates really need to be in thread.hpp, can't be in > it's > > > own > > > hpp file? It doesn't seem right to add JVM TI classes into thread.hpp. > > > > > > Note that we also think we may have a bug in deopt: > > > https://bugs.openjdk.java.net/browse/JDK-8238237 > > > > > > I think it would be best, if possible, to push after that is resolved. > > > > > > Not even nearly a full review :) > > > > > > Thanks, Robbin > > > > > > > > > >> Incremental: > > > >> > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.4.inc/ > > > >> > > > >> I was not able to eliminate the additional suspend flag now. I'll take > care > > > of this > > > >> as soon as the > > > >> existing suspend-resume-mechanism is reworked. > > > >> > > > >> Testing: > > > >> > > > >> Nightly tests @SAP: > > > >> > > > >> JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015, > > > Renaissance > > > >> Suite, SAP specific tests > > > >> with fastdebug and release builds on all platforms > > > >> > > > >> Stress testing with DeoptimizeObjectsALot running SPECjvm2008 40x > > > parallel > > > >> for 24h > > > >> > > > >> Thanks, Richard. > > > >> > > > >> > > > >> More details on the changes: > > > >> > > > >> * Hide DeoptimizeObjectsALotThread from external view. > > > >> > > > >> * Changed EscapeBarrier_lock to be a _safepoint_check_never lock. > > > >> It used to be _safepoint_check_sometimes, which will be eliminated > > > sooner or > > > >> later. > > > >> I added explicit thread state changes with ThreadBlockInVM to code > > > paths > > > >> where we can wait() > > > >> on EscapeBarrier_lock to become safepoint safe. > > > >> > > > >> * Use handshake EscapeBarrierSuspendHandshake to suspend target > > > threads > > > >> instead of vm operation > > > >> VM_ThreadSuspendAllForObjDeopt. > > > >> > > > >> * Removed uses of Threads_lock. When adding a new thread we > > suspend > > > it iff > > > >> EA optimizations are > > > >> being reverted. In the previous version we were waiting on > > > Threads_lock > > > >> while EA optimizations > > > >> were reverted. See EscapeBarrier::thread_added(). > > > >> > > > >> * Made tests require Xmixed compilation mode. > > > >> > > > >> * Made tests agnostic regarding tiered compilation. > > > >> I.e. tc isn't disabled anymore, and the tests can be run with tc > enabled > > or > > > >> disabled. > > > >> > > > >> * Exercising EATests.java as well with stress test options > > > >> DeoptimizeObjectsALot* > > > >> Due to the non-deterministic deoptimizations some tests need to be > > > skipped. > > > >> We do this to prevent bit-rot of the stress test code. > > > >> > > > >> * Executing EATests.java as well with graal if available. Driver for this is > > > >> EATestsJVMCI.java. Graal cannot pass all tests, because it does not > > > provide all > > > >> the new debug info > > > >> (namely not_global_escape_in_scope and arg_escape in > > > scopeDesc.hpp). > > > >> And graal does not yet support the JVMTI operations force early > > return > > > and > > > >> pop frame. > > > >> > > > >> * Removed tracing from new jdi tests in EATests.java. Too much trace > > > output > > > >> before the debugging > > > >> connection is established can cause deadlock because output buffers > > fill > > > up. > > > >> (See https://bugs.openjdk.java.net/browse/JDK-8173304) > > > >> > > > >> * Many copyright year changes and smaller clean-up changes of > testing > > > code > > > >> (trailing white-space and > > > >> the like). > > > >> > > > >> > > > >> -----Original Message----- > > > >> From: David Holmes > > > >> Sent: Donnerstag, 19. Dezember 2019 03:12 > > > >> To: Reingruber, Richard ; serviceability- > > > >> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; > > > hotspot- > > > >> runtime-dev at openjdk.java.net; Vladimir Kozlov > > > (vladimir.kozlov at oracle.com) > > > >> > > > >> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > > Performance in > > > >> the Presence of JVMTI Agents > > > >> > > > >> Hi Richard, > > > >> > > > >> I think my issue is with the way EliminateNestedLocks works so I'm > going > > > >> to look into that more deeply. > > > >> > > > >> Thanks for the explanations. > > > >> > > > >> David > > > >> > > > >> On 18/12/2019 12:47 am, Reingruber, Richard wrote: > > > >>> Hi David, > > > >>> > > > >>> > > > Some further queries/concerns: > > > >>> > > > > > > >>> > > > src/hotspot/share/runtime/objectMonitor.cpp > > > >>> > > > > > > >>> > > > Can you please explain the changes to ObjectMonitor::wait: > > > >>> > > > > > > >>> > > > ! _recursions = save // restore the old recursion count > > > >>> > > > ! + jt->get_and_reset_relock_count_after_wait(); // > > > >>> > > > increased by the deferred relock count > > > >>> > > > > > > >>> > > > what is the "deferred relock count"? I gather it relates to > > > >>> > > > > > > >>> > > > "The code was extended to be able to deoptimize objects of > a > > > >>> > > frame that > > > >>> > > > is not the top frame and to let another thread than the > > owning > > > >>> > > thread do > > > >>> > > > it." > > > >>> > > > > > >>> > > Yes, these relate. Currently EA based optimizations are reverted, > > > when a > > > >> compiled frame is > > > >>> > > replaced with corresponding interpreter frames. Part of this is > > > relocking > > > >> objects with eliminated > > > >>> > > locking. New with the enhancement is that we do this also just > > > before > > > >> object references are > > > >>> > > acquired through JVMTI. In this case we deoptimize also the > > > owning > > > >> compiled frame C and we > > > >>> > > register deoptimized objects as deferred updates. When control > > > returns > > > >> to C it gets deoptimized, > > > >>> > > we notice that objects are already deoptimized (reallocated and > > > >> relocked), so we don't do it again > > > >>> > > (relocking twice would be incorrect of course). Deferred > updates > > > are > > > >> copied into the new > > > >>> > > interpreter frames. > > > >>> > > > > > >>> > > Problem: relocking is not possible if the target thread T is > waiting > > > on the > > > >> monitor that needs to > > > >>> > > be relocked. This happens only with non-local objects with > > > >> EliminateNestedLocks. Instead relocking > > > >>> > > is deferred until T owns the monitor again. This is what the > piece > > of > > > >> code above does. > > > >>> > > > > >>> > Sorry I need some more detail here. How can you wait() on an > > > object > > > >>> > monitor if the object allocation and/or locking was optimised > > away? > > > And > > > >>> > what is a "non-local object" in this context? Isn't EA restricted to > > > >>> > thread-confined objects? > > > >>> > > > >>> "Non-local object" is an object that escapes its thread. The issue I'm > > > >> addressing with the changes > > > >>> in ObjectMonitor::wait are almost unrelated to EA. They are caused > by > > > >> EliminateNestedLocks, where C2 > > > >>> eliminates recursive locking of an already owned lock. The lock > owning > > > object > > > >> exists on the heap, it > > > >>> is locked and you can call wait() on it. > > > >>> > > > >>> EliminateLocks is the C2 option that controls lock elimination based > on > > > EA. > > > >> Both optimizations have > > > >>> in common that objects with eliminated locking need to be relocked > > > when > > > >> deoptimizing a frame, > > > >>> i.e. when replacing a compiled frame with equivalent interpreter > > > >>> frames. Deoptimization::relock_objects does that job for /all/ > > eliminated > > > >> locks in scope. /All/ can > > > >>> be a mix of eliminated nested locks and locks of not-escaping objects. > > > >>> > > > >>> New with the enhancement: I call relock_objects earlier, just before > > > objects > > > >> pontentially > > > >>> escape. But then later when the owning compiled frame gets > > > deoptimized, I > > > >> must not do it again: > > > >>> > > > >>> See call to EscapeBarrier::objs_are_deoptimized in > > deoptimization.cpp: > > > >>> > > > >>> 373 if ((jvmci_enabled || ((DoEscapeAnalysis || > > > EliminateNestedLocks) && > > > >> EliminateLocks)) > > > >>> 374 && !EscapeBarrier::objs_are_deoptimized(thread, > > > deoptee.id())) { > > > >>> 375 bool unused; > > > >>> 376 eliminate_locks(thread, chunk, realloc_failures, deoptee, > > > exec_mode, > > > >> unused); > > > >>> 377 } > > > >>> > > > >>> Now when calling relock_objects early it is quiet possible that I have > to > > > relock > > > >> an object the > > > >>> target thread currently waits for. Obviously I cannot relock in this > case, > > > >> instead I chose to > > > >>> introduce relock_count_after_wait to JavaThread. > > > >>> > > > >>> > Is it just that some of the locking gets optimized away e.g. > > > >>> > > > > >>> > synchronised(obj) { > > > >>> > synchronised(obj) { > > > >>> > synchronised(obj) { > > > >>> > obj.wait(); > > > >>> > } > > > >>> > } > > > >>> > } > > > >>> > > > > >>> > If this is reduced to a form as-if it were a single lock of the > monitor > > > >>> > (due to EA) and the wait() triggers a JVM TI event which leads to > > the > > > >>> > escape of "obj" then we need to reconstruct the true lock state, > > and > > > so > > > >>> > when the wait() internally unblocks and reacquires the monitor it > > > has to > > > >>> > set the true recursion count to 3, not the 1 that it appeared to be > > > when > > > >>> > wait() was initially called. Is that the scenario? > > > >>> > > > >>> Kind of... except that the locking is not eliminated due to EA and > there > > is > > > no > > > >> JVM TI event > > > >>> triggered by wait. > > > >>> > > > >>> Add > > > >>> > > > >>> LocalObject l1 = new LocalObject(); > > > >>> > > > >>> in front of the synchrnized blocks and assume a JVM TI agent > acquires > > l1. > > > This > > > >> triggers the code in > > > >>> question. > > > >>> > > > >>> See that relocking/reallocating is transactional. If it is done then for > > /all/ > > > >> objects in scope and it is > > > >>> done at most once. It wouldn't be quite so easy to split this in > relocking > > > of > > > >> nested/EA-based > > > >>> eliminated locks. > > > >>> > > > >>> > If so I find this truly awful. Anyone using wait() in a realistic form > > > >>> > requires a notification and so the object cannot be thread > > confined. > > > In > > > >>> > > > >>> It is not thread confined. > > > >>> > > > >>> > which case I would strongly argue that upon hitting the wait() the > > > deopt > > > >>> > should occur unconditionally and so the lock state is correct > before > > > we > > > >>> > wait and so we don't need to mess with the recursion count > > > internally > > > >>> > when we reacquire the monitor. > > > >>> > > > > >>> > > > > > >>> > > > which I don't like the sound of at all when it comes to > > > ObjectMonitor > > > >>> > > > state. So I'd like to understand in detail exactly what is going > > on > > > here > > > >>> > > > and why. This is a very intrusive change that seems to badly > > > break > > > >>> > > > encapsulation and impacts future changes to ObjectMonitor > > > that are > > > >> under > > > >>> > > > investigation. > > > >>> > > > > > >>> > > I would not regard this as breaking encapsulation. Certainly not > > > badly. > > > >>> > > > > > >>> > > I've added a property relock_count_after_wait to JavaThread. > > The > > > >> property is well > > > >>> > > encapsulated. Future ObjectMonitor implementations have to > > deal > > > with > > > >> recursion too. They are free > > > >>> > > in choosing a way to do that as long as that property is taken > into > > > >> account. This is hardly a > > > >>> > > limitation. > > > >>> > > > > >>> > I do think this badly breaks encapsulation as you have to add a > > > callout > > > >>> > from the guts of the ObjectMonitor code to reach into the thread > > to > > > get > > > >>> > this lock count adjustment. I understand why you have had to do > > > this but > > > >>> > I would much rather see a change to the EA optimisation strategy > > so > > > that > > > >>> > this is not needed. > > > >>> > > > > >>> > > Note also that the property is a straight forward extension of > the > > > >> existing concept of deferred > > > >>> > > local updates. It is embedded into the structure holding them. > So > > > not > > > >> even the footprint of a > > > >>> > > JavaThread is enlarged if no deferred updates are generated. > > > >>> > > > > >>> > [...] > > > >>> > > > > >>> > > > > > >>> > > I'm actually duplicating the existing external suspend > mechanism, > > > >> because a thread can be > > > >>> > > suspended at most once. And hey, and don't like that either! > But > > it > > > >> seems not unlikely that the > > > >>> > > duplicate can be removed together with the original and the > new > > > type > > > >> of handshakes that will be > > > >>> > > used for thread suspend can be used for object deoptimization > > > too. See > > > >> today's discussion in > > > >>> > > JDK-8227745 [2]. > > > >>> > > > > >>> > I hope that discussion bears some fruit, at the moment it seems > > not > > > to > > > >>> > be possible to use handshakes here. :( > > > >>> > > > > >>> > The external suspend mechanism is a royal pain in the proverbial > > > that we > > > >>> > have to carefully live with. The idea that we're duplicating that > for > > > >>> > use in another fringe area of functionality does not thrill me at all. > > > >>> > > > > >>> > To be clear, I understand the problem that exists and that you > > wish > > > to > > > >>> > solve, but for the runtime parts I balk at the complexity cost of > > > >>> > solving it. > > > >>> > > > >>> I know it's complex, but by far no rocket science. > > > >>> > > > >>> Also I find it hard to imagine another fix for JDK-8233915 besides > > > changing > > > >> the JVM TI specification. > > > >>> > > > >>> Thanks, Richard. > > > >>> > > > >>> -----Original Message----- > > > >>> From: David Holmes > > > >>> Sent: Dienstag, 17. Dezember 2019 08:03 > > > >>> To: Reingruber, Richard ; > serviceability- > > > >> dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; > > > hotspot- > > > >> runtime-dev at openjdk.java.net; Vladimir Kozlov > > > (vladimir.kozlov at oracle.com) > > > >> > > > >>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > > Performance > > > >> in the Presence of JVMTI Agents > > > >>> > > > >>> > > > >>> > > > >>> David > > > >>> > > > >>> On 17/12/2019 4:57 pm, David Holmes wrote: > > > >>>> Hi Richard, > > > >>>> > > > >>>> On 14/12/2019 5:01 am, Reingruber, Richard wrote: > > > >>>>> Hi David, > > > >>>>> > > > >>>>> ?? > Some further queries/concerns: > > > >>>>> ?? > > > > >>>>> ?? > src/hotspot/share/runtime/objectMonitor.cpp > > > >>>>> ?? > > > > >>>>> ?? > Can you please explain the changes to ObjectMonitor::wait: > > > >>>>> ?? > > > > >>>>> ?? > !?? _recursions = save????? // restore the old recursion count > > > >>>>> ?? > !???????????????? + jt->get_and_reset_relock_count_after_wait(); // > > > >>>>> ?? > increased by the deferred relock count > > > >>>>> ?? > > > > >>>>> ?? > what is the "deferred relock count"? I gather it relates to > > > >>>>> ?? > > > > >>>>> ?? > "The code was extended to be able to deoptimize objects of a > > > >>>>> frame that > > > >>>>> ?? > is not the top frame and to let another thread than the owning > > > >>>>> thread do > > > >>>>> ?? > it." > > > >>>>> > > > >>>>> Yes, these relate. Currently EA based optimizations are reverted, > > > when > > > >>>>> a compiled frame is replaced > > > >>>>> with corresponding interpreter frames. Part of this is relocking > > > >>>>> objects with eliminated > > > >>>>> locking. New with the enhancement is that we do this also just > > before > > > >>>>> object references are acquired > > > >>>>> through JVMTI. In this case we deoptimize also the owning > compiled > > > >>>>> frame C and we register > > > >>>>> deoptimized objects as deferred updates. When control returns to > > C > > > it > > > >>>>> gets deoptimized, we notice > > > >>>>> that objects are already deoptimized (reallocated and relocked), so > > > we > > > >>>>> don't do it again (relocking > > > >>>>> twice would be incorrect of course). Deferred updates are copied > > into > > > >>>>> the new interpreter frames. > > > >>>>> > > > >>>>> Problem: relocking is not possible if the target thread T is waiting > > > >>>>> on the monitor that needs to be > > > >>>>> relocked. This happens only with non-local objects with > > > >>>>> EliminateNestedLocks. Instead relocking is > > > >>>>> deferred until T owns the monitor again. This is what the piece of > > > >>>>> code above does. > > > >>>> > > > >>>> Sorry I need some more detail here. How can you wait() on an > object > > > >>>> monitor if the object allocation and/or locking was optimised away? > > > And > > > >>>> what is a "non-local object" in this context? Isn't EA restricted to > > > >>>> thread-confined objects? > > > >>>> > > > >>>> Is it just that some of the locking gets optimized away e.g. > > > >>>> > > > >>>> synchronised(obj) { > > > >>>> ? synchronised(obj) { > > > >>>> ??? synchronised(obj) { > > > >>>> ????? obj.wait(); > > > >>>> ??? } > > > >>>> ? } > > > >>>> } > > > >>>> > > > >>>> If this is reduced to a form as-if it were a single lock of the monitor > > > >>>> (due to EA) and the wait() triggers a JVM TI event which leads to the > > > >>>> escape of "obj" then we need to reconstruct the true lock state, and > > so > > > >>>> when the wait() internally unblocks and reacquires the monitor it > has > > to > > > >>>> set the true recursion count to 3, not the 1 that it appeared to be > > when > > > >>>> wait() was initially called. Is that the scenario? > > > >>>> > > > >>>> If so I find this truly awful. Anyone using wait() in a realistic form > > > >>>> requires a notification and so the object cannot be thread confined. > > In > > > >>>> which case I would strongly argue that upon hitting the wait() the > > > deopt > > > >>>> should occur unconditionally and so the lock state is correct before > > we > > > >>>> wait and so we don't need to mess with the recursion count > internally > > > >>>> when we reacquire the monitor. > > > >>>> > > > >>>>> > > > >>>>> ?? > which I don't like the sound of at all when it comes to > > > >>>>> ObjectMonitor > > > >>>>> ?? > state. So I'd like to understand in detail exactly what is going > > > >>>>> on here > > > >>>>> ?? > and why.? This is a very intrusive change that seems to badly > > > break > > > >>>>> ?? > encapsulation and impacts future changes to ObjectMonitor > > that > > > >>>>> are under > > > >>>>> ?? > investigation. > > > >>>>> > > > >>>>> I would not regard this as breaking encapsulation. Certainly not > > badly. > > > >>>>> > > > >>>>> I've added a property relock_count_after_wait to JavaThread. The > > > >>>>> property is well > > > >>>>> encapsulated. Future ObjectMonitor implementations have to deal > > > with > > > >>>>> recursion too. They are free in > > > >>>>> choosing a way to do that as long as that property is taken into > > > >>>>> account. This is hardly a > > > >>>>> limitation. > > > >>>> > > > >>>> I do think this badly breaks encapsulation as you have to add a > callout > > > >>>> from the guts of the ObjectMonitor code to reach into the thread to > > > get > > > >>>> this lock count adjustment. I understand why you have had to do > this > > > but > > > >>>> I would much rather see a change to the EA optimisation strategy so > > > that > > > >>>> this is not needed. > > > >>>> > > > >>>>> Note also that the property is a straight forward extension of the > > > >>>>> existing concept of deferred > > > >>>>> local updates. It is embedded into the structure holding them. So > > not > > > >>>>> even the footprint of a > > > >>>>> JavaThread is enlarged if no deferred updates are generated. > > > >>>>> > > > >>>>> ?? > --- > > > >>>>> ?? > > > > >>>>> ?? > src/hotspot/share/runtime/thread.cpp > > > >>>>> ?? > > > > >>>>> ?? > Can you please explain why > > > >>>>> JavaThread::wait_for_object_deoptimization > > > >>>>> ?? > has to be handcrafted in this way rather than using proper > > > >>>>> transitions. > > > >>>>> ?? > > > > >>>>> > > > >>>>> I wrote wait_for_object_deoptimization taking > > > >>>>> JavaThread::java_suspend_self_with_safepoint_check > > > >>>>> as template. So in short: for the same reasons :) > > > >>>>> > > > >>>>> Threads reach both methods as part of thread state transitions, > > > >>>>> therefore special handling is > > > >>>>> required to change thread state on top of ongoing transitions. > > > >>>>> > > > >>>>> ?? > We got rid of "deopt suspend" some time ago and it is > > disturbing > > > >>>>> to see > > > >>>>> ?? > it being added back (effectively). This seems like it may be > > > >>>>> something > > > >>>>> ?? > that handshakes could be used for. > > > >>>>> > > > >>>>> Deopt suspend used to be something rather different with a > similar > > > >>>>> name[1]. It is not being added back. > > > >>>> > > > >>>> I stand corrected. Despite comments in the code to the contrary > > > >>>> deopt_suspend didn't actually cause a self-suspend. I was doing a > lot > > of > > > >>>> cleanup in this area 13 years ago :) > > > >>>> > > > >>>>> > > > >>>>> I'm actually duplicating the existing external suspend mechanism, > > > >>>>> because a thread can be suspended > > > >>>>> at most once. And hey, and don't like that either! But it seems not > > > >>>>> unlikely that the duplicate can > > > >>>>> be removed together with the original and the new type of > > > handshakes > > > >>>>> that will be used for > > > >>>>> thread suspend can be used for object deoptimization too. See > > > today's > > > >>>>> discussion in JDK-8227745 [2]. > > > >>>> > > > >>>> I hope that discussion bears some fruit, at the moment it seems not > > to > > > >>>> be possible to use handshakes here. :( > > > >>>> > > > >>>> The external suspend mechanism is a royal pain in the proverbial > that > > > we > > > >>>> have to carefully live with. The idea that we're duplicating that for > > > >>>> use in another fringe area of functionality does not thrill me at all. > > > >>>> > > > >>>> To be clear, I understand the problem that exists and that you wish > to > > > >>>> solve, but for the runtime parts I balk at the complexity cost of > > > >>>> solving it. > > > >>>> > > > >>>> Thanks, > > > >>>> David > > > >>>> ----- > > > >>>> > > > >>>>> Thanks, Richard. > > > >>>>> > > > >>>>> [1] Deopt suspend was something like an async. handshake for > > > >>>>> architectures with register windows, > > > >>>>> ???? where patching the return pc for deoptimization of a compiled > > > >>>>> frame was racy if the owner thread > > > >>>>> ???? was in native code. Instead a "deopt" suspend flag was set on > > > >>>>> which the thread patched its own > > > >>>>> ???? frame upon return from native. So no thread was suspended. > It > > > got > > > >>>>> its name only from the name of > > > >>>>> ???? the flags. > > > >>>>> > > > >>>>> [2] Discussion about using handshakes to sync. with the target > > thread: > > > >>>>> > > > >>>>> https://bugs.openjdk.java.net/browse/JDK- > > > >> > > > > > > 8227745?focusedCommentId=14306727&page=com.atlassian.jira.plugin.syst > > > e > > > >> m.issuetabpanels:comment-tabpanel#comment-14306727 > > > >>>>> > > > >>>>> > > > >>>>> -----Original Message----- > > > >>>>> From: David Holmes > > > >>>>> Sent: Freitag, 13. Dezember 2019 00:56 > > > >>>>> To: Reingruber, Richard ; > > > >>>>> serviceability-dev at openjdk.java.net; > > > >>>>> hotspot-compiler-dev at openjdk.java.net; > > > >>>>> hotspot-runtime-dev at openjdk.java.net > > > >>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > > >>>>> Performance in the Presence of JVMTI Agents > > > >>>>> > > > >>>>> Hi Richard, > > > >>>>> > > > >>>>> Some further queries/concerns: > > > >>>>> > > > >>>>> src/hotspot/share/runtime/objectMonitor.cpp > > > >>>>> > > > >>>>> Can you please explain the changes to ObjectMonitor::wait: > > > >>>>> > > > >>>>> !?? _recursions = save????? // restore the old recursion count > > > >>>>> !???????????????? + jt->get_and_reset_relock_count_after_wait(); // > > > >>>>> increased by the deferred relock count > > > >>>>> > > > >>>>> what is the "deferred relock count"? I gather it relates to > > > >>>>> > > > >>>>> "The code was extended to be able to deoptimize objects of a > > frame > > > that > > > >>>>> is not the top frame and to let another thread than the owning > > thread > > > do > > > >>>>> it." > > > >>>>> > > > >>>>> which I don't like the sound of at all when it comes to > ObjectMonitor > > > >>>>> state. So I'd like to understand in detail exactly what is going on > here > > > >>>>> and why.? This is a very intrusive change that seems to badly break > > > >>>>> encapsulation and impacts future changes to ObjectMonitor that > > are > > > under > > > >>>>> investigation. > > > >>>>> > > > >>>>> --- > > > >>>>> > > > >>>>> src/hotspot/share/runtime/thread.cpp > > > >>>>> > > > >>>>> Can you please explain why > > > JavaThread::wait_for_object_deoptimization > > > >>>>> has to be handcrafted in this way rather than using proper > > transitions. > > > >>>>> > > > >>>>> We got rid of "deopt suspend" some time ago and it is disturbing > to > > > see > > > >>>>> it being added back (effectively). This seems like it may be > > something > > > >>>>> that handshakes could be used for. > > > >>>>> > > > >>>>> Thanks, > > > >>>>> David > > > >>>>> ----- > > > >>>>> > > > >>>>> On 12/12/2019 7:02 am, David Holmes wrote: > > > >>>>>> On 12/12/2019 1:07 am, Reingruber, Richard wrote: > > > >>>>>>> Hi David, > > > >>>>>>> > > > >>>>>>> ??? > Most of the details here are in areas I can comment on in > > > detail, > > > >>>>>>> but I > > > >>>>>>> ??? > did take an initial general look at things. > > > >>>>>>> > > > >>>>>>> Thanks for taking the time! > > > >>>>>> > > > >>>>>> Apologies the above should read: > > > >>>>>> > > > >>>>>> "Most of the details here are in areas I *can't* comment on in > > detail > > > >>>>>> ..." > > > >>>>>> > > > >>>>>> David > > > >>>>>> > > > >>>>>>> ??? > The only thing that jumped out at me is that I think the > > > >>>>>>> ??? > DeoptimizeObjectsALotThread should be a hidden thread. > > > >>>>>>> ??? > > > > >>>>>>> ??? > +? bool is_hidden_from_external_view() const { return true; > > } > > > >>>>>>> > > > >>>>>>> Yes, it should. Will add the method like above. > > > >>>>>>> > > > >>>>>>> ??? > Also I don't see any testing of the > > > DeoptimizeObjectsALotThread. > > > >>>>>>> Without > > > >>>>>>> ??? > active testing this will just bit-rot. > > > >>>>>>> > > > >>>>>>> DeoptimizeObjectsALot is meant for stress testing with a larger > > > >>>>>>> workload. I will add a minimal test > > > >>>>>>> to keep it fresh. > > > >>>>>>> > > > >>>>>>> ??? > Also on the tests I don't understand your @requires clause: > > > >>>>>>> ??? > > > > >>>>>>> ??? >?? @requires ((vm.compMode != "Xcomp") & > > > vm.compiler2.enabled > > > >> & > > > >>>>>>> ??? > (vm.opt.TieredCompilation != true)) > > > >>>>>>> ??? > > > > >>>>>>> ??? > This seems to require that TieredCompilation is disabled, > but > > > >>>>>>> tiered is > > > >>>>>>> ??? > our normal mode of operation. ?? > > > >>>>>>> ??? > > > > >>>>>>> > > > >>>>>>> I removed the clause. I guess I wanted to target the tests > towards > > > the > > > >>>>>>> code they are supposed to > > > >>>>>>> test, and it's easier to analyze failures w/o tiered compilation > and > > > >>>>>>> with just one compiler thread. > > > >>>>>>> > > > >>>>>>> Additionally I will make use of > > > >>>>>>> compiler.whitebox.CompilerWhiteBoxTest.THRESHOLD in the > > tests. > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> Richard. > > > >>>>>>> > > > >>>>>>> -----Original Message----- > > > >>>>>>> From: David Holmes > > > >>>>>>> Sent: Mittwoch, 11. Dezember 2019 08:03 > > > >>>>>>> To: Reingruber, Richard ; > > > >>>>>>> serviceability-dev at openjdk.java.net; > > > >>>>>>> hotspot-compiler-dev at openjdk.java.net; > > > >>>>>>> hotspot-runtime-dev at openjdk.java.net > > > >>>>>>> Subject: Re: RFR(L) 8227745: Enable Escape Analysis for Better > > > >>>>>>> Performance in the Presence of JVMTI Agents > > > >>>>>>> > > > >>>>>>> Hi Richard, > > > >>>>>>> > > > >>>>>>> On 11/12/2019 7:45 am, Reingruber, Richard wrote: > > > >>>>>>>> Hi, > > > >>>>>>>> > > > >>>>>>>> I would like to get reviews please for > > > >>>>>>>> > > > >>>>>>>> > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.3/ > > > >>>>>>>> > > > >>>>>>>> Corresponding RFE: > > > >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8227745 > > > >>>>>>>> > > > >>>>>>>> Fixes also https://bugs.openjdk.java.net/browse/JDK-8233915 > > > >>>>>>>> And potentially https://bugs.openjdk.java.net/browse/JDK- > > > 8214584 [1] > > > >>>>>>>> > > > >>>>>>>> Vladimir Kozlov kindly put webrev.3 through tier1-8 testing > > > without > > > >>>>>>>> issues (thanks!). In addition the > > > >>>>>>>> change is being tested at SAP since I posted the first RFR some > > > >>>>>>>> months ago. > > > >>>>>>>> > > > >>>>>>>> The intention of this enhancement is to benefit performance > > wise > > > from > > > >>>>>>>> escape analysis even if JVMTI > > > >>>>>>>> agents request capabilities that allow them to access local > > variable > > > >>>>>>>> values. E.g. if you start-up > > > >>>>>>>> with -agentlib:jdwp=transport=dt_socket,server=y,suspend=n, > > > then > > > >>>>>>>> escape analysis is disabled right > > > >>>>>>>> from the beginning, well before a debugger attaches -- if ever > > one > > > >>>>>>>> should do so. With the > > > >>>>>>>> enhancement, escape analysis will remain enabled until and > > after > > > a > > > >>>>>>>> debugger attaches. EA based > > > >>>>>>>> optimizations are reverted just before an agent acquires the > > > >>>>>>>> reference to an object. In the JBS item > > > >>>>>>>> you'll find more details. > > > >>>>>>> > > > >>>>>>> Most of the details here are in areas I can comment on in detail, > > but > > > I > > > >>>>>>> did take an initial general look at things. > > > >>>>>>> > > > >>>>>>> The only thing that jumped out at me is that I think the > > > >>>>>>> DeoptimizeObjectsALotThread should be a hidden thread. > > > >>>>>>> > > > >>>>>>> +? bool is_hidden_from_external_view() const { return true; } > > > >>>>>>> > > > >>>>>>> Also I don't see any testing of the DeoptimizeObjectsALotThread. > > > >>>>>>> Without > > > >>>>>>> active testing this will just bit-rot. > > > >>>>>>> > > > >>>>>>> Also on the tests I don't understand your @requires clause: > > > >>>>>>> > > > >>>>>>> ??? @requires ((vm.compMode != "Xcomp") & > > > vm.compiler2.enabled & > > > >>>>>>> (vm.opt.TieredCompilation != true)) > > > >>>>>>> > > > >>>>>>> This seems to require that TieredCompilation is disabled, but > > tiered > > > is > > > >>>>>>> our normal mode of operation. ?? > > > >>>>>>> > > > >>>>>>> Thanks, > > > >>>>>>> David > > > >>>>>>> > > > >>>>>>>> Thanks, > > > >>>>>>>> Richard. > > > >>>>>>>> > > > >>>>>>>> [1] Experimental fix for JDK-8214584 based on JDK-8227745 > > > >>>>>>>> > > > >> > > > > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8214584/experiment_v1.pa > > > tc > > > >> h > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> From christian.hagedorn at oracle.com Mon Jul 13 07:19:37 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 13 Jul 2020 09:19:37 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> Message-ID: <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Thank you Vladimir for your review! Best regards, Christian On 11.07.20 01:25, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/10/20 12:37 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248552 >> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >> >> In the failing testcase, C2 removes a zero check for a division/modulo >> node n based on the type information of the loop induction variable >> phi p (always between 1 and 50 and never 0). However, n is later split >> through p and ends up after the AddNode which updates the induction >> variable p. In the last iteration j equals 2 and is then updated to 0. >> The division/modulo node n is now executed before the loop limit check >> which results in a SIGFPE. >> >> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or >> modulo node has its zero check removed (i.e. control in NULL) and is >> split through a phi which has an input that could be zero. This should >> only happen for an induction variable phi of a trip-counted (integer) >> loop. >> >> Best regards, >> Christian From christian.hagedorn at oracle.com Mon Jul 13 09:06:35 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 13 Jul 2020 11:06:35 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have zero in their type range (which is fine, this happens in some loop opts after partial peeling was applied earlier). This happened, for example, for a phi which merged two nodes from the original and a cloned loop. I think we just need to remove the additional assertion code. New webrev: http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ Best regards, Christian On 13.07.20 09:19, Christian Hagedorn wrote: > Thank you Vladimir for your review! > > Best regards, > Christian > > On 11.07.20 01:25, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks, >> Vladimir >> >> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>> >>> In the failing testcase, C2 removes a zero check for a >>> division/modulo node n based on the type information of the loop >>> induction variable phi p (always between 1 and 50 and never 0). >>> However, n is later split through p and ends up after the AddNode >>> which updates the induction variable p. In the last iteration j >>> equals 2 and is then updated to 0. The division/modulo node n is now >>> executed before the loop limit check which results in a SIGFPE. >>> >>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or >>> modulo node has its zero check removed (i.e. control in NULL) and is >>> split through a phi which has an input that could be zero. This >>> should only happen for an induction variable phi of a trip-counted >>> (integer) loop. >>> >>> Best regards, >>> Christian From aph at redhat.com Mon Jul 13 09:16:11 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 13 Jul 2020 10:16:11 +0100 Subject: [15] RFR(S): 8248845: AArch64: stack corruption after spilling vector register In-Reply-To: <854kqiqrt0.fsf@arm.com> References: <857dvfrev5.fsf@arm.com> <0eeec297-f2e1-e326-5d3a-eb4a11e47934@oracle.com> <854kqiqrt0.fsf@arm.com> Message-ID: <7acb43fb-abd2-d895-8f1b-2ab4aff140a2@redhat.com> On 08/07/2020 10:28, Nick Gasson wrote: > I wonder whether we should only do scheduling on AArch64 for in-order > CPUs? I tried SPECjvm with/without OptoScheduling on a few different > AArch64 systems but couldn't get conclusive results either way. Arm has always been difficult to performance tune because it's an architecture, not a processor. I didn't test on Arm's own designs at all for the first few years of AArch64. We schedule based on a conversation I had with Arm architects, which basically amounted to "Schedule for in-order cores and the out-of-order cores will look after themselves." I'd prefer not to disable scheduling simply because it's buggy; that feels wrong to me. But if it's making things worse, then we can do so. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Mon Jul 13 10:07:10 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 13 Jul 2020 12:07:10 +0200 Subject: [15] RFR (S): 8247502: PhaseStringOpts crashes while optimising effectively dead code In-Reply-To: References: <9ee563ef-501b-bdaa-4e87-8e9e8aaf2dd7@oracle.com> Message-ID: <3beee001-0e4f-c92f-3746-74c8ed6bb043@oracle.com> +1 Thanks for taking care of this while I'm on vacation! Best regards, Tobias On 11.07.20 01:19, Vladimir Kozlov wrote: > I agree with this small fix. > > Thanks, > Vladimir > > On 7/10/20 9:26 AM, Vladimir Ivanov wrote: >> https://bugs.openjdk.java.net/browse/JDK-8247502 >> http://cr.openjdk.java.net/~vlivanov/8247502/webrev.00/ >> >> As Tobias discovered, PhaseStringOpts crashes when it encounters String::append() argument being >> TOP: TOP is a constant, but the code expects to see a String constant instead. >> >> It happens while processing a call in unreachable infinite loop. The code is effectively dead, but >> IGVN and PhaseRemoveUseless don't see that. It is discovered later when loop opts kick in which >> clean it up. >> >> Proposed fix tries to make the code more robust and just bails out the optimization when TOP is >> encountered. >> >> Alternative way to fix the problem would be to clean up the graph before PhaseStringOpts (e.g., by >> running PhaseIdealLoop(LoopOptsNone) since PhaseRemoveUseless is not enough), but PhaseIdealLoop >> pass can be expensive. So, I'm in favor of the local fix in PhaseStringOpts. >> >> Testing: crash reproducer, hs-precheckin-comp, hs-tier1, hs-tier2, tier1 >> >> Thanks! >> >> PS: no regression test since I wasn't able to extract a simple reproducer from the crash log. >> >> Best regards, >> Vladimir Ivanov From christian.hagedorn at oracle.com Mon Jul 13 10:09:51 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 13 Jul 2020 12:09:51 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> Message-ID: <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> Ping - could anyone review it, please? Thanks! Best regards, Christian On 02.07.20 09:33, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8247743 > http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ > > The testcase creates a deep graph with a lot of nodes on a chain. When > running with the specified test flags, it recursively calls > Node::find_recur() for each node discovered which eventually results in > a segmentation fault due to a stack overflow (around 10000 calls due to > such a long chain of nodes). The fix just converts the recursive > algorithm into an iterative one to avoid a segmentation fault. This is > similar to JDK-8246203 [1]. > > I additionally removed Node::find_ctrl() and its special handling in the > algorithm since it is not used. > > There is actually another problem with the recursive version. When > running the testcase without -XX:CompileOnly=compiler/c2/TestFindNode, > it will spin forever inside [2] because there is a debug_orig node cycle > and the loop does not break based on the debug_orig nodes being visited. > This is also fixed in the patch. > > Thank you! > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/browse/JDK-8246203 > [2] > http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 > From vladimir.x.ivanov at oracle.com Mon Jul 13 13:40:36 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Mon, 13 Jul 2020 16:40:36 +0300 Subject: [15] RFR (S): 8247502: PhaseStringOpts crashes while optimising effectively dead code In-Reply-To: <3beee001-0e4f-c92f-3746-74c8ed6bb043@oracle.com> References: <9ee563ef-501b-bdaa-4e87-8e9e8aaf2dd7@oracle.com> <3beee001-0e4f-c92f-3746-74c8ed6bb043@oracle.com> Message-ID: <949b0726-d36a-c103-3632-2f31390cefdb@oracle.com> Thanks for the reviews, Vladimir & Tobias. Best regards, Vladimir Ivanov On 13.07.2020 13:07, Tobias Hartmann wrote: > +1 > > Thanks for taking care of this while I'm on vacation! > > Best regards, > Tobias > > > On 11.07.20 01:19, Vladimir Kozlov wrote: >> I agree with this small fix. >> >> Thanks, >> Vladimir >> >> On 7/10/20 9:26 AM, Vladimir Ivanov wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8247502 >>> http://cr.openjdk.java.net/~vlivanov/8247502/webrev.00/ >>> >>> As Tobias discovered, PhaseStringOpts crashes when it encounters String::append() argument being >>> TOP: TOP is a constant, but the code expects to see a String constant instead. >>> >>> It happens while processing a call in unreachable infinite loop. The code is effectively dead, but >>> IGVN and PhaseRemoveUseless don't see that. It is discovered later when loop opts kick in which >>> clean it up. >>> >>> Proposed fix tries to make the code more robust and just bails out the optimization when TOP is >>> encountered. >>> >>> Alternative way to fix the problem would be to clean up the graph before PhaseStringOpts (e.g., by >>> running PhaseIdealLoop(LoopOptsNone) since PhaseRemoveUseless is not enough), but PhaseIdealLoop >>> pass can be expensive. So, I'm in favor of the local fix in PhaseStringOpts. >>> >>> Testing: crash reproducer, hs-precheckin-comp, hs-tier1, hs-tier2, tier1 >>> >>> Thanks! >>> >>> PS: no regression test since I wasn't able to extract a simple reproducer from the crash log. >>> >>> Best regards, >>> Vladimir Ivanov From beurba at microsoft.com Mon Jul 13 14:03:57 2020 From: beurba at microsoft.com (Bernhard Urban-Forster) Date: Mon, 13 Jul 2020 14:03:57 +0000 Subject: RFR(XS) 8248671: AArch64: Remove unused variables In-Reply-To: <1c652b56-2476-ede0-47f8-13c4e99639d0@oracle.com> References: <1c652b56-2476-ede0-47f8-13c4e99639d0@oracle.com> Message-ID: Thank you for your review Andrew and David. Here is the webrev based on https://hg.openjdk.java.net/jdk/jdk/: http://cr.openjdk.java.net/~burban/8248671_hg/ Thanks, -Bernhard > -----Original Message----- > From: David Holmes > Sent: Monday, July 13, 2020 6:08 AM > To: Bernhard Urban-Forster ; aarch64-port- > dev at openjdk.java.net; hotspot-dev at openjdk.java.net; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(XS) 8248671: AArch64: Remove unused variables > > Hi Bernhard, > > On 10/07/2020 7:08 am, Bernhard Urban-Forster wrote: > > Hello everyone, > > > > > > please review this change: > > > > JBS: > > https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs > > .openjdk.java.net%2Fbrowse%2FJDK- > 8248671&data=02%7C01%7Cbeurba%40m > > > icrosoft.com%7Cfa08ce93e8d44b4f38a708d826e296ed%7C72f988bf86f141af > 91ab > > > 2d7cd011db47%7C1%7C0%7C637302101922573034&sdata=Bv6Fsw104 > ZizId5EdC > > %2BTnV8DrJCJyQzVk9ht6rWjLMw%3D&reserved=0 > > Webrev: > > https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.open > > jdk.java.net%2F~burban%2F8248671_unused- > vars%2F&data=02%7C01%7Cbeu > > > rba%40microsoft.com%7Cfa08ce93e8d44b4f38a708d826e296ed%7C72f988bf > 86f14 > > > 1af91ab2d7cd011db47%7C1%7C0%7C637302101922573034&sdata=X7 > 7Ri2iWLkm > > %2FOSmdP9HyEWEwMA7rO%2BM6oKVRAg4zHGQ%3D&reserved=0 > > > > We found this issue while bringing up Windows+AArch64 support for > HotSpot. The Microsoft toolchain (MSVC) seems to be slightly more pedantic > than GCC. > > Looks good and trivial. > > But could I request that webrevs/patches for mainline be generated against the > mainline hg repository rather than the git mirror. > > Thanks, > David > > > > > Thanks, > > -Bernhard > > From patric.hedlin at oracle.com Mon Jul 13 15:33:50 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 13 Jul 2020 17:33:50 +0200 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: Hi Andrew, On 2020-07-09 16:26, Andrew Haley wrote: > On 07/07/2020 12:17, Patric Hedlin wrote: >> C1 code generation for reading and writing stack-slots does not handle >> large immediate offsets on aarch64. This patch will ensure that >> immediate offsets are admissible for base+(immediate)offset encoding or, >> if this is not the case, will enforce an explicit address calculation to >> a scratch register. (Also correcting a small glitch in 9-bit signed >> immediate encoding check.) >> >> NOTE: Current patch includes (local) definitions of is_simm/9 and >> is_uimm/12, for review purpose only. With JDK-8248901 these will move to >> Assembler, and will not be included in the change-set. > Umm, OK. These functions seem too complicated: all you have to do is > > int64_t chk = val >> (nbits - 1); |(gdb) > guarantee (chk == -1 || chk == 0, "Field too big for insn"); |#9 0x0000ffffbcab5c30 in Compilation::compile_method (this=0xffff80b7dde8) The 'guarantee' of course works poorly as a predicate and the 'chk' calculation is based on implementation-dependent behaviour. > but the AArch64 part of it looks fine. Having seen your second answer, I guess you had a change of hart. What a pity. /Patric From aph at redhat.com Mon Jul 13 17:06:34 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 13 Jul 2020 18:06:34 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> Message-ID: <6656d2d2-053b-63c1-e3b8-878600313bc3@redhat.com> Hi, On 13/07/2020 16:33, Patric Hedlin wrote: > > On 2020-07-09 16:26, Andrew Haley wrote: >> On 07/07/2020 12:17, Patric Hedlin wrote: >>> C1 code generation for reading and writing stack-slots does not handle >>> large immediate offsets on aarch64. This patch will ensure that >>> immediate offsets are admissible for base+(immediate)offset encoding or, >>> if this is not the case, will enforce an explicit address calculation to >>> a scratch register. (Also correcting a small glitch in 9-bit signed >>> immediate encoding check.) >>> >>> NOTE: Current patch includes (local) definitions of is_simm/9 and >>> is_uimm/12, for review purpose only. With JDK-8248901 these will move to >>> Assembler, and will not be included in the change-set. >> Umm, OK. These functions seem too complicated: all you have to do is >> >> int64_t chk = val >> (nbits - 1); |(gdb) >> guarantee (chk == -1 || chk == 0, "Field too big for insn"); |#9 0x0000ffffbcab5c30 in Compilation::compile_method (this=0xffff80b7dde8) > The 'guarantee' of course works poorly as a predicate and the 'chk' > calculation is based on implementation-dependent behaviour. Hmm. Signed >> does require a 2's complement C implementation, but we assume many implementation-defined things in HotSpot. I know, if we can remove such things perhaps we should, all other things being equal. >> but the AArch64 part of it looks fine. > Having seen your second answer, I guess you had a change of > hart. What a pity. Of course I want this bug fixed, and I'm grateful for this patch. However, we already have an equivalent overflow test in a couple of (a few, probably) different places; and your patch adds another one. We shouldn't be doing that. I'm quite open to doing it in another way, thereby replacing the existing logic, but not to duplicating code. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Mon Jul 13 17:16:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2020 10:16:31 -0700 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: This rise question: why zero check was removed if one of merged types has 0? Should we be more careful when we remove zero check? Thanks, Vladimir On 7/13/20 2:06 AM, Christian Hagedorn wrote: > A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod node > whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have zero in their > type range (which is fine, this happens in some loop opts after partial peeling was applied earlier). This happened, for > example, for a phi which merged two nodes from the original and a cloned loop. I think we just need to remove the > additional assertion code. > > New webrev: > http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ > > Best regards, > Christian > > On 13.07.20 09:19, Christian Hagedorn wrote: >> Thank you Vladimir for your review! >> >> Best regards, >> Christian >> >> On 11.07.20 01:25, Vladimir Kozlov wrote: >>> Looks good. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>> >>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of the >>>> loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p and ends up >>>> after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is then updated to 0. >>>> The division/modulo node n is now executed before the loop limit check which results in a SIGFPE. >>>> >>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed (i.e. >>>> control in NULL) and is split through a phi which has an input that could be zero. This should only happen for an >>>> induction variable phi of a trip-counted (integer) loop. >>>> >>>> Best regards, >>>> Christian From vladimir.kozlov at oracle.com Mon Jul 13 17:43:36 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2020 10:43:36 -0700 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> Message-ID: <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> Node::find_ctrl() is used during debugging when you want to print and look on only control nodes. We have several such methods which are only used in debugger. I suggest to store old_arena() in local var and pass into add_to_worklist(). You can make add_to_worklist() static since you pass node as argument. Thanks, Vladimir On 7/13/20 3:09 AM, Christian Hagedorn wrote: > Ping - could anyone review it, please? Thanks! > > Best regards, > Christian > > On 02.07.20 09:33, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8247743 >> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >> >> The testcase creates a deep graph with a lot of nodes on a chain. When running with the specified test flags, it >> recursively calls Node::find_recur() for each node discovered which eventually results in a segmentation fault due to >> a stack overflow (around 10000 calls due to such a long chain of nodes). The fix just converts the recursive algorithm >> into an iterative one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. >> >> I additionally removed Node::find_ctrl() and its special handling in the algorithm since it is not used. >> >> There is actually another problem with the recursive version. When running the testcase without >> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because there is a debug_orig node cycle and >> the loop does not break based on the debug_orig nodes being visited. This is also fixed in the patch. >> >> Thank you! >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >> [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From ekaterina.pavlova at oracle.com Mon Jul 13 19:38:10 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Mon, 13 Jul 2020 12:38:10 -0700 Subject: RFR[15] (T/XS): 8236809 [Graal] java/lang/Class/getDeclaredField/FieldSetAccessibleTest.java timeouts Message-ID: Hi all, please review this small change which adds the test into ProblemList-graal.txt till we have libgraal. JBS: https://bugs.openjdk.java.net/browse/JDK-8236809 webrev: http://cr.openjdk.java.net/~epavlova//8236809/webrev.00/index.html regards, -katya From vladimir.kozlov at oracle.com Mon Jul 13 21:25:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2020 14:25:40 -0700 Subject: RFR[15] (T/XS): 8236809 [Graal] java/lang/Class/getDeclaredField/FieldSetAccessibleTest.java timeouts In-Reply-To: References: Message-ID: <0816f8ed-4122-39ea-3806-0fba1a51d34a@oracle.com> Good. Thanks, Vladimir K On 7/13/20 12:38 PM, Ekaterina Pavlova wrote: > Hi all, > > please review this small change which adds the test into ProblemList-graal.txt > till we have libgraal. > > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8236809 > ?webrev: http://cr.openjdk.java.net/~epavlova//8236809/webrev.00/index.html > > > regards, > -katya From igor.ignatyev at oracle.com Mon Jul 13 21:29:18 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 13 Jul 2020 14:29:18 -0700 Subject: RFR [15] : 8249036 : clean up FileInstaller $test.src $cwd in vmTestbase_nsk_stress tests Message-ID: <5E2ED18E-9CD6-44D6-95D0-E13D1AFC1BC3@oracle.com> http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ > 44 lines changed: 0 ins; 23 del; 21 mod; Hi all, could you please review this clean-up which removes unnecessary `FileInstaller` actions for :vmTestbase_nsk_stress tests? from the main issue(8204985): > all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. none of vmTestbase_nsk_stress tests need FileInstaller, hence the patch is just `ag -l '@run driver jdk.test.lib.FileInstaller . .' vmTestbase/nsk/stress xargs -I{} gsed -i '/@run driver jdk.test.lib.FileInstaller \. \./d' {}`. JBS: https://bugs.openjdk.java.net/browse/JDK-8249036 webrev: http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ testing: :vmTestbase_nsk_stress on linux-x64 Thanks, -- Igor From vladimir.kozlov at oracle.com Mon Jul 13 21:34:06 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 13 Jul 2020 14:34:06 -0700 Subject: RFR [15] : 8249036 : clean up FileInstaller $test.src $cwd in vmTestbase_nsk_stress tests In-Reply-To: <5E2ED18E-9CD6-44D6-95D0-E13D1AFC1BC3@oracle.com> References: <5E2ED18E-9CD6-44D6-95D0-E13D1AFC1BC3@oracle.com> Message-ID: <5720357e-c3c6-6f7f-7993-535561fd84e2@oracle.com> Good. Thanks, Vladimir K On 7/13/20 2:29 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ >> 44 lines changed: 0 ins; 23 del; 21 mod; > > Hi all, > > could you please review this clean-up which removes unnecessary `FileInstaller` actions for :vmTestbase_nsk_stress tests? > from the main issue(8204985): >> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. > > none of vmTestbase_nsk_stress tests need FileInstaller, hence the patch is just `ag -l '@run driver jdk.test.lib.FileInstaller . .' vmTestbase/nsk/stress xargs -I{} gsed -i '/@run driver jdk.test.lib.FileInstaller \. \./d' {}`. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249036 > webrev: http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ > testing: :vmTestbase_nsk_stress on linux-x64 > > Thanks, > -- Igor > From jamsheed.c.m at oracle.com Tue Jul 14 08:28:33 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Tue, 14 Jul 2020 13:58:33 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> Message-ID: <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> Hi all, I had incorrectly added extra check in assert after offset computation in address_offset . For addps with non constant offsets (like [1]) Not changing the old assert even though I am not expecting first addp/second addp(for array addressing) case for init captured store. http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ Best regards, Jamsheed [1] assert(offs != Type::OffsetBot || - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || is_captured_store(adr), "offset must be a constant or it is initialization of array"); On 13/07/2020 11:14, Jamsheed C M wrote: > > Hi, > > I reworked the fix. I compute offset for all init captures stores, but > treats this special init captured stores similar to unsafe(as these > objects are usually GlobalEscape and doesn't have any perf implications). > > revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ > > testing: mach1-5( logs in jbs) > > Best regards, > > Jamsheed > > On 09/07/2020 19:36, Jamsheed C M wrote: >> >> Hi, >> >> request to hold the review. need to change the code for dealing with >> unsafe access. as current capture code go for more execution time >> analyzing things. >> >> Best regards, >> >> Jamsheed >> >> On 09/07/2020 13:01, Jamsheed C M wrote: >>> >>> Hi all, >>> >>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>> >>> Request for review changes made to offset computation and field >>> write detection for init captured stores due to phis addition >>> between alloc and init. This happen if init node in different outer >>> loop wrt to alloc node and there is a loop opt.? This was required >>> as a result of enhancement [1]. >>> >>> Normally init are not associated with multiple alloc node during EA >>> phase, but changes done for [1] caused the code shapes of the form >>> [2]? to generate inits associated with multiple alloc node. >>> >>> This had implication in offset computation and field write detection >>> related to initializing stores. >>> >>> Attempt to fix in EA: >>> >>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>> >>> Alternate fix: >>> >>> ???? Minimize the scenario in compiler generated code by throwing >>> only j.l.Error from slowpath(all exception async/sync are handled in >>> runtime exit). >>> >>> ???? Stub epilog doesn't poll or throw any exceptions. Disable full >>> loop opt before EA for detectable patterns and bailout EA for late >>> detected patterns. >>> >>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>> >>> Please advice. >>> >>> Testing : mach tier1-5 (logs in jbs) >>> >>> Best regards, >>> >>> Jamsheed >>> >>> >>> [1] JDK-8231291 >>> C2: loop opts >>> before EA should maximally unroll loops >>> >>> [2] that have its init node in different outer loop wrt to alloc node. >>> >>> >>> loop begin >>> >>> ?? try{ >>> >>> ?? return new obj()/? throw new obj()/ uncommon trap after >>> allocation, in a loop >>> >>> ?? } catch(ex) { >>> >>> ?? } >>> >>> loop end >>> >>> 42 public static IntA test(int n) { >>> 43 for (int i=0; i<2; i++) { >>> 44 try { >>> 45 return new IntA(n + i); >>> 46 } catch (Exception e) { >>> 47 } >>> 48 } >>> 49 >>> From christian.hagedorn at oracle.com Tue Jul 14 09:54:01 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 14 Jul 2020 11:54:01 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> Message-ID: <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> Hi Vladimir On 13.07.20 19:43, Vladimir Kozlov wrote: > Node::find_ctrl() is used during debugging when you want to print and > look on only control nodes. > We have several such methods which are only used in debugger. I see, I restored this method and changed Node::find() accordingly. I additionally added two find_ctrl() methods to make it easier to call it from a debugger (as already present for find_node()). > I suggest to store old_arena() in local var and pass into > add_to_worklist(). > > You can make add_to_worklist() static since you pass node as argument. Okay. I updated this and the change above in a new webrev: http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ Best regards, Christian > Thanks, > Vladimir > > On 7/13/20 3:09 AM, Christian Hagedorn wrote: >> Ping - could anyone review it, please? Thanks! >> >> Best regards, >> Christian >> >> On 02.07.20 09:33, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>> >>> The testcase creates a deep graph with a lot of nodes on a chain. >>> When running with the specified test flags, it recursively calls >>> Node::find_recur() for each node discovered which eventually results >>> in a segmentation fault due to a stack overflow (around 10000 calls >>> due to such a long chain of nodes). The fix just converts the >>> recursive algorithm into an iterative one to avoid a segmentation >>> fault. This is similar to JDK-8246203 [1]. >>> >>> I additionally removed Node::find_ctrl() and its special handling in >>> the algorithm since it is not used. >>> >>> There is actually another problem with the recursive version. When >>> running the testcase without >>> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside >>> [2] because there is a debug_orig node cycle and the loop does not >>> break based on the debug_orig nodes being visited. This is also fixed >>> in the patch. >>> >>> Thank you! >>> >>> Best regards, >>> Christian >>> >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>> [2] >>> http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 >>> From christian.hagedorn at oracle.com Tue Jul 14 12:32:19 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 14 Jul 2020 14:32:19 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: Hi Vladimir I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good so far. Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int could overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction variable phi (which is an input to the DivNode) also gets its type set to int. Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input. At a later point in time, we want to split through p. But then we detect zero as possible value due to the type range of both AddINodes being int. Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an actual problem for loop induction variable phis. Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that cause other problems? I guess there probably must be a reason why we don't do it like that. Best regards, Christian On 13.07.20 19:16, Vladimir Kozlov wrote: > This rise question: why zero check was removed if one of merged types > has 0? > Should we be more careful when we remove zero check? > > Thanks, > Vladimir > > On 7/13/20 2:06 AM, Christian Hagedorn wrote: >> A test in some later tier testing revealed that the assertion code is >> actually too strong. There can be a Div/Mod node whose zero check was >> removed but that is then spilt through a non-induction-variable phi >> whose inputs have zero in their type range (which is fine, this >> happens in some loop opts after partial peeling was applied earlier). >> This happened, for example, for a phi which merged two nodes from the >> original and a cloned loop. I think we just need to remove the >> additional assertion code. >> >> New webrev: >> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >> >> Best regards, >> Christian >> >> On 13.07.20 09:19, Christian Hagedorn wrote: >>> Thank you Vladimir for your review! >>> >>> Best regards, >>> Christian >>> >>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>> Looks good. >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>> Hi >>>>> >>>>> Please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>> >>>>> In the failing testcase, C2 removes a zero check for a >>>>> division/modulo node n based on the type information of the loop >>>>> induction variable phi p (always between 1 and 50 and never 0). >>>>> However, n is later split through p and ends up after the AddNode >>>>> which updates the induction variable p. In the last iteration j >>>>> equals 2 and is then updated to 0. The division/modulo node n is >>>>> now executed before the loop limit check which results in a SIGFPE. >>>>> >>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division >>>>> or modulo node has its zero check removed (i.e. control in NULL) >>>>> and is split through a phi which has an input that could be zero. >>>>> This should only happen for an induction variable phi of a >>>>> trip-counted (integer) loop. >>>>> >>>>> Best regards, >>>>> Christian From christian.hagedorn at oracle.com Tue Jul 14 12:39:32 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 14 Jul 2020 14:39:32 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: > [..] Since the DivINode has a control outside of the main loop [..] Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the main-loop body). The DivINode still has NULL as control input. Best regards, Christian On 14.07.20 14:32, Christian Hagedorn wrote: > Hi Vladimir > > I had a closer look at the failing testcase with webrev.00. The original > DivNode has its zero check removed based on correct type information. > Afterwards its split through an induction variable phi for which both > inputs have non-zero types. So, the DivNode end up after an AddINode > (which adds a positive constant) which has a non-zero type. All good so > far. > > Now we add pre/main/post loops and the induction variable phi for the > pre-loop gets type int>=1 since the limit for the pre-loop is hidden > behind an Opaque1 node which just returns int as type. The AddINode > belonging to the loop induction variable phi in the pre-loop is > therefore updated to have the type int as well (int>=1 + positive_int > could overflow). This type information propagates to the main-loop and > its AddINode belonging to the loop induction variable phi (which is an > input to the DivNode) also gets its type set to int. > > Later, we add a vector post loop where we clone the main loop and add a > phi p for the the AddINode node and its new clone. Since the DivINode > has a control outside of the main loop, it is not cloned and gets the > phi p as an input. At a later point in time, we want to split through p. > But then we detect zero as possible value due to the type range of both > AddINodes being int. > > Even though the type information is not accurate enough, the DivINode is > never zero and we could safely apply the split through the phi. We could > think about doing a bail out for all kinds of phis but I think it should > only be an actual problem for loop induction variable phis. > > Thinking about this type propagation problem, couldn't we somehow set > the type of the Opaque1 node hiding the pre-loop limit to the same type > as the pre-loop limit to allow this information to flow to the pre and > main loop? Or would that cause other problems? I guess there probably > must be a reason why we don't do it like that. > > Best regards, > Christian > > On 13.07.20 19:16, Vladimir Kozlov wrote: >> This rise question: why zero check was removed if one of merged types >> has 0? >> Should we be more careful when we remove zero check? >> >> Thanks, >> Vladimir >> >> On 7/13/20 2:06 AM, Christian Hagedorn wrote: >>> A test in some later tier testing revealed that the assertion code is >>> actually too strong. There can be a Div/Mod node whose zero check was >>> removed but that is then spilt through a non-induction-variable phi >>> whose inputs have zero in their type range (which is fine, this >>> happens in some loop opts after partial peeling was applied earlier). >>> This happened, for example, for a phi which merged two nodes from the >>> original and a cloned loop. I think we just need to remove the >>> additional assertion code. >>> >>> New webrev: >>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>> >>> Best regards, >>> Christian >>> >>> On 13.07.20 09:19, Christian Hagedorn wrote: >>>> Thank you Vladimir for your review! >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>>> Looks good. >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>>> Hi >>>>>> >>>>>> Please review the following patch: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>>> >>>>>> In the failing testcase, C2 removes a zero check for a >>>>>> division/modulo node n based on the type information of the loop >>>>>> induction variable phi p (always between 1 and 50 and never 0). >>>>>> However, n is later split through p and ends up after the AddNode >>>>>> which updates the induction variable p. In the last iteration j >>>>>> equals 2 and is then updated to 0. The division/modulo node n is >>>>>> now executed before the loop limit check which results in a SIGFPE. >>>>>> >>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division >>>>>> or modulo node has its zero check removed (i.e. control in NULL) >>>>>> and is split through a phi which has an input that could be zero. >>>>>> This should only happen for an induction variable phi of a >>>>>> trip-counted (integer) loop. >>>>>> >>>>>> Best regards, >>>>>> Christian From igor.ignatyev at oracle.com Tue Jul 14 18:25:44 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 14 Jul 2020 11:25:44 -0700 Subject: RFR [15] : 8249036 : clean up FileInstaller $test.src $cwd in vmTestbase_nsk_stress tests In-Reply-To: <5720357e-c3c6-6f7f-7993-535561fd84e2@oracle.com> References: <5E2ED18E-9CD6-44D6-95D0-E13D1AFC1BC3@oracle.com> <5720357e-c3c6-6f7f-7993-535561fd84e2@oracle.com> Message-ID: Thanks Vladimir, pushed to jdk15. -- Igor > On Jul 13, 2020, at 2:34 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir K > > On 7/13/20 2:29 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ >>> 44 lines changed: 0 ins; 23 del; 21 mod; >> Hi all, >> could you please review this clean-up which removes unnecessary `FileInstaller` actions for :vmTestbase_nsk_stress tests? >> from the main issue(8204985): >>> all vmTestbase tests have '@run driver jdk.test.lib.FileInstaller . .' to mimic old test harness behavior and copy all files from a test source directory to a current work directory. some tests depend on this step, so we need 1st identify such tests and then either rewrite them not to have this dependency or leave FileInstaller only in these tests. >> none of vmTestbase_nsk_stress tests need FileInstaller, hence the patch is just `ag -l '@run driver jdk.test.lib.FileInstaller . .' vmTestbase/nsk/stress xargs -I{} gsed -i '/@run driver jdk.test.lib.FileInstaller \. \./d' {}`. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249036 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249036/webrev.00/ >> testing: :vmTestbase_nsk_stress on linux-x64 >> Thanks, >> -- Igor From vladimir.kozlov at oracle.com Tue Jul 14 18:46:32 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Jul 2020 11:46:32 -0700 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> Message-ID: <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> Can you move next up to where other small find*() methods are defined?: +Node* Node::find_ctrl(int idx) { + return find(idx, true); } Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It is hard to find where this not product code ends. Looks good otherwise. Thanks, Vladimir On 7/14/20 2:54 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 13.07.20 19:43, Vladimir Kozlov wrote: >> Node::find_ctrl() is used during debugging when you want to print and look on only control nodes. >> We have several such methods which are only used in debugger. > > I see, I restored this method and changed Node::find() accordingly. I additionally added two find_ctrl() methods to make > it easier to call it from a debugger (as already present for find_node()). > >> I suggest to store old_arena() in local var and pass into add_to_worklist(). >> >> You can make add_to_worklist() static since you pass node as argument. > > Okay. I updated this and the change above in a new webrev: > http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ > > Best regards, > Christian > >> Thanks, >> Vladimir >> >> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>> Ping - could anyone review it, please? Thanks! >>> >>> Best regards, >>> Christian >>> >>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>> Hi >>>> >>>> Please review the following patch: >>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>> >>>> The testcase creates a deep graph with a lot of nodes on a chain. When running with the specified test flags, it >>>> recursively calls Node::find_recur() for each node discovered which eventually results in a segmentation fault due >>>> to a stack overflow (around 10000 calls due to such a long chain of nodes). The fix just converts the recursive >>>> algorithm into an iterative one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. >>>> >>>> I additionally removed Node::find_ctrl() and its special handling in the algorithm since it is not used. >>>> >>>> There is actually another problem with the recursive version. When running the testcase without >>>> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because there is a debug_orig node cycle >>>> and the loop does not break based on the debug_orig nodes being visited. This is also fixed in the patch. >>>> >>>> Thank you! >>>> >>>> Best regards, >>>> Christian >>>> >>>> >>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From vladimir.kozlov at oracle.com Tue Jul 14 19:07:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Jul 2020 12:07:31 -0700 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: > Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop > limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that > cause other problems? I guess there probably must be a reason why we don't do it like that. It has wide type to prevent premature optimizations before loop is fully transformed. That is the reason we add it in first place. But it would be interesting to see if we can use more narrow type: TypeInt::POS1 for example for positive limits (>0) (and opposite for negative limits < 0). I may be missing some nuances and it may not work but we should try. Regards, Vladimir On 7/14/20 5:39 AM, Christian Hagedorn wrote: > >> [..] Since the DivINode has a control outside of the main loop [..] > > Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the > main-loop body). The DivINode still has NULL as control input. > > Best regards, > Christian > > On 14.07.20 14:32, Christian Hagedorn wrote: >> Hi Vladimir >> >> I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on >> correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero >> types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good >> so far. >> >> Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for >> the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop >> induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int could >> overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction variable >> phi (which is an input to the DivNode) also gets its type set to int. >> >> Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new >> clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input. At >> a later point in time, we want to split through p. But then we detect zero as possible value due to the type range of >> both AddINodes being int. >> >> Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the >> split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an >> actual problem for loop induction variable phis. >> >> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop >> limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that >> cause other problems? I guess there probably must be a reason why we don't do it like that. >> >> Best regards, >> Christian >> >> On 13.07.20 19:16, Vladimir Kozlov wrote: >>> This rise question: why zero check was removed if one of merged types has 0? >>> Should we be more careful when we remove zero check? >>> >>> Thanks, >>> Vladimir >>> >>> On 7/13/20 2:06 AM, Christian Hagedorn wrote: >>>> A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod >>>> node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have zero >>>> in their type range (which is fine, this happens in some loop opts after partial peeling was applied earlier). This >>>> happened, for example, for a phi which merged two nodes from the original and a cloned loop. I think we just need to >>>> remove the additional assertion code. >>>> >>>> New webrev: >>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 13.07.20 09:19, Christian Hagedorn wrote: >>>>> Thank you Vladimir for your review! >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>>>> Looks good. >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>>>> Hi >>>>>>> >>>>>>> Please review the following patch: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>>>> >>>>>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of >>>>>>> the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p and >>>>>>> ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is then >>>>>>> updated to 0. The division/modulo node n is now executed before the loop limit check which results in a SIGFPE. >>>>>>> >>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed (i.e. >>>>>>> control in NULL) and is split through a phi which has an input that could be zero. This should only happen for an >>>>>>> induction variable phi of a trip-counted (integer) loop. >>>>>>> >>>>>>> Best regards, >>>>>>> Christian From ekaterina.pavlova at oracle.com Wed Jul 15 00:25:30 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Tue, 14 Jul 2020 17:25:30 -0700 Subject: RFR [15] (T/XS): 8242388 compiler/graalunit/CoreTest.java timed out Message-ID: <39811448-6cf5-c329-de66-27233854cb62@oracle.com> Hi all, compiler/graalunit/CoreTest.java fails by timeout from time to time. The most time expensive subtest is org.graalvm.compiler.core.test.CountedLoopTest. The fix spits the test into two tests to reduce total execution time. Please review. JBS: https://bugs.openjdk.java.net/browse/JDK-8242388 webrev: http://cr.openjdk.java.net/~epavlova//8242388/webrev.00/index.html testing: graalunit tests as part of tier3 Thanks, -katya From vladimir.kozlov at oracle.com Wed Jul 15 01:20:18 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Jul 2020 18:20:18 -0700 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> Message-ID: <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> I looked more on this. EA already does not secularize allocations when Phi nodes merged them - it should handle this case. I did small experiment and relaxed assert for this new (10. needs comment update) case for AddP's base and test passed: src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 @@ -2357,6 +2357,7 @@ int opcode = uncast_base->Opcode(); assert(opcode == Op_ConP || opcode == Op_ThreadLocal || opcode == Op_CastX2P || uncast_base->is_DecodeNarrowPtr() || + (uncast_base->is_Phi() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || (uncast_base->is_Mem() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || (uncast_base->is_Proj() && uncast_base->in(0)->is_Allocate()), "sanity"); } Did you hit a case when this may not work? And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): ======== Connection graph for Test::test JavaObject NoEscape(NoEscape) [ 158F [ 107 ]] 95 Allocate === 242 76 230 8 1 ( 93 92 21 1 78 1 78 ) [[ 96 97 98 105 106 107 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 LocalVar [ 95P [ 158b ]] 107 Proj === 95 [[ 108 158 ]] #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 Scalar 95 Allocate === 242 76 230 8 1 ( 93 92 21 1 78 1 78 ) [[ 96 97 98 105 106 107 ]] rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 ++++ Eliminated: 95 Allocate t\Thanks, Vladimir K On 7/14/20 1:28 AM, Jamsheed C M wrote: > Hi all, > > I had incorrectly added extra check in assert after offset computation in address_offset . For addps with non constant > offsets (like [1]) > > Not changing the old assert even though I am not expecting first addp/second addp(for array addressing) case for init > captured store. > > http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ > > Best regards, > > Jamsheed > > [1] > > assert(offs != Type::OffsetBot || > - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), > + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || is_captured_store(adr), > ??????????? "offset must be a constant or it is initialization of array"); > > On 13/07/2020 11:14, Jamsheed C M wrote: >> >> Hi, >> >> I reworked the fix. I compute offset for all init captures stores, but treats this special init captured stores >> similar to unsafe(as these objects are usually GlobalEscape and doesn't have any perf implications). >> >> revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >> >> testing: mach1-5( logs in jbs) >> >> Best regards, >> >> Jamsheed >> >> On 09/07/2020 19:36, Jamsheed C M wrote: >>> >>> Hi, >>> >>> request to hold the review. need to change the code for dealing with unsafe access. as current capture code go for >>> more execution time analyzing things. >>> >>> Best regards, >>> >>> Jamsheed >>> >>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>> >>>> Hi all, >>>> >>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>> >>>> Request for review changes made to offset computation and field write detection for init captured stores due to phis >>>> addition between alloc and init. This happen if init node in different outer loop wrt to alloc node and there is a >>>> loop opt.? This was required as a result of enhancement [1]. >>>> >>>> Normally init are not associated with multiple alloc node during EA phase, but changes done for [1] caused the code >>>> shapes of the form [2]? to generate inits associated with multiple alloc node. >>>> >>>> This had implication in offset computation and field write detection related to initializing stores. >>>> >>>> Attempt to fix in EA: >>>> >>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>> >>>> Alternate fix: >>>> >>>> ???? Minimize the scenario in compiler generated code by throwing only j.l.Error from slowpath(all exception >>>> async/sync are handled in runtime exit). >>>> >>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable full loop opt before EA for detectable patterns and >>>> bailout EA for late detected patterns. >>>> >>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>> >>>> Please advice. >>>> >>>> Testing : mach tier1-5 (logs in jbs) >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> >>>> [1] JDK-8231291 C2: loop opts before EA should maximally unroll loops >>>> >>>> [2] that have its init node in different outer loop wrt to alloc node. >>>> >>>> >>>> loop begin >>>> >>>> ?? try{ >>>> >>>> ?? return new obj()/? throw new obj()/ uncommon trap after allocation, in a loop >>>> >>>> ?? } catch(ex) { >>>> >>>> ?? } >>>> >>>> loop end >>>> >>>> ? 42???? public static IntA test(int n) { >>>> ?? 43???????? for (int i=0; i<2; i++) { >>>> ?? 44???????????? try { >>>> ?? 45?????????????????? return new IntA(n + i); >>>> ?? 46???????????? } catch (Exception e) { >>>> ?? 47???????????? } >>>> ?? 48???????? } >>>> ?? 49 >>>> From vladimir.kozlov at oracle.com Wed Jul 15 01:24:11 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 14 Jul 2020 18:24:11 -0700 Subject: RFR [15] (T/XS): 8242388 compiler/graalunit/CoreTest.java timed out In-Reply-To: <39811448-6cf5-c329-de66-27233854cb62@oracle.com> References: <39811448-6cf5-c329-de66-27233854cb62@oracle.com> Message-ID: <64ccd3d7-dc82-e243-a63e-db49d61503ef@oracle.com> Good. Thanks, Vladimir K On 7/14/20 5:25 PM, Ekaterina Pavlova wrote: > Hi all, > > compiler/graalunit/CoreTest.java fails by timeout from time to time. > The most time expensive subtest is org.graalvm.compiler.core.test.CountedLoopTest. > The fix spits the test into two tests to reduce total execution time. > Please review. > > ??? JBS: https://bugs.openjdk.java.net/browse/JDK-8242388 > ?webrev: http://cr.openjdk.java.net/~epavlova//8242388/webrev.00/index.html > testing: graalunit tests as part of tier3 > > > Thanks, > -katya > > > From jamsheed.c.m at oracle.com Wed Jul 15 02:51:28 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Wed, 15 Jul 2020 08:21:28 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> Message-ID: <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> Hi Vladimir, On 15/07/2020 06:50, Vladimir Kozlov wrote: > I looked more on this. EA already does not secularize allocations when > Phi nodes merged them - it should handle this case. I did small > experiment and relaxed assert for this new (10. needs comment update) > case for AddP's base and test passed: > > src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 > @@ -2357,6 +2357,7 @@ > ?????? int opcode = uncast_base->Opcode(); > ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || > ????????????? opcode == Op_CastX2P || > uncast_base->is_DecodeNarrowPtr() || > +???????????? (uncast_base->is_Phi() && > (uncast_base->bottom_type()->isa_rawptr() != NULL)) || > ????????????? (uncast_base->is_Mem() && > (uncast_base->bottom_type()->isa_rawptr() != NULL)) || > ????????????? (uncast_base->is_Proj() && > uncast_base->in(0)->is_Allocate()), "sanity"); > ???? } > > Did you hit a case when this may not work? Yes, right it already doesn't mark it as scalarizable if base count is more than one(I think it missed a is_oop check there). EA CG adds edges only for oop field making stores to them undetected. This makes these stored objects to NoEscape and if compiled method continues execution with this NoEscape object can have undesired results(i.e synchronization removed). Probable case would be(didn't verify) try { LOOP BEGIN ? try {throw new Obj()} catch {} LOOP END } catch (Obj e) { } [1] https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L2256 > > > And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation > (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): > > ======== Connection graph for? Test::test > JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95??? Allocate ===? > 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1? 78 ) [[ 96 97? 98 105? 106? > 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, > top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: Test::test1 @ > bci:0 Test::test @ bci:8 > LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108? 158 ]] #5 > !jvms: Test::test1 @ bci:0 Test::test @ bci:8 > > Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93? 92? 21? 1 78? > 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]]? rawptr:NotNull ( int:>=0, > java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 Test::test > @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 > ++++ Eliminated: 95 Allocate > > > t\Thanks, > Vladimir K > > On 7/14/20 1:28 AM, Jamsheed C M wrote: >> Hi all, >> >> I had incorrectly added extra check in assert after offset >> computation in address_offset . For addps with non constant offsets >> (like [1]) >> >> Not changing the old assert even though I am not expecting first >> addp/second addp(for array addressing) case for init captured store. >> >> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >> >> Best regards, >> >> Jamsheed >> >> [1] >> >> assert(offs != Type::OffsetBot || >> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >> is_captured_store(adr), >> ???????????? "offset must be a constant or it is initialization of >> array"); >> >> On 13/07/2020 11:14, Jamsheed C M wrote: >>> >>> Hi, >>> >>> I reworked the fix. I compute offset for all init captures stores, >>> but treats this special init captured stores similar to unsafe(as >>> these objects are usually GlobalEscape and doesn't have any perf >>> implications). >>> >>> revised webrev: >>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>> >>> testing: mach1-5( logs in jbs) >>> >>> Best regards, >>> >>> Jamsheed >>> >>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>> >>>> Hi, >>>> >>>> request to hold the review. need to change the code for dealing >>>> with unsafe access. as current capture code go for more execution >>>> time analyzing things. >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>> >>>>> Hi all, >>>>> >>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>> >>>>> Request for review changes made to offset computation and field >>>>> write detection for init captured stores due to phis addition >>>>> between alloc and init. This happen if init node in different >>>>> outer loop wrt to alloc node and there is a loop opt.? This was >>>>> required as a result of enhancement [1]. >>>>> >>>>> Normally init are not associated with multiple alloc node during >>>>> EA phase, but changes done for [1] caused the code shapes of the >>>>> form [2]? to generate inits associated with multiple alloc node. >>>>> >>>>> This had implication in offset computation and field write >>>>> detection related to initializing stores. >>>>> >>>>> Attempt to fix in EA: >>>>> >>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>> >>>>> Alternate fix: >>>>> >>>>> ???? Minimize the scenario in compiler generated code by throwing >>>>> only j.l.Error from slowpath(all exception async/sync are handled >>>>> in runtime exit). >>>>> >>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable >>>>> full loop opt before EA for detectable patterns and bailout EA for >>>>> late detected patterns. >>>>> >>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>> >>>>> Please advice. >>>>> >>>>> Testing : mach tier1-5 (logs in jbs) >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> >>>>> [1] JDK-8231291 >>>>> C2: loop opts >>>>> before EA should maximally unroll loops >>>>> >>>>> [2] that have its init node in different outer loop wrt to alloc >>>>> node. >>>>> >>>>> >>>>> loop begin >>>>> >>>>> ?? try{ >>>>> >>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>> allocation, in a loop >>>>> >>>>> ?? } catch(ex) { >>>>> >>>>> ?? } >>>>> >>>>> loop end >>>>> >>>>> ? 42???? public static IntA test(int n) { >>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>> ?? 44???????????? try { >>>>> ?? 45?????????????????? return new IntA(n + i); >>>>> ?? 46???????????? } catch (Exception e) { >>>>> ?? 47???????????? } >>>>> ?? 48???????? } >>>>> ?? 49 >>>>> From jamsheed.c.m at oracle.com Wed Jul 15 03:08:10 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Wed, 15 Jul 2020 08:38:10 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> Message-ID: <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> (unfinished mail got sent, so completing it) On 15/07/2020 08:21, Jamsheed C M wrote: > Hi Vladimir, > > On 15/07/2020 06:50, Vladimir Kozlov wrote: >> I looked more on this. EA already does not secularize allocations >> when Phi nodes merged them - it should handle this case. I did small >> experiment and relaxed assert for this new (10. needs comment update) >> case for AddP's base and test passed: >> >> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >> @@ -2357,6 +2357,7 @@ >> ?????? int opcode = uncast_base->Opcode(); >> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >> ????????????? opcode == Op_CastX2P || >> uncast_base->is_DecodeNarrowPtr() || >> +???????????? (uncast_base->is_Phi() && >> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >> ????????????? (uncast_base->is_Mem() && >> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >> ????????????? (uncast_base->is_Proj() && >> uncast_base->in(0)->is_Allocate()), "sanity"); >> ???? } >> >> Did you hit a case when this may not work? > > Yes, right it already doesn't mark it as scalarizable if base count is > more than one(I think it missed a is_oop check there)[1]. > > EA CG adds edges only for oop field making stores to them undetected. > This makes these stored objects to NoEscape and if compiled method > continues execution with this NoEscape object can have undesired > results(i.e synchronization removed). > > Probable case would be(didn't verify) > > try { > > LOOP BEGIN > > ? try {throw new Obj()} catch {} > > LOOP END > > } catch (Obj e) { > > } Best Regards, Jamsheed [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >> >> >> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation >> (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): >> >> ======== Connection graph for? Test::test >> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95??? Allocate ===? >> 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1? 78 ) [[ 96 97 98 105? 106? >> 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, >> top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: Test::test1 @ >> bci:0 Test::test @ bci:8 >> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108? 158 ]] #5 >> !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >> >> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93? 92? 21 1 78? >> 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]]? rawptr:NotNull ( int:>=0, >> java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 >> Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >> ++++ Eliminated: 95 Allocate >> >> >> t\Thanks, >> Vladimir K >> >> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>> Hi all, >>> >>> I had incorrectly added extra check in assert after offset >>> computation in address_offset . For addps with non constant offsets >>> (like [1]) >>> >>> Not changing the old assert even though I am not expecting first >>> addp/second addp(for array addressing) case for init captured store. >>> >>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>> >>> >>> Best regards, >>> >>> Jamsheed >>> >>> [1] >>> >>> assert(offs != Type::OffsetBot || >>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >>> is_captured_store(adr), >>> ???????????? "offset must be a constant or it is initialization of >>> array"); >>> >>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>> >>>> Hi, >>>> >>>> I reworked the fix. I compute offset for all init captures stores, >>>> but treats this special init captured stores similar to unsafe(as >>>> these objects are usually GlobalEscape and doesn't have any perf >>>> implications). >>>> >>>> revised webrev: >>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>> >>>> testing: mach1-5( logs in jbs) >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>> >>>>> Hi, >>>>> >>>>> request to hold the review. need to change the code for dealing >>>>> with unsafe access. as current capture code go for more execution >>>>> time analyzing things. >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>> >>>>>> Hi all, >>>>>> >>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>> >>>>>> Request for review changes made to offset computation and field >>>>>> write detection for init captured stores due to phis addition >>>>>> between alloc and init. This happen if init node in different >>>>>> outer loop wrt to alloc node and there is a loop opt.? This was >>>>>> required as a result of enhancement [1]. >>>>>> >>>>>> Normally init are not associated with multiple alloc node during >>>>>> EA phase, but changes done for [1] caused the code shapes of the >>>>>> form [2]? to generate inits associated with multiple alloc node. >>>>>> >>>>>> This had implication in offset computation and field write >>>>>> detection related to initializing stores. >>>>>> >>>>>> Attempt to fix in EA: >>>>>> >>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>> >>>>>> Alternate fix: >>>>>> >>>>>> ???? Minimize the scenario in compiler generated code by throwing >>>>>> only j.l.Error from slowpath(all exception async/sync are handled >>>>>> in runtime exit). >>>>>> >>>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable >>>>>> full loop opt before EA for detectable patterns and bailout EA >>>>>> for late detected patterns. >>>>>> >>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>> >>>>>> Please advice. >>>>>> >>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> >>>>>> [1] JDK-8231291 >>>>>> C2: loop opts >>>>>> before EA should maximally unroll loops >>>>>> >>>>>> [2] that have its init node in different outer loop wrt to alloc >>>>>> node. >>>>>> >>>>>> >>>>>> loop begin >>>>>> >>>>>> ?? try{ >>>>>> >>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>>> allocation, in a loop >>>>>> >>>>>> ?? } catch(ex) { >>>>>> >>>>>> ?? } >>>>>> >>>>>> loop end >>>>>> >>>>>> ? 42???? public static IntA test(int n) { >>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>> ?? 44???????????? try { >>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>> ?? 46???????????? } catch (Exception e) { >>>>>> ?? 47???????????? } >>>>>> ?? 48???????? } >>>>>> ?? 49 >>>>>> From christian.hagedorn at oracle.com Wed Jul 15 07:58:17 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 09:58:17 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> Message-ID: <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> Hi Vladimir On 14.07.20 20:46, Vladimir Kozlov wrote: > Can you move next up to where other small find*() methods are defined?: > > +Node* Node::find_ctrl(int idx) { > +? return find(idx, true); > ?} > > Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It is > hard to find where this not product code ends. > > Looks good otherwise. Thanks, I added these changes in a new webrev: http://cr.openjdk.java.net/~chagedorn/8247743/webrev.02/ Best regards, Christian > Thanks, > Vladimir > > On 7/14/20 2:54 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 13.07.20 19:43, Vladimir Kozlov wrote: >>> Node::find_ctrl() is used during debugging when you want to print and >>> look on only control nodes. >>> We have several such methods which are only used in debugger. >> >> I see, I restored this method and changed Node::find() accordingly. I >> additionally added two find_ctrl() methods to make it easier to call >> it from a debugger (as already present for find_node()). >> >>> I suggest to store old_arena() in local var and pass into >>> add_to_worklist(). >>> >>> You can make add_to_worklist() static since you pass node as argument. >> >> Okay. I updated this and the change above in a new webrev: >> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ >> >> Best regards, >> Christian >> >>> Thanks, >>> Vladimir >>> >>> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>>> Ping - could anyone review it, please? Thanks! >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>>> Hi >>>>> >>>>> Please review the following patch: >>>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>>> >>>>> The testcase creates a deep graph with a lot of nodes on a chain. >>>>> When running with the specified test flags, it recursively calls >>>>> Node::find_recur() for each node discovered which eventually >>>>> results in a segmentation fault due to a stack overflow (around >>>>> 10000 calls due to such a long chain of nodes). The fix just >>>>> converts the recursive algorithm into an iterative one to avoid a >>>>> segmentation fault. This is similar to JDK-8246203 [1]. >>>>> >>>>> I additionally removed Node::find_ctrl() and its special handling >>>>> in the algorithm since it is not used. >>>>> >>>>> There is actually another problem with the recursive version. When >>>>> running the testcase without >>>>> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever >>>>> inside [2] because there is a debug_orig node cycle and the loop >>>>> does not break based on the debug_orig nodes being visited. This is >>>>> also fixed in the patch. >>>>> >>>>> Thank you! >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> >>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>>> [2] >>>>> http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 >>>>> From rwestrel at redhat.com Wed Jul 15 09:59:53 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Wed, 15 Jul 2020 11:59:53 +0200 Subject: RFR(M): 8229495: SIGILL in C2 generated OSR compilation In-Reply-To: References: <3b720427-d718-5d1c-dbe9-6149a21883af@oracle.com> <87r1topriw.fsf@redhat.com> <84b2c86d-c7e6-7945-dae5-db1d8efe6f25@oracle.com> <87sge0oqv8.fsf@redhat.com> Message-ID: <878sflnlnq.fsf@redhat.com> Thanks for the reviews Christian & Vladimir. Roland. From christian.hagedorn at oracle.com Wed Jul 15 13:08:33 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 15:08:33 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> Message-ID: <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> Hi Vladimir On 14.07.20 21:07, Vladimir Kozlov wrote: > > Thinking about this type propagation problem, couldn't we somehow set > the type of the Opaque1 node hiding the pre-loop > > limit to the same type as the pre-loop limit to allow this > information to flow to the pre and main loop? Or would that > > cause other problems? I guess there probably must be a reason why we > don't do it like that. > > It has wide type to prevent premature optimizations before loop is fully > transformed. That is the reason we add it in first place. > > But it would be interesting to see if we can use more narrow type: > TypeInt::POS1 for example for positive limits (>0) (and opposite for > negative limits < 0). I may be missing some nuances and it may not work > but we should try. I had an additional discussion about this with Roland. He made a good point that not the Opaque1 nodes themselves are the problem but rather the type of the iv phi, or more specifically the PhiNode::Value() function. Before creating pre/main/post loops, the iv phi has already a narrow type 1..300 set by PhiNode::Value(). However, when creating the pre (and post loop), we actually widen the type of the iv phi of the pre-loop to int>=1 (based on the pre-loop limit which is an Opaque1 node with type int). Roland suggested that we should not do that but instead filter the returned type with the already existing type to not widen it. I think that makes sense. We are already doing that for the other cases in PhiNode::Value() [1][2]. It looks like we just miss it for the special handling of iv phis of trip-counted loops. This also fixes the assertion failure that occurred before with webrev.00. I created a new webrev based on webrev.00 with this change in PhiNode::Value(): http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ I'm currently running some testing with it again. Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097 [2] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157 > Regards, > Vladimir > > On 7/14/20 5:39 AM, Christian Hagedorn wrote: >> >>> [..] Since the DivINode has a control outside of the main loop [..] >> >> Edit: I actually meant that get_ctrl() returns a node outside of the >> main-loop (i.e. the DivINode is not part of the main-loop body). The >> DivINode still has NULL as control input. >> >> Best regards, >> Christian >> >> On 14.07.20 14:32, Christian Hagedorn wrote: >>> Hi Vladimir >>> >>> I had a closer look at the failing testcase with webrev.00. The >>> original DivNode has its zero check removed based on correct type >>> information. Afterwards its split through an induction variable phi >>> for which both inputs have non-zero types. So, the DivNode end up >>> after an AddINode (which adds a positive constant) which has a >>> non-zero type. All good so far. >>> >>> Now we add pre/main/post loops and the induction variable phi for the >>> pre-loop gets type int>=1 since the limit for the pre-loop is hidden >>> behind an Opaque1 node which just returns int as type. The AddINode >>> belonging to the loop induction variable phi in the pre-loop is >>> therefore updated to have the type int as well (int>=1 + positive_int >>> could overflow). This type information propagates to the main-loop >>> and its AddINode belonging to the loop induction variable phi (which >>> is an input to the DivNode) also gets its type set to int. >>> >>> Later, we add a vector post loop where we clone the main loop and add >>> a phi p for the the AddINode node and its new clone. Since the >>> DivINode has a control outside of the main loop, it is not cloned and >>> gets the phi p as an input. At a later point in time, we want to >>> split through p. But then we detect zero as possible value due to the >>> type range of both AddINodes being int. >>> >>> Even though the type information is not accurate enough, the DivINode >>> is never zero and we could safely apply the split through the phi. We >>> could think about doing a bail out for all kinds of phis but I think >>> it should only be an actual problem for loop induction variable phis. >>> >>> Thinking about this type propagation problem, couldn't we somehow set >>> the type of the Opaque1 node hiding the pre-loop limit to the same >>> type as the pre-loop limit to allow this information to flow to the >>> pre and main loop? Or would that cause other problems? I guess there >>> probably must be a reason why we don't do it like that. >>> >>> Best regards, >>> Christian >>> >>> On 13.07.20 19:16, Vladimir Kozlov wrote: >>>> This rise question: why zero check was removed if one of merged >>>> types has 0? >>>> Should we be more careful when we remove zero check? >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote: >>>>> A test in some later tier testing revealed that the assertion code >>>>> is actually too strong. There can be a Div/Mod node whose zero >>>>> check was removed but that is then spilt through a >>>>> non-induction-variable phi whose inputs have zero in their type >>>>> range (which is fine, this happens in some loop opts after partial >>>>> peeling was applied earlier). This happened, for example, for a phi >>>>> which merged two nodes from the original and a cloned loop. I think >>>>> we just need to remove the additional assertion code. >>>>> >>>>> New webrev: >>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> On 13.07.20 09:19, Christian Hagedorn wrote: >>>>>> Thank you Vladimir for your review! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>>>>> Looks good. >>>>>>> >>>>>>> Thanks, >>>>>>> Vladimir >>>>>>> >>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> Please review the following patch: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>>>>> >>>>>>>> In the failing testcase, C2 removes a zero check for a >>>>>>>> division/modulo node n based on the type information of the loop >>>>>>>> induction variable phi p (always between 1 and 50 and never 0). >>>>>>>> However, n is later split through p and ends up after the >>>>>>>> AddNode which updates the induction variable p. In the last >>>>>>>> iteration j equals 2 and is then updated to 0. The >>>>>>>> division/modulo node n is now executed before the loop limit >>>>>>>> check which results in a SIGFPE. >>>>>>>> >>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a >>>>>>>> division or modulo node has its zero check removed (i.e. control >>>>>>>> in NULL) and is split through a phi which has an input that >>>>>>>> could be zero. This should only happen for an induction variable >>>>>>>> phi of a trip-counted (integer) loop. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Christian From luhenry at microsoft.com Wed Jul 15 13:27:15 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 15 Jul 2020 13:27:15 +0000 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: , , Message-ID: Hi Andrew, A quick follow-up on that patch. Is there anything you would like to see done differently? Thank you, -- Lidovic ________________________________________ From: Ludovic Henry Sent: Friday, July 10, 2020 10:58 To: Andrew Haley; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64 Subject: Re: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor Hi Andrew, I uploaded a new webrev following your review. Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248676%2Fwebrev.01%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cf0344ec5e0284918c41308d824fae260%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637300007247824629&sdata=XUa2v10K2zvoqA7OD7BstRYrRc85ewchBD3YRkdbqDc%3D&reserved=0 Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions Thank you, ________________________________________ From: Andrew Haley Sent: Friday, July 10, 2020 01:10 To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64 Subject: Re: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor On 09/07/2020 21:31, Ludovic Henry wrote: > JBS: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8248676&data=02%7C01%7Cluhenry%40microsoft.com%7Cf0344ec5e0284918c41308d824fae260%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637300007247824629&sdata=3zArU%2F%2FmsCilK%2F8wIIsxEtp4bXd%2BEn0ZOkQVODcRyDA%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248676%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cf0344ec5e0284918c41308d824fae260%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637300007247824629&sdata=cyCZTT5%2BIRkbms%2BI5jpWhpH%2BsdQuAPXVppnJ0dNsDh8%3D&reserved=0 > Testing: jtreg:test/hotspot/jtreg:tier1, jtreg:test/jdk:tier1, jtreg:test/jdk:tier2, jtreg:test/langtools on Linux-AArch64, no regressions. > > This small fix is in the context of the larger support for Windows-AArch64. The attribute `__attribute__ ((constructor))` is not supported by MSVC, and the documented workaround is to allocate an empty static struct with a constructor. This patch only applies this workaround when compiling on Windows, and leaves other platforms unchanged. Please take out the #ifdef WINDOWS: we can use portable C++ here on all platforms. Thanks, -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7Cluhenry%40microsoft.com%7Cf0344ec5e0284918c41308d824fae260%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637300007247834623&sdata=8HQN5TP4Kbxqji6PEue7wk0Tirpc7qRMOyGTnp0jBm0%3D&reserved=0 EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From christian.hagedorn at oracle.com Wed Jul 15 15:04:58 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 17:04:58 +0200 Subject: [16] RFR(XS): 8248467: C2: compiler/intrinsics/object/TestClone fails with -XX:+VerifyGraphEdges Message-ID: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8248467 http://cr.openjdk.java.net/~chagedorn/8248467/webrev.00/ The assertion is hit due to a MemBarNode whose precedence edge was set to NULL at [1] (result_phi_rawoop is NULL and _resproj is the precedence edge to a MemBarStoreStore). This is possible since JDK-8237581 [2] which can remove some allocations. The fix just adds this additional case in the assert. Best regards, Christian [1] http://hg.openjdk.java.net/jdk/jdk/file/4a8fd81d64ba/src/hotspot/share/opto/macro.cpp#l1566 [2] https://bugs.openjdk.java.net/browse/JDK-8237581 From xxinliu at amazon.com Wed Jul 15 15:31:56 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 15 Jul 2020 15:31:56 +0000 Subject: question about PrintOptoStatistics atomicity Message-ID: <1594827116846.89704@amazon.com> Hi, I have a question about -XX:+PrintOptoStatistics in c2_globals.hpp. It dumps many internal counters in different C2 phases. I found those counters are all static fields. eg. http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/chaitin.cpp#l2297 http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/phaseX.hpp#l599 I notice that all setters of those fields are not atomic. IMHO, hotspot may has more than one c2-compiler-threads running at the same time. How does hotspot guarantee those fields are thread-safe? or the flag intends to do statistics in single-thread mode by design? If those counters are not atomic, shall we connect this flag to CICompilerCount? I think we can constrain the number of c2-compiler-thread to 1 if user set PrintOptoStatistics. Does it make sense? thanks, --lx From jamsheed.c.m at oracle.com Wed Jul 15 15:55:44 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Wed, 15 Jul 2020 21:25:44 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" Message-ID: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> Hi, Async handling at method entry requires it to be aware of synchronization(like whether it is doing async handling before lock acquire or after) This is required as exception handler rely on this info for unlocking.? Async handling code never had this special condition handled and it worked most of the time as we were using biased locking which got disabled by [1] There was one other issue reported in similar time[2]. This issue got triggered in test case by [3], back to back extra safepoint after suspend and TLH for ThreadDeath. So in this setup both PopFrame request and Thread.Stop request happened together for the test scenario and it reached java method entry with pending_exception set. I have done a partial fix for the issue, mainly to handle production mode crash failures(do not unlock flag related ones) Fix detail: 1) I save restore the "do not unlock" flag in async handling. 2) Return for floating pending exception for some cases(PopFrame, Early return related). This is debug(JVMTI) feature and floating exception can get cleaned just like that in present compiler request and deopt code. webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ There are more problems in these code areas, like we clear all exceptions in compilation request path(interpreter,c1), as well as deoptimization path. All these un-handled cases will be separately handled by https://bugs.openjdk.java.net/browse/JDK-8249451 Request for review. Best regards, Jamsheed [1]https://bugs.openjdk.java.net/browse/JDK-8231264 [2] https://bugs.openjdk.java.net/browse/JDK-8246727 [3] https://bugs.openjdk.java.net/browse/JDK-8221207 From vladimir.kozlov at oracle.com Wed Jul 15 17:26:17 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2020 10:26:17 -0700 Subject: [16] RFR(XS): 8248467: C2: compiler/intrinsics/object/TestClone fails with -XX:+VerifyGraphEdges In-Reply-To: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> References: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> Message-ID: <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> Good. Thanks, Vladimir On 7/15/20 8:04 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8248467 > http://cr.openjdk.java.net/~chagedorn/8248467/webrev.00/ > > The assertion is hit due to a MemBarNode whose precedence edge was set to NULL at [1] (result_phi_rawoop is NULL and > _resproj is the precedence edge to a MemBarStoreStore). This is possible since JDK-8237581 [2] which can remove some > allocations. The fix just adds this additional case in the assert. > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/4a8fd81d64ba/src/hotspot/share/opto/macro.cpp#l1566 > [2] https://bugs.openjdk.java.net/browse/JDK-8237581 From vladimir.kozlov at oracle.com Wed Jul 15 17:37:50 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2020 10:37:50 -0700 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> Message-ID: Looks good. Thanks, Vladimir K On 7/15/20 12:58 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 14.07.20 20:46, Vladimir Kozlov wrote: >> Can you move next up to where other small find*() methods are defined?: >> >> +Node* Node::find_ctrl(int idx) { >> +? return find(idx, true); >> ??} >> >> Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It is hard to find where this not product code ends. >> >> Looks good otherwise. > > Thanks, I added these changes in a new webrev: > http://cr.openjdk.java.net/~chagedorn/8247743/webrev.02/ > > Best regards, > Christian > > >> Thanks, >> Vladimir >> >> On 7/14/20 2:54 AM, Christian Hagedorn wrote: >>> Hi Vladimir >>> >>> On 13.07.20 19:43, Vladimir Kozlov wrote: >>>> Node::find_ctrl() is used during debugging when you want to print and look on only control nodes. >>>> We have several such methods which are only used in debugger. >>> >>> I see, I restored this method and changed Node::find() accordingly. I additionally added two find_ctrl() methods to >>> make it easier to call it from a debugger (as already present for find_node()). >>> >>>> I suggest to store old_arena() in local var and pass into add_to_worklist(). >>>> >>>> You can make add_to_worklist() static since you pass node as argument. >>> >>> Okay. I updated this and the change above in a new webrev: >>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ >>> >>> Best regards, >>> Christian >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>>>> Ping - could anyone review it, please? Thanks! >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>>>> Hi >>>>>> >>>>>> Please review the following patch: >>>>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>>>> >>>>>> The testcase creates a deep graph with a lot of nodes on a chain. When running with the specified test flags, it >>>>>> recursively calls Node::find_recur() for each node discovered which eventually results in a segmentation fault due >>>>>> to a stack overflow (around 10000 calls due to such a long chain of nodes). The fix just converts the recursive >>>>>> algorithm into an iterative one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. >>>>>> >>>>>> I additionally removed Node::find_ctrl() and its special handling in the algorithm since it is not used. >>>>>> >>>>>> There is actually another problem with the recursive version. When running the testcase without >>>>>> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because there is a debug_orig node cycle >>>>>> and the loop does not break based on the debug_orig nodes being visited. This is also fixed in the patch. >>>>>> >>>>>> Thank you! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> >>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From christian.hagedorn at oracle.com Wed Jul 15 17:42:36 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 19:42:36 +0200 Subject: [16] RFR(XS): 8248467: C2: compiler/intrinsics/object/TestClone fails with -XX:+VerifyGraphEdges In-Reply-To: <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> References: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 15.07.20 19:26, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 7/15/20 8:04 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248467 >> http://cr.openjdk.java.net/~chagedorn/8248467/webrev.00/ >> >> The assertion is hit due to a MemBarNode whose precedence edge was set >> to NULL at [1] (result_phi_rawoop is NULL and _resproj is the >> precedence edge to a MemBarStoreStore). This is possible since >> JDK-8237581 [2] which can remove some allocations. The fix just adds >> this additional case in the assert. >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/4a8fd81d64ba/src/hotspot/share/opto/macro.cpp#l1566 >> >> [2] https://bugs.openjdk.java.net/browse/JDK-8237581 From vladimir.kozlov at oracle.com Wed Jul 15 17:43:03 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2020 10:43:03 -0700 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> Message-ID: <150a1de1-86bb-22a4-6b9c-b868cb686cea@oracle.com> Looks good. Thanks, Vladimir On 7/15/20 6:08 AM, Christian Hagedorn wrote: > Hi Vladimir > > On 14.07.20 21:07, Vladimir Kozlov wrote: >> ?> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the pre-loop >> ?> limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or would that >> ?> cause other problems? I guess there probably must be a reason why we don't do it like that. >> >> It has wide type to prevent premature optimizations before loop is fully transformed. That is the reason we add it in >> first place. >> >> But it would be interesting to see if we can use more narrow type: TypeInt::POS1 for example for positive limits (>0) >> (and opposite for negative limits < 0). I may be missing some nuances and it may not work but we should try. > > I had an additional discussion about this with Roland. He made a good point that not the Opaque1 nodes themselves are > the problem but rather the type of the iv phi, or more specifically the PhiNode::Value() function. > > Before creating pre/main/post loops, the iv phi has already a narrow type 1..300 set by PhiNode::Value(). However, when > creating the pre (and post loop), we actually widen the type of the iv phi of the pre-loop to int>=1 (based on the > pre-loop limit which is an Opaque1 node with type int). Roland suggested that we should not do that but instead filter > the returned type with the already existing type to not widen it. I think that makes sense. We are already doing that > for the other cases in PhiNode::Value() [1][2]. It looks like we just miss it for the special handling of iv phis of > trip-counted loops. This also fixes the assertion failure that occurred before with webrev.00. > > I created a new webrev based on webrev.00 with this change in PhiNode::Value(): > http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ > > I'm currently running some testing with it again. > > Best regards, > Christian > > > [1] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097 > [2] http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157 > >> Regards, >> Vladimir >> >> On 7/14/20 5:39 AM, Christian Hagedorn wrote: >>> >>>> [..] Since the DivINode has a control outside of the main loop [..] >>> >>> Edit: I actually meant that get_ctrl() returns a node outside of the main-loop (i.e. the DivINode is not part of the >>> main-loop body). The DivINode still has NULL as control input. >>> >>> Best regards, >>> Christian >>> >>> On 14.07.20 14:32, Christian Hagedorn wrote: >>>> Hi Vladimir >>>> >>>> I had a closer look at the failing testcase with webrev.00. The original DivNode has its zero check removed based on >>>> correct type information. Afterwards its split through an induction variable phi for which both inputs have non-zero >>>> types. So, the DivNode end up after an AddINode (which adds a positive constant) which has a non-zero type. All good >>>> so far. >>>> >>>> Now we add pre/main/post loops and the induction variable phi for the pre-loop gets type int>=1 since the limit for >>>> the pre-loop is hidden behind an Opaque1 node which just returns int as type. The AddINode belonging to the loop >>>> induction variable phi in the pre-loop is therefore updated to have the type int as well (int>=1 + positive_int >>>> could overflow). This type information propagates to the main-loop and its AddINode belonging to the loop induction >>>> variable phi (which is an input to the DivNode) also gets its type set to int. >>>> >>>> Later, we add a vector post loop where we clone the main loop and add a phi p for the the AddINode node and its new >>>> clone. Since the DivINode has a control outside of the main loop, it is not cloned and gets the phi p as an input. >>>> At a later point in time, we want to split through p. But then we detect zero as possible value due to the type >>>> range of both AddINodes being int. >>>> >>>> Even though the type information is not accurate enough, the DivINode is never zero and we could safely apply the >>>> split through the phi. We could think about doing a bail out for all kinds of phis but I think it should only be an >>>> actual problem for loop induction variable phis. >>>> >>>> Thinking about this type propagation problem, couldn't we somehow set the type of the Opaque1 node hiding the >>>> pre-loop limit to the same type as the pre-loop limit to allow this information to flow to the pre and main loop? Or >>>> would that cause other problems? I guess there probably must be a reason why we don't do it like that. >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 13.07.20 19:16, Vladimir Kozlov wrote: >>>>> This rise question: why zero check was removed if one of merged types has 0? >>>>> Should we be more careful when we remove zero check? >>>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote: >>>>>> A test in some later tier testing revealed that the assertion code is actually too strong. There can be a Div/Mod >>>>>> node whose zero check was removed but that is then spilt through a non-induction-variable phi whose inputs have >>>>>> zero in their type range (which is fine, this happens in some loop opts after partial peeling was applied >>>>>> earlier). This happened, for example, for a phi which merged two nodes from the original and a cloned loop. I >>>>>> think we just need to remove the additional assertion code. >>>>>> >>>>>> New webrev: >>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> On 13.07.20 09:19, Christian Hagedorn wrote: >>>>>>> Thank you Vladimir for your review! >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>>>>>> Looks good. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Vladimir >>>>>>>> >>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>>>>>> Hi >>>>>>>>> >>>>>>>>> Please review the following patch: >>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>>>>>> >>>>>>>>> In the failing testcase, C2 removes a zero check for a division/modulo node n based on the type information of >>>>>>>>> the loop induction variable phi p (always between 1 and 50 and never 0). However, n is later split through p >>>>>>>>> and ends up after the AddNode which updates the induction variable p. In the last iteration j equals 2 and is >>>>>>>>> then updated to 0. The division/modulo node n is now executed before the loop limit check which results in a >>>>>>>>> SIGFPE. >>>>>>>>> >>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a division or modulo node has its zero check removed >>>>>>>>> (i.e. control in NULL) and is split through a phi which has an input that could be zero. This should only >>>>>>>>> happen for an induction variable phi of a trip-counted (integer) loop. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> Christian From christian.hagedorn at oracle.com Wed Jul 15 17:43:48 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 19:43:48 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 15.07.20 19:37, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 7/15/20 12:58 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 14.07.20 20:46, Vladimir Kozlov wrote: >>> Can you move next up to where other small find*() methods are defined?: >>> >>> +Node* Node::find_ctrl(int idx) { >>> +? return find(idx, true); >>> ??} >>> >>> Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It >>> is hard to find where this not product code ends. >>> >>> Looks good otherwise. >> >> Thanks, I added these changes in a new webrev: >> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.02/ >> >> Best regards, >> Christian >> >> >>> Thanks, >>> Vladimir >>> >>> On 7/14/20 2:54 AM, Christian Hagedorn wrote: >>>> Hi Vladimir >>>> >>>> On 13.07.20 19:43, Vladimir Kozlov wrote: >>>>> Node::find_ctrl() is used during debugging when you want to print >>>>> and look on only control nodes. >>>>> We have several such methods which are only used in debugger. >>>> >>>> I see, I restored this method and changed Node::find() accordingly. >>>> I additionally added two find_ctrl() methods to make it easier to >>>> call it from a debugger (as already present for find_node()). >>>> >>>>> I suggest to store old_arena() in local var and pass into >>>>> add_to_worklist(). >>>>> >>>>> You can make add_to_worklist() static since you pass node as argument. >>>> >>>> Okay. I updated this and the change above in a new webrev: >>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ >>>> >>>> Best regards, >>>> Christian >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>>>>> Ping - could anyone review it, please? Thanks! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>>>>> Hi >>>>>>> >>>>>>> Please review the following patch: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>>>>> >>>>>>> The testcase creates a deep graph with a lot of nodes on a chain. >>>>>>> When running with the specified test flags, it recursively calls >>>>>>> Node::find_recur() for each node discovered which eventually >>>>>>> results in a segmentation fault due to a stack overflow (around >>>>>>> 10000 calls due to such a long chain of nodes). The fix just >>>>>>> converts the recursive algorithm into an iterative one to avoid a >>>>>>> segmentation fault. This is similar to JDK-8246203 [1]. >>>>>>> >>>>>>> I additionally removed Node::find_ctrl() and its special handling >>>>>>> in the algorithm since it is not used. >>>>>>> >>>>>>> There is actually another problem with the recursive version. >>>>>>> When running the testcase without >>>>>>> -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever >>>>>>> inside [2] because there is a debug_orig node cycle and the loop >>>>>>> does not break based on the debug_orig nodes being visited. This >>>>>>> is also fixed in the patch. >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>>>>> [2] >>>>>>> http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 >>>>>>> From christian.hagedorn at oracle.com Wed Jul 15 17:44:27 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 15 Jul 2020 19:44:27 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <150a1de1-86bb-22a4-6b9c-b868cb686cea@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> <150a1de1-86bb-22a4-6b9c-b868cb686cea@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 15.07.20 19:43, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/15/20 6:08 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 14.07.20 21:07, Vladimir Kozlov wrote: >>> ?> Thinking about this type propagation problem, couldn't we somehow >>> set the type of the Opaque1 node hiding the pre-loop >>> ?> limit to the same type as the pre-loop limit to allow this >>> information to flow to the pre and main loop? Or would that >>> ?> cause other problems? I guess there probably must be a reason why >>> we don't do it like that. >>> >>> It has wide type to prevent premature optimizations before loop is >>> fully transformed. That is the reason we add it in first place. >>> >>> But it would be interesting to see if we can use more narrow type: >>> TypeInt::POS1 for example for positive limits (>0) (and opposite for >>> negative limits < 0). I may be missing some nuances and it may not >>> work but we should try. >> >> I had an additional discussion about this with Roland. He made a good >> point that not the Opaque1 nodes themselves are the problem but rather >> the type of the iv phi, or more specifically the PhiNode::Value() >> function. >> >> Before creating pre/main/post loops, the iv phi has already a narrow >> type 1..300 set by PhiNode::Value(). However, when creating the pre >> (and post loop), we actually widen the type of the iv phi of the >> pre-loop to int>=1 (based on the pre-loop limit which is an Opaque1 >> node with type int). Roland suggested that we should not do that but >> instead filter the returned type with the already existing type to not >> widen it. I think that makes sense. We are already doing that for the >> other cases in PhiNode::Value() [1][2]. It looks like we just miss it >> for the special handling of iv phis of trip-counted loops. This also >> fixes the assertion failure that occurred before with webrev.00. >> >> I created a new webrev based on webrev.00 with this change in >> PhiNode::Value(): >> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ >> >> I'm currently running some testing with it again. >> >> Best regards, >> Christian >> >> >> [1] >> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1097 >> >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/9ea3344c6445/src/hotspot/share/opto/cfgnode.cpp#l1157 >> >> >>> Regards, >>> Vladimir >>> >>> On 7/14/20 5:39 AM, Christian Hagedorn wrote: >>>> >>>>> [..] Since the DivINode has a control outside of the main loop [..] >>>> >>>> Edit: I actually meant that get_ctrl() returns a node outside of the >>>> main-loop (i.e. the DivINode is not part of the main-loop body). The >>>> DivINode still has NULL as control input. >>>> >>>> Best regards, >>>> Christian >>>> >>>> On 14.07.20 14:32, Christian Hagedorn wrote: >>>>> Hi Vladimir >>>>> >>>>> I had a closer look at the failing testcase with webrev.00. The >>>>> original DivNode has its zero check removed based on correct type >>>>> information. Afterwards its split through an induction variable phi >>>>> for which both inputs have non-zero types. So, the DivNode end up >>>>> after an AddINode (which adds a positive constant) which has a >>>>> non-zero type. All good so far. >>>>> >>>>> Now we add pre/main/post loops and the induction variable phi for >>>>> the pre-loop gets type int>=1 since the limit for the pre-loop is >>>>> hidden behind an Opaque1 node which just returns int as type. The >>>>> AddINode belonging to the loop induction variable phi in the >>>>> pre-loop is therefore updated to have the type int as well (int>=1 >>>>> + positive_int could overflow). This type information propagates to >>>>> the main-loop and its AddINode belonging to the loop induction >>>>> variable phi (which is an input to the DivNode) also gets its type >>>>> set to int. >>>>> >>>>> Later, we add a vector post loop where we clone the main loop and >>>>> add a phi p for the the AddINode node and its new clone. Since the >>>>> DivINode has a control outside of the main loop, it is not cloned >>>>> and gets the phi p as an input. At a later point in time, we want >>>>> to split through p. But then we detect zero as possible value due >>>>> to the type range of both AddINodes being int. >>>>> >>>>> Even though the type information is not accurate enough, the >>>>> DivINode is never zero and we could safely apply the split through >>>>> the phi. We could think about doing a bail out for all kinds of >>>>> phis but I think it should only be an actual problem for loop >>>>> induction variable phis. >>>>> >>>>> Thinking about this type propagation problem, couldn't we somehow >>>>> set the type of the Opaque1 node hiding the pre-loop limit to the >>>>> same type as the pre-loop limit to allow this information to flow >>>>> to the pre and main loop? Or would that cause other problems? I >>>>> guess there probably must be a reason why we don't do it like that. >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>> On 13.07.20 19:16, Vladimir Kozlov wrote: >>>>>> This rise question: why zero check was removed if one of merged >>>>>> types has 0? >>>>>> Should we be more careful when we remove zero check? >>>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 7/13/20 2:06 AM, Christian Hagedorn wrote: >>>>>>> A test in some later tier testing revealed that the assertion >>>>>>> code is actually too strong. There can be a Div/Mod node whose >>>>>>> zero check was removed but that is then spilt through a >>>>>>> non-induction-variable phi whose inputs have zero in their type >>>>>>> range (which is fine, this happens in some loop opts after >>>>>>> partial peeling was applied earlier). This happened, for example, >>>>>>> for a phi which merged two nodes from the original and a cloned >>>>>>> loop. I think we just need to remove the additional assertion code. >>>>>>> >>>>>>> New webrev: >>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> On 13.07.20 09:19, Christian Hagedorn wrote: >>>>>>>> Thank you Vladimir for your review! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Christian >>>>>>>> >>>>>>>> On 11.07.20 01:25, Vladimir Kozlov wrote: >>>>>>>>> Looks good. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Vladimir >>>>>>>>> >>>>>>>>> On 7/10/20 12:37 AM, Christian Hagedorn wrote: >>>>>>>>>> Hi >>>>>>>>>> >>>>>>>>>> Please review the following patch: >>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8248552 >>>>>>>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.00/ >>>>>>>>>> >>>>>>>>>> In the failing testcase, C2 removes a zero check for a >>>>>>>>>> division/modulo node n based on the type information of the >>>>>>>>>> loop induction variable phi p (always between 1 and 50 and >>>>>>>>>> never 0). However, n is later split through p and ends up >>>>>>>>>> after the AddNode which updates the induction variable p. In >>>>>>>>>> the last iteration j equals 2 and is then updated to 0. The >>>>>>>>>> division/modulo node n is now executed before the loop limit >>>>>>>>>> check which results in a SIGFPE. >>>>>>>>>> >>>>>>>>>> The fix bails out of PhaseIdealLoop::split_thru_phi if a >>>>>>>>>> division or modulo node has its zero check removed (i.e. >>>>>>>>>> control in NULL) and is split through a phi which has an input >>>>>>>>>> that could be zero. This should only happen for an induction >>>>>>>>>> variable phi of a trip-counted (integer) loop. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> Christian From vladimir.kozlov at oracle.com Wed Jul 15 17:50:56 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2020 10:50:56 -0700 Subject: question about PrintOptoStatistics atomicity In-Reply-To: <1594827116846.89704@amazon.com> References: <1594827116846.89704@amazon.com> Message-ID: <6b3d2637-e01c-8ab9-e32c-2404c7b2a40a@oracle.com> It was done intentionally because when that code was implemented atomic operations were expensive. We never intended these counters to be precise - they were used mostly for debugging purpose. It is up to user how he want to use them - for example using only one C2 thread. When you collect data for general application you want to execute it with the same parameters as in production. I don't think we should enforce any restrictions in VM when PrintOptoStatistics is used. Regards, Vladimir K On 7/15/20 8:31 AM, Liu, Xin wrote: > Hi, > > > I have a question about -XX:+PrintOptoStatistics in c2_globals.hpp. > > It dumps many internal counters in different C2 phases. I found those counters are all static fields. > > eg. > > http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/chaitin.cpp#l2297 > > http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/phaseX.hpp#l599 > > > I notice that all setters of those fields are not atomic. IMHO, hotspot may has more than one c2-compiler-threads running at the same time. > > How does hotspot guarantee those fields are thread-safe? or the flag intends to do statistics in single-thread mode by design? > > > If those counters are not atomic, shall we connect this flag to CICompilerCount? > > I think we can constrain the number of c2-compiler-thread to 1 if user set PrintOptoStatistics. Does it make sense? > > > thanks, > > --lx > From ekaterina.pavlova at oracle.com Wed Jul 15 17:54:32 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 15 Jul 2020 10:54:32 -0700 Subject: RFR [15] (T/XS): 8242388 compiler/graalunit/CoreTest.java timed out In-Reply-To: <64ccd3d7-dc82-e243-a63e-db49d61503ef@oracle.com> References: <39811448-6cf5-c329-de66-27233854cb62@oracle.com> <64ccd3d7-dc82-e243-a63e-db49d61503ef@oracle.com> Message-ID: Thanks Vladimir, pushed. -katya On 7/14/20 6:24 PM, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir K > > On 7/14/20 5:25 PM, Ekaterina Pavlova wrote: >> Hi all, >> >> compiler/graalunit/CoreTest.java fails by timeout from time to time. >> The most time expensive subtest is org.graalvm.compiler.core.test.CountedLoopTest. >> The fix spits the test into two tests to reduce total execution time. >> Please review. >> >> ???? JBS: https://bugs.openjdk.java.net/browse/JDK-8242388 >> ??webrev: http://cr.openjdk.java.net/~epavlova//8242388/webrev.00/index.html >> testing: graalunit tests as part of tier3 >> >> >> Thanks, >> -katya >> >> >> From jamsheed.c.m at oracle.com Wed Jul 15 17:54:56 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Wed, 15 Jul 2020 23:24:56 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> Message-ID: Hi Vladimir, with unrolling i understand that many cases will just have phis everywhere to outside the loop as the uses are outside the loop. and this is not restricted to escaping objects alone as i depicted. it can be escaping as well as non-escaping. so marking store to them as global escape doesn't seems to be nice idea. i will rework on this fix and get back again. Thank you Best regards Jamsheed On 15/07/2020 08:38, Jamsheed C M wrote: > (unfinished mail got sent, so completing it) > On 15/07/2020 08:21, Jamsheed C M wrote: >> Hi Vladimir, >> >> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>> I looked more on this. EA already does not secularize allocations >>> when Phi nodes merged them - it should handle this case. I did small >>> experiment and relaxed assert for this new (10. needs comment >>> update) case for AddP's base and test passed: >>> >>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>> @@ -2357,6 +2357,7 @@ >>> ?????? int opcode = uncast_base->Opcode(); >>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>> ????????????? opcode == Op_CastX2P || >>> uncast_base->is_DecodeNarrowPtr() || >>> +???????????? (uncast_base->is_Phi() && >>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>> ????????????? (uncast_base->is_Mem() && >>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>> ????????????? (uncast_base->is_Proj() && >>> uncast_base->in(0)->is_Allocate()), "sanity"); >>> ???? } >>> >>> Did you hit a case when this may not work? >> >> Yes, right it already doesn't mark it as scalarizable if base count >> is more than one(I think it missed a is_oop check there)[1]. >> >> EA CG adds edges only for oop field making stores to them undetected. >> This makes these stored objects to NoEscape and if compiled method >> continues execution with this NoEscape object can have undesired >> results(i.e synchronization removed). >> >> Probable case would be(didn't verify) >> >> try { >> >> LOOP BEGIN >> >> ? try {throw new Obj()} catch {} >> >> LOOP END >> >> } catch (Obj e) { >> >> } > > Best Regards, > > Jamsheed > > [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 > > > >>> >>> >>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation >>> (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): >>> >>> ======== Connection graph for? Test::test >>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95??? Allocate ===? >>> 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1? 78 ) [[ 96 97 98 105? >>> 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, >>> bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>> Test::test1 @ bci:0 Test::test @ bci:8 >>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108? 158 ]] #5 >>> !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>> >>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93? 92? 21 1 78? >>> 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]]? rawptr:NotNull ( int:>=0, >>> java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 >>> Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>> ++++ Eliminated: 95 Allocate >>> >>> >>> t\Thanks, >>> Vladimir K >>> >>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>> Hi all, >>>> >>>> I had incorrectly added extra check in assert after offset >>>> computation in address_offset . For addps with non constant offsets >>>> (like [1]) >>>> >>>> Not changing the old assert even though I am not expecting first >>>> addp/second addp(for array addressing) case for init captured store. >>>> >>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>> >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> [1] >>>> >>>> assert(offs != Type::OffsetBot || >>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >>>> is_captured_store(adr), >>>> ???????????? "offset must be a constant or it is initialization of >>>> array"); >>>> >>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>> >>>>> Hi, >>>>> >>>>> I reworked the fix. I compute offset for all init captures stores, >>>>> but treats this special init captured stores similar to unsafe(as >>>>> these objects are usually GlobalEscape and doesn't have any perf >>>>> implications). >>>>> >>>>> revised webrev: >>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>> >>>>> testing: mach1-5( logs in jbs) >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> request to hold the review. need to change the code for dealing >>>>>> with unsafe access. as current capture code go for more execution >>>>>> time analyzing things. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>> >>>>>>> Request for review changes made to offset computation and field >>>>>>> write detection for init captured stores due to phis addition >>>>>>> between alloc and init. This happen if init node in different >>>>>>> outer loop wrt to alloc node and there is a loop opt.? This was >>>>>>> required as a result of enhancement [1]. >>>>>>> >>>>>>> Normally init are not associated with multiple alloc node during >>>>>>> EA phase, but changes done for [1] caused the code shapes of the >>>>>>> form [2]? to generate inits associated with multiple alloc node. >>>>>>> >>>>>>> This had implication in offset computation and field write >>>>>>> detection related to initializing stores. >>>>>>> >>>>>>> Attempt to fix in EA: >>>>>>> >>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>> >>>>>>> Alternate fix: >>>>>>> >>>>>>> ???? Minimize the scenario in compiler generated code by >>>>>>> throwing only j.l.Error from slowpath(all exception async/sync >>>>>>> are handled in runtime exit). >>>>>>> >>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable >>>>>>> full loop opt before EA for detectable patterns and bailout EA >>>>>>> for late detected patterns. >>>>>>> >>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>> >>>>>>> Please advice. >>>>>>> >>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Jamsheed >>>>>>> >>>>>>> >>>>>>> [1] JDK-8231291 >>>>>>> C2: loop opts >>>>>>> before EA should maximally unroll loops >>>>>>> >>>>>>> [2] that have its init node in different outer loop wrt to alloc >>>>>>> node. >>>>>>> >>>>>>> >>>>>>> loop begin >>>>>>> >>>>>>> ?? try{ >>>>>>> >>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>>>> allocation, in a loop >>>>>>> >>>>>>> ?? } catch(ex) { >>>>>>> >>>>>>> ?? } >>>>>>> >>>>>>> loop end >>>>>>> >>>>>>> ? 42???? public static IntA test(int n) { >>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>> ?? 44???????????? try { >>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>> ?? 47???????????? } >>>>>>> ?? 48???????? } >>>>>>> ?? 49 >>>>>>> From vladimir.kozlov at oracle.com Wed Jul 15 18:59:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 15 Jul 2020 11:59:40 -0700 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> Message-ID: <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> As I said before I agree with your additional checks for StoreN and StoreNKlass. But I have concerns about new is_init_captured_store code. EA is mostly looking only on inputs to see Allocation. And in several places it expecting only to see Allocation because other cases should be filtered out before. Thanks, Vladimir On 7/15/20 10:54 AM, Jamsheed C M wrote: > Hi Vladimir, > > with unrolling i understand that many cases will just have phis everywhere to outside the loop as the uses are outside > the loop. > > and this is not restricted to escaping objects alone as i depicted. it can be escaping as well as non-escaping. > > so marking store to them as global escape doesn't seems to be nice idea. i will rework on this fix and get back again. > > Thank you > > Best regards > > Jamsheed > > On 15/07/2020 08:38, Jamsheed C M wrote: >> (unfinished mail got sent, so completing it) >> On 15/07/2020 08:21, Jamsheed C M wrote: >>> Hi Vladimir, >>> >>> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>>> I looked more on this. EA already does not secularize allocations when Phi nodes merged them - it should handle this >>>> case. I did small experiment and relaxed assert for this new (10. needs comment update) case for AddP's base and >>>> test passed: >>>> >>>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>>> @@ -2357,6 +2357,7 @@ >>>> ?????? int opcode = uncast_base->Opcode(); >>>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>>> ????????????? opcode == Op_CastX2P || uncast_base->is_DecodeNarrowPtr() || >>>> +???????????? (uncast_base->is_Phi() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>> ????????????? (uncast_base->is_Mem() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>> ????????????? (uncast_base->is_Proj() && uncast_base->in(0)->is_Allocate()), "sanity"); >>>> ???? } >>>> >>>> Did you hit a case when this may not work? >>> >>> Yes, right it already doesn't mark it as scalarizable if base count is more than one(I think it missed a is_oop check >>> there)[1]. >>> >>> EA CG adds edges only for oop field making stores to them undetected. This makes these stored objects to NoEscape and >>> if compiled method continues execution with this NoEscape object can have undesired results(i.e synchronization >>> removed). >>> >>> Probable case would be(didn't verify) >>> >>> try { >>> >>> LOOP BEGIN >>> >>> ? try {throw new Obj()} catch {} >>> >>> LOOP END >>> >>> } catch (Obj e) { >>> >>> } >> >> Best Regards, >> >> Jamsheed >> >> [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >> >> >>>> >>>> >>>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation (-XX:+PrintEscapeAnalysis >>>> -XX:+PrintEliminateAllocations): >>>> >>>> ======== Connection graph for? Test::test >>>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95??? Allocate === 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1? 78 ) >>>> [[ 96 97 98 105 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 >>>> Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108? 158 ]] #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>> >>>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93? 92? 21 1 78 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]] >>>> rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>>> Test::test1 @ bci:0 Test::test @ bci:8 >>>> ++++ Eliminated: 95 Allocate >>>> >>>> >>>> t\Thanks, >>>> Vladimir K >>>> >>>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>>> Hi all, >>>>> >>>>> I had incorrectly added extra check in assert after offset computation in address_offset . For addps with non >>>>> constant offsets (like [1]) >>>>> >>>>> Not changing the old assert even though I am not expecting first addp/second addp(for array addressing) case for >>>>> init captured store. >>>>> >>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> [1] >>>>> >>>>> assert(offs != Type::OffsetBot || >>>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || is_captured_store(adr), >>>>> ???????????? "offset must be a constant or it is initialization of array"); >>>>> >>>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> I reworked the fix. I compute offset for all init captures stores, but treats this special init captured stores >>>>>> similar to unsafe(as these objects are usually GlobalEscape and doesn't have any perf implications). >>>>>> >>>>>> revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>>> >>>>>> testing: mach1-5( logs in jbs) >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> request to hold the review. need to change the code for dealing with unsafe access. as current capture code go >>>>>>> for more execution time analyzing things. >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Jamsheed >>>>>>> >>>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>>> >>>>>>>> Request for review changes made to offset computation and field write detection for init captured stores due to >>>>>>>> phis addition between alloc and init. This happen if init node in different outer loop wrt to alloc node and >>>>>>>> there is a loop opt.? This was required as a result of enhancement [1]. >>>>>>>> >>>>>>>> Normally init are not associated with multiple alloc node during EA phase, but changes done for [1] caused the >>>>>>>> code shapes of the form [2]? to generate inits associated with multiple alloc node. >>>>>>>> >>>>>>>> This had implication in offset computation and field write detection related to initializing stores. >>>>>>>> >>>>>>>> Attempt to fix in EA: >>>>>>>> >>>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>>> >>>>>>>> Alternate fix: >>>>>>>> >>>>>>>> ???? Minimize the scenario in compiler generated code by throwing only j.l.Error from slowpath(all exception >>>>>>>> async/sync are handled in runtime exit). >>>>>>>> >>>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable full loop opt before EA for detectable patterns >>>>>>>> and bailout EA for late detected patterns. >>>>>>>> >>>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>>> >>>>>>>> Please advice. >>>>>>>> >>>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> >>>>>>>> [1] JDK-8231291 C2: loop opts before EA should maximally >>>>>>>> unroll loops >>>>>>>> >>>>>>>> [2] that have its init node in different outer loop wrt to alloc node. >>>>>>>> >>>>>>>> >>>>>>>> loop begin >>>>>>>> >>>>>>>> ?? try{ >>>>>>>> >>>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after allocation, in a loop >>>>>>>> >>>>>>>> ?? } catch(ex) { >>>>>>>> >>>>>>>> ?? } >>>>>>>> >>>>>>>> loop end >>>>>>>> >>>>>>>> ? 42???? public static IntA test(int n) { >>>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>>> ?? 44???????????? try { >>>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>>> ?? 47???????????? } >>>>>>>> ?? 48???????? } >>>>>>>> ?? 49 >>>>>>>> From jamsheed.c.m at oracle.com Wed Jul 15 22:16:11 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 03:46:11 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> Message-ID: <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> (Thank you Dean, adding serviceability team as this issue involves JVMTI features PopFrame, EarlyReturn features) JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 (testing: mach5, tier1-5 links in JBS) Best regards, Jamsheed On 15/07/2020 21:25, Jamsheed C M wrote: > > Hi, > > Async handling at method entry requires it to be aware of > synchronization(like whether it is doing async handling before lock > acquire or after) > > This is required as exception handler rely on this info for > unlocking.? Async handling code never had this special condition > handled and it worked most of the time as we were using biased locking > which got disabled by [1] > > There was one other issue reported in similar time[2]. This issue got > triggered in test case by [3], back to back extra safepoint after > suspend and TLH for ThreadDeath. So in this setup both PopFrame > request and Thread.Stop request happened together for the test > scenario and it reached java method entry with pending_exception set. > > I have done a partial fix for the issue, mainly to handle production > mode crash failures(do not unlock flag related ones) > > Fix detail: > > 1) I save restore the "do not unlock" flag in async handling. > > 2) Return for floating pending exception for some cases(PopFrame, > Early return related). This is debug(JVMTI) feature and floating > exception can get cleaned just like that in present compiler request > and deopt code. > > webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ > > There are more problems in these code areas, like we clear all > exceptions in compilation request path(interpreter,c1), as well as > deoptimization path. > > All these un-handled cases will be separately handled by > https://bugs.openjdk.java.net/browse/JDK-8249451 > > Request for review. > > Best regards, > > Jamsheed > > [1]https://bugs.openjdk.java.net/browse/JDK-8231264 > > > [2] https://bugs.openjdk.java.net/browse/JDK-8246727 > > [3] https://bugs.openjdk.java.net/browse/JDK-8221207 > From david.holmes at oracle.com Wed Jul 15 23:50:35 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jul 2020 09:50:35 +1000 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> Message-ID: <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> Hi Jamsheed, On 16/07/2020 8:16 am, Jamsheed C M wrote: > (Thank you Dean, adding serviceability team as this issue involves JVMTI > features PopFrame, EarlyReturn features) It is not at all obvious how your proposed fix impacts the JVM TI features. > JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 > > (testing: mach5, tier1-5 links in JBS) > > Best regards, > > Jamsheed > > On 15/07/2020 21:25, Jamsheed C M wrote: >> >> Hi, >> >> Async handling at method entry requires it to be aware of >> synchronization(like whether it is doing async handling before lock >> acquire or after) >> >> This is required as exception handler rely on this info for >> unlocking.? Async handling code never had this special condition >> handled and it worked most of the time as we were using biased locking >> which got disabled by [1] >> >> There was one other issue reported in similar time[2]. This issue got >> triggered in test case by [3], back to back extra safepoint after >> suspend and TLH for ThreadDeath. So in this setup both PopFrame >> request and Thread.Stop request happened together for the test >> scenario and it reached java method entry with pending_exception set. >> >> I have done a partial fix for the issue, mainly to handle production >> mode crash failures(do not unlock flag related ones) >> >> Fix detail: >> >> 1) I save restore the "do not unlock" flag in async handling. Sorry but you completely changed the fix compared to what we discussed and what I pre-reviewed! What happened to changing from JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time and effort to determine that this save/restore of the "do not unlock flag" is actually correct and valid! >> >> 2) Return for floating pending exception for some cases(PopFrame, >> Early return related). This is debug(JVMTI) feature and floating >> exception can get cleaned just like that in present compiler request >> and deopt code. What part of the change addresses this? Thanks, David ----- >> >> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >> >> There are more problems in these code areas, like we clear all >> exceptions in compilation request path(interpreter,c1), as well as >> deoptimization path. >> >> All these un-handled cases will be separately handled by >> https://bugs.openjdk.java.net/browse/JDK-8249451 >> >> Request for review. >> >> Best regards, >> >> Jamsheed >> >> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >> >> >> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >> >> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >> From jamsheed.c.m at oracle.com Thu Jul 16 00:01:21 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 05:31:21 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> Message-ID: <973c4e4c-ed0e-7152-8387-28243a3ac275@oracle.com> Hi David, On 16/07/2020 05:20, David Holmes wrote: > Hi Jamsheed, > > On 16/07/2020 8:16 am, Jamsheed C M wrote: >> (Thank you Dean, adding serviceability team as this issue involves >> JVMTI features PopFrame, EarlyReturn features) > > It is not at all obvious how your proposed fix impacts the JVM TI > features. Yes, proposed fix doesn't. Fix doesn't plan to address JVMTI feature related issues. Added just to keep everyone in the loop. Best regards, Jamsheed > >> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >> >> (testing: mach5, tier1-5 links in JBS) >> >> Best regards, >> >> Jamsheed >> >> On 15/07/2020 21:25, Jamsheed C M wrote: >>> >>> Hi, >>> >>> Async handling at method entry requires it to be aware of >>> synchronization(like whether it is doing async handling before lock >>> acquire or after) >>> >>> This is required as exception handler rely on this info for >>> unlocking.? Async handling code never had this special condition >>> handled and it worked most of the time as we were using biased >>> locking which got disabled by [1] >>> >>> There was one other issue reported in similar time[2]. This issue >>> got triggered in test case by [3], back to back extra safepoint >>> after suspend and TLH for ThreadDeath. So in this setup both >>> PopFrame request and Thread.Stop request happened together for the >>> test scenario and it reached java method entry with >>> pending_exception set. >>> >>> I have done a partial fix for the issue, mainly to handle production >>> mode crash failures(do not unlock flag related ones) >>> >>> Fix detail: >>> >>> 1) I save restore the "do not unlock" flag in async handling. > > Sorry but you completely changed the fix compared to what we discussed > and what I pre-reviewed! What happened to changing from JRT_ENTRY to > JRT_ENTRY_NOASYNC? It is going to take me a lot of time and effort to > determine that this save/restore of the "do not unlock flag" is > actually correct and valid! > >>> >>> 2) Return for floating pending exception for some cases(PopFrame, >>> Early return related). This is debug(JVMTI) feature and floating >>> exception can get cleaned just like that in present compiler request >>> and deopt code. > > What part of the change addresses this? > > Thanks, > David > ----- > >>> >>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>> >>> There are more problems in these code areas, like we clear all >>> exceptions in compilation request path(interpreter,c1), as well as >>> deoptimization path. >>> >>> All these un-handled cases will be separately handled by >>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>> >>> Request for review. >>> >>> Best regards, >>> >>> Jamsheed >>> >>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>> >>> >>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>> >>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>> From jamsheed.c.m at oracle.com Thu Jul 16 00:37:25 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 06:07:25 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> Message-ID: <122f8079-958c-acdf-bb60-3934729a313a@oracle.com> Hi David, On 16/07/2020 05:20, David Holmes wrote: >>> >>> Hi, >>> >>> Async handling at method entry requires it to be aware of >>> synchronization(like whether it is doing async handling before lock >>> acquire or after) >>> >>> This is required as exception handler rely on this info for >>> unlocking.? Async handling code never had this special condition >>> handled and it worked most of the time as we were using biased >>> locking which got disabled by [1] >>> >>> There was one other issue reported in similar time[2]. This issue >>> got triggered in test case by [3], back to back extra safepoint >>> after suspend and TLH for ThreadDeath. So in this setup both >>> PopFrame request and Thread.Stop request happened together for the >>> test scenario and it reached java method entry with >>> pending_exception set. >>> >>> I have done a partial fix for the issue, mainly to handle production >>> mode crash failures(do not unlock flag related ones) >>> >>> Fix detail: >>> >>> 1) I save restore the "do not unlock" flag in async handling. > > Sorry but you completely changed the fix compared to what we discussed > and what I pre-reviewed! What happened to changing from JRT_ENTRY to > JRT_ENTRY_NOASYNC? It is going to take me a lot of time and effort to > determine that this save/restore of the "do not unlock flag" is > actually correct and valid! I tried JRT_ENTRY to JRT_ENTRY_NOASYNC. but unfortunately that made some tests to fail(logs in JBS), I didn't investigate it in detail, but what I presume is pending_async_exception is set for those failing scenarios but as we have? disabled async handling in some prominent code paths, the exception is never delivered. >>> >>> 2) Return for floating pending exception for some cases(PopFrame, >>> Early return related). This is debug(JVMTI) feature and floating >>> exception can get cleaned just like that in present compiler request >>> and deopt code. > > What part of the change addresses this? It doesn't address this issue completely. As it requires other changes in compilation request path(c1,interpreter) and deopt. Just made changes to interpreter part(compilation request part). that fixes interpreter part partially. JRT_ENTRY(nmethod*, InterpreterRuntime::frequency_counter_overflow_inner(JavaThread* thread, address branch_bcp)) + if (HAS_PENDING_EXCEPTION) { + return NULL; + } JRT_ENTRY(void, InterpreterRuntime::profile_method(JavaThread* thread)) + if (HAS_PENDING_EXCEPTION) { + return; + } Best regards Jamsheed > > Thanks, > David > ----- From david.holmes at oracle.com Thu Jul 16 01:07:33 2020 From: david.holmes at oracle.com (David Holmes) Date: Thu, 16 Jul 2020 11:07:33 +1000 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> Message-ID: <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> Hi Jamsheed, tl;dr version: fix looks good. Thanks for working through things with me on this one. Long version ... for the sake of other reviewers (and myself) I'm going to walk through the problem scenario and how the fix addresses it, because the bug report is long and confusing and touches on a number of different issues with async exception handling. We are dealing with the code generated for Java method entry, and in particular for a synchronized Java method. We do a lot of things in the entry code before we actually lock the monitor and jump to the Java method. Some of those things include method profiling and the counter overflow check for the JIT. If an exception is thrown at this point, the logic to remove the activation would unlock the monitor - which we haven't actually locked yet! So we have the do_not_unlock_if_synchronized flag which is stored in the current JavaThread. We set that flag true so that if any exceptions result in activation removal, the removal logic won't try to unlock the monitor. Once we're ready to lock the monitor we set the flag back to false (note there is an implicit assumption here that monitor locking can never raise an exception). The problem arises with async exceptions, or more specifically the async exception that is raised due to an "unsafe access error". This is where a memory-mapped ByteBuffer causes an access violation (SEGV) due to a bad pointer. The signal handler simply sets a flag to indicate we encountered an "unsafe access error", adjusts the BCI to the next instruction and allows execution to proceed at the next instruction. It is then expected that the runtime will "soon" notice this pending unsafe access error and create and throw the InternalError instance that indicates the ByteBuffer operation failed. This requires executing Java code. One of the places that checks for that pending unsafe access error is in the destructor of the JRT_ENTRY wrapper that is used for the method profiling and counter overflow checking. This occurs whilst the do_not_unlock_if_synchronized flag is true, so the resulting InternalError won't result in an attempt to unlock the not-locked monitor. The problem is that creating the InternalError executes Java code - it calls constructors, which call methods etc. And some of those methods are synchronized. So the method entry logic for such a call will set do_not_unlock_if_synchronized to true, perform all the preamble related to the call, then set do_not_unlock_if_synchronized to false, lock the monitor and make the call. When construction completes the InternalError is thrown and we remove the activation for the method we had originally started to call. But now the do_not_unlock_if_synchronized flag has been reset to false by the nested Java method call, so we do in fact try to unlock a monitor that was never locked, and things break. This nesting problem is well known and we have a mechanism for dealing with - the UnlockFlagSaver. The actual logic executed for profiling methods and doing the counter overflow check contains the requisite UnlockFlagSaver to avoid the problem just outlined. Unfortunately the async exception is processed in the JRT_ENTRY wrapper, which is outside the scope of those UnlockFlagSaver helpers and so they don't help in this case. So the fix is to "simply" move the UnlockFlagSaver deeper into the call stack to the code that actually does the async exception processing: void JavaThread::check_and_handle_async_exceptions(bool check_unsafe_error) { + // May be we are at method entry and requires to save do not unlock flag. + UnlockFlagSaver fs(this); so now after the InternalError has been created and thrown we will restore the original value of the do_not_unlock_if_synchronized flag (false) and so the InternalError will not cause activation removal to attempt to unlock the not-locked monitor. The scope of the UnlockFlagSaver could be narrowed to the actual logic for processing the unsafe access error, but it seems fine at method scope. A second fix is that the overflow counter check had an assertion that it was not executed with any pending exceptions. But that turned out to be false for reasons I can't fully explain, but it again appears to relate to a pending async exception being installed prior to the method call - and seems related to the two referenced JVM TI functions. The simple solution here is to delete the assertion and to check for pending exceptions on entry to the code and just return immediately. The JRT_ENTRY destructor will see the pending exception and propagate it. Cheers, David On 16/07/2020 9:50 am, David Holmes wrote: > Hi Jamsheed, > > On 16/07/2020 8:16 am, Jamsheed C M wrote: >> (Thank you Dean, adding serviceability team as this issue involves >> JVMTI features PopFrame, EarlyReturn features) > > It is not at all obvious how your proposed fix impacts the JVM TI features. > >> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >> >> (testing: mach5, tier1-5 links in JBS) >> >> Best regards, >> >> Jamsheed >> >> On 15/07/2020 21:25, Jamsheed C M wrote: >>> >>> Hi, >>> >>> Async handling at method entry requires it to be aware of >>> synchronization(like whether it is doing async handling before lock >>> acquire or after) >>> >>> This is required as exception handler rely on this info for >>> unlocking.? Async handling code never had this special condition >>> handled and it worked most of the time as we were using biased >>> locking which got disabled by [1] >>> >>> There was one other issue reported in similar time[2]. This issue got >>> triggered in test case by [3], back to back extra safepoint after >>> suspend and TLH for ThreadDeath. So in this setup both PopFrame >>> request and Thread.Stop request happened together for the test >>> scenario and it reached java method entry with pending_exception set. >>> >>> I have done a partial fix for the issue, mainly to handle production >>> mode crash failures(do not unlock flag related ones) >>> >>> Fix detail: >>> >>> 1) I save restore the "do not unlock" flag in async handling. > > Sorry but you completely changed the fix compared to what we discussed > and what I pre-reviewed! What happened to changing from JRT_ENTRY to > JRT_ENTRY_NOASYNC? It is going to take me a lot of time and effort to > determine that this save/restore of the "do not unlock flag" is actually > correct and valid! > >>> >>> 2) Return for floating pending exception for some cases(PopFrame, >>> Early return related). This is debug(JVMTI) feature and floating >>> exception can get cleaned just like that in present compiler request >>> and deopt code. > > What part of the change addresses this? > > Thanks, > David > ----- > >>> >>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>> >>> There are more problems in these code areas, like we clear all >>> exceptions in compilation request path(interpreter,c1), as well as >>> deoptimization path. >>> >>> All these un-handled cases will be separately handled by >>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>> >>> Request for review. >>> >>> Best regards, >>> >>> Jamsheed >>> >>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>> >>> >>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>> >>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>> From jamsheed.c.m at oracle.com Thu Jul 16 01:55:30 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 07:25:30 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: Hi Vladimir, On 16/07/2020 00:29, Vladimir Kozlov wrote: > As I said before I agree with your additional checks for StoreN and > StoreNKlass. > > But I have concerns about new is_init_captured_store code. EA is > mostly looking only on inputs to see Allocation. And in several places > it expecting only to see Allocation because other cases should be > filtered out before. If that is the case, I would like to go with my first webrev for this fix as it nicely propagate es and there in no unnecessary promotion to global escape state. http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ Best regards, Jamsheed > > Thanks, > Vladimir > > On 7/15/20 10:54 AM, Jamsheed C M wrote: >> Hi Vladimir, >> >> with unrolling i understand that many cases will just have phis >> everywhere to outside the loop as the uses are outside the loop. >> >> and this is not restricted to escaping objects alone as i depicted. >> it can be escaping as well as non-escaping. >> >> so marking store to them as global escape doesn't seems to be nice >> idea. i will rework on this fix and get back again. >> >> Thank you >> >> Best regards >> >> Jamsheed >> >> On 15/07/2020 08:38, Jamsheed C M wrote: >>> (unfinished mail got sent, so completing it) >>> On 15/07/2020 08:21, Jamsheed C M wrote: >>>> Hi Vladimir, >>>> >>>> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>>>> I looked more on this. EA already does not secularize allocations >>>>> when Phi nodes merged them - it should handle this case. I did >>>>> small experiment and relaxed assert for this new (10. needs >>>>> comment update) case for AddP's base and test passed: >>>>> >>>>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>>>> @@ -2357,6 +2357,7 @@ >>>>> ?????? int opcode = uncast_base->Opcode(); >>>>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>>>> ????????????? opcode == Op_CastX2P || >>>>> uncast_base->is_DecodeNarrowPtr() || >>>>> +???????????? (uncast_base->is_Phi() && >>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>> ????????????? (uncast_base->is_Mem() && >>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>> ????????????? (uncast_base->is_Proj() && >>>>> uncast_base->in(0)->is_Allocate()), "sanity"); >>>>> ???? } >>>>> >>>>> Did you hit a case when this may not work? >>>> >>>> Yes, right it already doesn't mark it as scalarizable if base count >>>> is more than one(I think it missed a is_oop check there)[1]. >>>> >>>> EA CG adds edges only for oop field making stores to them >>>> undetected. This makes these stored objects to NoEscape and if >>>> compiled method continues execution with this NoEscape object can >>>> have undesired results(i.e synchronization removed). >>>> >>>> Probable case would be(didn't verify) >>>> >>>> try { >>>> >>>> LOOP BEGIN >>>> >>>> ? try {throw new Obj()} catch {} >>>> >>>> LOOP END >>>> >>>> } catch (Obj e) { >>>> >>>> } >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >>> >>> >>> >>>>> >>>>> >>>>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation >>>>> (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): >>>>> >>>>> ======== Connection graph for? Test::test >>>>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95 Allocate === >>>>> 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1? 78 ) [[ 96 97 98 105 >>>>> 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, >>>>> bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>>>> Test::test1 @ bci:0 Test::test @ bci:8 >>>>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108 158 ]] >>>>> #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>> >>>>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93 92? 21 1 78 >>>>> 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]] rawptr:NotNull ( int:>=0, >>>>> java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 >>>>> Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>> ++++ Eliminated: 95 Allocate >>>>> >>>>> >>>>> t\Thanks, >>>>> Vladimir K >>>>> >>>>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>>>> Hi all, >>>>>> >>>>>> I had incorrectly added extra check in assert after offset >>>>>> computation in address_offset . For addps with non constant >>>>>> offsets (like [1]) >>>>>> >>>>>> Not changing the old assert even though I am not expecting first >>>>>> addp/second addp(for array addressing) case for init captured store. >>>>>> >>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>>>> >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> [1] >>>>>> >>>>>> assert(offs != Type::OffsetBot || >>>>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >>>>>> is_captured_store(adr), >>>>>> ???????????? "offset must be a constant or it is initialization >>>>>> of array"); >>>>>> >>>>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I reworked the fix. I compute offset for all init captures >>>>>>> stores, but treats this special init captured stores similar to >>>>>>> unsafe(as these objects are usually GlobalEscape and doesn't >>>>>>> have any perf implications). >>>>>>> >>>>>>> revised webrev: >>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>>>> >>>>>>> testing: mach1-5( logs in jbs) >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Jamsheed >>>>>>> >>>>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> request to hold the review. need to change the code for dealing >>>>>>>> with unsafe access. as current capture code go for more >>>>>>>> execution time analyzing things. >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>>>> >>>>>>>>> Request for review changes made to offset computation and >>>>>>>>> field write detection for init captured stores due to phis >>>>>>>>> addition between alloc and init. This happen if init node in >>>>>>>>> different outer loop wrt to alloc node and there is a loop >>>>>>>>> opt.? This was required as a result of enhancement [1]. >>>>>>>>> >>>>>>>>> Normally init are not associated with multiple alloc node >>>>>>>>> during EA phase, but changes done for [1] caused the code >>>>>>>>> shapes of the form [2]? to generate inits associated with >>>>>>>>> multiple alloc node. >>>>>>>>> >>>>>>>>> This had implication in offset computation and field write >>>>>>>>> detection related to initializing stores. >>>>>>>>> >>>>>>>>> Attempt to fix in EA: >>>>>>>>> >>>>>>>>> ???? webrev: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>>>> >>>>>>>>> Alternate fix: >>>>>>>>> >>>>>>>>> ???? Minimize the scenario in compiler generated code by >>>>>>>>> throwing only j.l.Error from slowpath(all exception async/sync >>>>>>>>> are handled in runtime exit). >>>>>>>>> >>>>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable >>>>>>>>> full loop opt before EA for detectable patterns and bailout EA >>>>>>>>> for late detected patterns. >>>>>>>>> >>>>>>>>> ???? webrev: >>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>>>> >>>>>>>>> Please advice. >>>>>>>>> >>>>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> >>>>>>>>> [1] JDK-8231291 >>>>>>>>> C2: loop >>>>>>>>> opts before EA should maximally unroll loops >>>>>>>>> >>>>>>>>> [2] that have its init node in different outer loop wrt to >>>>>>>>> alloc node. >>>>>>>>> >>>>>>>>> >>>>>>>>> loop begin >>>>>>>>> >>>>>>>>> ?? try{ >>>>>>>>> >>>>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>>>>>> allocation, in a loop >>>>>>>>> >>>>>>>>> ?? } catch(ex) { >>>>>>>>> >>>>>>>>> ?? } >>>>>>>>> >>>>>>>>> loop end >>>>>>>>> >>>>>>>>> ? 42???? public static IntA test(int n) { >>>>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>>>> ?? 44???????????? try { >>>>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>>>> ?? 47???????????? } >>>>>>>>> ?? 48???????? } >>>>>>>>> ?? 49 >>>>>>>>> From jiefu at tencent.com Thu Jul 16 01:59:32 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Thu, 16 Jul 2020 01:59:32 +0000 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java Message-ID: Hi all, May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ Thanks a lot. Best regards, Jie From jamsheed.c.m at oracle.com Thu Jul 16 02:03:31 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 07:33:31 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> Message-ID: Hi David, On 16/07/2020 06:37, David Holmes wrote: > Hi Jamsheed, > > tl;dr version: fix looks good. Thanks for working through things with > me on this one. > > Long version ... for the sake of other reviewers (and myself) I'm > going to walk through the problem scenario and how the fix addresses > it, because the bug report is long and confusing and touches on a > number of different issues with async exception handling. > > We are dealing with the code generated for Java method entry, and in > particular for a synchronized Java method. We do a lot of things in > the entry code before we actually lock the monitor and jump to the > Java method. Some of those things include method profiling and the > counter overflow check for the JIT. If an exception is thrown at this > point, the logic to remove the activation would unlock the monitor - > which we haven't actually locked yet! So we have the > do_not_unlock_if_synchronized flag which is stored in the current > JavaThread. We set that flag true so that if any exceptions result in > activation removal, the removal logic won't try to unlock the monitor. > Once we're ready to lock the monitor we set the flag back to false > (note there is an implicit assumption here that monitor locking can > never raise an exception). > > The problem arises with async exceptions, or more specifically the > async exception that is raised due to an "unsafe access error". This > is where a memory-mapped ByteBuffer causes an access violation (SEGV) > due to a bad pointer. The signal handler simply sets a flag to > indicate we encountered an "unsafe access error", adjusts the BCI to > the next instruction and allows execution to proceed at the next > instruction. It is then expected that the runtime will "soon" notice > this pending unsafe access error and create and throw the > InternalError instance that indicates the ByteBuffer operation failed. > This requires executing Java code. > > One of the places that checks for that pending unsafe access error is > in the destructor of the JRT_ENTRY wrapper that is used for the method > profiling and counter overflow checking. This occurs whilst the > do_not_unlock_if_synchronized flag is true, so the resulting > InternalError won't result in an attempt to unlock the not-locked > monitor. > > The problem is that creating the InternalError executes Java code - it > calls constructors, which call methods etc. And some of those methods > are synchronized. So the method entry logic for such a call will set > do_not_unlock_if_synchronized to true, perform all the preamble > related to the call, then set do_not_unlock_if_synchronized to false, > lock the monitor and make the call. When construction completes the > InternalError is thrown and we remove the activation for the method we > had originally started to call. But now the > do_not_unlock_if_synchronized flag has been reset to false by the > nested Java method call, so we do in fact try to unlock a monitor that > was never locked, and things break. > > This nesting problem is well known and we have a mechanism for dealing > with - the UnlockFlagSaver. The actual logic executed for profiling > methods and doing the counter overflow check contains the requisite > UnlockFlagSaver to avoid the problem just outlined. Unfortunately the > async exception is processed in the JRT_ENTRY wrapper, which is > outside the scope of those UnlockFlagSaver helpers and so they don't > help in this case. > > So the fix is to "simply" move the UnlockFlagSaver deeper into the > call stack to the code that actually does the async exception processing: > > ?void JavaThread::check_and_handle_async_exceptions(bool > check_unsafe_error) { > +?? // May be we are at method entry and requires to save do not > unlock flag. > +?? UnlockFlagSaver fs(this); > > so now after the InternalError has been created and thrown we will > restore the original value of the do_not_unlock_if_synchronized flag > (false) and so the InternalError will not cause activation removal to > attempt to unlock the not-locked monitor. > > The scope of the UnlockFlagSaver could be narrowed to the actual logic > for processing the unsafe access error, but it seems fine at method > scope. > > A second fix is that the overflow counter check had an assertion that > it was not executed with any pending exceptions. But that turned out > to be false for reasons I can't fully explain, but it again appears to > relate to a pending async exception being installed prior to the > method call - and seems related to the two referenced JVM TI > functions. The simple solution here is to delete the assertion and to > check for pending exceptions on entry to the code and just return > immediately. The JRT_ENTRY destructor will see the pending exception > and propagate it. Thanks a lot for the opportunity, for all the help, and for putting detailed description of the problem here. Best regards, Jamsheed > > Cheers, > David > > On 16/07/2020 9:50 am, David Holmes wrote: >> Hi Jamsheed, >> >> On 16/07/2020 8:16 am, Jamsheed C M wrote: >>> (Thank you Dean, adding serviceability team as this issue involves >>> JVMTI features PopFrame, EarlyReturn features) >> >> It is not at all obvious how your proposed fix impacts the JVM TI >> features. >> >>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >>> >>> (testing: mach5, tier1-5 links in JBS) >>> >>> Best regards, >>> >>> Jamsheed >>> >>> On 15/07/2020 21:25, Jamsheed C M wrote: >>>> >>>> Hi, >>>> >>>> Async handling at method entry requires it to be aware of >>>> synchronization(like whether it is doing async handling before lock >>>> acquire or after) >>>> >>>> This is required as exception handler rely on this info for >>>> unlocking.? Async handling code never had this special condition >>>> handled and it worked most of the time as we were using biased >>>> locking which got disabled by [1] >>>> >>>> There was one other issue reported in similar time[2]. This issue >>>> got triggered in test case by [3], back to back extra safepoint >>>> after suspend and TLH for ThreadDeath. So in this setup both >>>> PopFrame request and Thread.Stop request happened together for the >>>> test scenario and it reached java method entry with >>>> pending_exception set. >>>> >>>> I have done a partial fix for the issue, mainly to handle >>>> production mode crash failures(do not unlock flag related ones) >>>> >>>> Fix detail: >>>> >>>> 1) I save restore the "do not unlock" flag in async handling. >> >> Sorry but you completely changed the fix compared to what we >> discussed and what I pre-reviewed! What happened to changing from >> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time >> and effort to determine that this save/restore of the "do not unlock >> flag" is actually correct and valid! >> >>>> >>>> 2) Return for floating pending exception for some cases(PopFrame, >>>> Early return related). This is debug(JVMTI) feature and floating >>>> exception can get cleaned just like that in present compiler >>>> request and deopt code. >> >> What part of the change addresses this? >> >> Thanks, >> David >> ----- >> >>>> >>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>>> >>>> There are more problems in these code areas, like we clear all >>>> exceptions in compilation request path(interpreter,c1), as well as >>>> deoptimization path. >>>> >>>> All these un-handled cases will be separately handled by >>>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>>> >>>> Request for review. >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>>> >>>> >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>>> >>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>>> From mikael.vidstedt at oracle.com Thu Jul 16 02:10:20 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 15 Jul 2020 19:10:20 -0700 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java In-Reply-To: References: Message-ID: <95C84F92-6E0E-4995-AA01-CBD4BB81CC8B@oracle.com> Thanks for doing this. Can you please use the same exact license header found in make/templates/gpl-header and/or the surrounding files in that test directory? Cheers, Mikael > On Jul 15, 2020, at 6:59 PM, jiefu(??) wrote: > > Hi all, > > May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 > Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ > > Thanks a lot. > Best regards, > Jie From jiefu at tencent.com Thu Jul 16 02:55:04 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Thu, 16 Jul 2020 02:55:04 +0000 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java Message-ID: <48C11F71-A4AD-44EB-A3EF-C3F920176605@tencent.com> Hi Mikael, Thanks for your review. Updated: http://cr.openjdk.java.net/~jiefu/8246805/webrev.01/ Thanks. Best regards, Jie ?On 2020/7/16, 10:11 AM, "Mikael Vidstedt" wrote: Thanks for doing this. Can you please use the same exact license header found in make/templates/gpl-header and/or the surrounding files in that test directory? Cheers, Mikael > On Jul 15, 2020, at 6:59 PM, jiefu(??) wrote: > > Hi all, > > May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 > Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ > > Thanks a lot. > Best regards, > Jie From igor.ignatyev at oracle.com Thu Jul 16 03:31:01 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 15 Jul 2020 20:31:01 -0700 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java In-Reply-To: <48C11F71-A4AD-44EB-A3EF-C3F920176605@tencent.com> References: <48C11F71-A4AD-44EB-A3EF-C3F920176605@tencent.com> Message-ID: <5E44E4D8-1478-4604-A472-8065E09276E1@oracle.com> Hi Jie, LGTM -- Igor > On Jul 15, 2020, at 7:55 PM, jiefu(??) wrote: > > Hi Mikael, > > Thanks for your review. > Updated: http://cr.openjdk.java.net/~jiefu/8246805/webrev.01/ > > Thanks. > Best regards, > Jie > > ?On 2020/7/16, 10:11 AM, "Mikael Vidstedt" wrote: > > > Thanks for doing this. Can you please use the same exact license header found in make/templates/gpl-header and/or the surrounding files in that test directory? > > Cheers, > Mikael > >> On Jul 15, 2020, at 6:59 PM, jiefu(??) wrote: >> >> Hi all, >> >> May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 >> Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ >> >> Thanks a lot. >> Best regards, >> Jie > > > > From mikael.vidstedt at oracle.com Thu Jul 16 03:33:55 2020 From: mikael.vidstedt at oracle.com (Mikael Vidstedt) Date: Wed, 15 Jul 2020 20:33:55 -0700 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java In-Reply-To: <48C11F71-A4AD-44EB-A3EF-C3F920176605@tencent.com> References: <48C11F71-A4AD-44EB-A3EF-C3F920176605@tencent.com> Message-ID: <37DFCE79-0F1F-45A0-A0CC-13A915D34936@oracle.com> Looks good, thank you! Cheers, Mikael > On Jul 15, 2020, at 7:55 PM, jiefu(??) wrote: > > Hi Mikael, > > Thanks for your review. > Updated: http://cr.openjdk.java.net/~jiefu/8246805/webrev.01/ > > Thanks. > Best regards, > Jie > > ?On 2020/7/16, 10:11 AM, "Mikael Vidstedt" wrote: > > > Thanks for doing this. Can you please use the same exact license header found in make/templates/gpl-header and/or the surrounding files in that test directory? > > Cheers, > Mikael > >> On Jul 15, 2020, at 6:59 PM, jiefu(??) wrote: >> >> Hi all, >> >> May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 >> Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ >> >> Thanks a lot. >> Best regards, >> Jie > > > > From jiefu at tencent.com Thu Jul 16 03:58:06 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Thu, 16 Jul 2020 03:58:06 +0000 Subject: RFR: 8246805: Incorrect copyright header in TestInvalidTieredStopAtLevel.java Message-ID: Thanks Igor and Mikael for your review. Pushed. Best regards, Jie ?On 2020/7/16, 11:34 AM, "Mikael Vidstedt" wrote: Looks good, thank you! Cheers, Mikael > On Jul 15, 2020, at 7:55 PM, jiefu(??) wrote: > > Hi Mikael, > > Thanks for your review. > Updated: http://cr.openjdk.java.net/~jiefu/8246805/webrev.01/ > > Thanks. > Best regards, > Jie > > On 2020/7/16, 10:11 AM, "Mikael Vidstedt" wrote: > > > Thanks for doing this. Can you please use the same exact license header found in make/templates/gpl-header and/or the surrounding files in that test directory? > > Cheers, > Mikael > >> On Jul 15, 2020, at 6:59 PM, jiefu(??) wrote: >> >> Hi all, >> >> May I get reviews for this tiny fix, which just updates the license to be GPLv2 only (not GPLv2+CPE)? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8246805 >> Webrev: http://cr.openjdk.java.net/~jiefu/8246805/webrev.00/ >> >> Thanks a lot. >> Best regards, >> Jie > > > > From xxinliu at amazon.com Thu Jul 16 04:01:49 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 16 Jul 2020 04:01:49 +0000 Subject: question about PrintOptoStatistics atomicity In-Reply-To: <6b3d2637-e01c-8ab9-e32c-2404c7b2a40a@oracle.com> References: <1594827116846.89704@amazon.com>, <6b3d2637-e01c-8ab9-e32c-2404c7b2a40a@oracle.com> Message-ID: <1594872109123.1250@amazon.com> Hi, Vladimir, Thank you for you information. I understand. I can use -XX:CICompilerCount=2 if I need to have precise counters. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Vladimir Kozlov Sent: Wednesday, July 15, 2020 10:50 AM To: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] question about PrintOptoStatistics atomicity CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. It was done intentionally because when that code was implemented atomic operations were expensive. We never intended these counters to be precise - they were used mostly for debugging purpose. It is up to user how he want to use them - for example using only one C2 thread. When you collect data for general application you want to execute it with the same parameters as in production. I don't think we should enforce any restrictions in VM when PrintOptoStatistics is used. Regards, Vladimir K On 7/15/20 8:31 AM, Liu, Xin wrote: > Hi, > > > I have a question about -XX:+PrintOptoStatistics in c2_globals.hpp. > > It dumps many internal counters in different C2 phases. I found those counters are all static fields. > > eg. > > http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/chaitin.cpp#l2297 > > http://hg.openjdk.java.net/jdk/jdk/file/4b9ced2b948c/src/hotspot/share/opto/phaseX.hpp#l599 > > > I notice that all setters of those fields are not atomic. IMHO, hotspot may has more than one c2-compiler-threads running at the same time. > > How does hotspot guarantee those fields are thread-safe? or the flag intends to do statistics in single-thread mode by design? > > > If those counters are not atomic, shall we connect this flag to CICompilerCount? > > I think we can constrain the number of c2-compiler-thread to 1 if user set PrintOptoStatistics. Does it make sense? > > > thanks, > > --lx > From jamsheed.c.m at oracle.com Thu Jul 16 07:00:18 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 12:30:18 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> Message-ID: <55b4473d-8aa4-77e0-1145-2a94a0a5f62e@oracle.com> Hi all, could i get another review? Best regards, Jamsheed On 16/07/2020 06:37, David Holmes wrote: > Hi Jamsheed, > > tl;dr version: fix looks good. Thanks for working through things with > me on this one. > > Long version ... for the sake of other reviewers (and myself) I'm > going to walk through the problem scenario and how the fix addresses > it, because the bug report is long and confusing and touches on a > number of different issues with async exception handling. > > We are dealing with the code generated for Java method entry, and in > particular for a synchronized Java method. We do a lot of things in > the entry code before we actually lock the monitor and jump to the > Java method. Some of those things include method profiling and the > counter overflow check for the JIT. If an exception is thrown at this > point, the logic to remove the activation would unlock the monitor - > which we haven't actually locked yet! So we have the > do_not_unlock_if_synchronized flag which is stored in the current > JavaThread. We set that flag true so that if any exceptions result in > activation removal, the removal logic won't try to unlock the monitor. > Once we're ready to lock the monitor we set the flag back to false > (note there is an implicit assumption here that monitor locking can > never raise an exception). > > The problem arises with async exceptions, or more specifically the > async exception that is raised due to an "unsafe access error". This > is where a memory-mapped ByteBuffer causes an access violation (SEGV) > due to a bad pointer. The signal handler simply sets a flag to > indicate we encountered an "unsafe access error", adjusts the BCI to > the next instruction and allows execution to proceed at the next > instruction. It is then expected that the runtime will "soon" notice > this pending unsafe access error and create and throw the > InternalError instance that indicates the ByteBuffer operation failed. > This requires executing Java code. > > One of the places that checks for that pending unsafe access error is > in the destructor of the JRT_ENTRY wrapper that is used for the method > profiling and counter overflow checking. This occurs whilst the > do_not_unlock_if_synchronized flag is true, so the resulting > InternalError won't result in an attempt to unlock the not-locked > monitor. > > The problem is that creating the InternalError executes Java code - it > calls constructors, which call methods etc. And some of those methods > are synchronized. So the method entry logic for such a call will set > do_not_unlock_if_synchronized to true, perform all the preamble > related to the call, then set do_not_unlock_if_synchronized to false, > lock the monitor and make the call. When construction completes the > InternalError is thrown and we remove the activation for the method we > had originally started to call. But now the > do_not_unlock_if_synchronized flag has been reset to false by the > nested Java method call, so we do in fact try to unlock a monitor that > was never locked, and things break. > > This nesting problem is well known and we have a mechanism for dealing > with - the UnlockFlagSaver. The actual logic executed for profiling > methods and doing the counter overflow check contains the requisite > UnlockFlagSaver to avoid the problem just outlined. Unfortunately the > async exception is processed in the JRT_ENTRY wrapper, which is > outside the scope of those UnlockFlagSaver helpers and so they don't > help in this case. > > So the fix is to "simply" move the UnlockFlagSaver deeper into the > call stack to the code that actually does the async exception processing: > > ?void JavaThread::check_and_handle_async_exceptions(bool > check_unsafe_error) { > +?? // May be we are at method entry and requires to save do not > unlock flag. > +?? UnlockFlagSaver fs(this); > > so now after the InternalError has been created and thrown we will > restore the original value of the do_not_unlock_if_synchronized flag > (false) and so the InternalError will not cause activation removal to > attempt to unlock the not-locked monitor. > > The scope of the UnlockFlagSaver could be narrowed to the actual logic > for processing the unsafe access error, but it seems fine at method > scope. > > A second fix is that the overflow counter check had an assertion that > it was not executed with any pending exceptions. But that turned out > to be false for reasons I can't fully explain, but it again appears to > relate to a pending async exception being installed prior to the > method call - and seems related to the two referenced JVM TI > functions. The simple solution here is to delete the assertion and to > check for pending exceptions on entry to the code and just return > immediately. The JRT_ENTRY destructor will see the pending exception > and propagate it. > > Cheers, > David > > On 16/07/2020 9:50 am, David Holmes wrote: >> Hi Jamsheed, >> >> On 16/07/2020 8:16 am, Jamsheed C M wrote: >>> (Thank you Dean, adding serviceability team as this issue involves >>> JVMTI features PopFrame, EarlyReturn features) >> >> It is not at all obvious how your proposed fix impacts the JVM TI >> features. >> >>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >>> >>> (testing: mach5, tier1-5 links in JBS) >>> >>> Best regards, >>> >>> Jamsheed >>> >>> On 15/07/2020 21:25, Jamsheed C M wrote: >>>> >>>> Hi, >>>> >>>> Async handling at method entry requires it to be aware of >>>> synchronization(like whether it is doing async handling before lock >>>> acquire or after) >>>> >>>> This is required as exception handler rely on this info for >>>> unlocking.? Async handling code never had this special condition >>>> handled and it worked most of the time as we were using biased >>>> locking which got disabled by [1] >>>> >>>> There was one other issue reported in similar time[2]. This issue >>>> got triggered in test case by [3], back to back extra safepoint >>>> after suspend and TLH for ThreadDeath. So in this setup both >>>> PopFrame request and Thread.Stop request happened together for the >>>> test scenario and it reached java method entry with >>>> pending_exception set. >>>> >>>> I have done a partial fix for the issue, mainly to handle >>>> production mode crash failures(do not unlock flag related ones) >>>> >>>> Fix detail: >>>> >>>> 1) I save restore the "do not unlock" flag in async handling. >> >> Sorry but you completely changed the fix compared to what we >> discussed and what I pre-reviewed! What happened to changing from >> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time >> and effort to determine that this save/restore of the "do not unlock >> flag" is actually correct and valid! >> >>>> >>>> 2) Return for floating pending exception for some cases(PopFrame, >>>> Early return related). This is debug(JVMTI) feature and floating >>>> exception can get cleaned just like that in present compiler >>>> request and deopt code. >> >> What part of the change addresses this? >> >> Thanks, >> David >> ----- >> >>>> >>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>>> >>>> There are more problems in these code areas, like we clear all >>>> exceptions in compilation request path(interpreter,c1), as well as >>>> deoptimization path. >>>> >>>> All these un-handled cases will be separately handled by >>>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>>> >>>> Request for review. >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>>> >>>> >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>>> >>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>>> From aph at redhat.com Thu Jul 16 08:44:16 2020 From: aph at redhat.com (Andrew Haley) Date: Thu, 16 Jul 2020 09:44:16 +0100 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: Message-ID: <0aed0646-c770-03e6-4e0b-5108919b7203@redhat.com> On 15/07/2020 14:27, Ludovic Henry wrote: > A quick follow-up on that patch. Is there anything you would like to see done differently? It's fine, but (as discussed) it should go into http://hg.openjdk.java.net/aarch64-port/jdk-windows/ We'll need to do a regular pull from jdk/jdk into that tree. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jamsheed.c.m at oracle.com Thu Jul 16 11:40:46 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 17:10:46 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: <03e49aa9-167f-8b8f-a744-408febff5bf6@oracle.com> Hi Vladimir, On 16/07/2020 07:25, Jamsheed C M wrote: > But I have concerns about new is_init_captured_store code. EA is > mostly looking only on inputs to see Allocation. And in several places > it expecting only to see Allocation because other cases should be > filtered out before. I understand the concern here. If I am using the newer webrevs, I will ensure I don't filter out the inputs(basically check uncast i/p) for the stores I don't want to re-compute and find if it is initializing store, but this info is actually already available in Field/ can be made available. As I don't want EA taking more time analyzing stuffs due to my change, and in-turn have a perf impact. Best regards, Jamsheed From jamsheed.c.m at oracle.com Thu Jul 16 14:36:16 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 20:06:16 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: <83c48e9f-b247-bfd2-18b5-eea9ea5ae23a@oracle.com> Hi Vladimir, On 16/07/2020 00:29, Vladimir Kozlov wrote: > But I have concerns about new is_init_captured_store code. EA is > mostly looking only on inputs to see Allocation. And in several places > it expecting only to see Allocation because other cases should be > filtered out before. In all the cases we analyze inputs of addp(field), if it is a raw(uncasted) and if its input points to alloc projection we are sure they are init captured stores or intrinsic initialization. when i searched for present intrinsic code i see all its uses are casted address(before macro expansion). so only remaining case that is left out was init captured stores. case #3. so i used is_captured_store for finding all the raw stores that need to be analyzed(is a oop field store). all init captured store has base as top. and get_addp_base code i added will always detect them and direct address_offset to compute offset. Best regards, Jamsheed From coleen.phillimore at oracle.com Thu Jul 16 14:43:17 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 16 Jul 2020 10:43:17 -0400 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <55b4473d-8aa4-77e0-1145-2a94a0a5f62e@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> <55b4473d-8aa4-77e0-1145-2a94a0a5f62e@oracle.com> Message-ID: <38336861-a8eb-fdb0-7860-9cbc8eb820b6@oracle.com> Thanks to David's description of the problem and the fix, this makes sense to me now. I don't like it and we should revisit async exceptions for all the other problems it causes, but this change looks safe and good. thanks, Coleen On 7/16/20 3:00 AM, Jamsheed C M wrote: > Hi all, > > could i get another review? > > Best regards, > > Jamsheed > > On 16/07/2020 06:37, David Holmes wrote: >> Hi Jamsheed, >> >> tl;dr version: fix looks good. Thanks for working through things with >> me on this one. >> >> Long version ... for the sake of other reviewers (and myself) I'm >> going to walk through the problem scenario and how the fix addresses >> it, because the bug report is long and confusing and touches on a >> number of different issues with async exception handling. >> >> We are dealing with the code generated for Java method entry, and in >> particular for a synchronized Java method. We do a lot of things in >> the entry code before we actually lock the monitor and jump to the >> Java method. Some of those things include method profiling and the >> counter overflow check for the JIT. If an exception is thrown at this >> point, the logic to remove the activation would unlock the monitor - >> which we haven't actually locked yet! So we have the >> do_not_unlock_if_synchronized flag which is stored in the current >> JavaThread. We set that flag true so that if any exceptions result in >> activation removal, the removal logic won't try to unlock the >> monitor. Once we're ready to lock the monitor we set the flag back to >> false (note there is an implicit assumption here that monitor locking >> can never raise an exception). >> >> The problem arises with async exceptions, or more specifically the >> async exception that is raised due to an "unsafe access error". This >> is where a memory-mapped ByteBuffer causes an access violation (SEGV) >> due to a bad pointer. The signal handler simply sets a flag to >> indicate we encountered an "unsafe access error", adjusts the BCI to >> the next instruction and allows execution to proceed at the next >> instruction. It is then expected that the runtime will "soon" notice >> this pending unsafe access error and create and throw the >> InternalError instance that indicates the ByteBuffer operation >> failed. This requires executing Java code. >> >> One of the places that checks for that pending unsafe access error is >> in the destructor of the JRT_ENTRY wrapper that is used for the >> method profiling and counter overflow checking. This occurs whilst >> the do_not_unlock_if_synchronized flag is true, so the resulting >> InternalError won't result in an attempt to unlock the not-locked >> monitor. >> >> The problem is that creating the InternalError executes Java code - >> it calls constructors, which call methods etc. And some of those >> methods are synchronized. So the method entry logic for such a call >> will set do_not_unlock_if_synchronized to true, perform all the >> preamble related to the call, then set do_not_unlock_if_synchronized >> to false, lock the monitor and make the call. When construction >> completes the InternalError is thrown and we remove the activation >> for the method we had originally started to call. But now the >> do_not_unlock_if_synchronized flag has been reset to false by the >> nested Java method call, so we do in fact try to unlock a monitor >> that was never locked, and things break. >> >> This nesting problem is well known and we have a mechanism for >> dealing with - the UnlockFlagSaver. The actual logic executed for >> profiling methods and doing the counter overflow check contains the >> requisite UnlockFlagSaver to avoid the problem just outlined. >> Unfortunately the async exception is processed in the JRT_ENTRY >> wrapper, which is outside the scope of those UnlockFlagSaver helpers >> and so they don't help in this case. >> >> So the fix is to "simply" move the UnlockFlagSaver deeper into the >> call stack to the code that actually does the async exception >> processing: >> >> ?void JavaThread::check_and_handle_async_exceptions(bool >> check_unsafe_error) { >> +?? // May be we are at method entry and requires to save do not >> unlock flag. >> +?? UnlockFlagSaver fs(this); >> >> so now after the InternalError has been created and thrown we will >> restore the original value of the do_not_unlock_if_synchronized flag >> (false) and so the InternalError will not cause activation removal to >> attempt to unlock the not-locked monitor. >> >> The scope of the UnlockFlagSaver could be narrowed to the actual >> logic for processing the unsafe access error, but it seems fine at >> method scope. >> >> A second fix is that the overflow counter check had an assertion that >> it was not executed with any pending exceptions. But that turned out >> to be false for reasons I can't fully explain, but it again appears >> to relate to a pending async exception being installed prior to the >> method call - and seems related to the two referenced JVM TI >> functions. The simple solution here is to delete the assertion and to >> check for pending exceptions on entry to the code and just return >> immediately. The JRT_ENTRY destructor will see the pending exception >> and propagate it. >> >> Cheers, >> David >> >> On 16/07/2020 9:50 am, David Holmes wrote: >>> Hi Jamsheed, >>> >>> On 16/07/2020 8:16 am, Jamsheed C M wrote: >>>> (Thank you Dean, adding serviceability team as this issue involves >>>> JVMTI features PopFrame, EarlyReturn features) >>> >>> It is not at all obvious how your proposed fix impacts the JVM TI >>> features. >>> >>>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >>>> >>>> (testing: mach5, tier1-5 links in JBS) >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>> On 15/07/2020 21:25, Jamsheed C M wrote: >>>>> >>>>> Hi, >>>>> >>>>> Async handling at method entry requires it to be aware of >>>>> synchronization(like whether it is doing async handling before >>>>> lock acquire or after) >>>>> >>>>> This is required as exception handler rely on this info for >>>>> unlocking.? Async handling code never had this special condition >>>>> handled and it worked most of the time as we were using biased >>>>> locking which got disabled by [1] >>>>> >>>>> There was one other issue reported in similar time[2]. This issue >>>>> got triggered in test case by [3], back to back extra safepoint >>>>> after suspend and TLH for ThreadDeath. So in this setup both >>>>> PopFrame request and Thread.Stop request happened together for the >>>>> test scenario and it reached java method entry with >>>>> pending_exception set. >>>>> >>>>> I have done a partial fix for the issue, mainly to handle >>>>> production mode crash failures(do not unlock flag related ones) >>>>> >>>>> Fix detail: >>>>> >>>>> 1) I save restore the "do not unlock" flag in async handling. >>> >>> Sorry but you completely changed the fix compared to what we >>> discussed and what I pre-reviewed! What happened to changing from >>> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of time >>> and effort to determine that this save/restore of the "do not unlock >>> flag" is actually correct and valid! >>> >>>>> >>>>> 2) Return for floating pending exception for some cases(PopFrame, >>>>> Early return related). This is debug(JVMTI) feature and floating >>>>> exception can get cleaned just like that in present compiler >>>>> request and deopt code. >>> >>> What part of the change addresses this? >>> >>> Thanks, >>> David >>> ----- >>> >>>>> >>>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>>>> >>>>> There are more problems in these code areas, like we clear all >>>>> exceptions in compilation request path(interpreter,c1), as well as >>>>> deoptimization path. >>>>> >>>>> All these un-handled cases will be separately handled by >>>>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>>>> >>>>> Request for review. >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>>>> >>>>> >>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>>>> >>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>>>> From jamsheed.c.m at oracle.com Thu Jul 16 14:49:48 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 20:19:48 +0530 Subject: [15] RFR: 8246381: VM crashes with "Current BasicObjectLock* below than low_mark" In-Reply-To: <38336861-a8eb-fdb0-7860-9cbc8eb820b6@oracle.com> References: <7a802330-e836-1ff3-af0a-ede587e049ff@oracle.com> <30bd811e-c890-5bb1-8c78-4cf944fd5a42@oracle.com> <5d43f963-b931-3b69-4b5c-188c45b57de8@oracle.com> <1af60254-a239-c21f-68df-be9b65534e7f@oracle.com> <55b4473d-8aa4-77e0-1145-2a94a0a5f62e@oracle.com> <38336861-a8eb-fdb0-7860-9cbc8eb820b6@oracle.com> Message-ID: <24043cec-b3f2-bfa0-fd66-f2fcedc4be27@oracle.com> Hi Coleen, Thank you for the review. Best regards, Jamsheed On 16/07/2020 20:13, coleen.phillimore at oracle.com wrote: > > Thanks to David's description of the problem and the fix, this makes > sense to me now. > I don't like it and we should revisit async exceptions for all the > other problems it causes, but this change looks safe and good. > > thanks, > Coleen > > On 7/16/20 3:00 AM, Jamsheed C M wrote: >> Hi all, >> >> could i get another review? >> >> Best regards, >> >> Jamsheed >> >> On 16/07/2020 06:37, David Holmes wrote: >>> Hi Jamsheed, >>> >>> tl;dr version: fix looks good. Thanks for working through things >>> with me on this one. >>> >>> Long version ... for the sake of other reviewers (and myself) I'm >>> going to walk through the problem scenario and how the fix addresses >>> it, because the bug report is long and confusing and touches on a >>> number of different issues with async exception handling. >>> >>> We are dealing with the code generated for Java method entry, and in >>> particular for a synchronized Java method. We do a lot of things in >>> the entry code before we actually lock the monitor and jump to the >>> Java method. Some of those things include method profiling and the >>> counter overflow check for the JIT. If an exception is thrown at >>> this point, the logic to remove the activation would unlock the >>> monitor - which we haven't actually locked yet! So we have the >>> do_not_unlock_if_synchronized flag which is stored in the current >>> JavaThread. We set that flag true so that if any exceptions result >>> in activation removal, the removal logic won't try to unlock the >>> monitor. Once we're ready to lock the monitor we set the flag back >>> to false (note there is an implicit assumption here that monitor >>> locking can never raise an exception). >>> >>> The problem arises with async exceptions, or more specifically the >>> async exception that is raised due to an "unsafe access error". This >>> is where a memory-mapped ByteBuffer causes an access violation >>> (SEGV) due to a bad pointer. The signal handler simply sets a flag >>> to indicate we encountered an "unsafe access error", adjusts the BCI >>> to the next instruction and allows execution to proceed at the next >>> instruction. It is then expected that the runtime will "soon" notice >>> this pending unsafe access error and create and throw the >>> InternalError instance that indicates the ByteBuffer operation >>> failed. This requires executing Java code. >>> >>> One of the places that checks for that pending unsafe access error >>> is in the destructor of the JRT_ENTRY wrapper that is used for the >>> method profiling and counter overflow checking. This occurs whilst >>> the do_not_unlock_if_synchronized flag is true, so the resulting >>> InternalError won't result in an attempt to unlock the not-locked >>> monitor. >>> >>> The problem is that creating the InternalError executes Java code - >>> it calls constructors, which call methods etc. And some of those >>> methods are synchronized. So the method entry logic for such a call >>> will set do_not_unlock_if_synchronized to true, perform all the >>> preamble related to the call, then set do_not_unlock_if_synchronized >>> to false, lock the monitor and make the call. When construction >>> completes the InternalError is thrown and we remove the activation >>> for the method we had originally started to call. But now the >>> do_not_unlock_if_synchronized flag has been reset to false by the >>> nested Java method call, so we do in fact try to unlock a monitor >>> that was never locked, and things break. >>> >>> This nesting problem is well known and we have a mechanism for >>> dealing with - the UnlockFlagSaver. The actual logic executed for >>> profiling methods and doing the counter overflow check contains the >>> requisite UnlockFlagSaver to avoid the problem just outlined. >>> Unfortunately the async exception is processed in the JRT_ENTRY >>> wrapper, which is outside the scope of those UnlockFlagSaver helpers >>> and so they don't help in this case. >>> >>> So the fix is to "simply" move the UnlockFlagSaver deeper into the >>> call stack to the code that actually does the async exception >>> processing: >>> >>> ?void JavaThread::check_and_handle_async_exceptions(bool >>> check_unsafe_error) { >>> +?? // May be we are at method entry and requires to save do not >>> unlock flag. >>> +?? UnlockFlagSaver fs(this); >>> >>> so now after the InternalError has been created and thrown we will >>> restore the original value of the do_not_unlock_if_synchronized flag >>> (false) and so the InternalError will not cause activation removal >>> to attempt to unlock the not-locked monitor. >>> >>> The scope of the UnlockFlagSaver could be narrowed to the actual >>> logic for processing the unsafe access error, but it seems fine at >>> method scope. >>> >>> A second fix is that the overflow counter check had an assertion >>> that it was not executed with any pending exceptions. But that >>> turned out to be false for reasons I can't fully explain, but it >>> again appears to relate to a pending async exception being installed >>> prior to the method call - and seems related to the two referenced >>> JVM TI functions. The simple solution here is to delete the >>> assertion and to check for pending exceptions on entry to the code >>> and just return immediately. The JRT_ENTRY destructor will see the >>> pending exception and propagate it. >>> >>> Cheers, >>> David >>> >>> On 16/07/2020 9:50 am, David Holmes wrote: >>>> Hi Jamsheed, >>>> >>>> On 16/07/2020 8:16 am, Jamsheed C M wrote: >>>>> (Thank you Dean, adding serviceability team as this issue involves >>>>> JVMTI features PopFrame, EarlyReturn features) >>>> >>>> It is not at all obvious how your proposed fix impacts the JVM TI >>>> features. >>>> >>>>> JBS entry: https://bugs.openjdk.java.net/browse/JDK-8246381 >>>>> >>>>> (testing: mach5, tier1-5 links in JBS) >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>> On 15/07/2020 21:25, Jamsheed C M wrote: >>>>>> >>>>>> Hi, >>>>>> >>>>>> Async handling at method entry requires it to be aware of >>>>>> synchronization(like whether it is doing async handling before >>>>>> lock acquire or after) >>>>>> >>>>>> This is required as exception handler rely on this info for >>>>>> unlocking.? Async handling code never had this special condition >>>>>> handled and it worked most of the time as we were using biased >>>>>> locking which got disabled by [1] >>>>>> >>>>>> There was one other issue reported in similar time[2]. This issue >>>>>> got triggered in test case by [3], back to back extra safepoint >>>>>> after suspend and TLH for ThreadDeath. So in this setup both >>>>>> PopFrame request and Thread.Stop request happened together for >>>>>> the test scenario and it reached java method entry with >>>>>> pending_exception set. >>>>>> >>>>>> I have done a partial fix for the issue, mainly to handle >>>>>> production mode crash failures(do not unlock flag related ones) >>>>>> >>>>>> Fix detail: >>>>>> >>>>>> 1) I save restore the "do not unlock" flag in async handling. >>>> >>>> Sorry but you completely changed the fix compared to what we >>>> discussed and what I pre-reviewed! What happened to changing from >>>> JRT_ENTRY to JRT_ENTRY_NOASYNC? It is going to take me a lot of >>>> time and effort to determine that this save/restore of the "do not >>>> unlock flag" is actually correct and valid! >>>> >>>>>> >>>>>> 2) Return for floating pending exception for some cases(PopFrame, >>>>>> Early return related). This is debug(JVMTI) feature and floating >>>>>> exception can get cleaned just like that in present compiler >>>>>> request and deopt code. >>>> >>>> What part of the change addresses this? >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>>> >>>>>> webrev :http://cr.openjdk.java.net/~jcm/8246381/webrev.02/ >>>>>> >>>>>> There are more problems in these code areas, like we clear all >>>>>> exceptions in compilation request path(interpreter,c1), as well >>>>>> as deoptimization path. >>>>>> >>>>>> All these un-handled cases will be separately handled by >>>>>> https://bugs.openjdk.java.net/browse/JDK-8249451 >>>>>> >>>>>> Request for review. >>>>>> >>>>>> Best regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> [1]https://bugs.openjdk.java.net/browse/JDK-8231264 >>>>>> >>>>>> >>>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8246727 >>>>>> >>>>>> [3] https://bugs.openjdk.java.net/browse/JDK-8221207 >>>>>> > From jatin.bhateja at intel.com Thu Jul 16 14:52:13 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 16 Jul 2020 14:52:13 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 Message-ID: Hi Vladimir, Andrew, Thanks for your comments. I have placed updated patch at following location. http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ Summary of changes: 1) Optimization is specifically targeted to exploit vector rotation instruction added for X86 AVX512. A single rotate instruction encapsulates entire vector OR/SHIFTs pattern thus offers better latency at reduced instruction count. 2) There were two approaches to implement this: a) Let everything remain the same and add new wide complex instruction patterns in the matcher for e.g. set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI shift)) (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate shift)) It would have been an overoptimistic assumption to expect that graph shape would be preserved till the matcher for correct inferencing. In addition we would have required multiple such bulky patterns. b) Create new RotateLeft/RotateRight scalar nodes, these gets generated during intrinsification as well as during additional pattern matching during node Idealization, later on these nodes are consumed by SLP for valid vectorization scenarios to emit their vector counterparts which eventually emits vector rotates. 3) I choose approach 2b) since its cleaner, only problem here was that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should either be dismantled back to OR/SHIFT pattern or we penalize the vectorization which would be very costly, other option would have been to add additional vector rotate pattern for UseAVX=3 in the matcher which emit vector OR-SHIFTs instruction but then it will loose on emitting efficient instruction sequence which node sharing (OrV/LShiftV/URShift) offer in current implementation - thus it will not be beneficial for non-AVX512 targets, only saving will be in terms of cleanup of few existing scalar rotate matcher patterns, also old targets does not offer this powerful rotate instruction. Therefore new scalar nodes are created only for AVX512 targets. As per suggestions constant folding scenarios have been covered during Idealizations of newly added scalar nodes. Please review the latest version and share your feedback and test results. Best Regards, Jatin > -----Original Message----- > From: Andrew Haley > Sent: Saturday, July 11, 2020 2:24 PM > To: Vladimir Ivanov ; Bhateja, Jatin > ; hotspot-compiler-dev at openjdk.java.net > Cc: Viswanathan, Sandhya > Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification for X86 > > On 10/07/2020 18:32, Vladimir Ivanov wrote: > > > High-level comment: so far, there were no pressing need in > explicitly > marking the methods as intrinsics. ROR/ROL instructions > were selected > during matching [1]. Now the patch introduces > dedicated nodes > (RotateLeft/RotateRight) specifically for intrinsics > which partly > duplicates existing logic. > > The lack of rotate nodes in the IR has always meant that AArch64 doesn't > generate optimal code for e.g. > > (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > > because, with the RotateLeft expanded to its full combination of ORs and > shifts, it's to complicated to match. At the time I put this to one side > because it wasn't urgent. This is a shame because although such > combinations are unusual they are used in some crypto operations. > > If we can generate immediate-form rotate nodes early by pattern matching > during parsing (rather than depending on intrinsics) we'll get more value > than by depending on programmers calling intrinsics. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jamsheed.c.m at oracle.com Thu Jul 16 16:19:55 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Thu, 16 Jul 2020 21:49:55 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: Hi Vladimir, I ran performance run for http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/? (links in JBS) I don't see any issues, so i would like to go with webrev_fix_EA if it fixes all the reported issues. Best regards, Jamsheed On 16/07/2020 07:25, Jamsheed C M wrote: > Hi Vladimir, > > On 16/07/2020 00:29, Vladimir Kozlov wrote: >> As I said before I agree with your additional checks for StoreN and >> StoreNKlass. >> >> But I have concerns about new is_init_captured_store code. EA is >> mostly looking only on inputs to see Allocation. And in several >> places it expecting only to see Allocation because other cases should >> be filtered out before. > If that is the case, I would like to go with my first webrev for this > fix as it nicely propagate es and there in no unnecessary promotion to > global escape state. > > http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ > > Best regards, > > Jamsheed > >> >> Thanks, >> Vladimir >> >> On 7/15/20 10:54 AM, Jamsheed C M wrote: >>> Hi Vladimir, >>> >>> with unrolling i understand that many cases will just have phis >>> everywhere to outside the loop as the uses are outside the loop. >>> >>> and this is not restricted to escaping objects alone as i depicted. >>> it can be escaping as well as non-escaping. >>> >>> so marking store to them as global escape doesn't seems to be nice >>> idea. i will rework on this fix and get back again. >>> >>> Thank you >>> >>> Best regards >>> >>> Jamsheed >>> >>> On 15/07/2020 08:38, Jamsheed C M wrote: >>>> (unfinished mail got sent, so completing it) >>>> On 15/07/2020 08:21, Jamsheed C M wrote: >>>>> Hi Vladimir, >>>>> >>>>> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>>>>> I looked more on this. EA already does not secularize allocations >>>>>> when Phi nodes merged them - it should handle this case. I did >>>>>> small experiment and relaxed assert for this new (10. needs >>>>>> comment update) case for AddP's base and test passed: >>>>>> >>>>>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>>>>> @@ -2357,6 +2357,7 @@ >>>>>> ?????? int opcode = uncast_base->Opcode(); >>>>>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>>>>> ????????????? opcode == Op_CastX2P || >>>>>> uncast_base->is_DecodeNarrowPtr() || >>>>>> +???????????? (uncast_base->is_Phi() && >>>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>> ????????????? (uncast_base->is_Mem() && >>>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>> ????????????? (uncast_base->is_Proj() && >>>>>> uncast_base->in(0)->is_Allocate()), "sanity"); >>>>>> ???? } >>>>>> >>>>>> Did you hit a case when this may not work? >>>>> >>>>> Yes, right it already doesn't mark it as scalarizable if base >>>>> count is more than one(I think it missed a is_oop check there)[1]. >>>>> >>>>> EA CG adds edges only for oop field making stores to them >>>>> undetected. This makes these stored objects to NoEscape and if >>>>> compiled method continues execution with this NoEscape object can >>>>> have undesired results(i.e synchronization removed). >>>>> >>>>> Probable case would be(didn't verify) >>>>> >>>>> try { >>>>> >>>>> LOOP BEGIN >>>>> >>>>> ? try {throw new Obj()} catch {} >>>>> >>>>> LOOP END >>>>> >>>>> } catch (Obj e) { >>>>> >>>>> } >>>> >>>> Best Regards, >>>> >>>> Jamsheed >>>> >>>> [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >>>> >>>> >>>> >>>>>> >>>>>> >>>>>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation >>>>>> (-XX:+PrintEscapeAnalysis -XX:+PrintEliminateAllocations): >>>>>> >>>>>> ======== Connection graph for? Test::test >>>>>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95 Allocate === >>>>>> 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1 78 ) [[ 96 97 98 105 >>>>>> 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull >>>>>> *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>>>>> Test::test1 @ bci:0 Test::test @ bci:8 >>>>>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108 158 ]] >>>>>> #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>>> >>>>>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93 92? 21 1 >>>>>> 78 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]] rawptr:NotNull ( >>>>>> int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ >>>>>> bci:0 Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ >>>>>> bci:8 >>>>>> ++++ Eliminated: 95 Allocate >>>>>> >>>>>> >>>>>> t\Thanks, >>>>>> Vladimir K >>>>>> >>>>>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I had incorrectly added extra check in assert after offset >>>>>>> computation in address_offset . For addps with non constant >>>>>>> offsets (like [1]) >>>>>>> >>>>>>> Not changing the old assert even though I am not expecting first >>>>>>> addp/second addp(for array addressing) case for init captured >>>>>>> store. >>>>>>> >>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>>>>> >>>>>>> >>>>>>> Best regards, >>>>>>> >>>>>>> Jamsheed >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>> assert(offs != Type::OffsetBot || >>>>>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>>>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >>>>>>> is_captured_store(adr), >>>>>>> ???????????? "offset must be a constant or it is initialization >>>>>>> of array"); >>>>>>> >>>>>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I reworked the fix. I compute offset for all init captures >>>>>>>> stores, but treats this special init captured stores similar to >>>>>>>> unsafe(as these objects are usually GlobalEscape and doesn't >>>>>>>> have any perf implications). >>>>>>>> >>>>>>>> revised webrev: >>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>>>>> >>>>>>>> testing: mach1-5( logs in jbs) >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> request to hold the review. need to change the code for >>>>>>>>> dealing with unsafe access. as current capture code go for >>>>>>>>> more execution time analyzing things. >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>>>>> >>>>>>>>>> Hi all, >>>>>>>>>> >>>>>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>>>>> >>>>>>>>>> Request for review changes made to offset computation and >>>>>>>>>> field write detection for init captured stores due to phis >>>>>>>>>> addition between alloc and init. This happen if init node in >>>>>>>>>> different outer loop wrt to alloc node and there is a loop >>>>>>>>>> opt.? This was required as a result of enhancement [1]. >>>>>>>>>> >>>>>>>>>> Normally init are not associated with multiple alloc node >>>>>>>>>> during EA phase, but changes done for [1] caused the code >>>>>>>>>> shapes of the form [2]? to generate inits associated with >>>>>>>>>> multiple alloc node. >>>>>>>>>> >>>>>>>>>> This had implication in offset computation and field write >>>>>>>>>> detection related to initializing stores. >>>>>>>>>> >>>>>>>>>> Attempt to fix in EA: >>>>>>>>>> >>>>>>>>>> ???? webrev: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>>>>> >>>>>>>>>> Alternate fix: >>>>>>>>>> >>>>>>>>>> ???? Minimize the scenario in compiler generated code by >>>>>>>>>> throwing only j.l.Error from slowpath(all exception >>>>>>>>>> async/sync are handled in runtime exit). >>>>>>>>>> >>>>>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. >>>>>>>>>> Disable full loop opt before EA for detectable patterns and >>>>>>>>>> bailout EA for late detected patterns. >>>>>>>>>> >>>>>>>>>> ???? webrev: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>>>>> >>>>>>>>>> Please advice. >>>>>>>>>> >>>>>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Jamsheed >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> [1] JDK-8231291 >>>>>>>>>> C2: loop >>>>>>>>>> opts before EA should maximally unroll loops >>>>>>>>>> >>>>>>>>>> [2] that have its init node in different outer loop wrt to >>>>>>>>>> alloc node. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> loop begin >>>>>>>>>> >>>>>>>>>> ?? try{ >>>>>>>>>> >>>>>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>>>>>>> allocation, in a loop >>>>>>>>>> >>>>>>>>>> ?? } catch(ex) { >>>>>>>>>> >>>>>>>>>> ?? } >>>>>>>>>> >>>>>>>>>> loop end >>>>>>>>>> >>>>>>>>>> ? 42???? public static IntA test(int n) { >>>>>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>>>>> ?? 44???????????? try { >>>>>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>>>>> ?? 47???????????? } >>>>>>>>>> ?? 48???????? } >>>>>>>>>> ?? 49 >>>>>>>>>> From goetz.lindenmaier at sap.com Thu Jul 16 16:30:23 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 16 Jul 2020 16:30:23 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, I'll answer to the obvious things in this mail now. I'll go through the code thoroughly again and write a review of my findings thereafter. > So here is the new webrev.6 > > Webrev.6: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/ > Delta: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.inc/ Thanks for the incremental webrev, it's helpful! > I spent most of the time running a microbenchmark [1] I wrote to answer > questions from your > review. At first I had trouble with variance in the results until I found out it > was due to the NUMA > architecture of the server I used. After that I noticed that there was a > performance regression of > about 5% even at low agent activity. I finally found out that it was due to the > implementation of > JavaThread::wait_for_object_deoptimization() which is called by the target > of the JVMTI operation to > self suspend for object deoptimization. I fixed this by adding limited spinning > before calling > wait() on the monitor. > > The delta includes many changes in comments, renaming of names, etc. So > I'd like to summarize > functional changes: > > * Collected all the code for the testing feature DeoptimizeObjectsALot in > compileBroker.cpp and reworked it. Thanks, this makes it much more compact. > With DeoptimizeObjectsALot enabled internal threads are started that > deoptimize frames and > objects. The number of threads started are given with > DeoptimizeObjectsALotThreadCountAll and > DeoptimizeObjectsALotThreadCountSingle. The former targets all existing > threads whereas the > latter operates on a single thread selected round robin. > > I removed the mode where deoptimizations were performed at every nth > exit from the runtime. I never used it. Do I get it right? You have a n:1 and a n:all test scenario. n:1: n threads deoptimize 1 Jana thread where n = DOALThreadCountSingle n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > execute it always independently > of is_thread_fully_suspended(). Is this also a performance optimization? > * Bugfix in EscapeBarrier::thread_added(): must not clear deopt flag. Found > this testing with DeoptimizeObjectsALot. Ok. > * Added EscapeBarrier::thread_removed(). Ok. > * EscapeBarrier constructors: barriers can now be entirely disabled by > disabling DoEscapeAnalysis. > This effectively disables the enhancement. Good! > * JavaThread::wait_for_object_deoptimization(): > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > safepoint check! This > caused issues with not walkable stacks with DeoptimizeObjectsALot. OK. As I understand, there was one safepoint check in the old version, now there is one in each iteration. I assume this is intended, right? > - Added limited spinning inspired by HandshakeSpinYield to fix regression in > microbenchmark [1] Ok. Nice improvement, nice catch! > > I refer to some more changes answering your questions and comments inline > below. > > Thanks, > Richard. > > [1] Microbenchmark: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/ > > > I understand you annotate at safepoints where the escape analysis > > finds out that an object is "better" than global escape. > > This are the cases where the analysis identifies optimization > > opportunities. These annotations are then used to deoptimize > > frames and the objects referenced by them. > > Doesn't this overestimate the optimized > > objects? E.g., eliminate_alloc_node has many cases where it bails > > out. > > Yes, the implementation is conservative, but it is comparatively simple and > the additional debug > info is just 2 flags per safepoint. Thanks. It also helped that you explained to me offline that there are more optimizations than only lock elimination and scalar replacement done based on the ea information. The ea refines the IR graph with allows follow up optimizations which can not easily be tracked back to the escaping objects or the call sites where they do not escape. Thus, if there are non-global escaping objects, you have to deoptimize the frame. Did I repeat that correctly? With this understanding, a row of my proposed renamings/comments are obsolete. > On the other hand, those JVMTI operations > that really trigger > deoptimizations are expected to be comparatively infrequent such that > switching to the interpreter > for a few microseconds will hardly have an effect. That sounds reasonable. > I've done microbenchmarking to check this. > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > nchmark/ > > I found that in the worst case performance can be impacted by 10%. If the > agent is extremely active > and does relevant JVMTI calls like GetOwnedMonitorStackDepthInfo() every > millisecond or more often, > then the performance impact can be 30%. But I would think that this is not > realistic. These calls > are issued in interactive sessions to analyze deadlocks. Ok. > We could get more precise deoptimizations by adding a third flag per > safepoint for ea-local objects > among the owned monitors. This would help improve the worst case in the > benchmark. But I'm not > convinced, if it is worth it. > > Refer to the README.txt of the microbenchmark for a more detailled > discussion. > > pcDesc.hpp > > > > I would like to see some documentation of the methods. > Done. I didn't take your text, though, because I only noticed it after writing > my own. Let me know if you are not ok with it. That's fine. My texts were only proposals, you as author know better what goes on anyways. > > scopeDesc.cpp > > > > Besides refactoring copy escape info from pcDesc to scopeDesc > > and add accessors. Trivial. > > > > In scopeDesc.hpp you talk about NoEscape and ArgEscape. > > This are opto terms, but scopeDesc is a shared datastructure > > that does not depend on a specific compiler. > > Please explain what is going on without using these terms. > > Actually these are not too opto specific terms. They are used in the paper > referenced in > escape.hpp. Also you can easily google them. I'd rather keep the comments > as they are. Hmm, I'm not really happy with this, as also the papers are for the compiler community, and probably not familiar to others that work with HotSpot. But stay with your terms if you think it makes it clearer. Anyways, with now understanding why you use conservative Information (see above), the descriptions I had in mind are not precise. > > callnode.hpp > > > > You add functionality to annotate callnodes with escape information > > This is carried through code generation to final output where it is > > added to the compiled methods meta information. > > > > At Safepoints in general jvmti can access > > - Objects that were scalar replaced. They must be reallocated. > > (Flag EliminateAllocations) > > - Objects that should be locked but are not because they never > > escape the thread. They need to be relocked. > > > > At calls, Objects where locks have been removed escape to callees. > > We must persist this information so that if jvmti accesses the > > object in a callee, we can determine by looking at the caller that > > it needs to be relocked. > > Note that the ea-optimization must not be at the current location, it can also > follow when control > returns to the caller. Lock elimination isn't the only relevant optimization. Yes, I understood now, see above. Thanks for explaining. > Accesses to instance > members or array elements can be optimized as well. You mean the compiler can/will ignore volatile or memory ordering requirements for non-escaping objects? Sounds reasonable to do. > > // Returns true if at least one of the arguments to the call is an oop > > // that does not escape globally. > > bool ConnectionGraph::has_arg_escape(CallJavaNode* call) { > > IMHO the method names are descriptive and don't need the comments. But I > give in :) (only replaced > "oop" with "object") Thanks. Yes, object is better than oop. > You are right, it is not correct how flags are checked. Especially if only > running with the JVMCI compiler. > > I changed Deoptimization::deoptimize_objects_internal() to make > reallocation and relocking dependent > on similar checks as in Deoptimization::fetch_unroll_info_helper(). > Furthermore EscapeBarriers are > conditionally activated depending on the following (see EscapeBarrier ctors): > > JVMCI_ONLY(UseJVMCICompiler) NOT_JVMCI(false) > COMPILER2_PRESENT(|| DoEscapeAnalysis) > > So the enhancement can be practically completely disabled by disabling > DoEscapeAnalysis, which is > what C2 currently does if JVMTI capabilities that allow access to local > references are taken. Thanks for fixing. > I went for the latter. > > > In fetch_unroll_info_helper, I don't understand why you need > > && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { > > for eliminated locks, but not for skalar replaced objects? > > In short reallocation is idempotent, relocking is not. > > Without the enhancement Deoptimization::realloc_objects() can already be > called more than once for a frame: > > First call in materializeVirtualObjects() (also iterateFrames()). > > Second (indirect) call in fetch_unroll_info_helper(). > > The objects from the first call are saved as jvmti deferred updates when > realloc_objects() > returns. Note that there is no relationship to jvmti. The thing in common is > that updates cannot be > directely installed into a compiled frame, it is necessary to deoptimize the > frame and defer the > updates until the compiled frame gets replaced. Every time the vframes > corresponding to the owner > frame are iterated, they get the deferred updates. So in > fetch_unroll_info_helper() the > GrowableArray* chunk reference them too. All > references to the objects created by > the second (indirect) call to realloc_objects() are never used, because > compiledVFrame accessors to > locals, expressions, and monitors override them with the deferred updates. > The objects become > unreachable and get gc'ed. OK, so repeatedly computed vFrames always have the first version of reallocated objects by construction, so it needs not be handled here. But also due to construction, objects might be allocated just to be discarded. > materializeVirtualObjects() does not bother with relocking. > deoptimize_objects_internal(), which is > introduced by the enhancement, does relock objects, after all the lock > elimination becomes illegal > with the change in escape state. Relocking twice does not work, so the > enhancement avoids it by > checking EscapeBarrier::objs_are_deoptimized(thread, deoptee.id()). > > Note that materializeVirtualObjects() can be called more than once and will > always return the very > same objects, even though it calls realloc_objects() again. Ok. > > I would guess it is because the eliminated locks can be applied to > > argEscape, but scalar replacement only to noescape objects? > > I.e. it might have been done before? > > > > But why isn't this the case for eliminate_allocations? > > deoptimize_objects_internal does both unconditionally, > > so both can happen to inner frames, right? > > Sorry, I don't quite understand. Hope the explanation above helps. Yes. I was guessing wrong :) > > I like if boolean operators are at the beginning of broken lines, > > but I think hotspot convention is to have them at the end. > Ok, fixed. Thanks. > > > Code will get much more simple if BiasedLocking is removed. > > > > EscapeBarrier:: ... > > > > (This class maybe would qualify for a file of its own.) > > > > deoptimize_objects() > > I would mention escape analysis only as side remark. Also, as I understand, > > there is only one frame at given depth? > > // Deoptimize frames with optimized objects. This can be omitted locks and > > // objects not allocated but replaced by scalars. In C2, these optimizations > > // are based on escape analysis. > > // Up to depth, deoptimize frames with any optimized objects. > > // From depth to entry_frame, deoptimize only frames that > > // pass optimized objects to their callees. > > (First part similar for the comment above > EscapeBarrier::deoptimize_objects_internal().) > > I've reworked the comment. Let me know if you still think it needs to be > improved. Good now, thanks (maybe break the long line ...) > > What is the check (cur_depth <= depth) good for? Can you > > ever walk past entry_frame? > > Yes (assuming you mean the outer while-statement), there are java frames > beyond the entry frame if a > native method calls java methods again. So we visit all frames up to the given > depth and from there > we continue to the entry frame. It is not necessary to continue beyond that > entry frame, because > escape analysis assumes that arguments to native functions escape globally. > > Example: Let the java stack look like this: > > +---------+ > | Frame A | > +---------+ > | Frame N | > +---------+ > | Frame B | > +---------+ <- top of stack > > Where java method A calls native method N and N calls java method B. > > Very simplified the native stack will look like this > > +-------------------------+ > | Frame of JIT Compiled A | > +-------------------------+ > | Frame N | > +-------------------------+ > | Entry Frame | > +-------------------------+ > | Frame B | > +-------------------------+ <- top of stack > > The entry frame is an activation of the call stub, which is a small assembler > routine that > translates from the native calling convention to the java calling convention. > > There cannot be any ArgEscape that is passed to B (see above), therefore we > can stop the stackwalk > at the entry frame if depth is 1. If depth is 3 we have to continue to Frame A, > as it is directely > accessed. Ok, thanks, nice explanation!! > > Isn't vf->is_compiled_frame() prerequisite that "Move to next physical > frame" > > is needed? You could move it into the other check. > > If so, similar for deoptimize_objects_all_threads(). > > Only compiledVFrame require moving to the /top/ frame. Fixed. Thanks, this looks better. > > Syncronization: looks good. I think others had a look at this before. > > > > EscapeBarrier::deoptimize_objects_internal() > > The method name is misleading, it is not used by > > deoptimize_objects(). > > Also, method with the same name is in Deopitmization. > > Proposal: deoptimize_objects_thread() ? > > Sorry, but I don't see, why it would be misleading. > What would be the meaning of 'deoptimize_objects_thread'? I don't > understand that name. 1. I have no idea why it's called "_internal". Because it is private? By the name, I would expect that EscapeBarrier::deoptimize_objects() calls it for some internal tasks. But it does not. 2. My proposal: deoptimize_objects_all_threads() iterates all threads and calls deoptimize_objects(_one)_thread(thread) for each of these. That's how I would have named it. But no bike shedding, if you don't see what I mean it's not obvious. > > C1 stubs: this really shows you tested all configurations, great! > > > > > > mutexLocker: ok. > > objectMonitor.cpp: ok > > stackValue.hpp Is this missing clearing a bug? > > In short: that change is not needed anymore. I'll remove it again. Good. Thanks for the details. > > Renaming deferred_locals to deferred_updates is good, as well as > > adding a datastructure for it. > > (Adding this data structure might be a breakout, too.) > > > > good. > > > > thread.cpp > > > > good. > > > > vframe.cpp > > > > Is this a bug in existing code? > > Makes sense. > > Depends on your definition of bug. There are no references to > vframe::is_entry_frame() in the > existing code. I would think it is a bug. So it is :) > > > > > vframe_hp.hpp > > (What stands _hp for? helper? The file should be named > compiledVFrame ...) > > > > not_global_escape_in_scope() ... > > Again, you mention escape analysis here. Comments above hold, too. > > I think it is the right name, because it is meaningful and simple. Ok, accepted ... given my understandings from above. > > > You introduce JvmtiDeferredUpdates. Good. > > > > vframe_hp.cpp > > > > Changes for JvmtiDeferredUpdates, escape state accessors, > > > > line 422: > > Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here? > > > > > > macros.hpp > > Good. > > > > > > Test coding > > ============ > > > > compileBroker.h|cpp > > > > You introduce a third class of threads handled here and > > add a new flag to distinguish it. Before, the two kinds > > of threads were distinguished implicitly by passing in > > a compiler for compiler threads. > > The new thread kind is only used for testing in debug. > > > > make_thread: > > You could assert (comp != NULL...) to assure previous > > conditions. > > If replaced the if-statements with a switch-statement, made sure all enum- > elements are covered, and > added the assertion you suggested. > > > line 989 indentation broken > > You are referring to this block I assume: > (from > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/src/hots > pot/share/compiler/compileBroker.cpp.frames.html) > > 976 if (MethodFlushing) { > 977 // Initialize the sweeper thread > 978 Handle thread_oop = create_thread_oop("Sweeper thread", CHECK); > 979 jobject thread_handle = JNIHandles::make_local(THREAD, > thread_oop()); > 980 make_thread(sweeper_t, thread_handle, NULL, NULL, THREAD); > 981 } > 982 > 983 #if defined(ASSERT) && COMPILER2_OR_JVMCI > 984 if (DeoptimizeObjectsALot == 2) { > 985 // Initialize and start the object deoptimizer threads > 986 for (int thread_count = 0; thread_count < > DeoptimizeObjectsALotThreadCount; thread_count++) { > 987 Handle thread_oop = create_thread_oop("Deoptimize objects a lot > thread", CHECK); > 988 jobject thread_handle = JNIHandles::make_local(THREAD, > thread_oop()); > 989 make_thread(deoptimizer_t, thread_handle, NULL, NULL, THREAD); > 990 } > 991 } > 992 #endif // defined(ASSERT) && COMPILER2_OR_JVMCI > > I cannot really see broken indentation here. Am I looking at the wrong > location? I don't have the source version I reviewed last time any more, so I can't check. But maybe an artefact from patching ... if there were tabs jcheck would have told you, so that's not it. No problem. Best regards, Goetz. From igor.ignatyev at oracle.com Thu Jul 16 17:05:17 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 16 Jul 2020 10:05:17 -0700 Subject: [15] RFR(T) : 8249622 : use 8249621 to ignore 8 jvmci tests Message-ID: http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ > 2 lines changed: 0 ins; 0 del; 12 mod; Hi all, could you please review this trivial patch which updates @ignore tag in 8 jvmci to follow common practice and have a bug id? from JBS: > JDK-8220623 added @ignore to 8 jvmci tests but didn't provide any bug id, JDK-8249621 has been created to address the problem w/ the tests, this issue is to change @ignore to be followed by 8249621. JBS: https://bugs.openjdk.java.net/browse/JDK-8249622 webrev: http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ Thanks, -- Igor 8220623: https://bugs.openjdk.java.net/browse/JDK-8220623 8249621: https://bugs.openjdk.java.net/browse/JDK-8249621 From ekaterina.pavlova at oracle.com Thu Jul 16 17:28:05 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Thu, 16 Jul 2020 10:28:05 -0700 Subject: [15] RFR(T) : 8249622 : use 8249621 to ignore 8 jvmci tests In-Reply-To: References: Message-ID: <5a2c1162-ed38-bbe6-4192-36539243800b@oracle.com> Looks good, -katya On 7/16/20 10:05 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ >> 2 lines changed: 0 ins; 0 del; 12 mod; > > > Hi all, > > could you please review this trivial patch which updates @ignore tag in 8 jvmci to follow common practice and have a bug id? > > from JBS: >> JDK-8220623 added @ignore to 8 jvmci tests but didn't provide any bug id, JDK-8249621 has been created to address the problem w/ the tests, this issue is to change @ignore to be followed by 8249621. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249622 > webrev: http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ > > Thanks, > -- Igor > > > 8220623: https://bugs.openjdk.java.net/browse/JDK-8220623 > 8249621: https://bugs.openjdk.java.net/browse/JDK-8249621 > From vladimir.kozlov at oracle.com Thu Jul 16 19:56:19 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 16 Jul 2020 12:56:19 -0700 Subject: [15] RFR(T) : 8249622 : use 8249621 to ignore 8 jvmci tests In-Reply-To: References: Message-ID: <8f0c6d12-7783-5b4e-f79b-554d31337258@oracle.com> Good. Thanks, Vladimir K On 7/16/20 10:05 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ >> 2 lines changed: 0 ins; 0 del; 12 mod; > > > Hi all, > > could you please review this trivial patch which updates @ignore tag in 8 jvmci to follow common practice and have a bug id? > > from JBS: >> JDK-8220623 added @ignore to 8 jvmci tests but didn't provide any bug id, JDK-8249621 has been created to address the problem w/ the tests, this issue is to change @ignore to be followed by 8249621. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249622 > webrev: http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ > > Thanks, > -- Igor > > > 8220623: https://bugs.openjdk.java.net/browse/JDK-8220623 > 8249621: https://bugs.openjdk.java.net/browse/JDK-8249621 > From igor.ignatyev at oracle.com Fri Jul 17 03:04:13 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Thu, 16 Jul 2020 20:04:13 -0700 Subject: [15] RFR(T) : 8249622 : use 8249621 to ignore 8 jvmci tests In-Reply-To: <8f0c6d12-7783-5b4e-f79b-554d31337258@oracle.com> References: <8f0c6d12-7783-5b4e-f79b-554d31337258@oracle.com> Message-ID: <1C0D0178-80EA-402A-B114-80230B0BB663@oracle.com> Katya, Vladimir, thank you for your review, pushed to jdk15. -- Igor > On Jul 16, 2020, at 12:56 PM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir K > > On Jul 16, 2020, at 10:28 AM, Ekaterina Pavlova wrote: > > Looks good, > > -katya > On 7/16/20 10:05 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ >>> 2 lines changed: 0 ins; 0 del; 12 mod; >> Hi all, >> could you please review this trivial patch which updates @ignore tag in 8 jvmci to follow common practice and have a bug id? >> from JBS: >>> JDK-8220623 added @ignore to 8 jvmci tests but didn't provide any bug id, JDK-8249621 has been created to address the problem w/ the tests, this issue is to change @ignore to be followed by 8249621. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249622 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249622/webrev.00/ >> Thanks, >> -- Igor >> 8220623: https://bugs.openjdk.java.net/browse/JDK-8220623 >> 8249621: https://bugs.openjdk.java.net/browse/JDK-8249621 From goetz.lindenmaier at sap.com Fri Jul 17 12:30:40 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 17 Jul 2020 12:30:40 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, > I'll answer to the obvious things in this mail now. > I'll go through the code thoroughly again and write > a review of my findings thereafter. As promised a detailed walk-throug, but without any major findings: c1_IR.hpp: ok ci_Env.h|cpp: ok compiledMethod.cpp, nmethod.cpp: ok debugInfoRec.h|cpp: ok scopeDesc.h|cpp ok compileBroker.h|cpp: Maybe a bit of documentation how and why you start the threads? I had expected there are two test scenarios run after each other, but now I understand 'Single' and 'All' run simultaneously. Well, this really is a stress test! Also good the two variants of depotimization are stressed against each other. Besides that really nice it's all in one place. rootResolver.cpp: ok jvmciCodeInstaller.cpp: ok c2compiler.cpp: The essence of this change! Just one line :) Great! callnode.hpp ok escape.h|cpp ok macro.cpp I was not that happy with the names saying not_global_escape and similar. I now agreed you have to use the terms of the escape analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with the 'not' in the term, I always try to expand the name to some sentence with a negated verb, but it makes no sense. For example, "has_not_global_escape_in_scope" expands to "Hasn't a global escape in its scope." in my thinking, which makes no sense. You probably mean "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} in its scope." C2 is using the word "non" in this context, e.g., here alloc->is_non_escaping. non obviously negates the adjective 'global', non-global or nonglobal even is a English term I find in the net. So what about "has_non_global_escape_in_scope?" matcher.cpp ok output.cpp:1071 Please break the long line. jvmtiCodeBlobEvents.cpp ok jvmtiEnv.cpp MaxJavaStackTraceDepth is only documented to affect the exceptions stack trace depth, not to limit jvmti operations. Therefore I wondered why it is used here. Non of your business, but the flag should document this in globals.hpp, too. Does jvmti specify that the same limits are used ...? ok on your side. jvmtiEnvBase.cpp ok jvmtiImpl.h|cpp ok jvmtiTagMap.cpp ok whitebox.cpp ok deoptimization.cpp line 177: Please break line line 246, 281: Please break line 1578, 1583, 1589, 1632, 1649, 1651 Break line 1651: You use 'non'-terms, too: non-escaping :) 2805, 2929, 2946ff, break lines deoptimization.hpp 158, 174, 176 ... I would break lines too, but here you are in good company :) globals.hpp ok mutexLocker.h|cpp ok objectMonitor.cpp ok thread.cpp 2631 typo: sapfepont --> safepoint thread.hpp ok thread.inline.hpp ok vframe.cpp ok vframe_hp.cpp 458ff break lines vframe_hp.hpp ok macros.hpp ok TEST.ROOT ok WhiteBox.java ok IterateHeapWithEscapeAnalysisEnabled.java line 415: msg("wait until target thread has set testMethod_result"); while (testMethod_result == 0) { Thread.sleep(50); } Might the test run into timeouts at this place? The field is volatile, i.e. it will be reloaded in each iteration. But will dontinline_testMethod write it back to main memory in time? libIterateHeapWithEscapeAnalysisEnabled.c ok EATests.java This is a very elaborate test. I found a row of test cases illustrating issues we talked about before. Really helpful! 1311: TypeO materialize -> materialized 1640: setting local variable i triggers always deoptimization --> setting local variable i always triggers deoptimization 2176: dontinline_calee --> dontinline_callee 2510: poping --> popping ... but I'm not sure here. https://www.urbandictionary.com/define.php?term=poping poping Drinking large amounts of Dextromethorphan Hydrobromide (DXM)based cough syrup, and then embarking on an adventure while wandering around neighborhoods or parks all night. This is usually done while listening to Punk rock music from a portable jambox. ;) Don?t do it! ?? EATestsJVMTI.java I think you can just copy this test description into the other test. You can have two @test comments, they will be treated as separate tests. The @requires will be evaluated accordingly. For an example see test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java which has two different compile setups for the test class (-g). so, that's it for reading code ... Some general remarks, maybe a bit picky ...: I think you could use less commas ',' in comments. As I understand, you need a comma if the relative sentence is at the beginning, but not if it is at the end: If Corona is over, I go to the office. but I go to the office if Corona is over. I think the same holds for 'because', 'while' etc. E.g., jvmtiEnvBase.cpp:1313, jvmtiImpl.cpp:646ff, vframe_hp.hpp 104ff Also, I like full sentences in comments. Especially for me as foreign speaker, this makes things much more clear. I.e., I try to make it a real sentence with articles, capitalized and a dot at the end if there is a subject and a verb in first place. E.g., jvmtiEnvBase.cpp:1327 In many places, your comments read really well but some are quite abbreviated I think. E.g. thread.cpp:2601 is an example where a simple 'a' helps a lot. "Single deoptimization is typically very short." I would add 'A': "A single deoptimization is typically very short (fast?)." An other meaning of the comment I first considered is this: "Single deoptimization is typically very short, all_threads deoptimization takes longer" having in mind the functions EscapeBarries::deoptimize_objects_all_threads() and EscapeBarries::deoptimize_objects() doing a single thread. German with it's compound nouns is helpful here :) Einzeldeoptimierung <--> eine einzelne Deoptimierung Best regards, Goetz. From igor.ignatyev at oracle.com Fri Jul 17 17:22:04 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 10:22:04 -0700 Subject: [15] RFR(T) : 8249673 : cleanup graal problem lists Message-ID: <2564EBA5-2F22-4105-B5AE-984018F7D8C2@oracle.com> http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 > 21 lines changed: 0 ins; 5 del; 16 mod; Hi all, could you please review this clean up of ProblemList-graal.txt in hotspot and jdk test suites? from JBS: > graal problem-lists list several already closed bugs: > - JDK-8193210 fixed in jdk15-b17 > - JDK-8244656, JDK-8204347, JDK-8230419, JDK-8181833 closed as dup of JDK-8207267 JBS: https://bugs.openjdk.java.net/browse/JDK-8249673 webrev: http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 testing: - jdk/jfr/event/compiler/ tests w/ Graal as JIT - grep-ed for bug ids Thanks, -- Igor JDK-8193210 : https://bugs.openjdk.java.net/browse/JDK-8193210 JDK-8244656 : https://bugs.openjdk.java.net/browse/JDK-8244656 JDK-8204347 : https://bugs.openjdk.java.net/browse/JDK-8204347 JDK-8230419 : https://bugs.openjdk.java.net/browse/JDK-8230419 JDK-8207267 : https://bugs.openjdk.java.net/browse/JDK-8207267 From vladimir.kozlov at oracle.com Fri Jul 17 17:29:42 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 10:29:42 -0700 Subject: [15] RFR(T) : 8249673 : cleanup graal problem lists In-Reply-To: <2564EBA5-2F22-4105-B5AE-984018F7D8C2@oracle.com> References: <2564EBA5-2F22-4105-B5AE-984018F7D8C2@oracle.com> Message-ID: <4d6fe5f2-b947-50cf-0f51-6f8f218e1fad@oracle.com> LGTM Thanks, Vladimir K On 7/17/20 10:22 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 >> 21 lines changed: 0 ins; 5 del; 16 mod; > > > Hi all, > > could you please review this clean up of ProblemList-graal.txt in hotspot and jdk test suites? > > from JBS: >> graal problem-lists list several already closed bugs: >> - JDK-8193210 fixed in jdk15-b17 >> - JDK-8244656, JDK-8204347, JDK-8230419, JDK-8181833 closed as dup of JDK-8207267 > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249673 > webrev: http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 > testing: > - jdk/jfr/event/compiler/ tests w/ Graal as JIT > - grep-ed for bug ids > > Thanks, > -- Igor > > JDK-8193210 : https://bugs.openjdk.java.net/browse/JDK-8193210 > JDK-8244656 : https://bugs.openjdk.java.net/browse/JDK-8244656 > JDK-8204347 : https://bugs.openjdk.java.net/browse/JDK-8204347 > JDK-8230419 : https://bugs.openjdk.java.net/browse/JDK-8230419 > > JDK-8207267 : https://bugs.openjdk.java.net/browse/JDK-8207267 > From igor.ignatyev at oracle.com Fri Jul 17 17:51:26 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 10:51:26 -0700 Subject: [15] RFR(T) : 8249678 : @ignore should be used instead of ProblemList for 8158860, 8163894, 8193479, 8194310 Message-ID: <1BB411DA-3695-4CA4-B77D-9B834D03BEF4@oracle.com> http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 > 10 lines changed: 4 ins; 5 del; 1 mod; Hi all, could you please review this trivial clean up which replaces ProblemList entires w/ @ignore tag in tests which aren't runnable? - compiler/jvmci/compilerToVM/GetResolvedJavaTypeTest.java isn't runnable due to 8158860 - compiler/jvmci/compilerToVM/InvalidateInstalledCodeTest.java isn't runnable due to 8163894 - compiler/codegen/Test6896617.java isn't runnable due to 8193479 - compiler/c2/Test6852078.java isn't runnable due to 8194310 from main bug(8249618): > although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). > > due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. JBS: https://bugs.openjdk.java.net/browse/JDK-8249678 webrev: http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 Thanks, -- Igor 8249618 : https://bugs.openjdk.java.net/browse/JDK-8249618 8158860 : https://bugs.openjdk.java.net/browse/JDK-8158860 8163894 : https://bugs.openjdk.java.net/browse/JDK-8163894 8193479 : https://bugs.openjdk.java.net/browse/JDK-8193479 8194310 : https://bugs.openjdk.java.net/browse/JDK-8194310 From sandhya.viswanathan at intel.com Fri Jul 17 18:32:04 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Fri, 17 Jul 2020 18:32:04 +0000 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): Hotspot and x86 backend changes Message-ID: Hi Vladimir and Coleen, We are getting ready to propose to target Vector API to JDK 16. Please find below the updated hotspot and x86 backend changes: Shared Hotspot: Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/hs_webrev/webrev.01/ Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/hs_webrev/webrev.00-webrev.01/ X86: Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/ Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00-webrev.01/ Older webrev links for your reference: Shared Hotspot: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/ X86b backend: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00/ To get incremental webrev, I had to do some adjustments to these to be able to apply it to the jdk tip. Please let us know your feedback and if we have ok from you to propose to target to JDK 16. Best Regards, Sandhya -----Original Message----- From: Vladimir Kozlov Sent: Friday, May 01, 2020 6:05 PM To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; hotspot-dev Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): x86 backend changes On 5/1/20 5:55 PM, Viswanathan, Sandhya wrote: > Hi Vladimir, > > Thanks a lot for the feedback. > > We used an old existing separate branch to share the code for review and to track changes. > We didn?t know how to change the name of the branch from vector-unstable to vector-stable. Good to know that it does not mean that code is "unstable" ;) Katya filed today new bug [1]. Please look. Regards, Vladimir [1] https://bugs.openjdk.java.net/browse/JDK-8244269 > > Best Regards, > Sandhya > > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, May 01, 2020 5:32 PM > To: Viswanathan, Sandhya ; hotspot-compiler-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; hotspot-dev > Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): x86 backend changes > > Changes seems fine. Nice work. > > Why it is called "vector-unstable branch"? > > Thanks, > Vladimir K > > On 4/3/20 5:16 PM, Viswanathan, Sandhya wrote: >> Hi, >> >> >> Following up on review requests of API [0], Java implementation [1] and >> >> General Hotspot changes[3] for Vector API, here's a request for review >> >> of x86 backend changes required for supporting the API: >> >> >> >> JEP: https://openjdk.java.net/jeps/338 >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8223347 >> >> Webrev:http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00/ >> >> >> >> Complete implementation resides in vector-unstable branch of >> >> panama/dev repository [3]. >> >> Looking forward to your feedback. >> >> Best Regards, >> Sandhya >> >> >> [0] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >> >> >> >> [1] https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-April/065587.html >> >> >> >> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-April/037798.html >> >> >> >> [3] https://openjdk.java.net/projects/panama/ >> >> $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable >> >> >> >> >> From vladimir.kozlov at oracle.com Fri Jul 17 18:39:57 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 11:39:57 -0700 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: Yes, I agree with webrev_fix_EA version. I would suggest to modify TestIdealAllocShape.java test to add new method with synchronization from your example in JBS comment. Or add it as separate test. Thanks, Vladimir On 7/16/20 9:19 AM, Jamsheed C M wrote: > Hi Vladimir, > I ran performance run for http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/? (links in JBS) > I don't see any issues, so i would like to go with webrev_fix_EA if it fixes all the reported issues. > Best regards, > Jamsheed > > On 16/07/2020 07:25, Jamsheed C M wrote: >> Hi Vladimir, >> >> On 16/07/2020 00:29, Vladimir Kozlov wrote: >>> As I said before I agree with your additional checks for StoreN and StoreNKlass. >>> >>> But I have concerns about new is_init_captured_store code. EA is mostly looking only on inputs to see Allocation. And >>> in several places it expecting only to see Allocation because other cases should be filtered out before. >> If that is the case, I would like to go with my first webrev for this fix as it nicely propagate es and there in no >> unnecessary promotion to global escape state. >> >> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >> >> Best regards, >> >> Jamsheed >> >>> >>> Thanks, >>> Vladimir >>> >>> On 7/15/20 10:54 AM, Jamsheed C M wrote: >>>> Hi Vladimir, >>>> >>>> with unrolling i understand that many cases will just have phis everywhere to outside the loop as the uses are >>>> outside the loop. >>>> >>>> and this is not restricted to escaping objects alone as i depicted. it can be escaping as well as non-escaping. >>>> >>>> so marking store to them as global escape doesn't seems to be nice idea. i will rework on this fix and get back again. >>>> >>>> Thank you >>>> >>>> Best regards >>>> >>>> Jamsheed >>>> >>>> On 15/07/2020 08:38, Jamsheed C M wrote: >>>>> (unfinished mail got sent, so completing it) >>>>> On 15/07/2020 08:21, Jamsheed C M wrote: >>>>>> Hi Vladimir, >>>>>> >>>>>> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>>>>>> I looked more on this. EA already does not secularize allocations when Phi nodes merged them - it should handle >>>>>>> this case. I did small experiment and relaxed assert for this new (10. needs comment update) case for AddP's base >>>>>>> and test passed: >>>>>>> >>>>>>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>>>>>> @@ -2357,6 +2357,7 @@ >>>>>>> ?????? int opcode = uncast_base->Opcode(); >>>>>>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>>>>>> ????????????? opcode == Op_CastX2P || uncast_base->is_DecodeNarrowPtr() || >>>>>>> +???????????? (uncast_base->is_Phi() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>>> ????????????? (uncast_base->is_Mem() && (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>>> ????????????? (uncast_base->is_Proj() && uncast_base->in(0)->is_Allocate()), "sanity"); >>>>>>> ???? } >>>>>>> >>>>>>> Did you hit a case when this may not work? >>>>>> >>>>>> Yes, right it already doesn't mark it as scalarizable if base count is more than one(I think it missed a is_oop >>>>>> check there)[1]. >>>>>> >>>>>> EA CG adds edges only for oop field making stores to them undetected. This makes these stored objects to NoEscape >>>>>> and if compiled method continues execution with this NoEscape object can have undesired results(i.e >>>>>> synchronization removed). >>>>>> >>>>>> Probable case would be(didn't verify) >>>>>> >>>>>> try { >>>>>> >>>>>> LOOP BEGIN >>>>>> >>>>>> ? try {throw new Obj()} catch {} >>>>>> >>>>>> LOOP END >>>>>> >>>>>> } catch (Obj e) { >>>>>> >>>>>> } >>>>> >>>>> Best Regards, >>>>> >>>>> Jamsheed >>>>> >>>>> [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >>>>> >>>>> >>>>>>> >>>>>>> >>>>>>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed allocation (-XX:+PrintEscapeAnalysis >>>>>>> -XX:+PrintEliminateAllocations): >>>>>>> >>>>>>> ======== Connection graph for? Test::test >>>>>>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95 Allocate === 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78? 1 78 ) [[ >>>>>>> 96 97 98 105 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 >>>>>>> Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>>>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95? [[ 108 158 ]] #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>>>> >>>>>>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93 92? 21 1 78 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]] >>>>>>> rawptr:NotNull ( int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>>>>>> Test::test1 @ bci:0 Test::test @ bci:8 >>>>>>> ++++ Eliminated: 95 Allocate >>>>>>> >>>>>>> >>>>>>> t\Thanks, >>>>>>> Vladimir K >>>>>>> >>>>>>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> I had incorrectly added extra check in assert after offset computation in address_offset . For addps with non >>>>>>>> constant offsets (like [1]) >>>>>>>> >>>>>>>> Not changing the old assert even though I am not expecting first addp/second addp(for array addressing) case for >>>>>>>> init captured store. >>>>>>>> >>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>>>>>> >>>>>>>> Best regards, >>>>>>>> >>>>>>>> Jamsheed >>>>>>>> >>>>>>>> [1] >>>>>>>> >>>>>>>> assert(offs != Type::OffsetBot || >>>>>>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>>>>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || is_captured_store(adr), >>>>>>>> ???????????? "offset must be a constant or it is initialization of array"); >>>>>>>> >>>>>>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I reworked the fix. I compute offset for all init captures stores, but treats this special init captured stores >>>>>>>>> similar to unsafe(as these objects are usually GlobalEscape and doesn't have any perf implications). >>>>>>>>> >>>>>>>>> revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>>>>>> >>>>>>>>> testing: mach1-5( logs in jbs) >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> request to hold the review. need to change the code for dealing with unsafe access. as current capture code go >>>>>>>>>> for more execution time analyzing things. >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Jamsheed >>>>>>>>>> >>>>>>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>>>>>> >>>>>>>>>>> Request for review changes made to offset computation and field write detection for init captured stores due >>>>>>>>>>> to phis addition between alloc and init. This happen if init node in different outer loop wrt to alloc node >>>>>>>>>>> and there is a loop opt.? This was required as a result of enhancement [1]. >>>>>>>>>>> >>>>>>>>>>> Normally init are not associated with multiple alloc node during EA phase, but changes done for [1] caused >>>>>>>>>>> the code shapes of the form [2]? to generate inits associated with multiple alloc node. >>>>>>>>>>> >>>>>>>>>>> This had implication in offset computation and field write detection related to initializing stores. >>>>>>>>>>> >>>>>>>>>>> Attempt to fix in EA: >>>>>>>>>>> >>>>>>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>>>>>> >>>>>>>>>>> Alternate fix: >>>>>>>>>>> >>>>>>>>>>> ???? Minimize the scenario in compiler generated code by throwing only j.l.Error from slowpath(all exception >>>>>>>>>>> async/sync are handled in runtime exit). >>>>>>>>>>> >>>>>>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. Disable full loop opt before EA for detectable >>>>>>>>>>> patterns and bailout EA for late detected patterns. >>>>>>>>>>> >>>>>>>>>>> ???? webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>>>>>> >>>>>>>>>>> Please advice. >>>>>>>>>>> >>>>>>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> >>>>>>>>>>> Jamsheed >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> [1] JDK-8231291 C2: loop opts before EA should maximally >>>>>>>>>>> unroll loops >>>>>>>>>>> >>>>>>>>>>> [2] that have its init node in different outer loop wrt to alloc node. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> loop begin >>>>>>>>>>> >>>>>>>>>>> ?? try{ >>>>>>>>>>> >>>>>>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after allocation, in a loop >>>>>>>>>>> >>>>>>>>>>> ?? } catch(ex) { >>>>>>>>>>> >>>>>>>>>>> ?? } >>>>>>>>>>> >>>>>>>>>>> loop end >>>>>>>>>>> >>>>>>>>>>> ? 42???? public static IntA test(int n) { >>>>>>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>>>>>> ?? 44???????????? try { >>>>>>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>>>>>> ?? 47???????????? } >>>>>>>>>>> ?? 48???????? } >>>>>>>>>>> ?? 49 >>>>>>>>>>> From vladimir.kozlov at oracle.com Fri Jul 17 18:40:49 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 11:40:49 -0700 Subject: [15] RFR(T) : 8249678 : @ignore should be used instead of ProblemList for 8158860, 8163894, 8193479, 8194310 In-Reply-To: <1BB411DA-3695-4CA4-B77D-9B834D03BEF4@oracle.com> References: <1BB411DA-3695-4CA4-B77D-9B834D03BEF4@oracle.com> Message-ID: Good. Thanks, Vladimir On 7/17/20 10:51 AM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 >> 10 lines changed: 4 ins; 5 del; 1 mod; > > Hi all, > > could you please review this trivial clean up which replaces ProblemList entires w/ @ignore tag in tests which aren't runnable? > > - compiler/jvmci/compilerToVM/GetResolvedJavaTypeTest.java isn't runnable due to 8158860 > - compiler/jvmci/compilerToVM/InvalidateInstalledCodeTest.java isn't runnable due to 8163894 > - compiler/codegen/Test6896617.java isn't runnable due to 8193479 > - compiler/c2/Test6852078.java isn't runnable due to 8194310 > > from main bug(8249618): >> although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). >> >> due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249678 > webrev: http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 > > Thanks, > -- Igor > > 8249618 : https://bugs.openjdk.java.net/browse/JDK-8249618 > > 8158860 : https://bugs.openjdk.java.net/browse/JDK-8158860 > 8163894 : https://bugs.openjdk.java.net/browse/JDK-8163894 > 8193479 : https://bugs.openjdk.java.net/browse/JDK-8193479 > 8194310 : https://bugs.openjdk.java.net/browse/JDK-8194310 > > > From vladimir.x.ivanov at oracle.com Fri Jul 17 18:54:33 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 17 Jul 2020 21:54:33 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: Message-ID: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Hi Jatin, > http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ It definitely looks better, but IMO it hasn't reached the sweet spot yet. It feels like the focus is on auto-vectorizer while the burden is put on scalar cases. First of all, considering GVN folds relevant operation patterns into a single Rotate node now, what's the motivation to introduce intrinsics? Another point is there's still significant duplication for scalar cases. I'd prefer to see the legacy cases which rely on pattern matching to go away and be substituted with instructions which match Rotate instructions (migrating ). I understand that it will penalize the vectorization implementation, but IMO reducing overall complexity is worth it. On auto-vectorizer side, I see 2 ways to fix it: (1) introduce additional AD instructions for RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; (2) in SuperWord::output(), when matcher doesn't support RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), generate vectorized version of the original pattern. Overall, it looks like more and more focus is made on scalar part. Considering the main goal of the patch is to enable vectorization, I'm fine with separating cleanup of scalar part. As an interim solution, it seems that leaving the scalar part as it is now and matching scalar bit rotate pattern in VectorNode::is_rotate() should be enough to keep the vectorization part functioning. Then scalar Rotate nodes and relevant cleanups can be integrated later. (Or vice versa: clean up scalar part first and then follow up with vectorization.) Some other comments: * There's a lot of duplication between OrINode::Ideal and OrLNode::Ideal. What do you think about introducing a super type (OrNode) and put a unified version (OrNode::Ideal) there? * src/hotspot/cpu/x86/x86.ad +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || + n->bottom_type()->is_vect()->element_basic_type() == T_LONG); +instruct vprorate(vec dst, vec src, vec shift) %{ + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || + n->bottom_type()->is_vect()->element_basic_type() == T_LONG); The predicates are redundant here. * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, XMMRegister dst, XMMRegister src, + int shift, int vector_len) { + if (opcode == Op_RotateLeftV) { + if (etype == T_INT) { + evprold(dst, src, shift, vector_len); + } else { + evprolq(dst, src, shift, vector_len); + } Please, put an assert for the false case (assert(etype == T_LONG, "...")). * On testing (with previous version of the patch): -XX:UseAVX is x86-specific flag, so new/adjusted tests now fail on non-x86 platforms. Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions will solve the issue. Best regards, Vladimir Ivanov > > > Summary of changes: > 1) Optimization is specifically targeted to exploit vector rotation instruction added for X86 AVX512. A single rotate instruction encapsulates entire vector OR/SHIFTs pattern thus offers better latency at reduced instruction count. > > 2) There were two approaches to implement this: > a) Let everything remain the same and add new wide complex instruction patterns in the matcher for e.g. > set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI shift)) (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate shift)) > It would have been an overoptimistic assumption to expect that graph shape would be preserved till the matcher for correct inferencing. > In addition we would have required multiple such bulky patterns. > b) Create new RotateLeft/RotateRight scalar nodes, these gets generated during intrinsification as well as during additional pattern > matching during node Idealization, later on these nodes are consumed by SLP for valid vectorization scenarios to emit their vector > counterparts which eventually emits vector rotates. > > 3) I choose approach 2b) since its cleaner, only problem here was that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should either > be dismantled back to OR/SHIFT pattern or we penalize the vectorization which would be very costly, other option would have been to add additional vector rotate pattern for UseAVX=3 in the matcher which emit vector OR-SHIFTs instruction but then it will loose on emitting efficient instruction sequence which node sharing (OrV/LShiftV/URShift) offer in current implementation - thus it will not be beneficial for non-AVX512 targets, only saving will be in terms of cleanup of few existing scalar rotate matcher patterns, also old targets does not offer this powerful rotate instruction. Therefore new scalar nodes are created only for AVX512 targets. > > As per suggestions constant folding scenarios have been covered during Idealizations of newly added scalar nodes. > > Please review the latest version and share your feedback and test results. > > Best Regards, > Jatin > > >> -----Original Message----- >> From: Andrew Haley >> Sent: Saturday, July 11, 2020 2:24 PM >> To: Vladimir Ivanov ; Bhateja, Jatin >> ; hotspot-compiler-dev at openjdk.java.net >> Cc: Viswanathan, Sandhya >> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification for X86 >> >> On 10/07/2020 18:32, Vladimir Ivanov wrote: >> >> > High-level comment: so far, there were no pressing need in > explicitly >> marking the methods as intrinsics. ROR/ROL instructions > were selected >> during matching [1]. Now the patch introduces > dedicated nodes >> (RotateLeft/RotateRight) specifically for intrinsics > which partly >> duplicates existing logic. >> >> The lack of rotate nodes in the IR has always meant that AArch64 doesn't >> generate optimal code for e.g. >> >> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >> >> because, with the RotateLeft expanded to its full combination of ORs and >> shifts, it's to complicated to match. At the time I put this to one side >> because it wasn't urgent. This is a shame because although such >> combinations are unusual they are used in some crypto operations. >> >> If we can generate immediate-form rotate nodes early by pattern matching >> during parsing (rather than depending on intrinsics) we'll get more value >> than by depending on programmers calling intrinsics. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From igor.ignatyev at oracle.com Fri Jul 17 18:57:25 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 11:57:25 -0700 Subject: [15] RFR(T) : 8249678 : @ignore should be used instead of ProblemList for 8158860, 8163894, 8193479, 8194310 In-Reply-To: References: <1BB411DA-3695-4CA4-B77D-9B834D03BEF4@oracle.com> Message-ID: thanks Vladimir, pushed. -- Igor > On Jul 17, 2020, at 11:40 AM, Vladimir Kozlov wrote: > > Good. > > Thanks, > Vladimir > > On 7/17/20 10:51 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 >>> 10 lines changed: 4 ins; 5 del; 1 mod; >> Hi all, >> could you please review this trivial clean up which replaces ProblemList entires w/ @ignore tag in tests which aren't runnable? >> - compiler/jvmci/compilerToVM/GetResolvedJavaTypeTest.java isn't runnable due to 8158860 >> - compiler/jvmci/compilerToVM/InvalidateInstalledCodeTest.java isn't runnable due to 8163894 >> - compiler/codegen/Test6896617.java isn't runnable due to 8193479 >> - compiler/c2/Test6852078.java isn't runnable due to 8194310 >> from main bug(8249618): >>> although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). >>> >>> due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249678 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249678/webrev.00 >> Thanks, >> -- Igor >> 8249618 : https://bugs.openjdk.java.net/browse/JDK-8249618 >> 8158860 : https://bugs.openjdk.java.net/browse/JDK-8158860 >> 8163894 : https://bugs.openjdk.java.net/browse/JDK-8163894 >> 8193479 : https://bugs.openjdk.java.net/browse/JDK-8193479 >> 8194310 : https://bugs.openjdk.java.net/browse/JDK-8194310 From igor.ignatyev at oracle.com Fri Jul 17 18:57:36 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 11:57:36 -0700 Subject: [15] RFR(T) : 8249673 : cleanup graal problem lists In-Reply-To: <4d6fe5f2-b947-50cf-0f51-6f8f218e1fad@oracle.com> References: <2564EBA5-2F22-4105-B5AE-984018F7D8C2@oracle.com> <4d6fe5f2-b947-50cf-0f51-6f8f218e1fad@oracle.com> Message-ID: <0650C344-3216-4C1B-A2D7-5404671998A7@oracle.com> thanks Vladimir, pushed. -- Igor > On Jul 17, 2020, at 10:29 AM, Vladimir Kozlov wrote: > > LGTM > > Thanks, > Vladimir K > > On 7/17/20 10:22 AM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 >>> 21 lines changed: 0 ins; 5 del; 16 mod; >> Hi all, >> could you please review this clean up of ProblemList-graal.txt in hotspot and jdk test suites? >> from JBS: >>> graal problem-lists list several already closed bugs: >>> - JDK-8193210 fixed in jdk15-b17 >>> - JDK-8244656, JDK-8204347, JDK-8230419, JDK-8181833 closed as dup of JDK-8207267 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249673 >> webrev: http://cr.openjdk.java.net/~iignatyev//8249673/webrev.00 >> testing: >> - jdk/jfr/event/compiler/ tests w/ Graal as JIT >> - grep-ed for bug ids >> Thanks, >> -- Igor >> JDK-8193210 : https://bugs.openjdk.java.net/browse/JDK-8193210 >> JDK-8244656 : https://bugs.openjdk.java.net/browse/JDK-8244656 >> JDK-8204347 : https://bugs.openjdk.java.net/browse/JDK-8204347 >> JDK-8230419 : https://bugs.openjdk.java.net/browse/JDK-8230419 >> JDK-8207267 : https://bugs.openjdk.java.net/browse/JDK-8207267 From vladimir.a.ivanov at intel.com Fri Jul 17 19:57:42 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Fri, 17 Jul 2020 19:57:42 +0000 Subject: add microcode version to the hs_err files Message-ID: Hello, could you please review the patch http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ This patch add the microcode version for different OSes that may be useful in the issue resolution process. The reported microcode version for different OSes loos as: Linux (RHEL7.7): # cat hs_err_pid251046.log |grep microc CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb Windows (Win10, v1809): CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt MacOS (Darwin): $ cat hs_err_pid95187.log |grep microc CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, fma, clflush, clflushopt Thanks, Vladimir Thanks, Vladimir From thomas.stuefe at gmail.com Fri Jul 17 21:19:43 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 17 Jul 2020 23:19:43 +0200 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: Hi Vladimir, I think this would be more suited to hotspot-runtime. http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html +#if defined(IA32) || defined(AMD64) Is that not synonymous with x86? + while ((read = getline(&line, &len, fp)) != -1) { + if (len > 10 && strstr(line, "microcode") != NULL) { + char* rev = strchr(line, ':'); + if (rev != NULL) sscanf(rev + 1, "%x", &result); + break; + } + } + free(line); Not sure this works as intended. At the first call to getline() it will allocate a line buffer for you and return it. That buffer will be as large as the first line you happen to read. You then pass that same buffer into getline to fetch the next lines, but what if those are longer than the first? But anyway it would be better to pass a simple caller provided buffer in - stack allocated. Since this function is called at crash time and the C heap could be corrupted. Cheers, Thomas On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < vladimir.a.ivanov at intel.com> wrote: > Hello, > > could you please review the patch > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > > This patch add the microcode version for different OSes that may be useful > in the issue resolution process. > > > > The reported microcode version for different OSes loos as: > > > > Linux (RHEL7.7): > > # cat hs_err_pid251046.log |grep microc > > CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) > family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, > sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, > fma, clflush, clflushopt, clwb > > > > Windows (Win10, v1809): > > CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) > family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, > sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, > fma, clflush, clflushopt > > > > MacOS (Darwin): > > $ cat hs_err_pid95187.log |grep microc > > CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) > family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, > sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, > fma, clflush, clflushopt > > > > Thanks, Vladimir > > > Thanks, Vladimir > > From thomas.stuefe at gmail.com Fri Jul 17 21:26:16 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Fri, 17 Jul 2020 23:26:16 +0200 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe wrote: > Hi Vladimir, > > I think this would be more suited to hotspot-runtime. > > > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html > > +#if defined(IA32) || defined(AMD64) > > Is that not synonymous with x86? > > + while ((read = getline(&line, &len, fp)) != -1) { > + if (len > 10 && strstr(line, "microcode") != NULL) { > + char* rev = strchr(line, ':'); > + if (rev != NULL) sscanf(rev + 1, "%x", &result); > + break; > + } > + } > + free(line); > > Not sure this works as intended. At the first call to getline() it will > allocate a line buffer for you and return it. That buffer will be as large > as the first line you happen to read. You then pass that same buffer into > getline to fetch the next lines, but what if those are longer than the > first? > > Forget that point, getline calls realloc() on the line buffer to resize it, so this should be okay. Thanks, Thomas > But anyway it would be better to pass a simple caller provided buffer in - > stack allocated. Since this function is called at crash time and the C heap > could be corrupted. > > Cheers, Thomas > > > On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> wrote: > >> Hello, >> >> could you please review the patch >> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >> >> This patch add the microcode version for different OSes that may be >> useful in the issue resolution process. >> >> >> >> The reported microcode version for different OSes loos as: >> >> >> >> Linux (RHEL7.7): >> >> # cat hs_err_pid251046.log |grep microc >> >> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per >> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, >> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, >> aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, >> adx, fma, clflush, clflushopt, clwb >> >> >> >> Windows (Win10, v1809): >> >> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) >> family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, >> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >> clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, >> fma, clflush, clflushopt >> >> >> >> MacOS (Darwin): >> >> $ cat hs_err_pid95187.log |grep microc >> >> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) >> family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, >> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >> clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, >> fma, clflush, clflushopt >> >> >> >> Thanks, Vladimir >> >> >> Thanks, Vladimir >> >> From vladimir.a.ivanov at intel.com Fri Jul 17 21:57:37 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Fri, 17 Jul 2020 21:57:37 +0000 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: > +#if defined(IA32) || defined(AMD64) > > Is that not synonymous with x86? This patter was copied from the method ?print_model_name_and_flags? (file os/linux/os_linux.cpp). This method also read the ?/proc/cpuinfo? file and I reuse it as ?template? for the new method. It is better to use one pattern to work with exactly same file but in general you are right. The X86 is defined in the file ./share/utilities/macros.hpp as: #if defined(IA32) || defined(AMD64) #define X86 #define X86_ONLY(code) code #define NOT_X86(code) The question here: could I delete this ?ifdefs? while this method should work on x86 only? Thanks, Vladimir From: Thomas St?fe Sent: Friday, July 17, 2020 2:26 PM To: Ivanov, Vladimir A ; Hotspot dev runtime Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: add microcode version to the hs_err files On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > wrote: Hi Vladimir, I think this would be more suited to hotspot-runtime. http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html +#if defined(IA32) || defined(AMD64) Is that not synonymous with x86? + while ((read = getline(&line, &len, fp)) != -1) { + if (len > 10 && strstr(line, "microcode") != NULL) { + char* rev = strchr(line, ':'); + if (rev != NULL) sscanf(rev + 1, "%x", &result); + break; + } + } + free(line); Not sure this works as intended. At the first call to getline() it will allocate a line buffer for you and return it. That buffer will be as large as the first line you happen to read. You then pass that same buffer into getline to fetch the next lines, but what if those are longer than the first? Forget that point, getline calls realloc() on the line buffer to resize it, so this should be okay. Thanks, Thomas But anyway it would be better to pass a simple caller provided buffer in - stack allocated. Since this function is called at crash time and the C heap could be corrupted. Cheers, Thomas On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A > wrote: Hello, could you please review the patch http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ This patch add the microcode version for different OSes that may be useful in the issue resolution process. The reported microcode version for different OSes loos as: Linux (RHEL7.7): # cat hs_err_pid251046.log |grep microc CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb Windows (Win10, v1809): CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt MacOS (Darwin): $ cat hs_err_pid95187.log |grep microc CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, fma, clflush, clflushopt Thanks, Vladimir Thanks, Vladimir From thomas.stuefe at gmail.com Fri Jul 17 22:02:29 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 18 Jul 2020 00:02:29 +0200 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: Hi Vladimir, On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < vladimir.a.ivanov at intel.com> wrote: > > +#if defined(IA32) || defined(AMD64) > > > > Is that not synonymous with x86? > > This patter was copied from the method ?print_model_name_and_flags? (file > os/linux/os_linux.cpp). > > This method also read the ?/proc/cpuinfo? file and I reuse it as > ?template? for the new method. > > It is better to use one pattern to work with exactly same file but in > general you are right. > > The X86 is defined in the file ./share/utilities/macros.hpp as: > > #if defined(IA32) || defined(AMD64) > > #define X86 > > #define X86_ONLY(code) code > > #define NOT_X86(code) > > > > The question here: could I delete this ?ifdefs? while this method should > work on x86 only? > > > os_linux_x86.cpp is compiled for x86 platforms only, whereas os_linux.cpp is shared among all architectures. So, in the former you do not need to exclude non-x86 architectures. Cheers, Thomas > Thanks, Vladimir > > > > *From:* Thomas St?fe > *Sent:* Friday, July 17, 2020 2:26 PM > *To:* Ivanov, Vladimir A ; Hotspot dev > runtime > *Cc:* hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: add microcode version to the hs_err files > > > > > > > > On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > wrote: > > Hi Vladimir, > > > > I think this would be more suited to hotspot-runtime. > > > > > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html > > > +#if defined(IA32) || defined(AMD64) > > Is that not synonymous with x86? > > > > + while ((read = getline(&line, &len, fp)) != -1) { > + if (len > 10 && strstr(line, "microcode") != NULL) { > + char* rev = strchr(line, ':'); > + if (rev != NULL) sscanf(rev + 1, "%x", &result); > + break; > + } > + } > + free(line); > > > > Not sure this works as intended. At the first call to getline() it will > allocate a line buffer for you and return it. That buffer will be as large > as the first line you happen to read. You then pass that same buffer into > getline to fetch the next lines, but what if those are longer than the > first? > > > > > > Forget that point, getline calls realloc() on the line buffer to resize > it, so this should be okay. > > > > Thanks, Thomas > > > > But anyway it would be better to pass a simple caller provided buffer in - > stack allocated. Since this function is called at crash time and the C heap > could be corrupted. > > > > Cheers, Thomas > > > > > > On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> wrote: > > Hello, > > could you please review the patch > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > > This patch add the microcode version for different OSes that may be useful > in the issue resolution process. > > > > The reported microcode version for different OSes loos as: > > > > Linux (RHEL7.7): > > # cat hs_err_pid251046.log |grep microc > > CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) > family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, > sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, > fma, clflush, clflushopt, clwb > > > > Windows (Win10, v1809): > > CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) > family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, > sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, > fma, clflush, clflushopt > > > > MacOS (Darwin): > > $ cat hs_err_pid95187.log |grep microc > > CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) > family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, > sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, > clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, > fma, clflush, clflushopt > > > > Thanks, Vladimir > > > Thanks, Vladimir > > From vladimir.a.ivanov at intel.com Fri Jul 17 22:52:42 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Fri, 17 Jul 2020 22:52:42 +0000 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: Thanks for your comment. The updated patch available as http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.01/ Thanks, Vladimir From: Thomas St?fe Sent: Friday, July 17, 2020 3:02 PM To: Ivanov, Vladimir A Cc: Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net Subject: Re: add microcode version to the hs_err files Hi Vladimir, On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A > wrote: > +#if defined(IA32) || defined(AMD64) > > Is that not synonymous with x86? This patter was copied from the method ?print_model_name_and_flags? (file os/linux/os_linux.cpp). This method also read the ?/proc/cpuinfo? file and I reuse it as ?template? for the new method. It is better to use one pattern to work with exactly same file but in general you are right. The X86 is defined in the file ./share/utilities/macros.hpp as: #if defined(IA32) || defined(AMD64) #define X86 #define X86_ONLY(code) code #define NOT_X86(code) The question here: could I delete this ?ifdefs? while this method should work on x86 only? os_linux_x86.cpp is compiled for x86 platforms only, whereas os_linux.cpp is shared among all architectures. So, in the former you do not need to exclude non-x86 architectures. Cheers, Thomas Thanks, Vladimir From: Thomas St?fe > Sent: Friday, July 17, 2020 2:26 PM To: Ivanov, Vladimir A >; Hotspot dev runtime > Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: add microcode version to the hs_err files On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > wrote: Hi Vladimir, I think this would be more suited to hotspot-runtime. http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html +#if defined(IA32) || defined(AMD64) Is that not synonymous with x86? + while ((read = getline(&line, &len, fp)) != -1) { + if (len > 10 && strstr(line, "microcode") != NULL) { + char* rev = strchr(line, ':'); + if (rev != NULL) sscanf(rev + 1, "%x", &result); + break; + } + } + free(line); Not sure this works as intended. At the first call to getline() it will allocate a line buffer for you and return it. That buffer will be as large as the first line you happen to read. You then pass that same buffer into getline to fetch the next lines, but what if those are longer than the first? Forget that point, getline calls realloc() on the line buffer to resize it, so this should be okay. Thanks, Thomas But anyway it would be better to pass a simple caller provided buffer in - stack allocated. Since this function is called at crash time and the C heap could be corrupted. Cheers, Thomas On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A > wrote: Hello, could you please review the patch http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ This patch add the microcode version for different OSes that may be useful in the issue resolution process. The reported microcode version for different OSes loos as: Linux (RHEL7.7): # cat hs_err_pid251046.log |grep microc CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb Windows (Win10, v1809): CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt MacOS (Darwin): $ cat hs_err_pid95187.log |grep microc CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, fma, clflush, clflushopt Thanks, Vladimir Thanks, Vladimir From vladimir.kozlov at oracle.com Fri Jul 17 23:03:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 16:03:20 -0700 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: Message-ID: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> I updated subject to our formal review request format (JDK version, RFE's id and subject). I moved RFE to runtime group as Thomas said: https://bugs.openjdk.java.net/browse/JDK-8249672 Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) const+0xeb V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] VM_Version::get_processor_features()+0x76c V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] init_globals()+0x55 V [libjvm.so+0x16dde63] Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 Regards, Vladimir K On 7/17/20 3:02 PM, Thomas St?fe wrote: > Hi Vladimir, > > On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> wrote: > >>> +#if defined(IA32) || defined(AMD64) >>> >>> Is that not synonymous with x86? >> >> This patter was copied from the method ?print_model_name_and_flags? (file >> os/linux/os_linux.cpp). >> >> This method also read the ?/proc/cpuinfo? file and I reuse it as >> ?template? for the new method. >> >> It is better to use one pattern to work with exactly same file but in >> general you are right. >> >> The X86 is defined in the file ./share/utilities/macros.hpp as: >> >> #if defined(IA32) || defined(AMD64) >> >> #define X86 >> >> #define X86_ONLY(code) code >> >> #define NOT_X86(code) >> >> >> >> The question here: could I delete this ?ifdefs? while this method should >> work on x86 only? >> >> >> > > os_linux_x86.cpp is compiled for x86 platforms only, whereas os_linux.cpp > is shared among all architectures. > > So, in the former you do not need to exclude non-x86 architectures. > > Cheers, Thomas > > >> Thanks, Vladimir >> >> >> >> *From:* Thomas St?fe >> *Sent:* Friday, July 17, 2020 2:26 PM >> *To:* Ivanov, Vladimir A ; Hotspot dev >> runtime >> *Cc:* hotspot-compiler-dev at openjdk.java.net >> *Subject:* Re: add microcode version to the hs_err files >> >> >> >> >> >> >> >> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >> wrote: >> >> Hi Vladimir, >> >> >> >> I think this would be more suited to hotspot-runtime. >> >> >> >> >> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >> >> >> +#if defined(IA32) || defined(AMD64) >> >> Is that not synonymous with x86? >> >> >> >> + while ((read = getline(&line, &len, fp)) != -1) { >> + if (len > 10 && strstr(line, "microcode") != NULL) { >> + char* rev = strchr(line, ':'); >> + if (rev != NULL) sscanf(rev + 1, "%x", &result); >> + break; >> + } >> + } >> + free(line); >> >> >> >> Not sure this works as intended. At the first call to getline() it will >> allocate a line buffer for you and return it. That buffer will be as large >> as the first line you happen to read. You then pass that same buffer into >> getline to fetch the next lines, but what if those are longer than the >> first? >> >> >> >> >> >> Forget that point, getline calls realloc() on the line buffer to resize >> it, so this should be okay. >> >> >> >> Thanks, Thomas >> >> >> >> But anyway it would be better to pass a simple caller provided buffer in - >> stack allocated. Since this function is called at crash time and the C heap >> could be corrupted. >> >> >> >> Cheers, Thomas >> >> >> >> >> >> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >> vladimir.a.ivanov at intel.com> wrote: >> >> Hello, >> >> could you please review the patch >> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >> >> This patch add the microcode version for different OSes that may be useful >> in the issue resolution process. >> >> >> >> The reported microcode version for different OSes loos as: >> >> >> >> Linux (RHEL7.7): >> >> # cat hs_err_pid251046.log |grep microc >> >> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) >> family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, >> sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >> clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, >> fma, clflush, clflushopt, clwb >> >> >> >> Windows (Win10, v1809): >> >> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) >> family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, >> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >> clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, >> fma, clflush, clflushopt >> >> >> >> MacOS (Darwin): >> >> $ cat hs_err_pid95187.log |grep microc >> >> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) >> family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, >> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >> clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, >> fma, clflush, clflushopt >> >> >> >> Thanks, Vladimir >> >> >> Thanks, Vladimir >> >> From vladimir.kozlov at oracle.com Fri Jul 17 23:17:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 16:17:00 -0700 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. Someone from Runtime may suggest what is the best for this case. Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > I updated subject to our formal review request format (JDK version, RFE's id and subject). > > I moved RFE to runtime group as Thomas said: > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: > > #? SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 > # V? [libjvm.so+0xc12b0b]? GuardedMemory::print_on(outputStream*) const+0xeb > > V? [libjvm.so+0xc12b0b]? GuardedMemory::print_on(outputStream*) const+0xeb > V? [libjvm.so+0x13c898a]? verify_memory(void*)+0x26a > V? [libjvm.so+0x13cd30b]? os::free(void*)+0x5b > V? [libjvm.so+0x13e5598]? os::cpu_microcode_revision()+0xc8 > V? [libjvm.so+0x17d314c]? VM_Version::get_processor_features()+0x76c > V? [libjvm.so+0x17d6ead]? VM_Version::initialize()+0x10d > V? [libjvm.so+0x17ce6c6]? VM_Version_init()+0x26 > V? [libjvm.so+0xcb2895]?? init_globals()+0x55 > V? [libjvm.so+0x16dde63]? Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > Regards, > Vladimir K > > On 7/17/20 3:02 PM, Thomas St?fe wrote: >> Hi Vladimir, >> >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >> vladimir.a.ivanov at intel.com> wrote: >> >>>> ? +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>> >>> This patter was copied from the method ?print_model_name_and_flags? (file >>> os/linux/os_linux.cpp). >>> >>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>> ?template? for the new method. >>> >>> It is better to use one pattern to work with exactly same file but in >>> general you are right. >>> >>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>> >>> #if defined(IA32) || defined(AMD64) >>> >>> #define X86 >>> >>> #define X86_ONLY(code) code >>> >>> #define NOT_X86(code) >>> >>> >>> >>> The question here: could I delete this ?ifdefs? while this method should >>> work on x86 only? >>> >>> >>> >> >> os_linux_x86.cpp is compiled for x86 platforms only, whereas os_linux.cpp >> is shared among all architectures. >> >> So, in the former you do not need to exclude non-x86 architectures. >> >> Cheers, Thomas >> >> >>> Thanks, Vladimir >>> >>> >>> >>> *From:* Thomas St?fe >>> *Sent:* Friday, July 17, 2020 2:26 PM >>> *To:* Ivanov, Vladimir A ; Hotspot dev >>> runtime >>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>> *Subject:* Re: add microcode version to the hs_err files >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>> wrote: >>> >>> Hi Vladimir, >>> >>> >>> >>> I think this would be more suited to hotspot-runtime. >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>> >>> >>> >>> +#if defined(IA32) || defined(AMD64) >>> >>> Is that not synonymous with x86? >>> >>> >>> >>> +??? while ((read = getline(&line, &len, fp)) != -1) { >>> +????? if (len > 10 && strstr(line, "microcode") != NULL) { >>> +??????? char* rev = strchr(line, ':'); >>> +??????? if (rev != NULL) sscanf(rev + 1, "%x", &result); >>> +??????? break; >>> +????? } >>> +??? } >>> +??? free(line); >>> >>> >>> >>> Not sure this works as intended. At the first call to getline() it will >>> allocate a line buffer for you and return it. That buffer will be as large >>> as the first line you happen to read. You then pass that same buffer into >>> getline to fetch the next lines, but what if those are longer than the >>> first? >>> >>> >>> >>> >>> >>> Forget that point, getline calls realloc() on the line buffer to resize >>> it, so this should be okay. >>> >>> >>> >>> Thanks, Thomas >>> >>> >>> >>> But anyway it would be better to pass a simple caller provided buffer in - >>> stack allocated. Since this function is called at crash time and the C heap >>> could be corrupted. >>> >>> >>> >>> Cheers, Thomas >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>> Hello, >>> >>> could you please review the patch >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> >>> This patch add the microcode version for different OSes that may be useful >>> in the issue resolution process. >>> >>> >>> >>> The reported microcode version for different OSes loos as: >>> >>> >>> >>> Linux (RHEL7.7): >>> >>> # cat hs_err_pid251046.log |grep microc >>> >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) >>> family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, >>> sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >>> clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, >>> fma, clflush, clflushopt, clwb >>> >>> >>> >>> Windows (Win10, v1809): >>> >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) >>> family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, >>> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >>> clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, >>> fma, clflush, clflushopt >>> >>> >>> >>> MacOS (Darwin): >>> >>> $ cat hs_err_pid95187.log |grep microc >>> >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) >>> family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, >>> sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, >>> clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, >>> fma, clflush, clflushopt >>> >>> >>> >>> Thanks, Vladimir >>> >>> >>> ?? Thanks, Vladimir >>> >>> From vladimir.kozlov at oracle.com Fri Jul 17 23:24:07 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 17 Jul 2020 16:24:07 -0700 Subject: add microcode version to the hs_err files In-Reply-To: References: Message-ID: I forked new e-mail thread with correct subject line: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 Lets continue discussion there. There is issue with changes in os_linux_x86.cpp Regards, Vladimir K On 7/17/20 3:52 PM, Ivanov, Vladimir A wrote: > Thanks for your comment. > The updated patch available as http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.01/ > > Thanks, Vladimir > > From: Thomas St?fe > Sent: Friday, July 17, 2020 3:02 PM > To: Ivanov, Vladimir A > Cc: Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: add microcode version to the hs_err files > > Hi Vladimir, > > On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A > wrote: >> +#if defined(IA32) || defined(AMD64) >> >> Is that not synonymous with x86? > This patter was copied from the method ?print_model_name_and_flags? (file os/linux/os_linux.cpp). > This method also read the ?/proc/cpuinfo? file and I reuse it as ?template? for the new method. > It is better to use one pattern to work with exactly same file but in general you are right. > The X86 is defined in the file ./share/utilities/macros.hpp as: > #if defined(IA32) || defined(AMD64) > #define X86 > #define X86_ONLY(code) code > #define NOT_X86(code) > > The question here: could I delete this ?ifdefs? while this method should work on x86 only? > > > os_linux_x86.cpp is compiled for x86 platforms only, whereas os_linux.cpp is shared among all architectures. > > So, in the former you do not need to exclude non-x86 architectures. > > Cheers, Thomas > > Thanks, Vladimir > > From: Thomas St?fe > > Sent: Friday, July 17, 2020 2:26 PM > To: Ivanov, Vladimir A >; Hotspot dev runtime > > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: Re: add microcode version to the hs_err files > > > > On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > wrote: > Hi Vladimir, > > I think this would be more suited to hotspot-runtime. > > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html > > +#if defined(IA32) || defined(AMD64) > > Is that not synonymous with x86? > > + while ((read = getline(&line, &len, fp)) != -1) { > + if (len > 10 && strstr(line, "microcode") != NULL) { > + char* rev = strchr(line, ':'); > + if (rev != NULL) sscanf(rev + 1, "%x", &result); > + break; > + } > + } > + free(line); > > Not sure this works as intended. At the first call to getline() it will allocate a line buffer for you and return it. That buffer will be as large as the first line you happen to read. You then pass that same buffer into getline to fetch the next lines, but what if those are longer than the first? > > > Forget that point, getline calls realloc() on the line buffer to resize it, so this should be okay. > > Thanks, Thomas > > But anyway it would be better to pass a simple caller provided buffer in - stack allocated. Since this function is called at crash time and the C heap could be corrupted. > > Cheers, Thomas > > > On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A > wrote: > Hello, > > could you please review the patch http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > > This patch add the microcode version for different OSes that may be useful in the issue resolution process. > > > > The reported microcode version for different OSes loos as: > > > > Linux (RHEL7.7): > > # cat hs_err_pid251046.log |grep microc > > CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb > > > > Windows (Win10, v1809): > > CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt > > > > MacOS (Darwin): > > $ cat hs_err_pid95187.log |grep microc > > CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, bmi1, bmi2, adx, sha, fma, clflush, clflushopt > > > > Thanks, Vladimir > > > Thanks, Vladimir > From vladimir.a.ivanov at intel.com Fri Jul 17 23:24:32 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Fri, 17 Jul 2020 23:24:32 +0000 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Thanks, I expected the C's functions here. Let's wait a little bit for Runtime team and update work with buffer. Thanks, Vladimir -----Original Message----- From: Vladimir Kozlov Sent: Friday, July 17, 2020 4:17 PM To: Thomas St?fe ; Ivanov, Vladimir A Cc: Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. Someone from Runtime may suggest what is the best for this case. Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > I updated subject to our formal review request format (JDK version, RFE's id and subject). > > I moved RFE to runtime group as Thomas said: > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: > > #? SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V? > [libjvm.so+0xc12b0b]? GuardedMemory::print_on(outputStream*) > const+0xeb > > V? [libjvm.so+0xc12b0b]? GuardedMemory::print_on(outputStream*) > const+0xeb V? [libjvm.so+0x13c898a]? verify_memory(void*)+0x26a V? > [libjvm.so+0x13cd30b]? os::free(void*)+0x5b V? [libjvm.so+0x13e5598]? > os::cpu_microcode_revision()+0xc8 V? [libjvm.so+0x17d314c]? > VM_Version::get_processor_features()+0x76c > V? [libjvm.so+0x17d6ead]? VM_Version::initialize()+0x10d V? > [libjvm.so+0x17ce6c6]? VM_Version_init()+0x26 V? [libjvm.so+0xcb2895]?? > init_globals()+0x55 V? [libjvm.so+0x16dde63]? > Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > Regards, > Vladimir K > > On 7/17/20 3:02 PM, Thomas St?fe wrote: >> Hi Vladimir, >> >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >> vladimir.a.ivanov at intel.com> wrote: >> >>>> ? +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>> >>> This patter was copied from the method ?print_model_name_and_flags? >>> (file os/linux/os_linux.cpp). >>> >>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>> ?template? for the new method. >>> >>> It is better to use one pattern to work with exactly same file but >>> in general you are right. >>> >>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>> >>> #if defined(IA32) || defined(AMD64) >>> >>> #define X86 >>> >>> #define X86_ONLY(code) code >>> >>> #define NOT_X86(code) >>> >>> >>> >>> The question here: could I delete this ?ifdefs? while this method >>> should work on x86 only? >>> >>> >>> >> >> os_linux_x86.cpp is compiled for x86 platforms only, whereas >> os_linux.cpp is shared among all architectures. >> >> So, in the former you do not need to exclude non-x86 architectures. >> >> Cheers, Thomas >> >> >>> Thanks, Vladimir >>> >>> >>> >>> *From:* Thomas St?fe >>> *Sent:* Friday, July 17, 2020 2:26 PM >>> *To:* Ivanov, Vladimir A ; Hotspot dev >>> runtime >>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>> *Subject:* Re: add microcode version to the hs_err files >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>> >>> wrote: >>> >>> Hi Vladimir, >>> >>> >>> >>> I think this would be more suited to hotspot-runtime. >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>> >>> >>> >>> +#if defined(IA32) || defined(AMD64) >>> >>> Is that not synonymous with x86? >>> >>> >>> >>> +??? while ((read = getline(&line, &len, fp)) != -1) { >>> +????? if (len > 10 && strstr(line, "microcode") != NULL) { >>> +??????? char* rev = strchr(line, ':'); >>> +??????? if (rev != NULL) sscanf(rev + 1, "%x", &result); >>> +??????? break; >>> +????? } >>> +??? } >>> +??? free(line); >>> >>> >>> >>> Not sure this works as intended. At the first call to getline() it >>> will allocate a line buffer for you and return it. That buffer will >>> be as large as the first line you happen to read. You then pass that >>> same buffer into getline to fetch the next lines, but what if those >>> are longer than the first? >>> >>> >>> >>> >>> >>> Forget that point, getline calls realloc() on the line buffer to >>> resize it, so this should be okay. >>> >>> >>> >>> Thanks, Thomas >>> >>> >>> >>> But anyway it would be better to pass a simple caller provided >>> buffer in - stack allocated. Since this function is called at crash >>> time and the C heap could be corrupted. >>> >>> >>> >>> Cheers, Thomas >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>> Hello, >>> >>> could you please review the patch >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> >>> This patch add the microcode version for different OSes that may be >>> useful in the issue resolution process. >>> >>> >>> >>> The reported microcode version for different OSes loos as: >>> >>> >>> >>> Linux (RHEL7.7): >>> >>> # cat hs_err_pid251046.log |grep microc >>> >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per >>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, >>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb >>> >>> >>> >>> Windows (Win10, v1809): >>> >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per >>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, >>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt >>> >>> >>> >>> MacOS (Darwin): >>> >>> $ cat hs_err_pid95187.log |grep microc >>> >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per >>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, >>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt >>> >>> >>> >>> Thanks, Vladimir >>> >>> >>> ?? Thanks, Vladimir >>> >>> From igor.ignatyev at oracle.com Sat Jul 18 03:54:12 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 20:54:12 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore Message-ID: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ > 7 lines changed: 4 ins; 0 del; 3 mod; Hi all, could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? from JBS: > a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. JBS: https://bugs.openjdk.java.net/browse/JDK-8249697 webrev: http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run Thanks, -- Igor JDK-7049122 : https://bugs.openjdk.java.net/browse/JDK-7049122 From igor.ignatyev at oracle.com Sat Jul 18 03:57:43 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 17 Jul 2020 20:57:43 -0700 Subject: [15] RFR(T) : 8249698 : java/lang/invoke/LFCaching/LFGarbageCollectedTest.java should be ProblemList-ed and not @ignored Message-ID: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 > 3 lines changed: 1 ins; 1 del; 1 mod; Hi all, could you please review this trivial patch which removes @ignore from LFGarbageCollectedTest and adds it into problem-list instead? from 8249698: > java/lang/invoke/LFCaching/LFGarbageCollectedTest.java is excluded from execution due to JDK-8078602. although the test might indeed fail due to JDK-8078602, it still can be useful and isn't harmful to run, therefore this test should be put in ProblemList.txt and @ignore is to be removed. from main issue(8249618): > although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). > > due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. webrev: http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8249698 Thanks, -- Igor 8078602: https://bugs.openjdk.java.net/browse/JDK-8078602 8249618: https://bugs.openjdk.java.net/browse/JDK-8249618 From thomas.stuefe at gmail.com Sat Jul 18 04:41:33 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 18 Jul 2020 06:41:33 +0200 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Hi, yes, you must use the raw free here (for the same reason we cannot pass in an os::malloc() allocated buffer to getline, since if it were to resize it would use raw ::realloc() internally and crash the same way). But as I wrote in my first mail to the original thread, I would not use c-heap memory at all, since this function is used during crash reporting in the signal handler and the c-heap may be corrupted. It the max line length of /proc/cpu can be reliably predicted (so that getline wont realloc()) I would pass a stack allocated buffer into getline. If not, I would not use getline() at all but rewrite this, probably using fgets(). Cheers, Thomas On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A < vladimir.a.ivanov at intel.com> wrote: > Thanks, I expected the C's functions here. Let's wait a little bit for > Runtime team and update work with buffer. > > Thanks, Vladimir > > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, July 17, 2020 4:17 PM > To: Thomas St?fe ; Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> > Cc: Hotspot dev runtime ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > I think the issue is 'line' buffer is allocated by libc getline() and > os:free() which is HotSpot function [1] does not know about it. You need > C's ::free() or use HS's os::malloc() to allocate 'line' buffer. > > Someone from Runtime may suggest what is the best for this case. > > Thanks, > Vladimir K > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 > > On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > > I updated subject to our formal review request format (JDK version, > RFE's id and subject). > > > > I moved RFE to runtime group as Thomas said: > > > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > > > Submitted tier1 testing to build on all our supported platforms. And > debug builds on linux failed: > > > > # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V > > [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > > const+0xeb > > > > V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > > const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V > > [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] > > os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] > > VM_Version::get_processor_features()+0x76c > > V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V > > [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] > > init_globals()+0x55 V [libjvm.so+0x16dde63] > > Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > > > > Regards, > > Vladimir K > > > > On 7/17/20 3:02 PM, Thomas St?fe wrote: > >> Hi Vladimir, > >> > >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < > >> vladimir.a.ivanov at intel.com> wrote: > >> > >>>> +#if defined(IA32) || defined(AMD64) > >>>> > >>>> Is that not synonymous with x86? > >>> > >>> This patter was copied from the method ?print_model_name_and_flags? > >>> (file os/linux/os_linux.cpp). > >>> > >>> This method also read the ?/proc/cpuinfo? file and I reuse it as > >>> ?template? for the new method. > >>> > >>> It is better to use one pattern to work with exactly same file but > >>> in general you are right. > >>> > >>> The X86 is defined in the file ./share/utilities/macros.hpp as: > >>> > >>> #if defined(IA32) || defined(AMD64) > >>> > >>> #define X86 > >>> > >>> #define X86_ONLY(code) code > >>> > >>> #define NOT_X86(code) > >>> > >>> > >>> > >>> The question here: could I delete this ?ifdefs? while this method > >>> should work on x86 only? > >>> > >>> > >>> > >> > >> os_linux_x86.cpp is compiled for x86 platforms only, whereas > >> os_linux.cpp is shared among all architectures. > >> > >> So, in the former you do not need to exclude non-x86 architectures. > >> > >> Cheers, Thomas > >> > >> > >>> Thanks, Vladimir > >>> > >>> > >>> > >>> *From:* Thomas St?fe > >>> *Sent:* Friday, July 17, 2020 2:26 PM > >>> *To:* Ivanov, Vladimir A ; Hotspot dev > >>> runtime > >>> *Cc:* hotspot-compiler-dev at openjdk.java.net > >>> *Subject:* Re: add microcode version to the hs_err files > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > >>> > >>> wrote: > >>> > >>> Hi Vladimir, > >>> > >>> > >>> > >>> I think this would be more suited to hotspot-runtime. > >>> > >>> > >>> > >>> > >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > >>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html > >>> > >>> > >>> > >>> +#if defined(IA32) || defined(AMD64) > >>> > >>> Is that not synonymous with x86? > >>> > >>> > >>> > >>> + while ((read = getline(&line, &len, fp)) != -1) { > >>> + if (len > 10 && strstr(line, "microcode") != NULL) { > >>> + char* rev = strchr(line, ':'); > >>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); > >>> + break; > >>> + } > >>> + } > >>> + free(line); > >>> > >>> > >>> > >>> Not sure this works as intended. At the first call to getline() it > >>> will allocate a line buffer for you and return it. That buffer will > >>> be as large as the first line you happen to read. You then pass that > >>> same buffer into getline to fetch the next lines, but what if those > >>> are longer than the first? > >>> > >>> > >>> > >>> > >>> > >>> Forget that point, getline calls realloc() on the line buffer to > >>> resize it, so this should be okay. > >>> > >>> > >>> > >>> Thanks, Thomas > >>> > >>> > >>> > >>> But anyway it would be better to pass a simple caller provided > >>> buffer in - stack allocated. Since this function is called at crash > >>> time and the C heap could be corrupted. > >>> > >>> > >>> > >>> Cheers, Thomas > >>> > >>> > >>> > >>> > >>> > >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < > >>> vladimir.a.ivanov at intel.com> wrote: > >>> > >>> Hello, > >>> > >>> could you please review the patch > >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > >>> > >>> This patch add the microcode version for different OSes that may be > >>> useful in the issue resolution process. > >>> > >>> > >>> > >>> The reported microcode version for different OSes loos as: > >>> > >>> > >>> > >>> Linux (RHEL7.7): > >>> > >>> # cat hs_err_pid251046.log |grep microc > >>> > >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per > >>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, > >>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, > >>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, > >>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb > >>> > >>> > >>> > >>> Windows (Win10, v1809): > >>> > >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per > >>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, > >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, > >>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, > >>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt > >>> > >>> > >>> > >>> MacOS (Darwin): > >>> > >>> $ cat hs_err_pid95187.log |grep microc > >>> > >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per > >>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, > >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, > >>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, > >>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt > >>> > >>> > >>> > >>> Thanks, Vladimir > >>> > >>> > >>> Thanks, Vladimir > >>> > >>> > From vladimir.a.ivanov at intel.com Sat Jul 18 05:07:59 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Sat, 18 Jul 2020 05:07:59 +0000 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Hi, seems, this info created during initialization phase. Is it correct? Collect or parse common info at the crash point usually not a good idea. During initialization usage of the c-heap not a problem. The ?::free? work OK here. At least tier1 test produce same results for patched and non-patched builds. But these tests not generates real case for hs_err files. It looks like 2k byte array enough for the one record for CPU from cpuinfo file. Will update code to use local buffer. Thanks, Vladimir From: Thomas St?fe Sent: Friday, July 17, 2020 9:42 PM To: Ivanov, Vladimir A Cc: Vladimir Kozlov ; Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 Hi, yes, you must use the raw free here (for the same reason we cannot pass in an os::malloc() allocated buffer to getline, since if it were to resize it would use raw ::realloc() internally and crash the same way). But as I wrote in my first mail to the original thread, I would not use c-heap memory at all, since this function is used during crash reporting in the signal handler and the c-heap may be corrupted. It the max line length of /proc/cpu can be reliably predicted (so that getline wont realloc()) I would pass a stack allocated buffer into getline. If not, I would not use getline() at all but rewrite this, probably using fgets(). Cheers, Thomas On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A > wrote: Thanks, I expected the C's functions here. Let's wait a little bit for Runtime team and update work with buffer. Thanks, Vladimir -----Original Message----- From: Vladimir Kozlov > Sent: Friday, July 17, 2020 4:17 PM To: Thomas St?fe >; Ivanov, Vladimir A > Cc: Hotspot dev runtime >; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. Someone from Runtime may suggest what is the best for this case. Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > I updated subject to our formal review request format (JDK version, RFE's id and subject). > > I moved RFE to runtime group as Thomas said: > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: > > # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V > [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > const+0xeb > > V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V > [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] > os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] > VM_Version::get_processor_features()+0x76c > V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V > [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] > init_globals()+0x55 V [libjvm.so+0x16dde63] > Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > Regards, > Vladimir K > > On 7/17/20 3:02 PM, Thomas St?fe wrote: >> Hi Vladimir, >> >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >> vladimir.a.ivanov at intel.com> wrote: >> >>>> +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>> >>> This patter was copied from the method ?print_model_name_and_flags? >>> (file os/linux/os_linux.cpp). >>> >>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>> ?template? for the new method. >>> >>> It is better to use one pattern to work with exactly same file but >>> in general you are right. >>> >>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>> >>> #if defined(IA32) || defined(AMD64) >>> >>> #define X86 >>> >>> #define X86_ONLY(code) code >>> >>> #define NOT_X86(code) >>> >>> >>> >>> The question here: could I delete this ?ifdefs? while this method >>> should work on x86 only? >>> >>> >>> >> >> os_linux_x86.cpp is compiled for x86 platforms only, whereas >> os_linux.cpp is shared among all architectures. >> >> So, in the former you do not need to exclude non-x86 architectures. >> >> Cheers, Thomas >> >> >>> Thanks, Vladimir >>> >>> >>> >>> *From:* Thomas St?fe > >>> *Sent:* Friday, July 17, 2020 2:26 PM >>> *To:* Ivanov, Vladimir A >; Hotspot dev >>> runtime > >>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>> *Subject:* Re: add microcode version to the hs_err files >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>> > >>> wrote: >>> >>> Hi Vladimir, >>> >>> >>> >>> I think this would be more suited to hotspot-runtime. >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>> >>> >>> >>> +#if defined(IA32) || defined(AMD64) >>> >>> Is that not synonymous with x86? >>> >>> >>> >>> + while ((read = getline(&line, &len, fp)) != -1) { >>> + if (len > 10 && strstr(line, "microcode") != NULL) { >>> + char* rev = strchr(line, ':'); >>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); >>> + break; >>> + } >>> + } >>> + free(line); >>> >>> >>> >>> Not sure this works as intended. At the first call to getline() it >>> will allocate a line buffer for you and return it. That buffer will >>> be as large as the first line you happen to read. You then pass that >>> same buffer into getline to fetch the next lines, but what if those >>> are longer than the first? >>> >>> >>> >>> >>> >>> Forget that point, getline calls realloc() on the line buffer to >>> resize it, so this should be okay. >>> >>> >>> >>> Thanks, Thomas >>> >>> >>> >>> But anyway it would be better to pass a simple caller provided >>> buffer in - stack allocated. Since this function is called at crash >>> time and the C heap could be corrupted. >>> >>> >>> >>> Cheers, Thomas >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>> Hello, >>> >>> could you please review the patch >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> >>> This patch add the microcode version for different OSes that may be >>> useful in the issue resolution process. >>> >>> >>> >>> The reported microcode version for different OSes loos as: >>> >>> >>> >>> Linux (RHEL7.7): >>> >>> # cat hs_err_pid251046.log |grep microc >>> >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per >>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, >>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb >>> >>> >>> >>> Windows (Win10, v1809): >>> >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per >>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, >>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt >>> >>> >>> >>> MacOS (Darwin): >>> >>> $ cat hs_err_pid95187.log |grep microc >>> >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per >>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, >>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt >>> >>> >>> >>> Thanks, Vladimir >>> >>> >>> Thanks, Vladimir >>> >>> From thomas.stuefe at gmail.com Sat Jul 18 05:24:45 2020 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Sat, 18 Jul 2020 07:24:45 +0200 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Oh, sorry, you are right :( I was under the assumption you wanted to call os::cpu_microcode_revision() directly from within VMError::report(). During initialization using c-heap like this should not be a problem and you can forget about 9/10ths of what I wrote, sorry. In that case your original variant is fine, my only suggestion would be to clearly mark the free as ::free() with a comment to prevent someone from correcting it to os::free. Thank you, Thomas On Sat, Jul 18, 2020 at 7:08 AM Ivanov, Vladimir A < vladimir.a.ivanov at intel.com> wrote: > Hi, > > seems, this info created during initialization phase. Is it correct? > Collect or parse common info at the crash point usually not a good idea. > During initialization usage of the c-heap not a problem. > > The ?::free? work OK here. At least tier1 test produce same results for > patched and non-patched builds. But these tests not generates real case for > hs_err files. > > It looks like 2k byte array enough for the one record for CPU from cpuinfo > file. Will update code to use local buffer. > > > > Thanks, Vladimir > > > > *From:* Thomas St?fe > *Sent:* Friday, July 17, 2020 9:42 PM > *To:* Ivanov, Vladimir A > *Cc:* Vladimir Kozlov ; Hotspot dev runtime < > hotspot-runtime-dev at openjdk.java.net>; > hotspot-compiler-dev at openjdk.java.net > *Subject:* Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > > > Hi, > > > > yes, you must use the raw free here (for the same reason we cannot pass in > an os::malloc() allocated buffer to getline, since if it were to resize it > would use raw ::realloc() internally and crash the same way). > > > > But as I wrote in my first mail to the original thread, I would not use > c-heap memory at all, since this function is used during crash reporting in > the signal handler and the c-heap may be corrupted. > > > > It the max line length of /proc/cpu can be reliably predicted (so that > getline wont realloc()) I would pass a stack allocated buffer into getline. > If not, I would not use getline() at all but rewrite this, probably using > fgets(). > > > > Cheers, Thomas > > > > > > > > > > On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> wrote: > > Thanks, I expected the C's functions here. Let's wait a little bit for > Runtime team and update work with buffer. > > Thanks, Vladimir > > -----Original Message----- > From: Vladimir Kozlov > Sent: Friday, July 17, 2020 4:17 PM > To: Thomas St?fe ; Ivanov, Vladimir A < > vladimir.a.ivanov at intel.com> > Cc: Hotspot dev runtime ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > I think the issue is 'line' buffer is allocated by libc getline() and > os:free() which is HotSpot function [1] does not know about it. You need > C's ::free() or use HS's os::malloc() to allocate 'line' buffer. > > Someone from Runtime may suggest what is the best for this case. > > Thanks, > Vladimir K > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 > > On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > > I updated subject to our formal review request format (JDK version, > RFE's id and subject). > > > > I moved RFE to runtime group as Thomas said: > > > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > > > Submitted tier1 testing to build on all our supported platforms. And > debug builds on linux failed: > > > > # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V > > [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > > const+0xeb > > > > V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > > const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V > > [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] > > os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] > > VM_Version::get_processor_features()+0x76c > > V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V > > [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] > > init_globals()+0x55 V [libjvm.so+0x16dde63] > > Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > > > > Regards, > > Vladimir K > > > > On 7/17/20 3:02 PM, Thomas St?fe wrote: > >> Hi Vladimir, > >> > >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < > >> vladimir.a.ivanov at intel.com> wrote: > >> > >>>> +#if defined(IA32) || defined(AMD64) > >>>> > >>>> Is that not synonymous with x86? > >>> > >>> This patter was copied from the method ?print_model_name_and_flags? > >>> (file os/linux/os_linux.cpp). > >>> > >>> This method also read the ?/proc/cpuinfo? file and I reuse it as > >>> ?template? for the new method. > >>> > >>> It is better to use one pattern to work with exactly same file but > >>> in general you are right. > >>> > >>> The X86 is defined in the file ./share/utilities/macros.hpp as: > >>> > >>> #if defined(IA32) || defined(AMD64) > >>> > >>> #define X86 > >>> > >>> #define X86_ONLY(code) code > >>> > >>> #define NOT_X86(code) > >>> > >>> > >>> > >>> The question here: could I delete this ?ifdefs? while this method > >>> should work on x86 only? > >>> > >>> > >>> > >> > >> os_linux_x86.cpp is compiled for x86 platforms only, whereas > >> os_linux.cpp is shared among all architectures. > >> > >> So, in the former you do not need to exclude non-x86 architectures. > >> > >> Cheers, Thomas > >> > >> > >>> Thanks, Vladimir > >>> > >>> > >>> > >>> *From:* Thomas St?fe > >>> *Sent:* Friday, July 17, 2020 2:26 PM > >>> *To:* Ivanov, Vladimir A ; Hotspot dev > >>> runtime > >>> *Cc:* hotspot-compiler-dev at openjdk.java.net > >>> *Subject:* Re: add microcode version to the hs_err files > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe > >>> > >>> wrote: > >>> > >>> Hi Vladimir, > >>> > >>> > >>> > >>> I think this would be more suited to hotspot-runtime. > >>> > >>> > >>> > >>> > >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > >>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html > >>> > >>> > >>> > >>> +#if defined(IA32) || defined(AMD64) > >>> > >>> Is that not synonymous with x86? > >>> > >>> > >>> > >>> + while ((read = getline(&line, &len, fp)) != -1) { > >>> + if (len > 10 && strstr(line, "microcode") != NULL) { > >>> + char* rev = strchr(line, ':'); > >>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); > >>> + break; > >>> + } > >>> + } > >>> + free(line); > >>> > >>> > >>> > >>> Not sure this works as intended. At the first call to getline() it > >>> will allocate a line buffer for you and return it. That buffer will > >>> be as large as the first line you happen to read. You then pass that > >>> same buffer into getline to fetch the next lines, but what if those > >>> are longer than the first? > >>> > >>> > >>> > >>> > >>> > >>> Forget that point, getline calls realloc() on the line buffer to > >>> resize it, so this should be okay. > >>> > >>> > >>> > >>> Thanks, Thomas > >>> > >>> > >>> > >>> But anyway it would be better to pass a simple caller provided > >>> buffer in - stack allocated. Since this function is called at crash > >>> time and the C heap could be corrupted. > >>> > >>> > >>> > >>> Cheers, Thomas > >>> > >>> > >>> > >>> > >>> > >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < > >>> vladimir.a.ivanov at intel.com> wrote: > >>> > >>> Hello, > >>> > >>> could you please review the patch > >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ > >>> > >>> This patch add the microcode version for different OSes that may be > >>> useful in the issue resolution process. > >>> > >>> > >>> > >>> The reported microcode version for different OSes loos as: > >>> > >>> > >>> > >>> Linux (RHEL7.7): > >>> > >>> # cat hs_err_pid251046.log |grep microc > >>> > >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per > >>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, > >>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, > >>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, > >>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb > >>> > >>> > >>> > >>> Windows (Win10, v1809): > >>> > >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per > >>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, > >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, > >>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, > >>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt > >>> > >>> > >>> > >>> MacOS (Darwin): > >>> > >>> $ cat hs_err_pid95187.log |grep microc > >>> > >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per > >>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, > >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, > >>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, > >>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt > >>> > >>> > >>> > >>> Thanks, Vladimir > >>> > >>> > >>> Thanks, Vladimir > >>> > >>> > > From vladimir.kozlov at oracle.com Sat Jul 18 17:09:40 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 18 Jul 2020 10:09:40 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> Message-ID: <1539aec8-c8ad-0acb-b7a3-20d4e839a3cd@oracle.com> Good. Thanks, Vladimir On 7/17/20 8:54 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >> 7 lines changed: 4 ins; 0 del; 3 mod; > > > Hi all, > > could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? > from JBS: >> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. > > the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249697 > webrev: http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ > testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run > > Thanks, > -- Igor > > JDK-7049122 : https://bugs.openjdk.java.net/browse/JDK-7049122 > From vladimir.kozlov at oracle.com Sat Jul 18 17:10:26 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Sat, 18 Jul 2020 10:10:26 -0700 Subject: [15] RFR(T) : 8249698 : java/lang/invoke/LFCaching/LFGarbageCollectedTest.java should be ProblemList-ed and not @ignored In-Reply-To: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> References: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> Message-ID: <42b76420-205b-e72a-558d-8659242e1c06@oracle.com> Good. Thanks, Vladimir On 7/17/20 8:57 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 >> 3 lines changed: 1 ins; 1 del; 1 mod; > > > Hi all, > > could you please review this trivial patch which removes @ignore from LFGarbageCollectedTest and adds it into problem-list instead? > > from 8249698: >> java/lang/invoke/LFCaching/LFGarbageCollectedTest.java is excluded from execution due to JDK-8078602. although the test might indeed fail due to JDK-8078602, it still can be useful and isn't harmful to run, therefore this test should be put in ProblemList.txt and @ignore is to be removed. > from main issue(8249618): >> although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). >> >> due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. > > > webrev: http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8249698 > > Thanks, > -- Igor > > 8078602: https://bugs.openjdk.java.net/browse/JDK-8078602 > 8249618: https://bugs.openjdk.java.net/browse/JDK-8249618 > From mandy.chung at oracle.com Sun Jul 19 04:32:32 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Sat, 18 Jul 2020 21:32:32 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> Message-ID: <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> On 7/17/20 8:54 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ > I suggest to change this: ? 32? * @comment The following test creates an unreasonable number of adapters in -Xcomp mode (7049122) To: ?? @bug 8249697 ?? @summary verify very high number of adapters in -Xcomp mode Otherwise, looks fine. Mandy > Hi all, > > could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? > from JBS: >> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. > the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8249697 > webrev: http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ > testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run > > Thanks, > -- Igor > > JDK-7049122 : https://bugs.openjdk.java.net/browse/JDK-7049122 From mandy.chung at oracle.com Sun Jul 19 04:33:21 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Sat, 18 Jul 2020 21:33:21 -0700 Subject: [15] RFR(T) : 8249698 : java/lang/invoke/LFCaching/LFGarbageCollectedTest.java should be ProblemList-ed and not @ignored In-Reply-To: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> References: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> Message-ID: +1 Mandy On 7/17/20 8:57 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 >> 3 lines changed: 1 ins; 1 del; 1 mod; > > Hi all, > > could you please review this trivial patch which removes @ignore from LFGarbageCollectedTest and adds it into problem-list instead? > > from 8249698: >> java/lang/invoke/LFCaching/LFGarbageCollectedTest.java is excluded from execution due to JDK-8078602. although the test might indeed fail due to JDK-8078602, it still can be useful and isn't harmful to run, therefore this test should be put in ProblemList.txt and @ignore is to be removed. > from main issue(8249618): >> although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). >> >> due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. > > webrev: http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8249698 > > Thanks, > -- Igor > > 8078602: https://bugs.openjdk.java.net/browse/JDK-8078602 > 8249618: https://bugs.openjdk.java.net/browse/JDK-8249618 From david.holmes at oracle.com Mon Jul 20 01:06:44 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 20 Jul 2020 11:06:44 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable Message-ID: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ This is a simple cleanup that touches files across a number of VM areas - hence the cross-post. Whilst working on a different JNI fix I noticed that in most cases in jni.cpp we were using the following form of make_local: JNIHandles::make_local(env, obj); and what that form does is first extract the thread from the JNIEnv: JavaThread* thread = JavaThread::thread_from_jni_environment(env); return thread->active_handles()->allocate_handle(obj); but there is also another, faster, variant for when you already have the "thread": jobject JNIHandles::make_local(Thread* thread, oop obj) { return thread->active_handles()->allocate_handle(obj); } When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread from the JNIEnv: JavaThread* thread=JavaThread::thread_from_jni_environment(env); and further defined: Thread* THREAD = thread; so we always already have direct access to the "thread" available (or indirect via TRAPS), and in fact we can end up removing the make_local(JNIEnv* env, oop obj) variant altogether. Along the way I spotted some related issues with unnecessary use of Thread::current() when it is already available from TRAPS, and some other cases where we extracted the JNIEnv from a thread only to later extract the thread from the JNIEnv. Testing: tiers 1 - 3 Thanks, David ----- From ningsheng.jian at arm.com Mon Jul 20 03:51:25 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 20 Jul 2020 11:51:25 +0800 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> Message-ID: <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> Hi Andrew and all, Since we are getting ready to propose Vector API target to JDK 16 [1]. I have regenerated webrev of aarch64 backend parts from panama repo, which has been rebased to jdk/jdk very recently, by: $ hg update vector-unstable && hg diff -r default > all.patch $ grep "diff -r" all.patch | grep -e "src/hotspot/cpu/aarch64" | awk '{print $4}' > aarch64_list $ ksh ./webrev.ksh -r default -o aarch64_webrev aarch64_list The new webrev: http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ Could you please help to take a look? Yang's previous webrevs can still be found at [2], with review comments addressed in the latest webrev above. [1] http://mail.openjdk.java.net/pipermail/hotspot-dev/2020-July/042427.html [2] http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/ Thanks, Ningsheng On 7/8/20 3:05 PM, Yang Zhang wrote: > Hi Andrew > > I have updated this patch. Could you please help to review it again? > In this patch, the following changes are made: > 1. Separate newly added NEON instructions to a new ad file > aarch64_neon.ad > 2. Add assembler tests for NEON instructions. Trailing spaces > in the python script are also removed. > > http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ > > Thanks, > Yang > > > -----Original Message----- > From: Andrew Haley > Sent: Tuesday, June 30, 2020 12:10 AM > To: Yang Zhang ; Viswanathan, Sandhya ; Paul Sandoz > Cc: nd ; hotspot-compiler-dev at openjdk.java.net; hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes > > On 29/06/2020 08:48, Yang Zhang wrote: >> 1. Instructions that can be matched with NEON instructions directly. >> MulVB, SqrtVF and AbsV have been merged into jdk master already. >> >> 2. Instructions that jdk master has middle end support for, but they cannot be matched with NEON instructions directly. >> Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These new instructions can be moved into jdk master first, but for auto-vectorization, the performance might not get improved. >> >> 3. Panama/Vector API specific instructions such as Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend etc. >> These instructions cannot be moved into jdk master first because there isn't middle-end support. >> >> I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also update aarch64_asmtest.py and macroassemler.cpp. When the patch is ready, I will send it again. > > Thank you *very* much for your hard work. Appreciated! > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > From david.holmes at oracle.com Mon Jul 20 04:16:49 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 20 Jul 2020 14:16:49 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> Message-ID: Subject line got truncated by accident ... On 20/07/2020 11:06 am, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 > webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ > > This is a simple cleanup that touches files across a number of VM areas > - hence the cross-post. > > Whilst working on a different JNI fix I noticed that in most cases in > jni.cpp we were using the following form of make_local: > > JNIHandles::make_local(env, obj); > > and what that form does is first extract the thread from the JNIEnv: > > JavaThread* thread = JavaThread::thread_from_jni_environment(env); > return thread->active_handles()->allocate_handle(obj); > > but there is also another, faster, variant for when you already have the > "thread": > > jobject JNIHandles::make_local(Thread* thread, oop obj) { > ? return thread->active_handles()->allocate_handle(obj); > } > > When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, WB_ENTRY, > UNSAFE_ENTRY etc) it has already extracted the thread from the JNIEnv: > > ??? JavaThread* thread=JavaThread::thread_from_jni_environment(env); > > and further defined: > > ??? Thread* THREAD = thread; > > so we always already have direct access to the "thread" available (or > indirect via TRAPS), and in fact we can end up removing the > make_local(JNIEnv* env, oop obj) variant altogether. > > Along the way I spotted some related issues with unnecessary use of > Thread::current() when it is already available from TRAPS, and some > other cases where we extracted the JNIEnv from a thread only to later > extract the thread from the JNIEnv. > > Testing: tiers 1 - 3 > > Thanks, > David > ----- From kim.barrett at oracle.com Mon Jul 20 05:22:49 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 20 Jul 2020 01:22:49 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> Message-ID: <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> > On Jul 20, 2020, at 12:16 AM, David Holmes wrote: > > Subject line got truncated by accident ... > > On 20/07/2020 11:06 am, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >> This is a simple cleanup that touches files across a number of VM areas - hence the cross-post. >> Whilst working on a different JNI fix I noticed that in most cases in jni.cpp we were using the following form of make_local: >> JNIHandles::make_local(env, obj); >> and what that form does is first extract the thread from the JNIEnv: >> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >> return thread->active_handles()->allocate_handle(obj); >> but there is also another, faster, variant for when you already have the "thread": >> jobject JNIHandles::make_local(Thread* thread, oop obj) { >> return thread->active_handles()->allocate_handle(obj); >> } >> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread from the JNIEnv: >> JavaThread* thread=JavaThread::thread_from_jni_environment(env); >> and further defined: >> Thread* THREAD = thread; >> so we always already have direct access to the "thread" available (or indirect via TRAPS), and in fact we can end up removing the make_local(JNIEnv* env, oop obj) variant altogether. >> Along the way I spotted some related issues with unnecessary use of Thread::current() when it is already available from TRAPS, and some other cases where we extracted the JNIEnv from a thread only to later extract the thread from the JNIEnv. >> Testing: tiers 1 - 3 >> Thanks, >> David >> ----- ------------------------------------------------------------------------------ src/hotspot/share/classfile/javaClasses.cpp 439 JNIEnv *env = thread->jni_environment(); Since env is no longer used on the next line, move this down to where it is used, at line 444. ------------------------------------------------------------------------------ src/hotspot/share/classfile/verifier.cpp 299 JNIEnv *env = thread->jni_environment(); env now seems to only be used at line 320. Move this closer. ------------------------------------------------------------------------------ src/hotspot/share/prims/jni.cpp 743 result = JNIHandles::make_local(THREAD, result_handle()); jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where previously it just used "thread". Maybe this change shouldn't be made? Or can the other uses be changed to THREAD for consistency? ------------------------------------------------------------------------------ src/hotspot/share/prims/jvm.cpp The calls to JvmtiExport::post_vm_object_alloc have to use "thread" instead of "THREAD", even though other places nearby are using "THREAD". That inconsistency is kind of unfortunate, but doesn't seem easily avoidable. ------------------------------------------------------------------------------ From david.holmes at oracle.com Mon Jul 20 05:53:37 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 20 Jul 2020 15:53:37 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> Message-ID: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Hi Kim, Thanks for looking at this. Updated webrev at: http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ On 20/07/2020 3:22 pm, Kim Barrett wrote: >> On Jul 20, 2020, at 12:16 AM, David Holmes wrote: >> >> Subject line got truncated by accident ... >> >> On 20/07/2020 11:06 am, David Holmes wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>> This is a simple cleanup that touches files across a number of VM areas - hence the cross-post. >>> Whilst working on a different JNI fix I noticed that in most cases in jni.cpp we were using the following form of make_local: >>> JNIHandles::make_local(env, obj); >>> and what that form does is first extract the thread from the JNIEnv: >>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>> return thread->active_handles()->allocate_handle(obj); >>> but there is also another, faster, variant for when you already have the "thread": >>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>> return thread->active_handles()->allocate_handle(obj); >>> } >>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread from the JNIEnv: >>> JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>> and further defined: >>> Thread* THREAD = thread; >>> so we always already have direct access to the "thread" available (or indirect via TRAPS), and in fact we can end up removing the make_local(JNIEnv* env, oop obj) variant altogether. >>> Along the way I spotted some related issues with unnecessary use of Thread::current() when it is already available from TRAPS, and some other cases where we extracted the JNIEnv from a thread only to later extract the thread from the JNIEnv. >>> Testing: tiers 1 - 3 >>> Thanks, >>> David >>> ----- > > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/javaClasses.cpp > 439 JNIEnv *env = thread->jni_environment(); > > Since env is no longer used on the next line, move this down to where > it is used, at line 444. Fixed. > ------------------------------------------------------------------------------ > src/hotspot/share/classfile/verifier.cpp > 299 JNIEnv *env = thread->jni_environment(); > > env now seems to only be used at line 320. Move this closer. Fixed. > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jni.cpp > 743 result = JNIHandles::make_local(THREAD, result_handle()); > > jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where > previously it just used "thread". Maybe this change shouldn't be made? > Or can the other uses be changed to THREAD for consistency? "thread" and "THREAD" are interchangeable for anything expecting a "Thread*" (and somewhat surprisingly a number of API's that only work for JavaThreads actually take a Thread*. :( ). I had choice between trying to be file-wide consistent with the make_local calls, versus local-code consistent, and used THREAD as it is available in both JNI_ENTRY and via TRAPS. But I can certainly make a local change to "thread" for local consistency. > ------------------------------------------------------------------------------ > src/hotspot/share/prims/jvm.cpp > > The calls to JvmtiExport::post_vm_object_alloc have to use "thread" > instead of "THREAD", even though other places nearby are using > "THREAD". That inconsistency is kind of unfortunate, but doesn't seem > easily avoidable. Everything that uses THREAD in a JVM_ENTRY method can be changed to use "thread" instead. But I'm not sure it's a consistency worth pursuing at least as part of these changes (there are likely similar issues with most of the touched files). Thanks, David > ------------------------------------------------------------------------------ > From kim.barrett at oracle.com Mon Jul 20 06:15:13 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 20 Jul 2020 02:15:13 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: > On Jul 20, 2020, at 1:53 AM, David Holmes wrote: > > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ Looks good. > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes wrote: >> src/hotspot/share/prims/jni.cpp >> 743 result = JNIHandles::make_local(THREAD, result_handle()); >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a "Thread*" (and somewhat surprisingly a number of API's that only work for JavaThreads actually take a Thread*. :( ). I had choice between trying to be file-wide consistent with the make_local calls, versus local-code consistent, and used THREAD as it is available in both JNI_ENTRY and via TRAPS. But I can certainly make a local change to "thread" for local consistency. I don?t feel strongly either way. It just struck me as a little odd to have the mix in close proximity, especially since I think consistently using either one might work in this function. But being consistent about make_local usage has something to be said for it too. >> src/hotspot/share/prims/jvm.cpp >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD". That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to use "thread" instead. But I'm not sure it's a consistency worth pursuing at least as part of these changes (there are likely similar issues with most of the touched files). Yeah, it?s not really obvious whether to use THREAD or thread in some cases. But I agree that addressing any inconsistencies there is mostly out of scope for this change. From tobias.hartmann at oracle.com Mon Jul 20 07:46:34 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 09:46:34 +0200 Subject: RFR(S): 8248901: Signed immediate support in .../share/assembler.hpp is broken. In-Reply-To: <3df3dab6-aa2f-bbbc-d231-6cda8f2a0ff7@oracle.com> References: <3df3dab6-aa2f-bbbc-d231-6cda8f2a0ff7@oracle.com> Message-ID: <91ddfdac-2ce4-637c-b68c-7e042d67483f@oracle.com> Hi Patric, looks good to me. Best regards, Tobias On 07.07.20 13:00, Patric Hedlin wrote: > Dear all, > > I would like to ask for help to review the following change/update: > > Issue:? https://bugs.openjdk.java.net/browse/JDK-8248901 > Webrev: http://cr.openjdk.java.net/~phedlin/tr8248901/ > > > Current definition(s) of is_simm() and friends are not robust over inputs. Both min and max values > are undefined for width > 32 (and width < 0). > No is_uimm() is currently provided (added). Several definitions are not used (cleanup). > > NOTE: Adding currently unused is_simm9() and is_uimm12(), required by JDK-8247766. > > > Testing: hs-tier1-3 > > > Best regards, > Patric From jamsheed.c.m at oracle.com Mon Jul 20 07:52:21 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Mon, 20 Jul 2020 13:22:21 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> Message-ID: <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> Hi Vladimir, Thank you for the review, I have updated the test http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.02/ Hi all, Could I get another review ? Best regards, Jamsheed On 18/07/2020 00:09, Vladimir Kozlov wrote: > Yes, I agree with webrev_fix_EA version. > > I would suggest to modify TestIdealAllocShape.java test to add new > method with synchronization from your example in JBS comment. Or add > it as separate test. > > Thanks, > Vladimir > > On 7/16/20 9:19 AM, Jamsheed C M wrote: >> Hi Vladimir, >> I ran performance run for >> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/? (links in JBS) >> I don't see any issues, so i would like to go with webrev_fix_EA if >> it fixes all the reported issues. >> Best regards, >> Jamsheed >> >> On 16/07/2020 07:25, Jamsheed C M wrote: >>> Hi Vladimir, >>> >>> On 16/07/2020 00:29, Vladimir Kozlov wrote: >>>> As I said before I agree with your additional checks for StoreN and >>>> StoreNKlass. >>>> >>>> But I have concerns about new is_init_captured_store code. EA is >>>> mostly looking only on inputs to see Allocation. And in several >>>> places it expecting only to see Allocation because other cases >>>> should be filtered out before. >>> If that is the case, I would like to go with my first webrev for >>> this fix as it nicely propagate es and there in no unnecessary >>> promotion to global escape state. >>> >>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>> >>> Best regards, >>> >>> Jamsheed >>> >>>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/15/20 10:54 AM, Jamsheed C M wrote: >>>>> Hi Vladimir, >>>>> >>>>> with unrolling i understand that many cases will just have phis >>>>> everywhere to outside the loop as the uses are outside the loop. >>>>> >>>>> and this is not restricted to escaping objects alone as i >>>>> depicted. it can be escaping as well as non-escaping. >>>>> >>>>> so marking store to them as global escape doesn't seems to be nice >>>>> idea. i will rework on this fix and get back again. >>>>> >>>>> Thank you >>>>> >>>>> Best regards >>>>> >>>>> Jamsheed >>>>> >>>>> On 15/07/2020 08:38, Jamsheed C M wrote: >>>>>> (unfinished mail got sent, so completing it) >>>>>> On 15/07/2020 08:21, Jamsheed C M wrote: >>>>>>> Hi Vladimir, >>>>>>> >>>>>>> On 15/07/2020 06:50, Vladimir Kozlov wrote: >>>>>>>> I looked more on this. EA already does not secularize >>>>>>>> allocations when Phi nodes merged them - it should handle this >>>>>>>> case. I did small experiment and relaxed assert for this new >>>>>>>> (10. needs comment update) case for AddP's base and test passed: >>>>>>>> >>>>>>>> src/hotspot/share/opto/escape.cpp Tue Jul 14 18:11:27 2020 -0700 >>>>>>>> @@ -2357,6 +2357,7 @@ >>>>>>>> ?????? int opcode = uncast_base->Opcode(); >>>>>>>> ?????? assert(opcode == Op_ConP || opcode == Op_ThreadLocal || >>>>>>>> ????????????? opcode == Op_CastX2P || >>>>>>>> uncast_base->is_DecodeNarrowPtr() || >>>>>>>> +???????????? (uncast_base->is_Phi() && >>>>>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>>>> ????????????? (uncast_base->is_Mem() && >>>>>>>> (uncast_base->bottom_type()->isa_rawptr() != NULL)) || >>>>>>>> ????????????? (uncast_base->is_Proj() && >>>>>>>> uncast_base->in(0)->is_Allocate()), "sanity"); >>>>>>>> ???? } >>>>>>>> >>>>>>>> Did you hit a case when this may not work? >>>>>>> >>>>>>> Yes, right it already doesn't mark it as scalarizable if base >>>>>>> count is more than one(I think it missed a is_oop check there)[1]. >>>>>>> >>>>>>> EA CG adds edges only for oop field making stores to them >>>>>>> undetected. This makes these stored objects to NoEscape and if >>>>>>> compiled method continues execution with this NoEscape object >>>>>>> can have undesired results(i.e synchronization removed). >>>>>>> >>>>>>> Probable case would be(didn't verify) >>>>>>> >>>>>>> try { >>>>>>> >>>>>>> LOOP BEGIN >>>>>>> >>>>>>> ? try {throw new Obj()} catch {} >>>>>>> >>>>>>> LOOP END >>>>>>> >>>>>>> } catch (Obj e) { >>>>>>> >>>>>>> } >>>>>> >>>>>> Best Regards, >>>>>> >>>>>> Jamsheed >>>>>> >>>>>> [1]https://github.com/openjdk/jdk/blob/master/src/hotspot/share/opto/escape.cpp#L1770 >>>>>> >>>>>> >>>>>> >>>>>>>> >>>>>>>> >>>>>>>> And with LoopOpts off -XX:LoopUnrollLimit=0 it removed >>>>>>>> allocation (-XX:+PrintEscapeAnalysis >>>>>>>> -XX:+PrintEliminateAllocations): >>>>>>>> >>>>>>>> ======== Connection graph for? Test::test >>>>>>>> JavaObject NoEscape(NoEscape) [ 158F [ 107 ]]?? 95 Allocate === >>>>>>>> 242? 76? 230? 8? 1 ( 93? 92? 21? 1? 78 1 78 ) [[ 96 97 98 105 >>>>>>>> 106? 107 ]]? rawptr:NotNull ( int:>=0, java/lang/Object:NotNull >>>>>>>> *, bool, top ) Test::test1 @ bci:0 Test::test @ bci:8 !jvms: >>>>>>>> Test::test1 @ bci:0 Test::test @ bci:8 >>>>>>>> LocalVar [ 95P [ 158b ]]?? 107??? Proj??? ===? 95 [[ 108 158 ]] >>>>>>>> #5 !jvms: Test::test1 @ bci:0 Test::test @ bci:8 >>>>>>>> >>>>>>>> Scalar? 95??? Allocate??? ===? 242? 76? 230? 8? 1 ( 93 92? 21 1 >>>>>>>> 78 1? 78 ) [[ 96? 97? 98? 105? 106? 107 ]] rawptr:NotNull ( >>>>>>>> int:>=0, java/lang/Object:NotNull *, bool, top ) Test::test1 @ >>>>>>>> bci:0 Test::test @ bci:8 !jvms: Test::test1 @ bci:0 Test::test >>>>>>>> @ bci:8 >>>>>>>> ++++ Eliminated: 95 Allocate >>>>>>>> >>>>>>>> >>>>>>>> t\Thanks, >>>>>>>> Vladimir K >>>>>>>> >>>>>>>> On 7/14/20 1:28 AM, Jamsheed C M wrote: >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I had incorrectly added extra check in assert after offset >>>>>>>>> computation in address_offset . For addps with non constant >>>>>>>>> offsets (like [1]) >>>>>>>>> >>>>>>>>> Not changing the old assert even though I am not expecting >>>>>>>>> first addp/second addp(for array addressing) case for init >>>>>>>>> captured store. >>>>>>>>> >>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA_asserts_corrected/ >>>>>>>>> >>>>>>>>> >>>>>>>>> Best regards, >>>>>>>>> >>>>>>>>> Jamsheed >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> >>>>>>>>> assert(offs != Type::OffsetBot || >>>>>>>>> - adr->in(AddPNode::Address)->in(0)->is_AllocateArray(), >>>>>>>>> + adr->in(AddPNode::Address)->in(0)->is_AllocateArray() || >>>>>>>>> is_captured_store(adr), >>>>>>>>> ???????????? "offset must be a constant or it is >>>>>>>>> initialization of array"); >>>>>>>>> >>>>>>>>> On 13/07/2020 11:14, Jamsheed C M wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I reworked the fix. I compute offset for all init captures >>>>>>>>>> stores, but treats this special init captured stores similar >>>>>>>>>> to unsafe(as these objects are usually GlobalEscape and >>>>>>>>>> doesn't have any perf implications). >>>>>>>>>> >>>>>>>>>> revised webrev: >>>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.01/ >>>>>>>>>> >>>>>>>>>> testing: mach1-5( logs in jbs) >>>>>>>>>> >>>>>>>>>> Best regards, >>>>>>>>>> >>>>>>>>>> Jamsheed >>>>>>>>>> >>>>>>>>>> On 09/07/2020 19:36, Jamsheed C M wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> request to hold the review. need to change the code for >>>>>>>>>>> dealing with unsafe access. as current capture code go for >>>>>>>>>>> more execution time analyzing things. >>>>>>>>>>> >>>>>>>>>>> Best regards, >>>>>>>>>>> >>>>>>>>>>> Jamsheed >>>>>>>>>>> >>>>>>>>>>> On 09/07/2020 13:01, Jamsheed C M wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8242895 >>>>>>>>>>>> >>>>>>>>>>>> Request for review changes made to offset computation and >>>>>>>>>>>> field write detection for init captured stores due to phis >>>>>>>>>>>> addition between alloc and init. This happen if init node >>>>>>>>>>>> in different outer loop wrt to alloc node and there is a >>>>>>>>>>>> loop opt.? This was required as a result of enhancement [1]. >>>>>>>>>>>> >>>>>>>>>>>> Normally init are not associated with multiple alloc node >>>>>>>>>>>> during EA phase, but changes done for [1] caused the code >>>>>>>>>>>> shapes of the form [2]? to generate inits associated with >>>>>>>>>>>> multiple alloc node. >>>>>>>>>>>> >>>>>>>>>>>> This had implication in offset computation and field write >>>>>>>>>>>> detection related to initializing stores. >>>>>>>>>>>> >>>>>>>>>>>> Attempt to fix in EA: >>>>>>>>>>>> >>>>>>>>>>>> ???? webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA/ >>>>>>>>>>>> >>>>>>>>>>>> Alternate fix: >>>>>>>>>>>> >>>>>>>>>>>> ???? Minimize the scenario in compiler generated code by >>>>>>>>>>>> throwing only j.l.Error from slowpath(all exception >>>>>>>>>>>> async/sync are handled in runtime exit). >>>>>>>>>>>> >>>>>>>>>>>> ???? Stub epilog doesn't poll or throw any exceptions. >>>>>>>>>>>> Disable full loop opt before EA for detectable patterns and >>>>>>>>>>>> bailout EA for late detected patterns. >>>>>>>>>>>> >>>>>>>>>>>> ???? webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_deopt/ >>>>>>>>>>>> >>>>>>>>>>>> Please advice. >>>>>>>>>>>> >>>>>>>>>>>> Testing : mach tier1-5 (logs in jbs) >>>>>>>>>>>> >>>>>>>>>>>> Best regards, >>>>>>>>>>>> >>>>>>>>>>>> Jamsheed >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> [1] JDK-8231291 >>>>>>>>>>>> C2: loop >>>>>>>>>>>> opts before EA should maximally unroll loops >>>>>>>>>>>> >>>>>>>>>>>> [2] that have its init node in different outer loop wrt to >>>>>>>>>>>> alloc node. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> loop begin >>>>>>>>>>>> >>>>>>>>>>>> ?? try{ >>>>>>>>>>>> >>>>>>>>>>>> ?? return new obj()/? throw new obj()/ uncommon trap after >>>>>>>>>>>> allocation, in a loop >>>>>>>>>>>> >>>>>>>>>>>> ?? } catch(ex) { >>>>>>>>>>>> >>>>>>>>>>>> ?? } >>>>>>>>>>>> >>>>>>>>>>>> loop end >>>>>>>>>>>> >>>>>>>>>>>> ? 42???? public static IntA test(int n) { >>>>>>>>>>>> ?? 43???????? for (int i=0; i<2; i++) { >>>>>>>>>>>> ?? 44???????????? try { >>>>>>>>>>>> ?? 45?????????????????? return new IntA(n + i); >>>>>>>>>>>> ?? 46???????????? } catch (Exception e) { >>>>>>>>>>>> ?? 47???????????? } >>>>>>>>>>>> ?? 48???????? } >>>>>>>>>>>> ?? 49 >>>>>>>>>>>> From david.holmes at oracle.com Mon Jul 20 07:53:48 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 20 Jul 2020 17:53:48 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <6e0d9af0-92f0-1eba-fc0a-22eebf008fe0@oracle.com> Thanks Kim! David On 20/07/2020 4:15 pm, Kim Barrett wrote: >> On Jul 20, 2020, at 1:53 AM, David Holmes wrote: >> >> Hi Kim, >> >> Thanks for looking at this. >> >> Updated webrev at: >> >> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > Looks good. > >> >> On 20/07/2020 3:22 pm, Kim Barrett wrote: >>>> On Jul 20, 2020, at 12:16 AM, David Holmes wrote: >>> src/hotspot/share/prims/jni.cpp >>> 743 result = JNIHandles::make_local(THREAD, result_handle()); >>> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >>> previously it just used "thread". Maybe this change shouldn't be made? >>> Or can the other uses be changed to THREAD for consistency? >> >> "thread" and "THREAD" are interchangeable for anything expecting a "Thread*" (and somewhat surprisingly a number of API's that only work for JavaThreads actually take a Thread*. :( ). I had choice between trying to be file-wide consistent with the make_local calls, versus local-code consistent, and used THREAD as it is available in both JNI_ENTRY and via TRAPS. But I can certainly make a local change to "thread" for local consistency. > > I don?t feel strongly either way. It just struck me as a little odd to have the mix in close proximity, > especially since I think consistently using either one might work in this function. But being consistent > about make_local usage has something to be said for it too. > >>> src/hotspot/share/prims/jvm.cpp >>> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >>> instead of "THREAD", even though other places nearby are using >>> "THREAD". That inconsistency is kind of unfortunate, but doesn't seem >>> easily avoidable. >> >> Everything that uses THREAD in a JVM_ENTRY method can be changed to use "thread" instead. But I'm not sure it's a consistency worth pursuing at least as part of these changes (there are likely similar issues with most of the touched files). > > Yeah, it?s not really obvious whether to use THREAD or thread in some cases. > But I agree that addressing any inconsistencies there is mostly out of scope for > this change. > From tobias.hartmann at oracle.com Mon Jul 20 08:23:58 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 10:23:58 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> Message-ID: <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> Hi, On 08.07.20 10:26, Liu, Xin wrote: > ControlIntrinsic/DisableIntrinsic in compiler directives are more complex. The matched directive is only parsed when hotspot attempts to compile the corresponding method. > > I validate at that time and JVM will crash if it doesnot meet guarantee() statement. I don't think a guarantee should be used here, i.e. the VM shouldn't crash but we should exit gracefully with an error message. Isn't it possible to piggy-back on the error mechanism in DirectivesParser? > I added Method::external_name_short() which only returns the shorter method name in the form of "classname::method". > > Probably hotspot has had similar code, but I failed to discover. please let me know and I will remove it. I would just use name_and_sig_as_C_string(). jvmFlagConstraintList.cpp:180/181 - Wrong indentation jvmFlagConstraintsCompiler.cpp:388/400 - Maybe change the error message to "Unrecognized intrinsic detected in DisableIntrinsic [...]" Best regards, Tobias From tobias.hartmann at oracle.com Mon Jul 20 08:29:10 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 10:29:10 +0200 Subject: [16] RFR(XS): 8248467: C2: compiler/intrinsics/object/TestClone fails with -XX:+VerifyGraphEdges In-Reply-To: <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> References: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> Message-ID: <2ac39054-e9bf-d7a8-2dcc-a954d1a94abf@oracle.com> +1 Best regards, Tobias On 15.07.20 19:26, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 7/15/20 8:04 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8248467 >> http://cr.openjdk.java.net/~chagedorn/8248467/webrev.00/ >> >> The assertion is hit due to a MemBarNode whose precedence edge was set to NULL at [1] >> (result_phi_rawoop is NULL and _resproj is the precedence edge to a MemBarStoreStore). This is >> possible since JDK-8237581 [2] which can remove some allocations. The fix just adds this >> additional case in the assert. >> >> Best regards, >> Christian >> >> >> [1] http://hg.openjdk.java.net/jdk/jdk/file/4a8fd81d64ba/src/hotspot/share/opto/macro.cpp#l1566 >> [2] https://bugs.openjdk.java.net/browse/JDK-8237581 From tobias.hartmann at oracle.com Mon Jul 20 08:32:47 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 10:32:47 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> Message-ID: +1 Best regards, Tobias On 15.07.20 19:37, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir K > > On 7/15/20 12:58 AM, Christian Hagedorn wrote: >> Hi Vladimir >> >> On 14.07.20 20:46, Vladimir Kozlov wrote: >>> Can you move next up to where other small find*() methods are defined?: >>> >>> +Node* Node::find_ctrl(int idx) { >>> +? return find(idx, true); >>> ??} >>> >>> Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It is hard to find where this >>> not product code ends. >>> >>> Looks good otherwise. >> >> Thanks, I added these changes in a new webrev: >> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.02/ >> >> Best regards, >> Christian >> >> >>> Thanks, >>> Vladimir >>> >>> On 7/14/20 2:54 AM, Christian Hagedorn wrote: >>>> Hi Vladimir >>>> >>>> On 13.07.20 19:43, Vladimir Kozlov wrote: >>>>> Node::find_ctrl() is used during debugging when you want to print and look on only control nodes. >>>>> We have several such methods which are only used in debugger. >>>> >>>> I see, I restored this method and changed Node::find() accordingly. I additionally added two >>>> find_ctrl() methods to make it easier to call it from a debugger (as already present for >>>> find_node()). >>>> >>>>> I suggest to store old_arena() in local var and pass into add_to_worklist(). >>>>> >>>>> You can make add_to_worklist() static since you pass node as argument. >>>> >>>> Okay. I updated this and the change above in a new webrev: >>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ >>>> >>>> Best regards, >>>> Christian >>>> >>>>> Thanks, >>>>> Vladimir >>>>> >>>>> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>>>>> Ping - could anyone review it, please? Thanks! >>>>>> >>>>>> Best regards, >>>>>> Christian >>>>>> >>>>>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>>>>> Hi >>>>>>> >>>>>>> Please review the following patch: >>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>>>>> >>>>>>> The testcase creates a deep graph with a lot of nodes on a chain. When running with the >>>>>>> specified test flags, it recursively calls Node::find_recur() for each node discovered which >>>>>>> eventually results in a segmentation fault due to a stack overflow (around 10000 calls due to >>>>>>> such a long chain of nodes). The fix just converts the recursive algorithm into an iterative >>>>>>> one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. >>>>>>> >>>>>>> I additionally removed Node::find_ctrl() and its special handling in the algorithm since it >>>>>>> is not used. >>>>>>> >>>>>>> There is actually another problem with the recursive version. When running the testcase >>>>>>> without -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because >>>>>>> there is a debug_orig node cycle and the loop does not break based on the debug_orig nodes >>>>>>> being visited. This is also fixed in the patch. >>>>>>> >>>>>>> Thank you! >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> >>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From tobias.hartmann at oracle.com Mon Jul 20 09:14:01 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 11:14:01 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> Message-ID: <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> Hi Christian, On 15.07.20 15:08, Christian Hagedorn wrote: > http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ Looks good to me. Some code style comments: cfgnode.cpp:1083 - There's an extra whitespace before "," loopopts.cpp:84/86 - No need for extra brackets Please make sure to run performance testing. Best regards, Tobias From tobias.hartmann at oracle.com Mon Jul 20 09:50:45 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 11:50:45 +0200 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> Message-ID: <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> Hi Jamsheed, On 20.07.20 09:52, Jamsheed C M wrote: > http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.02/ Looks good to me too. Some style comments: escape.cpp: - line 2250: Maybe rename to "is_captured_store_address" or something similar - line 2254: just move _igvn->type into the assert - line 2257: wrong indentation - line 2996: "assocaited" -> "associated" Best regards, Tobias From jamsheed.c.m at oracle.com Mon Jul 20 13:30:32 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Mon, 20 Jul 2020 19:00:32 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> Message-ID: <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> Hi Tobias, Thank you for the review and the feedback. Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ Best regards, Jamsheed On 20/07/2020 15:20, Tobias Hartmann wrote: > Hi Jamsheed, > > On 20.07.20 09:52, Jamsheed C M wrote: >> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.02/ > Looks good to me too. Some style comments: > > escape.cpp: > - line 2250: Maybe rename to "is_captured_store_address" or something similar > - line 2254: just move _igvn->type into the assert > - line 2257: wrong indentation > - line 2996: "assocaited" -> "associated" > > Best regards, > Tobias From tobias.hartmann at oracle.com Mon Jul 20 13:35:01 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 20 Jul 2020 15:35:01 +0200 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> Message-ID: <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> Hi Jamsheed, On 20.07.20 15:30, Jamsheed C M wrote: > Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev required. Best regards, Tobias From jamsheed.c.m at oracle.com Mon Jul 20 13:48:04 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Mon, 20 Jul 2020 19:18:04 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> Message-ID: <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> Hi Tobias, On 20/07/2020 19:05, Tobias Hartmann wrote: > Hi Jamsheed, > > On 20.07.20 15:30, Jamsheed C M wrote: >> Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ > You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev required. Missed removing it. Thank you for the review. Best regards, Jamsheed > > Best regards, > Tobias From patric.hedlin at oracle.com Mon Jul 20 14:03:24 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 20 Jul 2020 16:03:24 +0200 Subject: RFR(S): 8248901: Signed immediate support in .../share/assembler.hpp is broken. In-Reply-To: <91ddfdac-2ce4-637c-b68c-7e042d67483f@oracle.com> References: <3df3dab6-aa2f-bbbc-d231-6cda8f2a0ff7@oracle.com> <91ddfdac-2ce4-637c-b68c-7e042d67483f@oracle.com> Message-ID: <378b61b1-06a6-82ca-9c5f-eb76e024292f@oracle.com> Thanks for reviewing Tobias. /Patric On 2020-07-20 09:46, Tobias Hartmann wrote: > Hi Patric, > > looks good to me. > > Best regards, > Tobias > > On 07.07.20 13:00, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8248901 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8248901/ >> >> >> Current definition(s) of is_simm() and friends are not robust over inputs. Both min and max values >> are undefined for width > 32 (and width < 0). >> No is_uimm() is currently provided (added). Several definitions are not used (cleanup). >> >> NOTE: Adding currently unused is_simm9() and is_uimm12(), required by JDK-8247766. >> >> >> Testing: hs-tier1-3 >> >> >> Best regards, >> Patric From igor.ignatyev at oracle.com Mon Jul 20 16:13:35 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 20 Jul 2020 09:13:35 -0700 Subject: [15] RFR(T) : 8249698 : java/lang/invoke/LFCaching/LFGarbageCollectedTest.java should be ProblemList-ed and not @ignored In-Reply-To: References: <61EBB792-FAF4-4DFD-A674-4BE7153F20AA@oracle.com> Message-ID: <74D1782A-4AA1-44CB-98DC-BD038B263F3A@oracle.com> Mandy, Vladimir, thanks for your reviews, pushed to jdk15. -- Igor > On Jul 18, 2020, at 9:33 PM, Mandy Chung wrote: > > +1 > > Mandy > > On 7/17/20 8:57 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 >>> 3 lines changed: 1 ins; 1 del; 1 mod; >> >> Hi all, >> >> could you please review this trivial patch which removes @ignore from LFGarbageCollectedTest and adds it into problem-list instead? >> >> from 8249698: >>> java/lang/invoke/LFCaching/LFGarbageCollectedTest.java is excluded from execution due to JDK-8078602. although the test might indeed fail due to JDK-8078602, it still can be useful and isn't harmful to run, therefore this test should be put in ProblemList.txt and @ignore is to be removed. >> from main issue(8249618): >>> although ProblemList and @ignore achieve the same end result (test exclusion), their server different goals and have slightly different meanings, simplified @ignore should be used to exclude useless or harmful tests, and ProblemList in all other cases (see yet-not-integrated `ProblemListing or `@ignore`-ing a Test` section of dev guide, PR -- https://github.com/openjdk/guide/pull/21 for more details). >>> >>> due to different reasons, this hasn't been always followed and some currently @ignore-d tests should rather be ProblemList-ed, and some of ProblemList-ed should be @ignore-d, this issue is to clean up the current state in a hope that this will reduce further confusion. >> >> webrev: http://cr.openjdk.java.net/~iignatyev//8249698/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249698 >> >> Thanks, >> -- Igor >> >> 8078602: https://bugs.openjdk.java.net/browse/JDK-8078602 >> 8249618: https://bugs.openjdk.java.net/browse/JDK-8249618 From daniel.daugherty at oracle.com Mon Jul 20 17:07:10 2020 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Mon, 20 Jul 2020 13:07:10 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <328fb322-5b14-968b-7b13-4b449a8d98fd@oracle.com> On 7/20/20 1:53 AM, David Holmes wrote: > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ I like this cleanup very much! src/hotspot/share/classfile/javaClasses.cpp ??? No comments. src/hotspot/share/classfile/verifier.cpp ??? L298: ? JavaThread* thread = (JavaThread*)THREAD; ??? L307: ? ResourceMark rm(THREAD); ??????? Since we've gone to the trouble of creating the 'thread' variable, ??????? I would prefer it to be used instead of THREAD where possible. src/hotspot/share/jvmci/jvmciCompilerToVM.cpp ??? L1021: ? HandleMark hm; ??????? Can this be 'hm(THREAD)'? (Not your problem, but while you're ??????? in that file?) src/hotspot/share/prims/jni.cpp ??? No comments. src/hotspot/share/prims/jvm.cpp ??? L140: ? ResourceMark rm; ??????? Can this be 'rm(THREAD)'? (Not your problem, but while you're ??????? in that file?) ??? L611: ? Handle stackStream_h(THREAD, JNIHandles::resolve_non_null(stackStream)); ??? L617: ? objArrayHandle frames_array_h(THREAD, fa); ??? L626: ? return JNIHandles::make_local(THREAD, result); ??????? Since we've gone to the trouble of creating the 'jt' variable, ??????? I would prefer it to be used instead of THREAD where possible. ??? L767: ? vframeStream vfst(thread); ??? L788???????? return (jclass) JNIHandles::make_local(THREAD, m->method_holder()->java_mirror()); ??????? Can we use 'thread' on L788? (preferred) ??????? Can we use 'THREAD' on L767? (less preferred) ??? L949: ? ResourceMark rm(THREAD); ??? L951: ? Handle class_loader (THREAD, JNIHandles::resolve(loader)); ??? L955: ?????????????????????????? THREAD); ??? L957: ? Handle protection_domain (THREAD, JNIHandles::resolve(pd)); ??? L968: ? return (jclass) JNIHandles::make_local(THREAD, k->java_mirror()); ??????? Since we've gone to the trouble of creating the 'jt' variable, ??????? I would prefer it to be used instead of THREAD where possible. ??? L986: ? JavaThread* jt = (JavaThread*) THREAD; ??????? This 'jt' is unused and can be deleted (Not your problem, but while you're ??????? in that file?) ??? L1154: ? while (*p != '\0') { ??? L1155: ????? if (*p == '.') { ??? L1156: ????????? *p = '/'; ??? L1157: ????? } ??? L1158: ????? p++; ??????? Nit - the indents are wrong on L1155-58. (Not your problem, but while you're ??????? in that file?) ??? L1389: ? ResourceMark rm(THREAD); ??? L1446: ??? return JNIHandles::make_local(THREAD, result); ??? L1460: ? return JNIHandles::make_local(THREAD, result); ??????? Can we use 'thread' on L1389? (preferred) And then the line you ??????? touched could also be 'thread' and we'll be consistent in this ??????? function... ??? L3287: ? oop jthread = thread->threadObj(); ??? L3288: ? assert (thread != NULL, "no current thread!"); ??????? I think the assert is wrong. It should be: ??????????? assert(jthread != NULL, "no current thread!"); ??????? If 'thread == NULL', then we would have crashed at L3287. ??????? Also notice that I deleted the extra ' ' before '('. (Not ??????? your problem, but while you're in that file?) ??? L3289: ? return JNIHandles::make_local(THREAD, jthread); ??????? Can you use 'thread' instead of 'THREAD' here for consistency? ??? L3681: ??? method_handle = Handle(THREAD, JNIHandles::resolve(method)); ??? L3682: ??? Handle receiver(THREAD, JNIHandles::resolve(obj)); ??? L3683: ??? objArrayHandle args(THREAD, objArrayOop(JNIHandles::resolve(args0))); ??? L3685: ??? jobject res = JNIHandles::make_local(THREAD, result); ??????? Can you use 'thread' instead of 'THREAD' here for consistency? ??? L3705: ? objArrayHandle args(THREAD, objArrayOop(JNIHandles::resolve(args0))); ??? L3707?? jobject res = JNIHandles::make_local(THREAD, result); ??????? Can you use 'thread' instead of 'THREAD' here for consistency? src/hotspot/share/prims/methodHandles.cpp ??? No comments. src/hotspot/share/prims/methodHandles.hpp ??? No comments. src/hotspot/share/prims/unsafe.cpp ??? No comments. src/hotspot/share/prims/whitebox.cpp ??? No comments. src/hotspot/share/runtime/jniHandles.cpp ??? No comments. src/hotspot/share/runtime/jniHandles.hpp ??? No comments. src/hotspot/share/services/management.cpp ??? No comments. None of my comments above are "must do". If you choose to make the changes, a new webrev isn't required, but would be useful for a sanity check. Thumbs up. Dan > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes >>> wrote: >>> >>> Subject line got truncated by accident ... >>> >>> On 20/07/2020 11:06 am, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>> This is a simple cleanup that touches files across a number of VM >>>> areas - hence the cross-post. >>>> Whilst working on a different JNI fix I noticed that in most cases >>>> in jni.cpp we were using the following form of make_local: >>>> JNIHandles::make_local(env, obj); >>>> and what that form does is first extract the thread from the JNIEnv: >>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>> return thread->active_handles()->allocate_handle(obj); >>>> but there is also another, faster, variant for when you already >>>> have the "thread": >>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>> ?? return thread->active_handles()->allocate_handle(obj); >>>> } >>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>> from the JNIEnv: >>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>> and further defined: >>>> ???? Thread* THREAD = thread; >>>> so we always already have direct access to the "thread" available >>>> (or indirect via TRAPS), and in fact we can end up removing the >>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>> Along the way I spotted some related issues with unnecessary use of >>>> Thread::current() when it is already available from TRAPS, and some >>>> other cases where we extracted the JNIEnv from a thread only to >>>> later extract the thread from the JNIEnv. >>>> Testing: tiers 1 - 3 >>>> Thanks, >>>> David >>>> ----- >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/javaClasses.cpp >> ? 439???? JNIEnv *env = thread->jni_environment(); >> >> Since env is no longer used on the next line, move this down to where >> it is used, at line 444. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ? 299?? JNIEnv *env = thread->jni_environment(); >> >> env now seems to only be used at line 320.? Move this closer. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >> >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a > "Thread*" (and somewhat surprisingly a number of API's that only work > for JavaThreads actually take a Thread*. :( ). I had choice between > trying to be file-wide consistent with the make_local calls, versus > local-code consistent, and used THREAD as it is available in both > JNI_ENTRY and via TRAPS. But I can certainly make a local change to > "thread" for local consistency. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvm.cpp >> >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to > use "thread" instead. But I'm not sure it's a consistency worth > pursuing at least as part of these changes (there are likely similar > issues with most of the touched files). > > Thanks, > David > >> ------------------------------------------------------------------------------ >> >> From vladimir.a.ivanov at intel.com Mon Jul 20 17:12:25 2020 From: vladimir.a.ivanov at intel.com (Ivanov, Vladimir A) Date: Mon, 20 Jul 2020 17:12:25 +0000 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: HI, The updated patch available as http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.03/ It use the ?fgets? instead of ?getline? to use local memory. The tier1 tests passed on the release and fastdebug builds on Linux and fastdebug builds on MacOS systems. Testing results same for patched and non-patched builds. Thanks, Vladmir From: Thomas St?fe Sent: Friday, July 17, 2020 10:25 PM To: Ivanov, Vladimir A Cc: Vladimir Kozlov ; Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 Oh, sorry, you are right :( I was under the assumption you wanted to call os::cpu_microcode_revision() directly from within VMError::report(). During initialization using c-heap like this should not be a problem and you can forget about 9/10ths of what I wrote, sorry. In that case your original variant is fine, my only suggestion would be to clearly mark the free as ::free() with a comment to prevent someone from correcting it to os::free. Thank you, Thomas On Sat, Jul 18, 2020 at 7:08 AM Ivanov, Vladimir A > wrote: Hi, seems, this info created during initialization phase. Is it correct? Collect or parse common info at the crash point usually not a good idea. During initialization usage of the c-heap not a problem. The ?::free? work OK here. At least tier1 test produce same results for patched and non-patched builds. But these tests not generates real case for hs_err files. It looks like 2k byte array enough for the one record for CPU from cpuinfo file. Will update code to use local buffer. Thanks, Vladimir From: Thomas St?fe > Sent: Friday, July 17, 2020 9:42 PM To: Ivanov, Vladimir A > Cc: Vladimir Kozlov >; Hotspot dev runtime >; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 Hi, yes, you must use the raw free here (for the same reason we cannot pass in an os::malloc() allocated buffer to getline, since if it were to resize it would use raw ::realloc() internally and crash the same way). But as I wrote in my first mail to the original thread, I would not use c-heap memory at all, since this function is used during crash reporting in the signal handler and the c-heap may be corrupted. It the max line length of /proc/cpu can be reliably predicted (so that getline wont realloc()) I would pass a stack allocated buffer into getline. If not, I would not use getline() at all but rewrite this, probably using fgets(). Cheers, Thomas On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A > wrote: Thanks, I expected the C's functions here. Let's wait a little bit for Runtime team and update work with buffer. Thanks, Vladimir -----Original Message----- From: Vladimir Kozlov > Sent: Friday, July 17, 2020 4:17 PM To: Thomas St?fe >; Ivanov, Vladimir A > Cc: Hotspot dev runtime >; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. Someone from Runtime may suggest what is the best for this case. Thanks, Vladimir K [1] http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 On 7/17/20 4:03 PM, Vladimir Kozlov wrote: > I updated subject to our formal review request format (JDK version, RFE's id and subject). > > I moved RFE to runtime group as Thomas said: > > https://bugs.openjdk.java.net/browse/JDK-8249672 > > Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: > > # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V > [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > const+0xeb > > V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) > const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V > [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] > os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] > VM_Version::get_processor_features()+0x76c > V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V > [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] > init_globals()+0x55 V [libjvm.so+0x16dde63] > Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 > > > Regards, > Vladimir K > > On 7/17/20 3:02 PM, Thomas St?fe wrote: >> Hi Vladimir, >> >> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >> vladimir.a.ivanov at intel.com> wrote: >> >>>> +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>> >>> This patter was copied from the method ?print_model_name_and_flags? >>> (file os/linux/os_linux.cpp). >>> >>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>> ?template? for the new method. >>> >>> It is better to use one pattern to work with exactly same file but >>> in general you are right. >>> >>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>> >>> #if defined(IA32) || defined(AMD64) >>> >>> #define X86 >>> >>> #define X86_ONLY(code) code >>> >>> #define NOT_X86(code) >>> >>> >>> >>> The question here: could I delete this ?ifdefs? while this method >>> should work on x86 only? >>> >>> >>> >> >> os_linux_x86.cpp is compiled for x86 platforms only, whereas >> os_linux.cpp is shared among all architectures. >> >> So, in the former you do not need to exclude non-x86 architectures. >> >> Cheers, Thomas >> >> >>> Thanks, Vladimir >>> >>> >>> >>> *From:* Thomas St?fe > >>> *Sent:* Friday, July 17, 2020 2:26 PM >>> *To:* Ivanov, Vladimir A >; Hotspot dev >>> runtime > >>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>> *Subject:* Re: add microcode version to the hs_err files >>> >>> >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>> > >>> wrote: >>> >>> Hi Vladimir, >>> >>> >>> >>> I think this would be more suited to hotspot-runtime. >>> >>> >>> >>> >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>> >>> >>> >>> +#if defined(IA32) || defined(AMD64) >>> >>> Is that not synonymous with x86? >>> >>> >>> >>> + while ((read = getline(&line, &len, fp)) != -1) { >>> + if (len > 10 && strstr(line, "microcode") != NULL) { >>> + char* rev = strchr(line, ':'); >>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); >>> + break; >>> + } >>> + } >>> + free(line); >>> >>> >>> >>> Not sure this works as intended. At the first call to getline() it >>> will allocate a line buffer for you and return it. That buffer will >>> be as large as the first line you happen to read. You then pass that >>> same buffer into getline to fetch the next lines, but what if those >>> are longer than the first? >>> >>> >>> >>> >>> >>> Forget that point, getline calls realloc() on the line buffer to >>> resize it, so this should be okay. >>> >>> >>> >>> Thanks, Thomas >>> >>> >>> >>> But anyway it would be better to pass a simple caller provided >>> buffer in - stack allocated. Since this function is called at crash >>> time and the C heap could be corrupted. >>> >>> >>> >>> Cheers, Thomas >>> >>> >>> >>> >>> >>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>> Hello, >>> >>> could you please review the patch >>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>> >>> This patch add the microcode version for different OSes that may be >>> useful in the issue resolution process. >>> >>> >>> >>> The reported microcode version for different OSes loos as: >>> >>> >>> >>> Linux (RHEL7.7): >>> >>> # cat hs_err_pid251046.log |grep microc >>> >>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per >>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, >>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb >>> >>> >>> >>> Windows (Win10, v1809): >>> >>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per >>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, >>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt >>> >>> >>> >>> MacOS (Darwin): >>> >>> $ cat hs_err_pid95187.log |grep microc >>> >>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per >>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, >>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, >>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt >>> >>> >>> >>> Thanks, Vladimir >>> >>> >>> Thanks, Vladimir >>> >>> From igor.ignatyev at oracle.com Mon Jul 20 17:13:34 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 20 Jul 2020 10:13:34 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> Message-ID: <956AE2D8-1D95-4357-9DBD-9A9D5ABF9CD1@oracle.com> Hi Mandy, that's actually the opposite, the 2nd subtest is run only in modes other than Xcomp, as w/ Xcomp the test creates lots of adapters and used to lead to JVM failure as described in 7049122. I tried to reproduce this failure, but in vain,.. after a bit more historical digging, I realized that the underlying problem was 7009641, which has been fixed in hs25/jdk8. so I've changed the fix for 8249697 to simply return run w/ '-DRicochetTest.MAX_ARITY=255': http://cr.openjdk.java.net/~iignatyev//8249697/webrev.02 I've verified that the test passes w/ Xcomp and - -XX:+TieredCompilation (c1 + c2); - -XX:-TieredCompilation (c2-only); - -XX:+NeverActAsServerClassMachine (emulated-client, c1-only) the test was run 100 times on {linux,windows,macos}-x64 w/ 0 failures. Thanks, -- Igor > On Jul 18, 2020, at 9:32 PM, Mandy Chung wrote: > > > > On 7/17/20 8:54 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >> > > I suggest to change this: > 32 * @comment The following test creates an unreasonable number of adapters in -Xcomp mode (7049122) > > To: > > @bug 8249697 > @summary verify very high number of adapters in -Xcomp mode > > Otherwise, looks fine. > > Mandy >> Hi all, >> >> could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? >> from JBS: >>> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. >> the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8249697 >> webrev: http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >> testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run >> >> Thanks, >> -- Igor >> >> JDK-7049122 : https://bugs.openjdk.java.net/browse/JDK-7049122 From vladimir.kozlov at oracle.com Mon Jul 20 18:01:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 20 Jul 2020 11:01:31 -0700 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> Message-ID: I asked to have 2 different test methods to reproduce 2 cases separately. You can't mix them. Regards, Vladimir On 7/20/20 6:48 AM, Jamsheed C M wrote: > Hi Tobias, > On 20/07/2020 19:05, Tobias Hartmann wrote: >> Hi Jamsheed, >> >> On 20.07.20 15:30, Jamsheed C M wrote: >>> Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ >> You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev required. > > Missed removing it. Thank you for the review. > > Best regards, > > Jamsheed > >> >> Best regards, >> Tobias From mandy.chung at oracle.com Mon Jul 20 18:57:09 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 20 Jul 2020 11:57:09 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <956AE2D8-1D95-4357-9DBD-9A9D5ABF9CD1@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> <956AE2D8-1D95-4357-9DBD-9A9D5ABF9CD1@oracle.com> Message-ID: <9f5959b8-f9ff-9962-77f7-7807b247ae90@oracle.com> Hi Igor, OK.? Should this revert the change by 7049122 then? i.e. simply change -DRicochetTest.MAX_ARITY=10 to 255 Your proposed patch adds a new @run instead of modifying the existing @run command: ? * @run junit/othervm/timeout=3600 -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies -DRicochetTest.MAX_ARITY=10 test.java.lang.invoke.RicochetTest I looked at the history and this @run was modified by JDK-7197210 that adds -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies options and reduce MAX_ARITY from 50 to 10. This issue is not critical to target for 15.? It may worth considering target this test fix for 16.? Just a suggestion. Mandy On 7/20/20 10:13 AM, Igor Ignatyev wrote: > Hi Mandy, > > that's actually the opposite, the 2nd subtest is run only in modes > other than Xcomp, as w/ Xcomp the test creates lots of adapters and > used to lead to JVM failure as described in?7049122. I tried to > reproduce this failure, but in vain,.. ?after a bit more historical > digging, I realized that the underlying problem was?7009641, which has > been fixed in hs25/jdk8. so I've changed the fix for?8249697 to simply > return run w/ '-DRicochetTest.MAX_ARITY=255': > http://cr.openjdk.java.net/~iignatyev//8249697/webrev.02 > > I've verified that the test passes w/ Xcomp and > ?- -XX:+TieredCompilation (c1 + c2); > ?-?-XX:-TieredCompilation?(c2-only); > ?- -XX:+NeverActAsServerClassMachine (emulated-client, c1-only) > > the test was run 100 times on {linux,windows,macos}-x64 w/ 0 failures. > Thanks, > -- Igor > >> On Jul 18, 2020, at 9:32 PM, Mandy Chung > > wrote: >> >> >> >> On 7/17/20 8:54 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>> >> >> I suggest to change this: >> ? 32? * @comment The following test creates an unreasonable number of >> adapters in -Xcomp mode (7049122) >> >> To: >> >> ?? @bug 8249697 >> ?? @summary verify very high number of adapters in -Xcomp mode >> >> Otherwise, looks fine. >> >> Mandy >>> Hi all, >>> >>> could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? >>> from JBS: >>>> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. >>> the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. >>> >>> JBS:https://bugs.openjdk.java.net/browse/JDK-8249697 >>> webrev:http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>> testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run >>> >>> Thanks, >>> -- Igor >>> >>> JDK-7049122 :https://bugs.openjdk.java.net/browse/JDK-7049122 >> > From jamsheed.c.m at oracle.com Mon Jul 20 19:22:11 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Tue, 21 Jul 2020 00:52:11 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> Message-ID: Hi Vladimir, Added both the tests http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.04/ Best Regards, Jamsheed On 20/07/2020 23:31, Vladimir Kozlov wrote: > I asked to have 2 different test methods to reproduce 2 cases separately. > You can't mix them. > > Regards, > Vladimir > > > On 7/20/20 6:48 AM, Jamsheed C M wrote: >> Hi Tobias, >> On 20/07/2020 19:05, Tobias Hartmann wrote: >>> Hi Jamsheed, >>> >>> On 20.07.20 15:30, Jamsheed C M wrote: >>>> Revised webrev: >>>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ >>> You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks >>> good to me! No new webrev required. >> >> Missed removing it. Thank you for the review. >> >> Best regards, >> >> Jamsheed >> >>> >>> Best regards, >>> Tobias From igor.ignatyev at oracle.com Mon Jul 20 19:22:05 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 20 Jul 2020 12:22:05 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <9f5959b8-f9ff-9962-77f7-7807b247ae90@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> <956AE2D8-1D95-4357-9DBD-9A9D5ABF9CD1@oracle.com> <9f5959b8-f9ff-9962-77f7-7807b247ae90@oracle.com> Message-ID: <3BD624C3-D3B7-4306-959A-1062CA34DF64@oracle.com> Hi Mandy, you are right, it's better to have just one @run, and as I don't think that 7197210 changes '-XX:-VerifyDependencies' nor '/timeout=3600' are needed anymore, I suggest to restore the test to its original version w/ `@run junit/othervm -DRicochetTest.MAX_ARITY=255 test.java.lang.invoke.RicochetTest`, so the patch (http://cr.openjdk.java.net/~iignatyev//8249697/webrev.03) would be just: > -/* @test > +/* > + * @test > * @summary unit tests for recursive method handles > - * @run junit/othervm/timeout=3600 -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies -DRicochetTest.MAX_ARITY=10 test.java.lang.invoke.RicochetTest > - */ > -/* > - * @ignore The following test creates an unreasonable number of adapters in -Xcomp mode (7049122) > * @run junit/othervm -DRicochetTest.MAX_ARITY=255 test.java.lang.invoke.RicochetTest > */ and then the bug's summary would be smth like 'remove temporary fixes from java/lang/invoke/RicochetTest.java' . sure there is no reason for it to be pushed into 15, I've retargeted to 16. -- Igor > On Jul 20, 2020, at 11:57 AM, Mandy Chung wrote: > > Hi Igor, > > OK. Should this revert the change by 7049122 then? i.e. simply change -DRicochetTest.MAX_ARITY=10 to 255 > > Your proposed patch adds a new @run instead of modifying the existing @run command: > > * @run junit/othervm/timeout=3600 -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies -DRicochetTest.MAX_ARITY=10 test.java.lang.invoke.RicochetTest > > I looked at the history and this @run was modified by JDK-7197210 that adds -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies options and reduce MAX_ARITY from 50 to 10. > > This issue is not critical to target for 15. It may worth considering target this test fix for 16. Just a suggestion. > > Mandy > > On 7/20/20 10:13 AM, Igor Ignatyev wrote: >> Hi Mandy, >> >> that's actually the opposite, the 2nd subtest is run only in modes other than Xcomp, as w/ Xcomp the test creates lots of adapters and used to lead to JVM failure as described in 7049122. I tried to reproduce this failure, but in vain,.. after a bit more historical digging, I realized that the underlying problem was 7009641, which has been fixed in hs25/jdk8. so I've changed the fix for 8249697 to simply return run w/ '-DRicochetTest.MAX_ARITY=255': http://cr.openjdk.java.net/~iignatyev//8249697/webrev.02 >> >> I've verified that the test passes w/ Xcomp and >> - -XX:+TieredCompilation (c1 + c2); >> - -XX:-TieredCompilation (c2-only); >> - -XX:+NeverActAsServerClassMachine (emulated-client, c1-only) >> >> the test was run 100 times on {linux,windows,macos}-x64 w/ 0 failures. >> >> Thanks, >> -- Igor >> >>> On Jul 18, 2020, at 9:32 PM, Mandy Chung > wrote: >>> >>> >>> >>> On 7/17/20 8:54 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>>> >>> >>> I suggest to change this: >>> 32 * @comment The following test creates an unreasonable number of adapters in -Xcomp mode (7049122) >>> >>> To: >>> >>> @bug 8249697 >>> @summary verify very high number of adapters in -Xcomp mode >>> >>> Otherwise, looks fine. >>> >>> Mandy >>>> Hi all, >>>> >>>> could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? >>>> from JBS: >>>>> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. >>>> the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. >>>> >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8249697 >>>> webrev: http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>>> testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run >>>> >>>> Thanks, >>>> -- Igor >>>> >>>> JDK-7049122 : https://bugs.openjdk.java.net/browse/JDK-7049122 >> > From mandy.chung at oracle.com Mon Jul 20 19:44:02 2020 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 20 Jul 2020 12:44:02 -0700 Subject: [15] RFR(T) : 8249697 : java/lang/invoke/RicochetTest.java should use @requires instead of @ignore In-Reply-To: <3BD624C3-D3B7-4306-959A-1062CA34DF64@oracle.com> References: <054E0326-B61C-40FA-A8E3-89C433A49EE3@oracle.com> <60806519-8e57-d126-8a2e-800053b4ee9a@oracle.com> <956AE2D8-1D95-4357-9DBD-9A9D5ABF9CD1@oracle.com> <9f5959b8-f9ff-9962-77f7-7807b247ae90@oracle.com> <3BD624C3-D3B7-4306-959A-1062CA34DF64@oracle.com> Message-ID: <1ceb03d9-5f15-22ed-286e-881dfa751c06@oracle.com> webrev.03 looks good. Mandy On 7/20/20 12:22 PM, Igor Ignatyev wrote: > Hi Mandy, > > you are right, it's better to have just one @run, and as I don't think > that 7197210 changes '-XX:-VerifyDependencies' nor '/timeout=3600' are > needed anymore, I suggest to restore the test to its original version > w/ ?`@run junit/othervm -DRicochetTest.MAX_ARITY=255 > test.java.lang.invoke.RicochetTest`, so the patch > (http://cr.openjdk.java.net/~iignatyev//8249697/webrev.03) would be just: > >> -/* @test >> +/* >> + * @test >> ??* @summary unit tests for recursive method handles >> - * @run junit/othervm/timeout=3600 -XX:+IgnoreUnrecognizedVMOptions >> -XX:-VerifyDependencies -DRicochetTest.MAX_ARITY=10 >> test.java.lang.invoke.RicochetTest >> - */ >> -/* >> - * @ignore The following test creates an unreasonable number of >> adapters in -Xcomp mode (7049122) >> ??* @run junit/othervm -DRicochetTest.MAX_ARITY=255 >> test.java.lang.invoke.RicochetTest >> ??*/ > > and then the bug's summary would be smth like 'remove temporary fixes > from?java/lang/invoke/RicochetTest.java' . > > sure there is no reason for it to be pushed into 15, I've retargeted > to 16. > > -- Igor > >> On Jul 20, 2020, at 11:57 AM, Mandy Chung > > wrote: >> >> Hi Igor, >> >> OK.? Should this revert the change by 7049122 then? i.e. simply >> change -DRicochetTest.MAX_ARITY=10 to 255 >> >> Your proposed patch adds a new @run instead of modifying the existing >> @run command: >> >> ? * @run junit/othervm/timeout=3600 -XX:+IgnoreUnrecognizedVMOptions >> -XX:-VerifyDependencies -DRicochetTest.MAX_ARITY=10 >> test.java.lang.invoke.RicochetTest >> >> I looked at the history and this @run was modified by JDK-7197210 >> that adds -XX:+IgnoreUnrecognizedVMOptions -XX:-VerifyDependencies >> options and reduce MAX_ARITY from 50 to 10. >> >> This issue is not critical to target for 15.? It may worth >> considering target this test fix for 16.? Just a suggestion. >> >> Mandy >> >> On 7/20/20 10:13 AM, Igor Ignatyev wrote: >>> Hi Mandy, >>> >>> that's actually the opposite, the 2nd subtest is run only in modes >>> other than Xcomp, as w/ Xcomp the test creates lots of adapters and >>> used to lead to JVM failure as described in?7049122. I tried to >>> reproduce this failure, but in vain,.. ?after a bit more historical >>> digging, I realized that the underlying problem was?7009641, which >>> has been fixed in hs25/jdk8. so I've changed the fix for?8249697 to >>> simply return run w/ '-DRicochetTest.MAX_ARITY=255': >>> http://cr.openjdk.java.net/~iignatyev//8249697/webrev.02 >>> >>> I've verified that the test passes w/ Xcomp and >>> ?- -XX:+TieredCompilation (c1 + c2); >>> ?-?-XX:-TieredCompilation?(c2-only); >>> ?- -XX:+NeverActAsServerClassMachine (emulated-client, c1-only) >>> >>> the test was run 100 times on {linux,windows,macos}-x64 w/ 0 failures. >>> Thanks, >>> -- Igor >>> >>>> On Jul 18, 2020, at 9:32 PM, Mandy Chung >>> > wrote: >>>> >>>> >>>> >>>> On 7/17/20 8:54 PM, Igor Ignatyev wrote: >>>>> http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>>>> >>>> >>>> I suggest to change this: >>>> ? 32? * @comment The following test creates an unreasonable number >>>> of adapters in -Xcomp mode (7049122) >>>> >>>> To: >>>> >>>> ?? @bug 8249697 >>>> ?? @summary verify very high number of adapters in -Xcomp mode >>>> >>>> Otherwise, looks fine. >>>> >>>> Mandy >>>>> Hi all, >>>>> >>>>> could you please review this small and trivial patch for java/lang/invoke/RicochetTest.java test? >>>>> from JBS: >>>>>> a run of java/lang/invoke/RicochetTest.java w/ MAX_ARITY=255 was removed from all configurations by JDK-7049122, yet the problem manifests itself only w/ Xcomp. as now we have @requires to filter out tests from certain configurations, the test can be updated to run MAX_ARITY=255 in all configs but Xcomp. >>>>> the patch splits the test into two subtests, each one w/ one @run, and use @requires to exclude one w/ MAX_ARITY=255 from execution if Xcomp flag is used. >>>>> >>>>> JBS:https://bugs.openjdk.java.net/browse/JDK-8249697 >>>>> webrev:http://cr.openjdk.java.net/~iignatyev/8249697/webrev.00/ >>>>> testing: java/lang/invoke/RicochetTest.java on {linux,windows,macos}-x64 w/ and w/o -Xcomp; Xcomp runs, as expected, had only 1 test run >>>>> >>>>> Thanks, >>>>> -- Igor >>>>> >>>>> JDK-7049122 :https://bugs.openjdk.java.net/browse/JDK-7049122 >>>> >>> >> > From vladimir.kozlov at oracle.com Mon Jul 20 20:05:22 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 20 Jul 2020 13:05:22 -0700 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8325fac5-6258-9b88-6507-5dcb0597cc17@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> Message-ID: <104285e4-811a-5314-54de-d6461320a76c@oracle.com> Good. Thanks, Vladimir On 7/20/20 12:22 PM, Jamsheed C M wrote: > Hi Vladimir, > > Added both the tests > > http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.04/ > > Best Regards, > > Jamsheed > > On 20/07/2020 23:31, Vladimir Kozlov wrote: >> I asked to have 2 different test methods to reproduce 2 cases separately. >> You can't mix them. >> >> Regards, >> Vladimir >> >> >> On 7/20/20 6:48 AM, Jamsheed C M wrote: >>> Hi Tobias, >>> On 20/07/2020 19:05, Tobias Hartmann wrote: >>>> Hi Jamsheed, >>>> >>>> On 20.07.20 15:30, Jamsheed C M wrote: >>>>> Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ >>>> You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev required. >>> >>> Missed removing it. Thank you for the review. >>> >>> Best regards, >>> >>> Jamsheed >>> >>>> >>>> Best regards, >>>> Tobias From vladimir.kozlov at oracle.com Mon Jul 20 20:20:26 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 20 Jul 2020 13:20:26 -0700 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <5e298ff3-6dc1-c4fa-4545-1fc26d7379b5@oracle.com> Hi David, Changes look good. On 7/20/20 10:07 AM, Daniel D. Daugherty wrote: > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp > L1021: HandleMark hm; > Can this be 'hm(THREAD)'? (Not your problem, but while you're in that file?) There are several cases like this in jvmciCompilerToVM.cpp and may be in other places. I think it should be done as separate clean up. Thanks, Vladimir On 7/19/20 10:53 PM, David Holmes wrote: > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes wrote: >>> >>> Subject line got truncated by accident ... >>> >>> On 20/07/2020 11:06 am, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>> This is a simple cleanup that touches files across a number of VM areas - hence the cross-post. >>>> Whilst working on a different JNI fix I noticed that in most cases in jni.cpp we were using the following form of >>>> make_local: >>>> JNIHandles::make_local(env, obj); >>>> and what that form does is first extract the thread from the JNIEnv: >>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>> return thread->active_handles()->allocate_handle(obj); >>>> but there is also another, faster, variant for when you already have the "thread": >>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>> ?? return thread->active_handles()->allocate_handle(obj); >>>> } >>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted >>>> the thread from the JNIEnv: >>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>> and further defined: >>>> ???? Thread* THREAD = thread; >>>> so we always already have direct access to the "thread" available (or indirect via TRAPS), and in fact we can end up >>>> removing the make_local(JNIEnv* env, oop obj) variant altogether. >>>> Along the way I spotted some related issues with unnecessary use of Thread::current() when it is already available >>>> from TRAPS, and some other cases where we extracted the JNIEnv from a thread only to later extract the thread from >>>> the JNIEnv. >>>> Testing: tiers 1 - 3 >>>> Thanks, >>>> David >>>> ----- >> >> ------------------------------------------------------------------------------ >> src/hotspot/share/classfile/javaClasses.cpp >> ? 439???? JNIEnv *env = thread->jni_environment(); >> >> Since env is no longer used on the next line, move this down to where >> it is used, at line 444. > > Fixed. > >> ------------------------------------------------------------------------------ >> src/hotspot/share/classfile/verifier.cpp >> ? 299?? JNIEnv *env = thread->jni_environment(); >> >> env now seems to only be used at line 320.? Move this closer. > > Fixed. > >> ------------------------------------------------------------------------------ >> src/hotspot/share/prims/jni.cpp >> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >> >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a "Thread*" (and somewhat surprisingly a number of > API's that only work for JavaThreads actually take a Thread*. :( ). I had choice between trying to be file-wide > consistent with the make_local calls, versus local-code consistent, and used THREAD as it is available in both JNI_ENTRY > and via TRAPS. But I can certainly make a local change to "thread" for local consistency. > >> ------------------------------------------------------------------------------ >> src/hotspot/share/prims/jvm.cpp >> >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to use "thread" instead. But I'm not sure it's a > consistency worth pursuing at least as part of these changes (there are likely similar issues with most of the touched > files). > > Thanks, > David > >> ------------------------------------------------------------------------------ >> From vladimir.kozlov at oracle.com Mon Jul 20 22:37:11 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 20 Jul 2020 15:37:11 -0700 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Looks good. Passed my tier1 testing. Thanks, Vladimir On 7/20/20 10:12 AM, Ivanov, Vladimir A wrote: > HI, > The updated patch available as http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.03/ > It use the ?fgets? instead of ?getline? to use local memory. > The tier1 tests passed on the release and fastdebug builds on Linux and fastdebug builds on MacOS systems. > Testing results same for patched and non-patched builds. > > Thanks, Vladmir > > From: Thomas St?fe > Sent: Friday, July 17, 2020 10:25 PM > To: Ivanov, Vladimir A > Cc: Vladimir Kozlov ; Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 > > Oh, sorry, you are right :( > > I was under the assumption you wanted to call os::cpu_microcode_revision() directly from within VMError::report(). During initialization using c-heap like this should not be a problem and you can forget about 9/10ths of what I wrote, sorry. > > In that case your original variant is fine, my only suggestion would be to clearly mark the free as ::free() with a comment to prevent someone from correcting it to os::free. > > Thank you, > > Thomas > > > > On Sat, Jul 18, 2020 at 7:08 AM Ivanov, Vladimir A > wrote: > Hi, > seems, this info created during initialization phase. Is it correct? Collect or parse common info at the crash point usually not a good idea. During initialization usage of the c-heap not a problem. > The ?::free? work OK here. At least tier1 test produce same results for patched and non-patched builds. But these tests not generates real case for hs_err files. > It looks like 2k byte array enough for the one record for CPU from cpuinfo file. Will update code to use local buffer. > > Thanks, Vladimir > > From: Thomas St?fe > > Sent: Friday, July 17, 2020 9:42 PM > To: Ivanov, Vladimir A > > Cc: Vladimir Kozlov >; Hotspot dev runtime >; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 > > Hi, > > yes, you must use the raw free here (for the same reason we cannot pass in an os::malloc() allocated buffer to getline, since if it were to resize it would use raw ::realloc() internally and crash the same way). > > But as I wrote in my first mail to the original thread, I would not use c-heap memory at all, since this function is used during crash reporting in the signal handler and the c-heap may be corrupted. > > It the max line length of /proc/cpu can be reliably predicted (so that getline wont realloc()) I would pass a stack allocated buffer into getline. If not, I would not use getline() at all but rewrite this, probably using fgets(). > > Cheers, Thomas > > > > > On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A > wrote: > Thanks, I expected the C's functions here. Let's wait a little bit for Runtime team and update work with buffer. > > Thanks, Vladimir > > -----Original Message----- > From: Vladimir Kozlov > > Sent: Friday, July 17, 2020 4:17 PM > To: Thomas St?fe >; Ivanov, Vladimir A > > Cc: Hotspot dev runtime >; hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 > > I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. > > Someone from Runtime may suggest what is the best for this case. > > Thanks, > Vladimir K > > [1] http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share/runtime/os.cpp#l792 > > On 7/17/20 4:03 PM, Vladimir Kozlov wrote: >> I updated subject to our formal review request format (JDK version, RFE's id and subject). >> >> I moved RFE to runtime group as Thomas said: >> >> https://bugs.openjdk.java.net/browse/JDK-8249672 >> >> Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: >> >> # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V >> [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) >> const+0xeb >> >> V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) >> const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V >> [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] >> os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] >> VM_Version::get_processor_features()+0x76c >> V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V >> [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] >> init_globals()+0x55 V [libjvm.so+0x16dde63] >> Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 >> >> >> Regards, >> Vladimir K >> >> On 7/17/20 3:02 PM, Thomas St?fe wrote: >>> Hi Vladimir, >>> >>> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>>>> +#if defined(IA32) || defined(AMD64) >>>>> >>>>> Is that not synonymous with x86? >>>> >>>> This patter was copied from the method ?print_model_name_and_flags? >>>> (file os/linux/os_linux.cpp). >>>> >>>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>>> ?template? for the new method. >>>> >>>> It is better to use one pattern to work with exactly same file but >>>> in general you are right. >>>> >>>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>>> >>>> #if defined(IA32) || defined(AMD64) >>>> >>>> #define X86 >>>> >>>> #define X86_ONLY(code) code >>>> >>>> #define NOT_X86(code) >>>> >>>> >>>> >>>> The question here: could I delete this ?ifdefs? while this method >>>> should work on x86 only? >>>> >>>> >>>> >>> >>> os_linux_x86.cpp is compiled for x86 platforms only, whereas >>> os_linux.cpp is shared among all architectures. >>> >>> So, in the former you do not need to exclude non-x86 architectures. >>> >>> Cheers, Thomas >>> >>> >>>> Thanks, Vladimir >>>> >>>> >>>> >>>> *From:* Thomas St?fe > >>>> *Sent:* Friday, July 17, 2020 2:26 PM >>>> *To:* Ivanov, Vladimir A >; Hotspot dev >>>> runtime > >>>> *Cc:* hotspot-compiler-dev at openjdk.java.net >>>> *Subject:* Re: add microcode version to the hs_err files >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>>> > >>>> wrote: >>>> >>>> Hi Vladimir, >>>> >>>> >>>> >>>> I think this would be more suited to hotspot-runtime. >>>> >>>> >>>> >>>> >>>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>>> src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>>> >>>> >>>> >>>> +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>>> >>>> >>>> >>>> + while ((read = getline(&line, &len, fp)) != -1) { >>>> + if (len > 10 && strstr(line, "microcode") != NULL) { >>>> + char* rev = strchr(line, ':'); >>>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); >>>> + break; >>>> + } >>>> + } >>>> + free(line); >>>> >>>> >>>> >>>> Not sure this works as intended. At the first call to getline() it >>>> will allocate a line buffer for you and return it. That buffer will >>>> be as large as the first line you happen to read. You then pass that >>>> same buffer into getline to fetch the next lines, but what if those >>>> are longer than the first? >>>> >>>> >>>> >>>> >>>> >>>> Forget that point, getline calls realloc() on the line buffer to >>>> resize it, so this should be okay. >>>> >>>> >>>> >>>> Thanks, Thomas >>>> >>>> >>>> >>>> But anyway it would be better to pass a simple caller provided >>>> buffer in - stack allocated. Since this function is called at crash >>>> time and the C heap could be corrupted. >>>> >>>> >>>> >>>> Cheers, Thomas >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>>> vladimir.a.ivanov at intel.com> wrote: >>>> >>>> Hello, >>>> >>>> could you please review the patch >>>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00/ >>>> >>>> This patch add the microcode version for different OSes that may be >>>> useful in the issue resolution process. >>>> >>>> >>>> >>>> The reported microcode version for different OSes loos as: >>>> >>>> >>>> >>>> Linux (RHEL7.7): >>>> >>>> # cat hs_err_pid251046.log |grep microc >>>> >>>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads per >>>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, >>>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb >>>> >>>> >>>> >>>> Windows (Win10, v1809): >>>> >>>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per >>>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, fxsr, >>>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>>> avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, tsc, >>>> tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt >>>> >>>> >>>> >>>> MacOS (Darwin): >>>> >>>> $ cat hs_err_pid95187.log |grep microc >>>> >>>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per >>>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, fxsr, >>>> mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, vzeroupper, >>>> avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, tscinvbit, >>>> bmi1, bmi2, adx, sha, fma, clflush, clflushopt >>>> >>>> >>>> >>>> Thanks, Vladimir >>>> >>>> >>>> Thanks, Vladimir >>>> >>>> From ningsheng.jian at arm.com Tue Jul 21 06:05:48 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Tue, 21 Jul 2020 14:05:48 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> Message-ID: <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> [Ping] Could anyone please help to review this patch, especially for the c2 register allocation part? JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 The latest webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.02 In the latest webrev, we block one predicate register (p7) with all elements preset to TRUE, so that c2 compiled code can use it freely to generate instructions for unpredicated operations. And the split parts: 1) SVE feature detection: http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature 2) c2 register allocation: http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra 3) SVE c2 backend: http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 The initial RFR which has some descriptions of the patch: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html The description can also be found at: http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt Notes to verify the patch on QEMU user emulation, with an example of compiled code: http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt Thanks, Ningsheng On 5/27/20 3:23 PM, Ningsheng Jian wrote: > Hi, > > I have rebased this patch with some more comments added. And also > relaxed the instruction matching conditions for 128-bit vector. > > I would appreciate if someone could help to review this. > > Whole patch: > http://cr.openjdk.java.net/~njian/8231441/webrev.01 > > Different parts of changes: > > 1) SVE feature detection > http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature > > 2) c2 registion allocation > http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra > > 3) SVE c2 backend > http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 > > (Or should I split this into different JBS?) > > Thanks, > Ningsheng > > On 3/25/20 2:37 PM, Ningsheng Jian wrote: >> Hi, >> >> Could you please help to review this patch adding AArch64 SVE support? >> It also touches c2 compiler shared code. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 >> >> Arm has released new vector ISA extension for AArch64, SVE [1] and >> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this >> patch we have: >> >> 1) SVE feature enablement and detection >> 2) SVE vector register allocation support with initial predicate >> register definition >> 3) SVE c2 backend for current SLP based vectorizer. (We also have a POC >> patch of a new vectorizer using SVE predicate-driven loop control, but >> that's still under development.) >> >> SVE register definition >> ======================= >> Unlike other SIMD architectures, SVE allows hardware implementations to >> choose a vector register length from 128 and 2048 bits, multiple of 128 >> bits. So we introduce a new vector type VectorA, i.e. length agnostic >> (scalable) vector type, and Op_VecA for machine vectora register. In the >> meantime, to minimize register allocation code changes, we also take >> advantage of one JIT compiler aspect, that is during the compile time we >> actually know the real hardware SVE vector register size of current >> running machine. So, the register allocator actually knows how many >> register slots an Op_VecA ideal reg requires, and could work fine >> without much modification. >> >> Since the bottom 128 bits are shared with the NEON, we extend current >> register mask definition of V0-V31 registers. Currently, c2 uses one bit >> mask for a 32-bit register slot, so to define at most 2048 bits we will >> need to add 64 slots in AD file. That's a really large number, and will >> also break current regmask assumption. Considering the SVE vector >> register is architecturally scalable for different sizes, we just define >> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H, >> Vx_J ... Vx_O. After adlc, the generated register masks now looks like: >> >> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, >> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... >> >> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, >> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... >> >> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, >> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... >> >> And we use SlotsPerVecA to indicate regmask bit size for a VecA register. >> >> Although for physical register allocation, register allocator does not >> need to know the real VecA register size, while doing spill/unspill, >> current register allocation needs to know actual stack slot size to >> store/load VecA registers. SVE is able to do vector size agnostic >> spilling, but to minimize the code changes, as I mentioned before, we >> just let RA know the actual vector register size in current running >> machine, by calling scalable_vector_reg_size(). >> >> In the meantime, since some vector operations do not have unpredicated >> SVE1 instructions, but only predicate version, e.g. vector multiply, >> vector load/store. We have also defined predicate registers in this >> patch, and c2 register allocator will allocate a temp predicate register >> to fulfill the expecting unpredicated operations. And this can also be >> used for future predicate-driven vectorizer. This is not efficient for >> now, as we can see many ptrue instructions in the generated code. One >> possible solution I can see, is to block one predicate register, and >> preset it to all true. But to preserve/reinitialize a caller save >> register value cross calls seems risky to work in this patch. I decide >> to defer it to further optimization work. If anyone has any suggestions >> on this, I would appreciate. >> >> SVE feature detection >> ===================== >> Since we may have some compiled code based on the initial detected SVE >> vector register length and the compiled code is compiled only for that >> vector register length, we assume that the SVE vector register length >> will not be changed during the JVM lifetime. However, SVE vector length >> is per-thread and can be changed by system call [3], so we need to make >> sure that each jni call will not change the sve vector length. >> >> Currently, we verify the SVE vector register length on each JNI return, >> and if an SVE vector length change is detected, jvm simply reports error >> and stops running. The VM running vector length can also be set by >> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is >> specified not the same as system default sve vector length (in >> /proc/sys/abi/sve_default_vector_length), JVM will set current process >> sve vector length to the specified vector length. >> >> Compiled code >> ============= >> We have added all current c2 backend codegen on par with NEON, but only >> for vector length larger than 128-bit. >> >> On a 1024 bit SVE environment, for the following simple loop with int >> array element type: >> >> ??? for (int i = 0; i < LENGTH; i++) { >> ????? c[i] = a[i] + b[i]; >> ??? } >> >> c2 generated loop: >> >> ??? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 >> ??? 0x0000ffff811c0824:?? add???? x13, x18, x11 >> ??? 0x0000ffff811c0828:?? add???? x14, x1, x11 >> ??? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 >> ??? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 >> ??? 0x0000ffff811c0834:?? add???? x11, x0, x11 >> ??? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 >> ??? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized >> ??? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] >> ??? 0x0000ffff811c0844:?? ptrue?? p0.s >> ??? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] >> ??? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s >> ??? 0x0000ffff811c0850:?? ptrue?? p1.s >> ??? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] >> ??? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 >> ??? 0x0000ffff811c085c:?? cmp???? w10, w12 >> ??? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 >> >> Test >> ==== >> Currently, we don't have real hardware to verify SVE features (and >> performance). But we have run jtreg tests with SVE in some emulators. On >> QEMU system emulator, which has SVE emulation support, jtreg tier1-3 >> passed with different vector sizes. We've also verified it with full >> jtreg tests without SVE on both x86 and AArch64, to make sure that >> there's no regression. >> >> The patch has also been applied to Vector API code base, and verified on >> emulator. In Vector API, there are more vector related tests and is more >> possible to generate vector instructions by intrinsification. >> >> A simple test can also run in QEMU user emulation, e.g. >> >> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD >> >> ( >> To run it in user emulation mode, we will need to bypass SVE feature >> detection code in this patch. E.g. apply: >> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch >> )l >> >> Others >> ====== >> Since this patch is a bit large, I've also split it into 3 parts, for >> easy review: >> >> 1) SVE feature detection >> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature >> >> 2) c2 registion allocation >> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra >> >> 3) SVE c2 backend >> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 >> >> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. >> >> Refs >> ==== >> [1] https://developer.arm.com/docs/ddi0584/latest >> [2] https://developer.arm.com/docs/ddi0602/latest >> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt >> >> Thanks, >> Ningsheng >> > From tobias.hartmann at oracle.com Tue Jul 21 06:17:31 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 21 Jul 2020 08:17:31 +0200 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <104285e4-811a-5314-54de-d6461320a76c@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <8522a69e-e538-2cc9-5364-887e450fc653@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> <104285e4-811a-5314-54de-d6461320a76c@oracle.com> Message-ID: <36de481a-d4b1-d9cf-3632-db1f82c5baba@oracle.com> +1 Best regards, Tobias On 20.07.20 22:05, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir > > On 7/20/20 12:22 PM, Jamsheed C M wrote: >> Hi Vladimir, >> >> Added both the tests >> >> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.04/ >> >> Best Regards, >> >> Jamsheed >> >> On 20/07/2020 23:31, Vladimir Kozlov wrote: >>> I asked to have 2 different test methods to reproduce 2 cases separately. >>> You can't mix them. >>> >>> Regards, >>> Vladimir >>> >>> >>> On 7/20/20 6:48 AM, Jamsheed C M wrote: >>>> Hi Tobias, >>>> On 20/07/2020 19:05, Tobias Hartmann wrote: >>>>> Hi Jamsheed, >>>>> >>>>> On 20.07.20 15:30, Jamsheed C M wrote: >>>>>> Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ >>>>> You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev >>>>> required. >>>> >>>> Missed removing it. Thank you for the review. >>>> >>>> Best regards, >>>> >>>> Jamsheed >>>> >>>>> >>>>> Best regards, >>>>> Tobias From jamsheed.c.m at oracle.com Tue Jul 21 06:40:32 2020 From: jamsheed.c.m at oracle.com (Jamsheed C M) Date: Tue, 21 Jul 2020 12:10:32 +0530 Subject: [15] RFR: 8242895: failed: sanity at src/hotspot/share/opto/escape.cpp:2361 In-Reply-To: <36de481a-d4b1-d9cf-3632-db1f82c5baba@oracle.com> References: <4a389db7-ebce-e2b8-4691-2ce6625e2709@oracle.com> <46144d6d-5714-05ad-a263-01507db937cc@oracle.com> <7a361c29-4771-9ed4-1542-b3f68a5726f3@oracle.com> <6dc4c99b-1d90-09f1-60d1-fb2caf981266@oracle.com> <6b4e4dda-01d4-37d0-5403-a4f5481e5bf0@oracle.com> <32d7fb64-75a5-7add-d496-df33cfaefabf@oracle.com> <4ffa8190-d57e-a9a2-e508-0d98035a34c6@oracle.com> <0fa9d47a-e568-bf22-4c49-74c926ae9f14@oracle.com> <16aead29-6788-a7e8-bf6e-ae2b56fdb9dc@oracle.com> <2805861f-4760-c768-9b1e-55cd6af1cde1@oracle.com> <3d5fc552-d3e2-494b-e921-c65967af8207@oracle.com> <104285e4-811a-5314-54de-d6461320a76c@oracle.com> <36de481a-d4b1-d9cf-3632-db1f82c5baba@oracle.com> Message-ID: <6baf448a-f44c-cd78-d7d0-121589a3c9cf@oracle.com> Thank you for the reviews. Initiated the Fix Request for JDK15, testing links are added in JBS. Best regards, Jamsheed On 21/07/2020 11:47, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 20.07.20 22:05, Vladimir Kozlov wrote: >> Good. >> >> Thanks, >> Vladimir >> >> On 7/20/20 12:22 PM, Jamsheed C M wrote: >>> Hi Vladimir, >>> >>> Added both the tests >>> >>> http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.04/ >>> >>> Best Regards, >>> >>> Jamsheed >>> >>> On 20/07/2020 23:31, Vladimir Kozlov wrote: >>>> I asked to have 2 different test methods to reproduce 2 cases separately. >>>> You can't mix them. >>>> >>>> Regards, >>>> Vladimir >>>> >>>> >>>> On 7/20/20 6:48 AM, Jamsheed C M wrote: >>>>> Hi Tobias, >>>>> On 20/07/2020 19:05, Tobias Hartmann wrote: >>>>>> Hi Jamsheed, >>>>>> >>>>>> On 20.07.20 15:30, Jamsheed C M wrote: >>>>>>> Revised webrev: http://cr.openjdk.java.net/~jcm/8242895/webrev_fix_EA.03/ >>>>>> You don't need #ifdef ASSERT in escape.cpp:2252. Otherwise looks good to me! No new webrev >>>>>> required. >>>>> Missed removing it. Thank you for the review. >>>>> >>>>> Best regards, >>>>> >>>>> Jamsheed >>>>> >>>>>> Best regards, >>>>>> Tobias From sandhya.viswanathan at intel.com Tue Jul 21 16:28:54 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Tue, 21 Jul 2020 16:28:54 +0000 Subject: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 In-Reply-To: References: <29dd9cde-48c8-915f-fa28-26312c7af17a@oracle.com> Message-ID: Hi VladimirK, Please let me know if I can push this onto jdk/jdk. Best Regards, Sandhya -----Original Message----- From: hotspot-compiler-dev On Behalf Of Vladimir Kozlov Sent: Monday, July 20, 2020 3:37 PM To: Ivanov, Vladimir A Cc: Hotspot dev runtime ; hotspot-compiler-dev at openjdk.java.net Subject: Re: [16] RFR(S) 8249672: Include microcode revision in features_string on x86 Looks good. Passed my tier1 testing. Thanks, Vladimir On 7/20/20 10:12 AM, Ivanov, Vladimir A wrote: > HI, > The updated patch available as > http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.03/ > It use the ?fgets? instead of ?getline? to use local memory. > The tier1 tests passed on the release and fastdebug builds on Linux and fastdebug builds on MacOS systems. > Testing results same for patched and non-patched builds. > > Thanks, Vladmir > > From: Thomas St?fe > Sent: Friday, July 17, 2020 10:25 PM > To: Ivanov, Vladimir A > Cc: Vladimir Kozlov ; Hotspot dev runtime > ; > hotspot-compiler-dev at openjdk.java.net > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > Oh, sorry, you are right :( > > I was under the assumption you wanted to call os::cpu_microcode_revision() directly from within VMError::report(). During initialization using c-heap like this should not be a problem and you can forget about 9/10ths of what I wrote, sorry. > > In that case your original variant is fine, my only suggestion would be to clearly mark the free as ::free() with a comment to prevent someone from correcting it to os::free. > > Thank you, > > Thomas > > > > On Sat, Jul 18, 2020 at 7:08 AM Ivanov, Vladimir A > wrote: > Hi, > seems, this info created during initialization phase. Is it correct? Collect or parse common info at the crash point usually not a good idea. During initialization usage of the c-heap not a problem. > The ?::free? work OK here. At least tier1 test produce same results for patched and non-patched builds. But these tests not generates real case for hs_err files. > It looks like 2k byte array enough for the one record for CPU from cpuinfo file. Will update code to use local buffer. > > Thanks, Vladimir > > From: Thomas St?fe > > > Sent: Friday, July 17, 2020 9:42 PM > To: Ivanov, Vladimir A > > > Cc: Vladimir Kozlov > >; > Hotspot dev runtime > dk.java.net>>; > hotspot-compiler-dev at openjdk.java.net jdk.java.net> > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > Hi, > > yes, you must use the raw free here (for the same reason we cannot pass in an os::malloc() allocated buffer to getline, since if it were to resize it would use raw ::realloc() internally and crash the same way). > > But as I wrote in my first mail to the original thread, I would not use c-heap memory at all, since this function is used during crash reporting in the signal handler and the c-heap may be corrupted. > > It the max line length of /proc/cpu can be reliably predicted (so that getline wont realloc()) I would pass a stack allocated buffer into getline. If not, I would not use getline() at all but rewrite this, probably using fgets(). > > Cheers, Thomas > > > > > On Sat, Jul 18, 2020 at 1:24 AM Ivanov, Vladimir A > wrote: > Thanks, I expected the C's functions here. Let's wait a little bit for Runtime team and update work with buffer. > > Thanks, Vladimir > > -----Original Message----- > From: Vladimir Kozlov > > > Sent: Friday, July 17, 2020 4:17 PM > To: Thomas St?fe > >; Ivanov, > Vladimir A > > > Cc: Hotspot dev runtime > dk.java.net>>; > hotspot-compiler-dev at openjdk.java.net jdk.java.net> > Subject: Re: [16] RFR(S) 8249672: Include microcode revision in > features_string on x86 > > I think the issue is 'line' buffer is allocated by libc getline() and os:free() which is HotSpot function [1] does not know about it. You need C's ::free() or use HS's os::malloc() to allocate 'line' buffer. > > Someone from Runtime may suggest what is the best for this case. > > Thanks, > Vladimir K > > [1] > http://hg.openjdk.java.net/jdk/jdk/file/14f465f62984/src/hotspot/share > /runtime/os.cpp#l792 > > On 7/17/20 4:03 PM, Vladimir Kozlov wrote: >> I updated subject to our formal review request format (JDK version, RFE's id and subject). >> >> I moved RFE to runtime group as Thomas said: >> >> https://bugs.openjdk.java.net/browse/JDK-8249672 >> >> Submitted tier1 testing to build on all our supported platforms. And debug builds on linux failed: >> >> # SIGSEGV (0xb) at pc=0x0000146fc6af4b0b, pid=9715, tid=9718 # V >> [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) >> const+0xeb >> >> V [libjvm.so+0xc12b0b] GuardedMemory::print_on(outputStream*) >> const+0xeb V [libjvm.so+0x13c898a] verify_memory(void*)+0x26a V >> [libjvm.so+0x13cd30b] os::free(void*)+0x5b V [libjvm.so+0x13e5598] >> os::cpu_microcode_revision()+0xc8 V [libjvm.so+0x17d314c] >> VM_Version::get_processor_features()+0x76c >> V [libjvm.so+0x17d6ead] VM_Version::initialize()+0x10d V >> [libjvm.so+0x17ce6c6] VM_Version_init()+0x26 V [libjvm.so+0xcb2895] >> init_globals()+0x55 V [libjvm.so+0x16dde63] >> Threads::create_vm(JavaVMInitArgs*, bool*)+0x2d3 >> >> >> Regards, >> Vladimir K >> >> On 7/17/20 3:02 PM, Thomas St?fe wrote: >>> Hi Vladimir, >>> >>> On Fri, Jul 17, 2020 at 11:57 PM Ivanov, Vladimir A < >>> vladimir.a.ivanov at intel.com> wrote: >>> >>>>> +#if defined(IA32) || defined(AMD64) >>>>> >>>>> Is that not synonymous with x86? >>>> >>>> This patter was copied from the method ?print_model_name_and_flags? >>>> (file os/linux/os_linux.cpp). >>>> >>>> This method also read the ?/proc/cpuinfo? file and I reuse it as >>>> ?template? for the new method. >>>> >>>> It is better to use one pattern to work with exactly same file but >>>> in general you are right. >>>> >>>> The X86 is defined in the file ./share/utilities/macros.hpp as: >>>> >>>> #if defined(IA32) || defined(AMD64) >>>> >>>> #define X86 >>>> >>>> #define X86_ONLY(code) code >>>> >>>> #define NOT_X86(code) >>>> >>>> >>>> >>>> The question here: could I delete this ?ifdefs? while this method >>>> should work on x86 only? >>>> >>>> >>>> >>> >>> os_linux_x86.cpp is compiled for x86 platforms only, whereas >>> os_linux.cpp is shared among all architectures. >>> >>> So, in the former you do not need to exclude non-x86 architectures. >>> >>> Cheers, Thomas >>> >>> >>>> Thanks, Vladimir >>>> >>>> >>>> >>>> *From:* Thomas St?fe >>>> > >>>> *Sent:* Friday, July 17, 2020 2:26 PM >>>> *To:* Ivanov, Vladimir A >>>> >; >>>> Hotspot dev runtime >>>> >>> enjdk.java.net>> >>>> *Cc:* >>>> hotspot-compiler-dev at openjdk.java.net>>> penjdk.java.net> >>>> *Subject:* Re: add microcode version to the hs_err files >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 17, 2020 at 11:19 PM Thomas St?fe >>>> > >>>> wrote: >>>> >>>> Hi Vladimir, >>>> >>>> >>>> >>>> I think this would be more suited to hotspot-runtime. >>>> >>>> >>>> >>>> >>>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00 >>>> / src/hotspot/os_cpu/linux_x86/os_linux_x86.cpp.udiff.html >>>> >>>> >>>> >>>> +#if defined(IA32) || defined(AMD64) >>>> >>>> Is that not synonymous with x86? >>>> >>>> >>>> >>>> + while ((read = getline(&line, &len, fp)) != -1) { >>>> + if (len > 10 && strstr(line, "microcode") != NULL) { >>>> + char* rev = strchr(line, ':'); >>>> + if (rev != NULL) sscanf(rev + 1, "%x", &result); >>>> + break; >>>> + } >>>> + } >>>> + free(line); >>>> >>>> >>>> >>>> Not sure this works as intended. At the first call to getline() it >>>> will allocate a line buffer for you and return it. That buffer will >>>> be as large as the first line you happen to read. You then pass >>>> that same buffer into getline to fetch the next lines, but what if >>>> those are longer than the first? >>>> >>>> >>>> >>>> >>>> >>>> Forget that point, getline calls realloc() on the line buffer to >>>> resize it, so this should be okay. >>>> >>>> >>>> >>>> Thanks, Thomas >>>> >>>> >>>> >>>> But anyway it would be better to pass a simple caller provided >>>> buffer in - stack allocated. Since this function is called at crash >>>> time and the C heap could be corrupted. >>>> >>>> >>>> >>>> Cheers, Thomas >>>> >>>> >>>> >>>> >>>> >>>> On Fri, Jul 17, 2020 at 10:22 PM Ivanov, Vladimir A < >>>> vladimir.a.ivanov at intel.com> wrote: >>>> >>>> Hello, >>>> >>>> could you please review the patch >>>> http://cr.openjdk.java.net/~sviswanathan/Vladimir/8249672/webrev.00 >>>> / >>>> >>>> This patch add the microcode version for different OSes that may be >>>> useful in the issue resolution process. >>>> >>>> >>>> >>>> The reported microcode version for different OSes loos as: >>>> >>>> >>>> >>>> Linux (RHEL7.7): >>>> >>>> # cat hs_err_pid251046.log |grep microc >>>> >>>> CPU: total 112 (initial active 112) (28 cores per cpu, 2 threads >>>> per >>>> core) family 6 model 85 stepping 4 microcode 0x200005e, cmov, cx8, >>>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt, clwb >>>> >>>> >>>> >>>> Windows (Win10, v1809): >>>> >>>> CPU: total 4 (initial active 4) (2 cores per cpu, 2 threads per >>>> core) family 6 model 142 stepping 9 microcode 0xb4, cmov, cx8, >>>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>>> vzeroupper, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, ht, >>>> tsc, tscinvbit, bmi1, bmi2, adx, fma, clflush, clflushopt >>>> >>>> >>>> >>>> MacOS (Darwin): >>>> >>>> $ cat hs_err_pid95187.log |grep microc >>>> >>>> CPU: total 8 (initial active 8) (4 cores per cpu, 2 threads per >>>> core) family 6 model 126 stepping 5 microcode 0x78, cmov, cx8, >>>> fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse4.2, popcnt, >>>> vzeroupper, avx, avx2, aes, clmul, erms, 3dnowpref, lzcnt, ht, tsc, >>>> tscinvbit, bmi1, bmi2, adx, sha, fma, clflush, clflushopt >>>> >>>> >>>> >>>> Thanks, Vladimir >>>> >>>> >>>> Thanks, Vladimir >>>> >>>> From coleen.phillimore at oracle.com Tue Jul 21 17:57:36 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 21 Jul 2020 13:57:36 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <2b52127c-8637-ed24-2a63-0b1372d4bff0@oracle.com> One note below: On 7/20/20 1:53 AM, David Holmes wrote: > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes >>> wrote: >>> >>> Subject line got truncated by accident ... >>> >>> On 20/07/2020 11:06 am, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>> This is a simple cleanup that touches files across a number of VM >>>> areas - hence the cross-post. >>>> Whilst working on a different JNI fix I noticed that in most cases >>>> in jni.cpp we were using the following form of make_local: >>>> JNIHandles::make_local(env, obj); >>>> and what that form does is first extract the thread from the JNIEnv: >>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>> return thread->active_handles()->allocate_handle(obj); >>>> but there is also another, faster, variant for when you already >>>> have the "thread": >>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>> ?? return thread->active_handles()->allocate_handle(obj); >>>> } >>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>> from the JNIEnv: >>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>> and further defined: >>>> ???? Thread* THREAD = thread; >>>> so we always already have direct access to the "thread" available >>>> (or indirect via TRAPS), and in fact we can end up removing the >>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>> Along the way I spotted some related issues with unnecessary use of >>>> Thread::current() when it is already available from TRAPS, and some >>>> other cases where we extracted the JNIEnv from a thread only to >>>> later extract the thread from the JNIEnv. >>>> Testing: tiers 1 - 3 >>>> Thanks, >>>> David >>>> ----- >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/javaClasses.cpp >> ? 439???? JNIEnv *env = thread->jni_environment(); >> >> Since env is no longer used on the next line, move this down to where >> it is used, at line 444. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ? 299?? JNIEnv *env = thread->jni_environment(); >> >> env now seems to only be used at line 320.? Move this closer. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >> >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a > "Thread*" (and somewhat surprisingly a number of API's that only work > for JavaThreads actually take a Thread*. :( ). I had choice between > trying to be file-wide consistent with the make_local calls, versus > local-code consistent, and used THREAD as it is available in both > JNI_ENTRY and via TRAPS. But I can certainly make a local change to > "thread" for local consistency. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvm.cpp >> >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to > use "thread" instead. But I'm not sure it's a consistency worth > pursuing at least as part of these changes (there are likely similar > issues with most of the touched files). The thing I like about THREAD if it's available is that it's assumed to be *always* the current thread, so I have to wonder no further. Also, "thread" is generally the current thread too, but if you have a choice, my preference would be to use THREAD. I wouldn't want to see this changed. Thanks, Coleen > > Thanks, > David > >> ------------------------------------------------------------------------------ >> >> From coleen.phillimore at oracle.com Tue Jul 21 18:01:36 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 21 Jul 2020 14:01:36 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <82ac807a-1492-9ac0-570a-d08b1dc93e09@oracle.com> This looks like a nice cleanup. http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/src/hotspot/share/runtime/jniHandles.cpp.udiff.html I'm wondering why you took out the NULL return for make_local() without a thread argument?? Here you may call Thread::current() unnecessarily. jobject JNIHandles::make_local(oop obj) { - if (obj == NULL) { - return NULL; // ignore null handles - } else { - Thread* thread = Thread::current(); - assert(oopDesc::is_oop(obj), "not an oop"); - assert(!current_thread_in_native(), "must not be in native"); - return thread->active_handles()->allocate_handle(obj); - } + return make_local(Thread::current(), obj); } Beyond the scope of this fix, but it'd be cool to not have a version that doesn't take thread, since there may be many more callers that already have Thread::current(). Coleen On 7/20/20 1:53 AM, David Holmes wrote: > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes >>> wrote: >>> >>> Subject line got truncated by accident ... >>> >>> On 20/07/2020 11:06 am, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>> This is a simple cleanup that touches files across a number of VM >>>> areas - hence the cross-post. >>>> Whilst working on a different JNI fix I noticed that in most cases >>>> in jni.cpp we were using the following form of make_local: >>>> JNIHandles::make_local(env, obj); >>>> and what that form does is first extract the thread from the JNIEnv: >>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>> return thread->active_handles()->allocate_handle(obj); >>>> but there is also another, faster, variant for when you already >>>> have the "thread": >>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>> ?? return thread->active_handles()->allocate_handle(obj); >>>> } >>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>> from the JNIEnv: >>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>> and further defined: >>>> ???? Thread* THREAD = thread; >>>> so we always already have direct access to the "thread" available >>>> (or indirect via TRAPS), and in fact we can end up removing the >>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>> Along the way I spotted some related issues with unnecessary use of >>>> Thread::current() when it is already available from TRAPS, and some >>>> other cases where we extracted the JNIEnv from a thread only to >>>> later extract the thread from the JNIEnv. >>>> Testing: tiers 1 - 3 >>>> Thanks, >>>> David >>>> ----- >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/javaClasses.cpp >> ? 439???? JNIEnv *env = thread->jni_environment(); >> >> Since env is no longer used on the next line, move this down to where >> it is used, at line 444. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ? 299?? JNIEnv *env = thread->jni_environment(); >> >> env now seems to only be used at line 320.? Move this closer. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >> >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a > "Thread*" (and somewhat surprisingly a number of API's that only work > for JavaThreads actually take a Thread*. :( ). I had choice between > trying to be file-wide consistent with the make_local calls, versus > local-code consistent, and used THREAD as it is available in both > JNI_ENTRY and via TRAPS. But I can certainly make a local change to > "thread" for local consistency. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvm.cpp >> >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to > use "thread" instead. But I'm not sure it's a consistency worth > pursuing at least as part of these changes (there are likely similar > issues with most of the touched files). > > Thanks, > David > >> ------------------------------------------------------------------------------ >> >> From serguei.spitsyn at oracle.com Tue Jul 21 19:25:31 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 21 Jul 2020 12:25:31 -0700 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> Message-ID: <1256c311-76cf-2d59-2e12-c79516728d34@oracle.com> Hi David, The fix looks good to me. Thanks, Serguei On 7/19/20 22:53, David Holmes wrote: > Hi Kim, > > Thanks for looking at this. > > Updated webrev at: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > On 20/07/2020 3:22 pm, Kim Barrett wrote: >>> On Jul 20, 2020, at 12:16 AM, David Holmes >>> wrote: >>> >>> Subject line got truncated by accident ... >>> >>> On 20/07/2020 11:06 am, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>> This is a simple cleanup that touches files across a number of VM >>>> areas - hence the cross-post. >>>> Whilst working on a different JNI fix I noticed that in most cases >>>> in jni.cpp we were using the following form of make_local: >>>> JNIHandles::make_local(env, obj); >>>> and what that form does is first extract the thread from the JNIEnv: >>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>> return thread->active_handles()->allocate_handle(obj); >>>> but there is also another, faster, variant for when you already >>>> have the "thread": >>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>> ?? return thread->active_handles()->allocate_handle(obj); >>>> } >>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>> from the JNIEnv: >>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>> and further defined: >>>> ???? Thread* THREAD = thread; >>>> so we always already have direct access to the "thread" available >>>> (or indirect via TRAPS), and in fact we can end up removing the >>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>> Along the way I spotted some related issues with unnecessary use of >>>> Thread::current() when it is already available from TRAPS, and some >>>> other cases where we extracted the JNIEnv from a thread only to >>>> later extract the thread from the JNIEnv. >>>> Testing: tiers 1 - 3 >>>> Thanks, >>>> David >>>> ----- >> >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/javaClasses.cpp >> ? 439???? JNIEnv *env = thread->jni_environment(); >> >> Since env is no longer used on the next line, move this down to where >> it is used, at line 444. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/classfile/verifier.cpp >> ? 299?? JNIEnv *env = thread->jni_environment(); >> >> env now seems to only be used at line 320.? Move this closer. > > Fixed. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jni.cpp >> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >> >> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >> previously it just used "thread". Maybe this change shouldn't be made? >> Or can the other uses be changed to THREAD for consistency? > > "thread" and "THREAD" are interchangeable for anything expecting a > "Thread*" (and somewhat surprisingly a number of API's that only work > for JavaThreads actually take a Thread*. :( ). I had choice between > trying to be file-wide consistent with the make_local calls, versus > local-code consistent, and used THREAD as it is available in both > JNI_ENTRY and via TRAPS. But I can certainly make a local change to > "thread" for local consistency. > >> ------------------------------------------------------------------------------ >> >> src/hotspot/share/prims/jvm.cpp >> >> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >> instead of "THREAD", even though other places nearby are using >> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >> easily avoidable. > > Everything that uses THREAD in a JVM_ENTRY method can be changed to > use "thread" instead. But I'm not sure it's a consistency worth > pursuing at least as part of these changes (there are likely similar > issues with most of the touched files). > > Thanks, > David > >> ------------------------------------------------------------------------------ >> >> From david.holmes at oracle.com Wed Jul 22 02:34:02 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 22 Jul 2020 12:34:02 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <1256c311-76cf-2d59-2e12-c79516728d34@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> <1256c311-76cf-2d59-2e12-c79516728d34@oracle.com> Message-ID: <63ff96e0-bcba-5041-0844-fb55b4fbfc1f@oracle.com> Thanks Serguei! David On 22/07/2020 5:25 am, serguei.spitsyn at oracle.com wrote: > Hi David, > > The fix looks good to me. > > Thanks, > Serguei > > > > On 7/19/20 22:53, David Holmes wrote: >> Hi Kim, >> >> Thanks for looking at this. >> >> Updated webrev at: >> >> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ >> >> On 20/07/2020 3:22 pm, Kim Barrett wrote: >>>> On Jul 20, 2020, at 12:16 AM, David Holmes >>>> wrote: >>>> >>>> Subject line got truncated by accident ... >>>> >>>> On 20/07/2020 11:06 am, David Holmes wrote: >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>>> This is a simple cleanup that touches files across a number of VM >>>>> areas - hence the cross-post. >>>>> Whilst working on a different JNI fix I noticed that in most cases >>>>> in jni.cpp we were using the following form of make_local: >>>>> JNIHandles::make_local(env, obj); >>>>> and what that form does is first extract the thread from the JNIEnv: >>>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>>> return thread->active_handles()->allocate_handle(obj); >>>>> but there is also another, faster, variant for when you already >>>>> have the "thread": >>>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>>> ?? return thread->active_handles()->allocate_handle(obj); >>>>> } >>>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>>> from the JNIEnv: >>>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>>> and further defined: >>>>> ???? Thread* THREAD = thread; >>>>> so we always already have direct access to the "thread" available >>>>> (or indirect via TRAPS), and in fact we can end up removing the >>>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>>> Along the way I spotted some related issues with unnecessary use of >>>>> Thread::current() when it is already available from TRAPS, and some >>>>> other cases where we extracted the JNIEnv from a thread only to >>>>> later extract the thread from the JNIEnv. >>>>> Testing: tiers 1 - 3 >>>>> Thanks, >>>>> David >>>>> ----- >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/javaClasses.cpp >>> ? 439???? JNIEnv *env = thread->jni_environment(); >>> >>> Since env is no longer used on the next line, move this down to where >>> it is used, at line 444. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/verifier.cpp >>> ? 299?? JNIEnv *env = thread->jni_environment(); >>> >>> env now seems to only be used at line 320.? Move this closer. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jni.cpp >>> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >>> >>> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >>> previously it just used "thread". Maybe this change shouldn't be made? >>> Or can the other uses be changed to THREAD for consistency? >> >> "thread" and "THREAD" are interchangeable for anything expecting a >> "Thread*" (and somewhat surprisingly a number of API's that only work >> for JavaThreads actually take a Thread*. :( ). I had choice between >> trying to be file-wide consistent with the make_local calls, versus >> local-code consistent, and used THREAD as it is available in both >> JNI_ENTRY and via TRAPS. But I can certainly make a local change to >> "thread" for local consistency. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jvm.cpp >>> >>> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >>> instead of "THREAD", even though other places nearby are using >>> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >>> easily avoidable. >> >> Everything that uses THREAD in a JVM_ENTRY method can be changed to >> use "thread" instead. But I'm not sure it's a consistency worth >> pursuing at least as part of these changes (there are likely similar >> issues with most of the touched files). >> >> Thanks, >> David >> >>> ------------------------------------------------------------------------------ >>> >>> > From david.holmes at oracle.com Wed Jul 22 02:46:26 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 22 Jul 2020 12:46:26 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <328fb322-5b14-968b-7b13-4b449a8d98fd@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> <328fb322-5b14-968b-7b13-4b449a8d98fd@oracle.com> Message-ID: <4d763c6f-96e1-5c9b-8739-a441ee3b4b31@oracle.com> Hi Dan, On 21/07/2020 3:07 am, Daniel D. Daugherty wrote: > On 7/20/20 1:53 AM, David Holmes wrote: >> Hi Kim, >> >> Thanks for looking at this. >> >> Updated webrev at: >> >> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ > > I like this cleanup very much! Thanks for looking at it. > > src/hotspot/share/classfile/javaClasses.cpp > ??? No comments. > > src/hotspot/share/classfile/verifier.cpp > ??? L298: ? JavaThread* thread = (JavaThread*)THREAD; > ??? L307: ? ResourceMark rm(THREAD); > ??????? Since we've gone to the trouble of creating the 'thread' variable, > ??????? I would prefer it to be used instead of THREAD where possible. Okay I made this change as we already use "thread" throughout that method. > src/hotspot/share/jvmci/jvmciCompilerToVM.cpp > ??? L1021: ? HandleMark hm; > ??????? Can this be 'hm(THREAD)'? (Not your problem, but while you're > ??????? in that file?) It probably could but there are around 8 such uses and I don't want to expand this change any further than necessary for the current issue. I filed a general RFE for things that should take advantage of having a current thread reference already (that will encompass Coleen's make_local(obj) change as well). https://bugs.openjdk.java.net/browse/JDK-8249837 > src/hotspot/share/prims/jni.cpp > ??? No comments. > > src/hotspot/share/prims/jvm.cpp > ??? L140: ? ResourceMark rm; > ??????? Can this be 'rm(THREAD)'? (Not your problem, but while you're > ??????? in that file?) > > ??? L611: ? Handle stackStream_h(THREAD, > JNIHandles::resolve_non_null(stackStream)); > ??? L617: ? objArrayHandle frames_array_h(THREAD, fa); > ??? L626: ? return JNIHandles::make_local(THREAD, result); > ??????? Since we've gone to the trouble of creating the 'jt' variable, > ??????? I would prefer it to be used instead of THREAD where possible. > > ??? L767: ? vframeStream vfst(thread); > ??? L788???????? return (jclass) JNIHandles::make_local(THREAD, > m->method_holder()->java_mirror()); > ??????? Can we use 'thread' on L788? (preferred) > ??????? Can we use 'THREAD' on L767? (less preferred) > > ??? L949: ? ResourceMark rm(THREAD); > ??? L951: ? Handle class_loader (THREAD, JNIHandles::resolve(loader)); > ??? L955: ?????????????????????????? THREAD); > ??? L957: ? Handle protection_domain (THREAD, JNIHandles::resolve(pd)); > ??? L968: ? return (jclass) JNIHandles::make_local(THREAD, > k->java_mirror()); > ??????? Since we've gone to the trouble of creating the 'jt' variable, > ??????? I would prefer it to be used instead of THREAD where possible. As per our slack chat, and the fact you are okay with things as-is, I will forego a more general "consistency" pass as it is unclear what is best here. As Coleen notes THREAD is generally understood to always be the current thread, while thread/jthread/jt could be any old thread in general. Also THREAD usage can highlight a Thread* API, while "thread" has to be used for JavaThread* API - but obviously that needs to be carefully and consistently applied to be useful. :) > ??? L986: ? JavaThread* jt = (JavaThread*) THREAD; > ??????? This 'jt' is unused and can be deleted (Not your problem, but > while you're > ??????? in that file?) Fixed (and another case elsewhere). > ??? L1154: ? while (*p != '\0') { > ??? L1155: ????? if (*p == '.') { > ??? L1156: ????????? *p = '/'; > ??? L1157: ????? } > ??? L1158: ????? p++; > ??????? Nit - the indents are wrong on L1155-58. (Not your problem, but > while you're > ??????? in that file?) Fixed > ??? L1389: ? ResourceMark rm(THREAD); > ??? L1446: ??? return JNIHandles::make_local(THREAD, result); > ??? L1460: ? return JNIHandles::make_local(THREAD, result); > ??????? Can we use 'thread' on L1389? (preferred) And then the line you > ??????? touched could also be 'thread' and we'll be consistent in this > ??????? function... Left as-is. > ??? L3287: ? oop jthread = thread->threadObj(); > ??? L3288: ? assert (thread != NULL, "no current thread!"); > ??????? I think the assert is wrong. It should be: > > ??????????? assert(jthread != NULL, "no current thread!"); > > ??????? If 'thread == NULL', then we would have crashed at L3287. > ??????? Also notice that I deleted the extra ' ' before '('. (Not > ??????? your problem, but while you're in that file?) Fixed. I was initially concerned about bootstrapping but it is fine - we ensure we set threadObj() before executing any Java code. > ??? L3289: ? return JNIHandles::make_local(THREAD, jthread); > ??????? Can you use 'thread' instead of 'THREAD' here for consistency? > > ??? L3681: ??? method_handle = Handle(THREAD, > JNIHandles::resolve(method)); > ??? L3682: ??? Handle receiver(THREAD, JNIHandles::resolve(obj)); > ??? L3683: ??? objArrayHandle args(THREAD, > objArrayOop(JNIHandles::resolve(args0))); > ??? L3685: ??? jobject res = JNIHandles::make_local(THREAD, result); > ??????? Can you use 'thread' instead of 'THREAD' here for consistency? > > ??? L3705: ? objArrayHandle args(THREAD, > objArrayOop(JNIHandles::resolve(args0))); > ??? L3707?? jobject res = JNIHandles::make_local(THREAD, result); > ??????? Can you use 'thread' instead of 'THREAD' here for consistency? Left as-is. > src/hotspot/share/prims/methodHandles.cpp > ??? No comments. > > src/hotspot/share/prims/methodHandles.hpp > ??? No comments. > > src/hotspot/share/prims/unsafe.cpp > ??? No comments. > > src/hotspot/share/prims/whitebox.cpp > ??? No comments. > > src/hotspot/share/runtime/jniHandles.cpp > ??? No comments. > > src/hotspot/share/runtime/jniHandles.hpp > ??? No comments. > > src/hotspot/share/services/management.cpp > ??? No comments. > > > None of my comments above are "must do". If you choose to make the > changes, a new webrev isn't required, but would be useful for a > sanity check. In addition to the tweak above I found a bunch of make_locasl(obj) usages in jvm.cpp and jni.cpp thanks to Coleen, which I have also fixed. Updated webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev.v3/ If this passes tier 1-3 re-testing then I plan to push. Thanks, David ----- > Thumbs up. > > Dan > > >> >> On 20/07/2020 3:22 pm, Kim Barrett wrote: >>>> On Jul 20, 2020, at 12:16 AM, David Holmes >>>> wrote: >>>> >>>> Subject line got truncated by accident ... >>>> >>>> On 20/07/2020 11:06 am, David Holmes wrote: >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>>> This is a simple cleanup that touches files across a number of VM >>>>> areas - hence the cross-post. >>>>> Whilst working on a different JNI fix I noticed that in most cases >>>>> in jni.cpp we were using the following form of make_local: >>>>> JNIHandles::make_local(env, obj); >>>>> and what that form does is first extract the thread from the JNIEnv: >>>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>>> return thread->active_handles()->allocate_handle(obj); >>>>> but there is also another, faster, variant for when you already >>>>> have the "thread": >>>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>>> ?? return thread->active_handles()->allocate_handle(obj); >>>>> } >>>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>>> from the JNIEnv: >>>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>>> and further defined: >>>>> ???? Thread* THREAD = thread; >>>>> so we always already have direct access to the "thread" available >>>>> (or indirect via TRAPS), and in fact we can end up removing the >>>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>>> Along the way I spotted some related issues with unnecessary use of >>>>> Thread::current() when it is already available from TRAPS, and some >>>>> other cases where we extracted the JNIEnv from a thread only to >>>>> later extract the thread from the JNIEnv. >>>>> Testing: tiers 1 - 3 >>>>> Thanks, >>>>> David >>>>> ----- >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/javaClasses.cpp >>> ? 439???? JNIEnv *env = thread->jni_environment(); >>> >>> Since env is no longer used on the next line, move this down to where >>> it is used, at line 444. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/verifier.cpp >>> ? 299?? JNIEnv *env = thread->jni_environment(); >>> >>> env now seems to only be used at line 320.? Move this closer. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jni.cpp >>> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >>> >>> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >>> previously it just used "thread". Maybe this change shouldn't be made? >>> Or can the other uses be changed to THREAD for consistency? >> >> "thread" and "THREAD" are interchangeable for anything expecting a >> "Thread*" (and somewhat surprisingly a number of API's that only work >> for JavaThreads actually take a Thread*. :( ). I had choice between >> trying to be file-wide consistent with the make_local calls, versus >> local-code consistent, and used THREAD as it is available in both >> JNI_ENTRY and via TRAPS. But I can certainly make a local change to >> "thread" for local consistency. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jvm.cpp >>> >>> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >>> instead of "THREAD", even though other places nearby are using >>> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >>> easily avoidable. >> >> Everything that uses THREAD in a JVM_ENTRY method can be changed to >> use "thread" instead. But I'm not sure it's a consistency worth >> pursuing at least as part of these changes (there are likely similar >> issues with most of the touched files). >> >> Thanks, >> David >> >>> ------------------------------------------------------------------------------ >>> >>> > From david.holmes at oracle.com Wed Jul 22 02:46:56 2020 From: david.holmes at oracle.com (David Holmes) Date: Wed, 22 Jul 2020 12:46:56 +1000 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <82ac807a-1492-9ac0-570a-d08b1dc93e09@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> <82ac807a-1492-9ac0-570a-d08b1dc93e09@oracle.com> Message-ID: <4ca86ddb-8a73-783c-0b3f-e8003f7160a3@oracle.com> Hi Coleen, On 22/07/2020 4:01 am, coleen.phillimore at oracle.com wrote: > > This looks like a nice cleanup. Thanks for looking at this. > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/src/hotspot/share/runtime/jniHandles.cpp.udiff.html > > I'm wondering why you took out the NULL return for make_local() without > a thread argument?? Here you may call Thread::current() unnecessarily. > > jobject JNIHandles::make_local(oop obj) { > - if (obj == NULL) { > - return NULL; // ignore null handles > - } else { > - Thread* thread = Thread::current(); > - assert(oopDesc::is_oop(obj), "not an oop"); > - assert(!current_thread_in_native(), "must not be in native"); > - return thread->active_handles()->allocate_handle(obj); > - } > + return make_local(Thread::current(), obj); > } I was simply using a standard call forwarding pattern to avoid code duplication. I suspect passing NULL is very rare so the unnecessary Thread::current() call is not an issue. Otherwise, if not NULL, the NULL check would happen twice (unless I keep the duplicated implementations). > Beyond the scope of this fix, but it'd be cool to not have a version > that doesn't take thread, since there may be many more callers that > already have Thread::current(). Indeed! And in fact I had missed a number of these in jvm.cpp and jni.cpp so I have fixed those. I've filed a RFE for other cases: https://bugs.openjdk.java.net/browse/JDK-8249837 Updated webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev.v3/ If this passes tier 1-3 re-testing then I plan to push. Thanks, David ----- > Coleen > > > On 7/20/20 1:53 AM, David Holmes wrote: >> Hi Kim, >> >> Thanks for looking at this. >> >> Updated webrev at: >> >> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ >> >> On 20/07/2020 3:22 pm, Kim Barrett wrote: >>>> On Jul 20, 2020, at 12:16 AM, David Holmes >>>> wrote: >>>> >>>> Subject line got truncated by accident ... >>>> >>>> On 20/07/2020 11:06 am, David Holmes wrote: >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>>> This is a simple cleanup that touches files across a number of VM >>>>> areas - hence the cross-post. >>>>> Whilst working on a different JNI fix I noticed that in most cases >>>>> in jni.cpp we were using the following form of make_local: >>>>> JNIHandles::make_local(env, obj); >>>>> and what that form does is first extract the thread from the JNIEnv: >>>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>>> return thread->active_handles()->allocate_handle(obj); >>>>> but there is also another, faster, variant for when you already >>>>> have the "thread": >>>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>>> ?? return thread->active_handles()->allocate_handle(obj); >>>>> } >>>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>>> from the JNIEnv: >>>>> ???? JavaThread* thread=JavaThread::thread_from_jni_environment(env); >>>>> and further defined: >>>>> ???? Thread* THREAD = thread; >>>>> so we always already have direct access to the "thread" available >>>>> (or indirect via TRAPS), and in fact we can end up removing the >>>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>>> Along the way I spotted some related issues with unnecessary use of >>>>> Thread::current() when it is already available from TRAPS, and some >>>>> other cases where we extracted the JNIEnv from a thread only to >>>>> later extract the thread from the JNIEnv. >>>>> Testing: tiers 1 - 3 >>>>> Thanks, >>>>> David >>>>> ----- >>> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/javaClasses.cpp >>> ? 439???? JNIEnv *env = thread->jni_environment(); >>> >>> Since env is no longer used on the next line, move this down to where >>> it is used, at line 444. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/classfile/verifier.cpp >>> ? 299?? JNIEnv *env = thread->jni_environment(); >>> >>> env now seems to only be used at line 320.? Move this closer. >> >> Fixed. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jni.cpp >>> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >>> >>> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >>> previously it just used "thread". Maybe this change shouldn't be made? >>> Or can the other uses be changed to THREAD for consistency? >> >> "thread" and "THREAD" are interchangeable for anything expecting a >> "Thread*" (and somewhat surprisingly a number of API's that only work >> for JavaThreads actually take a Thread*. :( ). I had choice between >> trying to be file-wide consistent with the make_local calls, versus >> local-code consistent, and used THREAD as it is available in both >> JNI_ENTRY and via TRAPS. But I can certainly make a local change to >> "thread" for local consistency. >> >>> ------------------------------------------------------------------------------ >>> >>> src/hotspot/share/prims/jvm.cpp >>> >>> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >>> instead of "THREAD", even though other places nearby are using >>> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >>> easily avoidable. >> >> Everything that uses THREAD in a JVM_ENTRY method can be changed to >> use "thread" instead. But I'm not sure it's a consistency worth >> pursuing at least as part of these changes (there are likely similar >> issues with most of the touched files). >> >> Thanks, >> David >> >>> ------------------------------------------------------------------------------ >>> >>> > From xxinliu at amazon.com Wed Jul 22 07:12:40 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 22 Jul 2020 07:12:40 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com>, <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> Message-ID: <1595401959932.33284@amazon.com> hi, Tobias, Thank you to review my patch. I make changes according to your feedbacks. here is the updated revision: https://cr.openjdk.java.net/~xliu/8247732/01/webrev/ 1. I move the validation logic for compiler directives to compilerOracle::scan_flag_and_value. If something wrong happens in parser, the patch will "gracefully" quit JVM using jvm_exit(1). is that okay? here is the example: $./build/linux-x86_64-server-release/jdk/bin/java -XX:CompileCommand=option,java.util.HashMap::putVal,ccstrlist,DisableIntrinsic,_hello -version CompileCommand: An error occurred during parsing Line: option,java/util/HashMap putVal ccstrlist DisableIntrinsic _hello Error: Unrecognized intrinsic detected in DisableIntrinsic: _hello Usage: '-XX:CompileCommand=command,"package/Class.method()"' Use: '-XX:CompileCommand=help' for more information. 2. I removed Method::external_name_short(). 3. fixed indentation issue. Test: hotspot:tier1 and gtest:all thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Monday, July 20, 2020 1:23 AM To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi, On 08.07.20 10:26, Liu, Xin wrote: > ControlIntrinsic/DisableIntrinsic in compiler directives are more complex. The matched directive is only parsed when hotspot attempts to compile the corresponding method. > > I validate at that time and JVM will crash if it doesnot meet guarantee() statement. I don't think a guarantee should be used here, i.e. the VM shouldn't crash but we should exit gracefully with an error message. Isn't it possible to piggy-back on the error mechanism in DirectivesParser? > I added Method::external_name_short() which only returns the shorter method name in the form of "classname::method". > > Probably hotspot has had similar code, but I failed to discover. please let me know and I will remove it. I would just use name_and_sig_as_C_string(). jvmFlagConstraintList.cpp:180/181 - Wrong indentation jvmFlagConstraintsCompiler.cpp:388/400 - Maybe change the error message to "Unrecognized intrinsic detected in DisableIntrinsic [...]" Best regards, Tobias From christian.hagedorn at oracle.com Wed Jul 22 08:23:15 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 22 Jul 2020 10:23:15 +0200 Subject: [16] RFR(XS): 8248467: C2: compiler/intrinsics/object/TestClone fails with -XX:+VerifyGraphEdges In-Reply-To: <2ac39054-e9bf-d7a8-2dcc-a954d1a94abf@oracle.com> References: <60c17f38-6cb2-d380-252f-15f8d5151b29@oracle.com> <6a458143-aeee-486b-2bc5-a210779c26dc@oracle.com> <2ac39054-e9bf-d7a8-2dcc-a954d1a94abf@oracle.com> Message-ID: <96a83931-cc69-42bd-43b8-71b688403920@oracle.com> Thank you Tobias for your review! Best regards, Christian On 20.07.20 10:29, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 15.07.20 19:26, Vladimir Kozlov wrote: >> Good. >> >> Thanks, >> Vladimir >> >> On 7/15/20 8:04 AM, Christian Hagedorn wrote: >>> Hi >>> >>> Please review the following patch: >>> https://bugs.openjdk.java.net/browse/JDK-8248467 >>> http://cr.openjdk.java.net/~chagedorn/8248467/webrev.00/ >>> >>> The assertion is hit due to a MemBarNode whose precedence edge was set to NULL at [1] >>> (result_phi_rawoop is NULL and _resproj is the precedence edge to a MemBarStoreStore). This is >>> possible since JDK-8237581 [2] which can remove some allocations. The fix just adds this >>> additional case in the assert. >>> >>> Best regards, >>> Christian >>> >>> >>> [1] http://hg.openjdk.java.net/jdk/jdk/file/4a8fd81d64ba/src/hotspot/share/opto/macro.cpp#l1566 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8237581 From christian.hagedorn at oracle.com Wed Jul 22 08:23:44 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 22 Jul 2020 10:23:44 +0200 Subject: [16] RFR(S): 8247743: Segmentation fault in debug builds due to stack overflow in find_recur with deep graphs In-Reply-To: References: <9af7a44c-4267-4900-812c-12aa0c37713a@oracle.com> <518ffdf1-143a-06f3-9aa4-96871d72d024@oracle.com> <9b3a9632-c7bb-2f51-c295-72935add2670@oracle.com> <2f317601-4845-541d-e2ef-ad7735386f1c@oracle.com> <7cfafcb9-6232-5738-6cad-508127fd31e8@oracle.com> <53d1eebe-e85f-58cb-7fba-0baf2ecf8701@oracle.com> Message-ID: Thank you Tobias for your review! Best regards, Christian On 20.07.20 10:32, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 15.07.20 19:37, Vladimir Kozlov wrote: >> Looks good. >> >> Thanks, >> Vladimir K >> >> On 7/15/20 12:58 AM, Christian Hagedorn wrote: >>> Hi Vladimir >>> >>> On 14.07.20 20:46, Vladimir Kozlov wrote: >>>> Can you move next up to where other small find*() methods are defined?: >>>> >>>> +Node* Node::find_ctrl(int idx) { >>>> +? return find(idx, true); >>>> ??} >>>> >>>> Also add '// not PRODUCT' comment to #endif for #ifndef PRODUCT. It is hard to find where this >>>> not product code ends. >>>> >>>> Looks good otherwise. >>> >>> Thanks, I added these changes in a new webrev: >>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.02/ >>> >>> Best regards, >>> Christian >>> >>> >>>> Thanks, >>>> Vladimir >>>> >>>> On 7/14/20 2:54 AM, Christian Hagedorn wrote: >>>>> Hi Vladimir >>>>> >>>>> On 13.07.20 19:43, Vladimir Kozlov wrote: >>>>>> Node::find_ctrl() is used during debugging when you want to print and look on only control nodes. >>>>>> We have several such methods which are only used in debugger. >>>>> >>>>> I see, I restored this method and changed Node::find() accordingly. I additionally added two >>>>> find_ctrl() methods to make it easier to call it from a debugger (as already present for >>>>> find_node()). >>>>> >>>>>> I suggest to store old_arena() in local var and pass into add_to_worklist(). >>>>>> >>>>>> You can make add_to_worklist() static since you pass node as argument. >>>>> >>>>> Okay. I updated this and the change above in a new webrev: >>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.01/ >>>>> >>>>> Best regards, >>>>> Christian >>>>> >>>>>> Thanks, >>>>>> Vladimir >>>>>> >>>>>> On 7/13/20 3:09 AM, Christian Hagedorn wrote: >>>>>>> Ping - could anyone review it, please? Thanks! >>>>>>> >>>>>>> Best regards, >>>>>>> Christian >>>>>>> >>>>>>> On 02.07.20 09:33, Christian Hagedorn wrote: >>>>>>>> Hi >>>>>>>> >>>>>>>> Please review the following patch: >>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8247743 >>>>>>>> http://cr.openjdk.java.net/~chagedorn/8247743/webrev.00/ >>>>>>>> >>>>>>>> The testcase creates a deep graph with a lot of nodes on a chain. When running with the >>>>>>>> specified test flags, it recursively calls Node::find_recur() for each node discovered which >>>>>>>> eventually results in a segmentation fault due to a stack overflow (around 10000 calls due to >>>>>>>> such a long chain of nodes). The fix just converts the recursive algorithm into an iterative >>>>>>>> one to avoid a segmentation fault. This is similar to JDK-8246203 [1]. >>>>>>>> >>>>>>>> I additionally removed Node::find_ctrl() and its special handling in the algorithm since it >>>>>>>> is not used. >>>>>>>> >>>>>>>> There is actually another problem with the recursive version. When running the testcase >>>>>>>> without -XX:CompileOnly=compiler/c2/TestFindNode, it will spin forever inside [2] because >>>>>>>> there is a debug_orig node cycle and the loop does not break based on the debug_orig nodes >>>>>>>> being visited. This is also fixed in the patch. >>>>>>>> >>>>>>>> Thank you! >>>>>>>> >>>>>>>> Best regards, >>>>>>>> Christian >>>>>>>> >>>>>>>> >>>>>>>> [1] https://bugs.openjdk.java.net/browse/JDK-8246203 >>>>>>>> [2] http://hg.openjdk.java.net/jdk/jdk/file/e2622818f0bd/src/hotspot/share/opto/node.cpp#l1589 From richard.reingruber at sap.com Wed Jul 22 08:20:15 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 22 Jul 2020 08:20:15 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, > I'll answer to the obvious things in this mail now. > I'll go through the code thoroughly again and write > a review of my findings thereafter. Sure. If trimmed my citations to relevant parts. > > The delta includes many changes in comments, renaming of names, etc. So > > I'd like to summarize > > functional changes: > > > > * Collected all the code for the testing feature DeoptimizeObjectsALot in > > compileBroker.cpp and reworked it. > Thanks, this makes it much more compact. > > With DeoptimizeObjectsALot enabled internal threads are started that > > deoptimize frames and > > objects. The number of threads started are given with > > DeoptimizeObjectsALotThreadCountAll and > > DeoptimizeObjectsALotThreadCountSingle. The former targets all existing > > threads whereas the > > latter operates on a single thread selected round robin. > > > > I removed the mode where deoptimizations were performed at every nth > > exit from the runtime. I never used it. > Do I get it right? You have a n:1 and a n:all test scenario. > n:1: n threads deoptimize 1 Jana thread where n = DOALThreadCountSingle > n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? Not quite. -XX:+DeoptimizeObjectsALot // required -XX:DeoptimizeObjectsALotThreadCountAll=m -XX:DeoptimizeObjectsALotThreadCountSingle=n Will start m+n threads. Each operating on all existing JavaThreads using EscapeBarriers. The difference between the 2 thread types is that one distinct EscapeBarrier targets either just a single thread or all exisitng threads at onece. If just one single thread is targeted per EscapeBarrier, then it is not always the same thread, but threads are selected round robin. So there will be n threads selecting independently single threads round robin per EscapeBarrier and m threads that target all threads in every EscapeBarrier. > > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > > execute it always independently > > of is_thread_fully_suspended(). > Is this also a performance optimization? Maybe a minor one. > > * JavaThread::wait_for_object_deoptimization(): > > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > > safepoint check! This > > caused issues with not walkable stacks with DeoptimizeObjectsALot. > OK. As I understand, there was one safepoint check in the old version, > now there is one in each iteration. I assume this is intended, right? Yes it is. The important thing here is (A) a safepoint check is needed /after/ leaving a safe state (_thread_in_native, _thread_blocked). (B) Shared variables that are modified at safepoints or with handshakes need to be reread /after/ the safepoint check. BTW: I only noticed now that since JDK-8240918 JavaThreads themselves must disarm their polling page. Originally (before handshakes) this was done by the VM thread. With handshakes it was done by the thread executing the handshake op. This was change for OrderAccess::cross_modify_fence() where the poll is left armed if the thread is in native and sice JDK-8240918 it is always left armed. So when a thread leaves a safe state (native, blocked) and there was a handshake/vm op, it will always call SafepointMechanism::block_if_requested_slow(), even if the handshake/vm operation have been processed already and everybody else is happyly executing bytecodes :) Still (A) and (B) hold. > > - Added limited spinning inspired by HandshakeSpinYield to fix regression in > > microbenchmark [1] > Ok. Nice improvement, nice catch! Yes. It certainly took some time to find out. > > > > I refer to some more changes answering your questions and comments inline > > below. > > > > Thanks, > > Richard. > > > > [1] Microbenchmark: > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/ > > > > > I understand you annotate at safepoints where the escape analysis > > > finds out that an object is "better" than global escape. > > > This are the cases where the analysis identifies optimization > > > opportunities. These annotations are then used to deoptimize > > > frames and the objects referenced by them. > > > Doesn't this overestimate the optimized > > > objects? E.g., eliminate_alloc_node has many cases where it bails > > > out. > > > > Yes, the implementation is conservative, but it is comparatively simple and > > the additional debug > > info is just 2 flags per safepoint. > Thanks. It also helped that you explained to me offline that > there are more optimizations than only lock elimination and scalar > replacement done based on the ea information. > The ea refines the IR graph with allows follow up optimizations > which can not easily be tracked back to the escaping objects or > the call sites where they do not escape. > Thus, if there are non-global escaping objects, you have to > deoptimize the frame. > Did I repeat that correctly? Mostly, but there are also cases, where deoptimization is required if and only if ea-local objects are passed as arguments. This is the case, when values are not read directely from a frame, but from a callee frame. > With this understanding, a row of my proposed renamings/comments > are obsolete. Ok. > > On the other hand, those JVMTI operations > > that really trigger > > deoptimizations are expected to be comparatively infrequent such that > > switching to the interpreter > > for a few microseconds will hardly have an effect. > That sounds reasonable. > > I've done microbenchmarking to check this. > > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > > nchmark/ > > > > I found that in the worst case performance can be impacted by 10%. If the > > agent is extremely active > > and does relevant JVMTI calls like GetOwnedMonitorStackDepthInfo() every > > millisecond or more often, > > then the performance impact can be 30%. But I would think that this is not > > realistic. These calls > > are issued in interactive sessions to analyze deadlocks. > Ok. > > We could get more precise deoptimizations by adding a third flag per > > safepoint for ea-local objects > > among the owned monitors. This would help improve the worst case in the > > benchmark. But I'm not > > convinced, if it is worth it. > > > > Refer to the README.txt of the microbenchmark for a more detailled > > discussion. > > > pcDesc.hpp > > > > > > I would like to see some documentation of the methods. > > Done. I didn't take your text, though, because I only noticed it after writing > > my own. Let me know if you are not ok with it. > That's fine. My texts were only proposals, you as author know better > what goes on anyways. Ok. > > > scopeDesc.cpp > > > > > > Besides refactoring copy escape info from pcDesc to scopeDesc > > > and add accessors. Trivial. > > > > > > In scopeDesc.hpp you talk about NoEscape and ArgEscape. > > > This are opto terms, but scopeDesc is a shared datastructure > > > that does not depend on a specific compiler. > > > Please explain what is going on without using these terms. > > > > Actually these are not too opto specific terms. They are used in the paper > > referenced in > > escape.hpp. Also you can easily google them. I'd rather keep the comments > > as they are. > Hmm, I'm not really happy with this, as also the papers > are for the compiler community, and probably not familiar to > others that work with HotSpot. > But stay with your terms if you think it makes it clearer. > Anyways, with now understanding why you use conservative > Information (see above), the descriptions I had in mind are not precise. Ok. > > > callnode.hpp > > > > > > You add functionality to annotate callnodes with escape information > > > This is carried through code generation to final output where it is > > > added to the compiled methods meta information. > > > > > > At Safepoints in general jvmti can access > > > - Objects that were scalar replaced. They must be reallocated. > > > (Flag EliminateAllocations) > > > - Objects that should be locked but are not because they never > > > escape the thread. They need to be relocked. > > > > > > At calls, Objects where locks have been removed escape to callees. > > > We must persist this information so that if jvmti accesses the > > > object in a callee, we can determine by looking at the caller that > > > it needs to be relocked. > > > > Note that the ea-optimization must not be at the current location, it can also > > follow when control > > returns to the caller. Lock elimination isn't the only relevant optimization. > Yes, I understood now, see above. Thanks for explaining. Ok. > > Accesses to instance > > members or array elements can be optimized as well. > You mean the compiler can/will ignore volatile or memory ordering > requirements for non-escaping objects? Sounds reasonable to do. Yes, for instance. Also without volatile modifiers it will eliminate accesses. Here is an example: Method A has a NoEscape allocation O that is not scalar replaced. A calls Method B, which is not inlined. When you use your debugger to break in B, then modify a field of O, then this modification would have no effect without deoptimization, because the jit assumes that B cannot modify O without a reference to it. > > You are right, it is not correct how flags are checked. Especially if only > > running with the JVMCI compiler. > > > > I changed Deoptimization::deoptimize_objects_internal() to make > > reallocation and relocking dependent > > on similar checks as in Deoptimization::fetch_unroll_info_helper(). > > Furthermore EscapeBarriers are > > conditionally activated depending on the following (see EscapeBarrier ctors): > > > > JVMCI_ONLY(UseJVMCICompiler) NOT_JVMCI(false) > > COMPILER2_PRESENT(|| DoEscapeAnalysis) > > > > So the enhancement can be practically completely disabled by disabling > > DoEscapeAnalysis, which is > > what C2 currently does if JVMTI capabilities that allow access to local > > references are taken. > Thanks for fixing. Thanks for finding :) > > I went for the latter. > > > > > In fetch_unroll_info_helper, I don't understand why you need > > > && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { > > > for eliminated locks, but not for skalar replaced objects? > > > > In short reallocation is idempotent, relocking is not. > > > > Without the enhancement Deoptimization::realloc_objects() can already be > > called more than once for a frame: > > > > First call in materializeVirtualObjects() (also iterateFrames()). > > > > Second (indirect) call in fetch_unroll_info_helper(). > > > > The objects from the first call are saved as jvmti deferred updates when > > realloc_objects() > > returns. Note that there is no relationship to jvmti. The thing in common is > > that updates cannot be > > directely installed into a compiled frame, it is necessary to deoptimize the > > frame and defer the > > updates until the compiled frame gets replaced. Every time the vframes > > corresponding to the owner > > frame are iterated, they get the deferred updates. So in > > fetch_unroll_info_helper() the > > GrowableArray* chunk reference them too. All > > references to the objects created by > > the second (indirect) call to realloc_objects() are never used, because > > compiledVFrame accessors to > > locals, expressions, and monitors override them with the deferred updates. > > The objects become > > unreachable and get gc'ed. > OK, so repeatedly computed vFrames always have the first version of > reallocated objects by construction, so it needs not be handled here. > But also due to construction, objects might be allocated just to be > discarded. Yes. > > materializeVirtualObjects() does not bother with relocking. > > deoptimize_objects_internal(), which is > > introduced by the enhancement, does relock objects, after all the lock > > elimination becomes illegal > > with the change in escape state. Relocking twice does not work, so the > > enhancement avoids it by > > checking EscapeBarrier::objs_are_deoptimized(thread, deoptee.id()). > > > > Note that materializeVirtualObjects() can be called more than once and will > > always return the very > > same objects, even though it calls realloc_objects() again. > Ok. > > > I would guess it is because the eliminated locks can be applied to > > > argEscape, but scalar replacement only to noescape objects? > > > I.e. it might have been done before? > > > > > > But why isn't this the case for eliminate_allocations? > > > deoptimize_objects_internal does both unconditionally, > > > so both can happen to inner frames, right? > > > > Sorry, I don't quite understand. Hope the explanation above helps. > Yes. I was guessing wrong :) Ok, good :) > > > > > Code will get much more simple if BiasedLocking is removed. > > > > > > EscapeBarrier:: ... > > > > > > (This class maybe would qualify for a file of its own.) > > > > > > deoptimize_objects() > > > I would mention escape analysis only as side remark. Also, as I understand, > > > there is only one frame at given depth? > > > // Deoptimize frames with optimized objects. This can be omitted locks and > > > // objects not allocated but replaced by scalars. In C2, these optimizations > > > // are based on escape analysis. > > > // Up to depth, deoptimize frames with any optimized objects. > > > // From depth to entry_frame, deoptimize only frames that > > > // pass optimized objects to their callees. > > > (First part similar for the comment above > > EscapeBarrier::deoptimize_objects_internal().) > > > > I've reworked the comment. Let me know if you still think it needs to be > > improved. > Good now, thanks (maybe break the long line ...) Ok. Will do in next webrev.7 > > > Syncronization: looks good. I think others had a look at this before. > > > > > > EscapeBarrier::deoptimize_objects_internal() > > > The method name is misleading, it is not used by > > > deoptimize_objects(). > > > Also, method with the same name is in Deopitmization. > > > Proposal: deoptimize_objects_thread() ? > > > > Sorry, but I don't see, why it would be misleading. > > What would be the meaning of 'deoptimize_objects_thread'? I don't > > understand that name. > 1. I have no idea why it's called "_internal". Because it is private? > By the name, I would expect that EscapeBarrier::deoptimize_objects() > calls it for some internal tasks. But it does not. Well, I'd say it is pretty internal, what's happening in that method. So IMHO the suffix _internal is a match. > 2. My proposal: deoptimize_objects_all_threads() iterates all threads > and calls deoptimize_objects(_one)_thread(thread) for each of these. > That's how I would have named it. > But no bike shedding, if you don't see what I mean it's not obvious. Ok. We could have a quick call, too, if you like. > > > Renaming deferred_locals to deferred_updates is good, as well as > > > adding a datastructure for it. > > > (Adding this data structure might be a breakout, too.) > > > > > > good. > > > > > > thread.cpp > > > > > > good. > > > > > > vframe.cpp > > > > > > Is this a bug in existing code? > > > Makes sense. > > > > Depends on your definition of bug. There are no references to > > vframe::is_entry_frame() in the > > existing code. I would think it is a bug. > So it is :) I'm just afraid it could get fixed by removing the class entryVFrame. > > > > > > > > vframe_hp.hpp > > > (What stands _hp for? helper? The file should be named > > compiledVFrame ...) > > > > > > not_global_escape_in_scope() ... > > > Again, you mention escape analysis here. Comments above hold, too. > > > > I think it is the right name, because it is meaningful and simple. > Ok, accepted ... given my understandings from above. Ok. > > > > > You introduce JvmtiDeferredUpdates. Good. > > > > > > vframe_hp.cpp > > > > > > Changes for JvmtiDeferredUpdates, escape state accessors, > > > > > > line 422: > > > Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here? > > > > > > > > > macros.hpp > > > Good. > > > > > > > > > Test coding > > > ============ > > > > > > compileBroker.h|cpp > > > > > > You introduce a third class of threads handled here and > > > add a new flag to distinguish it. Before, the two kinds > > > of threads were distinguished implicitly by passing in > > > a compiler for compiler threads. > > > The new thread kind is only used for testing in debug. > > > > > > make_thread: > > > You could assert (comp != NULL...) to assure previous > > > conditions. > > > > If replaced the if-statements with a switch-statement, made sure all enum- > > elements are covered, and > > added the assertion you suggested. > > > > > line 989 indentation broken > > > > You are referring to this block I assume: > > (from > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/src/hots > > pot/share/compiler/compileBroker.cpp.frames.html) > > > > 976 if (MethodFlushing) { > > 977 // Initialize the sweeper thread > > 978 Handle thread_oop = create_thread_oop("Sweeper thread", CHECK); > > 979 jobject thread_handle = JNIHandles::make_local(THREAD, > > thread_oop()); > > 980 make_thread(sweeper_t, thread_handle, NULL, NULL, THREAD); > > 981 } > > 982 > > 983 #if defined(ASSERT) && COMPILER2_OR_JVMCI > > 984 if (DeoptimizeObjectsALot == 2) { > > 985 // Initialize and start the object deoptimizer threads > > 986 for (int thread_count = 0; thread_count < > > DeoptimizeObjectsALotThreadCount; thread_count++) { > > 987 Handle thread_oop = create_thread_oop("Deoptimize objects a lot > > thread", CHECK); > > 988 jobject thread_handle = JNIHandles::make_local(THREAD, > > thread_oop()); > > 989 make_thread(deoptimizer_t, thread_handle, NULL, NULL, THREAD); > > 990 } > > 991 } > > 992 #endif // defined(ASSERT) && COMPILER2_OR_JVMCI > > > > I cannot really see broken indentation here. Am I looking at the wrong > > location? > I don't have the source version I reviewed last time any more, so > I can't check. But maybe an artefact from patching ... if there were > tabs jcheck would have told you, so that's not it. No problem. Ok. Thanks again! Cheers, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Donnerstag, 16. Juli 2020 18:30 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, I'll answer to the obvious things in this mail now. I'll go through the code thoroughly again and write a review of my findings thereafter. > So here is the new webrev.6 > > Webrev.6: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/ > Delta: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.inc/ Thanks for the incremental webrev, it's helpful! > I spent most of the time running a microbenchmark [1] I wrote to answer > questions from your > review. At first I had trouble with variance in the results until I found out it > was due to the NUMA > architecture of the server I used. After that I noticed that there was a > performance regression of > about 5% even at low agent activity. I finally found out that it was due to the > implementation of > JavaThread::wait_for_object_deoptimization() which is called by the target > of the JVMTI operation to > self suspend for object deoptimization. I fixed this by adding limited spinning > before calling > wait() on the monitor. > > The delta includes many changes in comments, renaming of names, etc. So > I'd like to summarize > functional changes: > > * Collected all the code for the testing feature DeoptimizeObjectsALot in > compileBroker.cpp and reworked it. Thanks, this makes it much more compact. > With DeoptimizeObjectsALot enabled internal threads are started that > deoptimize frames and > objects. The number of threads started are given with > DeoptimizeObjectsALotThreadCountAll and > DeoptimizeObjectsALotThreadCountSingle. The former targets all existing > threads whereas the > latter operates on a single thread selected round robin. > > I removed the mode where deoptimizations were performed at every nth > exit from the runtime. I never used it. Do I get it right? You have a n:1 and a n:all test scenario. n:1: n threads deoptimize 1 Jana thread where n = DOALThreadCountSingle n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > execute it always independently > of is_thread_fully_suspended(). Is this also a performance optimization? > * Bugfix in EscapeBarrier::thread_added(): must not clear deopt flag. Found > this testing with DeoptimizeObjectsALot. Ok. > * Added EscapeBarrier::thread_removed(). Ok. > * EscapeBarrier constructors: barriers can now be entirely disabled by > disabling DoEscapeAnalysis. > This effectively disables the enhancement. Good! > * JavaThread::wait_for_object_deoptimization(): > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > safepoint check! This > caused issues with not walkable stacks with DeoptimizeObjectsALot. OK. As I understand, there was one safepoint check in the old version, now there is one in each iteration. I assume this is intended, right? > - Added limited spinning inspired by HandshakeSpinYield to fix regression in > microbenchmark [1] Ok. Nice improvement, nice catch! > > I refer to some more changes answering your questions and comments inline > below. > > Thanks, > Richard. > > [1] Microbenchmark: > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbenchmark/ > > > I understand you annotate at safepoints where the escape analysis > > finds out that an object is "better" than global escape. > > This are the cases where the analysis identifies optimization > > opportunities. These annotations are then used to deoptimize > > frames and the objects referenced by them. > > Doesn't this overestimate the optimized > > objects? E.g., eliminate_alloc_node has many cases where it bails > > out. > > Yes, the implementation is conservative, but it is comparatively simple and > the additional debug > info is just 2 flags per safepoint. Thanks. It also helped that you explained to me offline that there are more optimizations than only lock elimination and scalar replacement done based on the ea information. The ea refines the IR graph with allows follow up optimizations which can not easily be tracked back to the escaping objects or the call sites where they do not escape. Thus, if there are non-global escaping objects, you have to deoptimize the frame. Did I repeat that correctly? With this understanding, a row of my proposed renamings/comments are obsolete. > On the other hand, those JVMTI operations > that really trigger > deoptimizations are expected to be comparatively infrequent such that > switching to the interpreter > for a few microseconds will hardly have an effect. That sounds reasonable. > I've done microbenchmarking to check this. > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > nchmark/ > > I found that in the worst case performance can be impacted by 10%. If the > agent is extremely active > and does relevant JVMTI calls like GetOwnedMonitorStackDepthInfo() every > millisecond or more often, > then the performance impact can be 30%. But I would think that this is not > realistic. These calls > are issued in interactive sessions to analyze deadlocks. Ok. > We could get more precise deoptimizations by adding a third flag per > safepoint for ea-local objects > among the owned monitors. This would help improve the worst case in the > benchmark. But I'm not > convinced, if it is worth it. > > Refer to the README.txt of the microbenchmark for a more detailled > discussion. > > pcDesc.hpp > > > > I would like to see some documentation of the methods. > Done. I didn't take your text, though, because I only noticed it after writing > my own. Let me know if you are not ok with it. That's fine. My texts were only proposals, you as author know better what goes on anyways. > > scopeDesc.cpp > > > > Besides refactoring copy escape info from pcDesc to scopeDesc > > and add accessors. Trivial. > > > > In scopeDesc.hpp you talk about NoEscape and ArgEscape. > > This are opto terms, but scopeDesc is a shared datastructure > > that does not depend on a specific compiler. > > Please explain what is going on without using these terms. > > Actually these are not too opto specific terms. They are used in the paper > referenced in > escape.hpp. Also you can easily google them. I'd rather keep the comments > as they are. Hmm, I'm not really happy with this, as also the papers are for the compiler community, and probably not familiar to others that work with HotSpot. But stay with your terms if you think it makes it clearer. Anyways, with now understanding why you use conservative Information (see above), the descriptions I had in mind are not precise. > > callnode.hpp > > > > You add functionality to annotate callnodes with escape information > > This is carried through code generation to final output where it is > > added to the compiled methods meta information. > > > > At Safepoints in general jvmti can access > > - Objects that were scalar replaced. They must be reallocated. > > (Flag EliminateAllocations) > > - Objects that should be locked but are not because they never > > escape the thread. They need to be relocked. > > > > At calls, Objects where locks have been removed escape to callees. > > We must persist this information so that if jvmti accesses the > > object in a callee, we can determine by looking at the caller that > > it needs to be relocked. > > Note that the ea-optimization must not be at the current location, it can also > follow when control > returns to the caller. Lock elimination isn't the only relevant optimization. Yes, I understood now, see above. Thanks for explaining. > Accesses to instance > members or array elements can be optimized as well. You mean the compiler can/will ignore volatile or memory ordering requirements for non-escaping objects? Sounds reasonable to do. > > // Returns true if at least one of the arguments to the call is an oop > > // that does not escape globally. > > bool ConnectionGraph::has_arg_escape(CallJavaNode* call) { > > IMHO the method names are descriptive and don't need the comments. But I > give in :) (only replaced > "oop" with "object") Thanks. Yes, object is better than oop. > You are right, it is not correct how flags are checked. Especially if only > running with the JVMCI compiler. > > I changed Deoptimization::deoptimize_objects_internal() to make > reallocation and relocking dependent > on similar checks as in Deoptimization::fetch_unroll_info_helper(). > Furthermore EscapeBarriers are > conditionally activated depending on the following (see EscapeBarrier ctors): > > JVMCI_ONLY(UseJVMCICompiler) NOT_JVMCI(false) > COMPILER2_PRESENT(|| DoEscapeAnalysis) > > So the enhancement can be practically completely disabled by disabling > DoEscapeAnalysis, which is > what C2 currently does if JVMTI capabilities that allow access to local > references are taken. Thanks for fixing. > I went for the latter. > > > In fetch_unroll_info_helper, I don't understand why you need > > && !EscapeBarrier::objs_are_deoptimized(thread, deoptee.id())) { > > for eliminated locks, but not for skalar replaced objects? > > In short reallocation is idempotent, relocking is not. > > Without the enhancement Deoptimization::realloc_objects() can already be > called more than once for a frame: > > First call in materializeVirtualObjects() (also iterateFrames()). > > Second (indirect) call in fetch_unroll_info_helper(). > > The objects from the first call are saved as jvmti deferred updates when > realloc_objects() > returns. Note that there is no relationship to jvmti. The thing in common is > that updates cannot be > directely installed into a compiled frame, it is necessary to deoptimize the > frame and defer the > updates until the compiled frame gets replaced. Every time the vframes > corresponding to the owner > frame are iterated, they get the deferred updates. So in > fetch_unroll_info_helper() the > GrowableArray* chunk reference them too. All > references to the objects created by > the second (indirect) call to realloc_objects() are never used, because > compiledVFrame accessors to > locals, expressions, and monitors override them with the deferred updates. > The objects become > unreachable and get gc'ed. OK, so repeatedly computed vFrames always have the first version of reallocated objects by construction, so it needs not be handled here. But also due to construction, objects might be allocated just to be discarded. > materializeVirtualObjects() does not bother with relocking. > deoptimize_objects_internal(), which is > introduced by the enhancement, does relock objects, after all the lock > elimination becomes illegal > with the change in escape state. Relocking twice does not work, so the > enhancement avoids it by > checking EscapeBarrier::objs_are_deoptimized(thread, deoptee.id()). > > Note that materializeVirtualObjects() can be called more than once and will > always return the very > same objects, even though it calls realloc_objects() again. Ok. > > I would guess it is because the eliminated locks can be applied to > > argEscape, but scalar replacement only to noescape objects? > > I.e. it might have been done before? > > > > But why isn't this the case for eliminate_allocations? > > deoptimize_objects_internal does both unconditionally, > > so both can happen to inner frames, right? > > Sorry, I don't quite understand. Hope the explanation above helps. Yes. I was guessing wrong :) > > I like if boolean operators are at the beginning of broken lines, > > but I think hotspot convention is to have them at the end. > Ok, fixed. Thanks. > > > Code will get much more simple if BiasedLocking is removed. > > > > EscapeBarrier:: ... > > > > (This class maybe would qualify for a file of its own.) > > > > deoptimize_objects() > > I would mention escape analysis only as side remark. Also, as I understand, > > there is only one frame at given depth? > > // Deoptimize frames with optimized objects. This can be omitted locks and > > // objects not allocated but replaced by scalars. In C2, these optimizations > > // are based on escape analysis. > > // Up to depth, deoptimize frames with any optimized objects. > > // From depth to entry_frame, deoptimize only frames that > > // pass optimized objects to their callees. > > (First part similar for the comment above > EscapeBarrier::deoptimize_objects_internal().) > > I've reworked the comment. Let me know if you still think it needs to be > improved. Good now, thanks (maybe break the long line ...) > > What is the check (cur_depth <= depth) good for? Can you > > ever walk past entry_frame? > > Yes (assuming you mean the outer while-statement), there are java frames > beyond the entry frame if a > native method calls java methods again. So we visit all frames up to the given > depth and from there > we continue to the entry frame. It is not necessary to continue beyond that > entry frame, because > escape analysis assumes that arguments to native functions escape globally. > > Example: Let the java stack look like this: > > +---------+ > | Frame A | > +---------+ > | Frame N | > +---------+ > | Frame B | > +---------+ <- top of stack > > Where java method A calls native method N and N calls java method B. > > Very simplified the native stack will look like this > > +-------------------------+ > | Frame of JIT Compiled A | > +-------------------------+ > | Frame N | > +-------------------------+ > | Entry Frame | > +-------------------------+ > | Frame B | > +-------------------------+ <- top of stack > > The entry frame is an activation of the call stub, which is a small assembler > routine that > translates from the native calling convention to the java calling convention. > > There cannot be any ArgEscape that is passed to B (see above), therefore we > can stop the stackwalk > at the entry frame if depth is 1. If depth is 3 we have to continue to Frame A, > as it is directely > accessed. Ok, thanks, nice explanation!! > > Isn't vf->is_compiled_frame() prerequisite that "Move to next physical > frame" > > is needed? You could move it into the other check. > > If so, similar for deoptimize_objects_all_threads(). > > Only compiledVFrame require moving to the /top/ frame. Fixed. Thanks, this looks better. > > Syncronization: looks good. I think others had a look at this before. > > > > EscapeBarrier::deoptimize_objects_internal() > > The method name is misleading, it is not used by > > deoptimize_objects(). > > Also, method with the same name is in Deopitmization. > > Proposal: deoptimize_objects_thread() ? > > Sorry, but I don't see, why it would be misleading. > What would be the meaning of 'deoptimize_objects_thread'? I don't > understand that name. 1. I have no idea why it's called "_internal". Because it is private? By the name, I would expect that EscapeBarrier::deoptimize_objects() calls it for some internal tasks. But it does not. 2. My proposal: deoptimize_objects_all_threads() iterates all threads and calls deoptimize_objects(_one)_thread(thread) for each of these. That's how I would have named it. But no bike shedding, if you don't see what I mean it's not obvious. > > C1 stubs: this really shows you tested all configurations, great! > > > > > > mutexLocker: ok. > > objectMonitor.cpp: ok > > stackValue.hpp Is this missing clearing a bug? > > In short: that change is not needed anymore. I'll remove it again. Good. Thanks for the details. > > Renaming deferred_locals to deferred_updates is good, as well as > > adding a datastructure for it. > > (Adding this data structure might be a breakout, too.) > > > > good. > > > > thread.cpp > > > > good. > > > > vframe.cpp > > > > Is this a bug in existing code? > > Makes sense. > > Depends on your definition of bug. There are no references to > vframe::is_entry_frame() in the > existing code. I would think it is a bug. So it is :) > > > > > vframe_hp.hpp > > (What stands _hp for? helper? The file should be named > compiledVFrame ...) > > > > not_global_escape_in_scope() ... > > Again, you mention escape analysis here. Comments above hold, too. > > I think it is the right name, because it is meaningful and simple. Ok, accepted ... given my understandings from above. > > > You introduce JvmtiDeferredUpdates. Good. > > > > vframe_hp.cpp > > > > Changes for JvmtiDeferredUpdates, escape state accessors, > > > > line 422: > > Would an assertion assert(!info->owner_is_scalar_replaced(), ...) hold here? > > > > > > macros.hpp > > Good. > > > > > > Test coding > > ============ > > > > compileBroker.h|cpp > > > > You introduce a third class of threads handled here and > > add a new flag to distinguish it. Before, the two kinds > > of threads were distinguished implicitly by passing in > > a compiler for compiler threads. > > The new thread kind is only used for testing in debug. > > > > make_thread: > > You could assert (comp != NULL...) to assure previous > > conditions. > > If replaced the if-statements with a switch-statement, made sure all enum- > elements are covered, and > added the assertion you suggested. > > > line 989 indentation broken > > You are referring to this block I assume: > (from > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.5/src/hots > pot/share/compiler/compileBroker.cpp.frames.html) > > 976 if (MethodFlushing) { > 977 // Initialize the sweeper thread > 978 Handle thread_oop = create_thread_oop("Sweeper thread", CHECK); > 979 jobject thread_handle = JNIHandles::make_local(THREAD, > thread_oop()); > 980 make_thread(sweeper_t, thread_handle, NULL, NULL, THREAD); > 981 } > 982 > 983 #if defined(ASSERT) && COMPILER2_OR_JVMCI > 984 if (DeoptimizeObjectsALot == 2) { > 985 // Initialize and start the object deoptimizer threads > 986 for (int thread_count = 0; thread_count < > DeoptimizeObjectsALotThreadCount; thread_count++) { > 987 Handle thread_oop = create_thread_oop("Deoptimize objects a lot > thread", CHECK); > 988 jobject thread_handle = JNIHandles::make_local(THREAD, > thread_oop()); > 989 make_thread(deoptimizer_t, thread_handle, NULL, NULL, THREAD); > 990 } > 991 } > 992 #endif // defined(ASSERT) && COMPILER2_OR_JVMCI > > I cannot really see broken indentation here. Am I looking at the wrong > location? I don't have the source version I reviewed last time any more, so I can't check. But maybe an artefact from patching ... if there were tabs jcheck would have told you, so that's not it. No problem. Best regards, Goetz. From jatin.bhateja at intel.com Wed Jul 22 10:27:26 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Wed, 22 Jul 2020 10:27:26 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Message-ID: Hi Vladimir, Please find the updated patch at following link http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ Change Summary: 1) Unified the handling for scalar rotate operation. All scalar rotate selection patterns are now dependent on newly created RotateLeft/RotateRight nodes. This promotes rotate inferencing. Currently if DAG nodes corresponding to a sub-pattern are shared (have multiple users) then existing complex patterns based on Or/LShiftL/URShift does not get matched and this prevents inferring rotate nodes. Please refer to JIT'ed assembly output with baseline[1] and with patch[2] . We can see that generated code size also went done from 832 byte to 768 bytes. Also this can cause perf degradation if shift-or dependency chain appears inside a hot region. 2) Due to enhanced rotate inferencing new patch shows better performance even for legacy targets (non AVX-512). Please refer to the perf result[3] over AVX2 machine for JMH benchmark part of the patch. 3) As suggested, removed Java API intrinsification changes and scalar rotate transformation are done during OrI/OrL node idealizations. 4) SLP always gets to work on new scalar Rotate nodes and creates vector rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes if target does not supports vector rotates(non-AVX512). 5) Added new instruction patterns for vector shift Left/Right operations with constant shift operands. This prevents emitting extra moves to XMM. 6) Constant folding scenarios are covered in RotateLeft/RotateRight idealization, inferencing of vector rotate through OrV idealization covers the vector patterns generated though non SLP route i.e. VectorAPI. Following are the results JMH benchmark over AVX3 target. Baseline: Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units RotateBenchmark.testRotateLeftI 20 512 thrpt 2 33541.569 ops/ms RotateBenchmark.testRotateLeftL 20 512 thrpt 2 20363.973 ops/ms RotateBenchmark.testRotateRightI 20 512 thrpt 2 33944.085 ops/ms RotateBenchmark.testRotateRightL 20 512 thrpt 2 20443.967 ops/ms With Changes: Benchmark (SHIFT) (TESTSIZE) Mode Cnt Score Error Units RotateBenchmark.testRotateLeftI 20 512 thrpt 2 48439.220 ops/ms RotateBenchmark.testRotateLeftL 20 512 thrpt 2 35758.933 ops/ms RotateBenchmark.testRotateRightI 20 512 thrpt 2 49702.219 ops/ms RotateBenchmark.testRotateRightL 20 512 thrpt 2 35618.666 ops/ms Please push the patch through your testing framework and let me know your review feedback. Best Regards, Jatin [1] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.txt [2] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_asm.txt [3] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_patch.txt > -----Original Message----- > From: Vladimir Ivanov > Sent: Saturday, July 18, 2020 12:25 AM > To: Bhateja, Jatin ; Andrew Haley > Cc: Viswanathan, Sandhya ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 > > Hi Jatin, > > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ > > It definitely looks better, but IMO it hasn't reached the sweet spot yet. > It feels like the focus is on auto-vectorizer while the burden is put on > scalar cases. > > First of all, considering GVN folds relevant operation patterns into a > single Rotate node now, what's the motivation to introduce intrinsics? > > Another point is there's still significant duplication for scalar cases. > > I'd prefer to see the legacy cases which rely on pattern matching to go > away and be substituted with instructions which match Rotate instructions > (migrating ). > > I understand that it will penalize the vectorization implementation, but > IMO reducing overall complexity is worth it. On auto-vectorizer side, I see > 2 ways to fix it: > > (1) introduce additional AD instructions for RotateLeftV/RotateRightV > specifically for pre-AVX512 hardware; > > (2) in SuperWord::output(), when matcher doesn't support > RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), > generate vectorized version of the original pattern. > > Overall, it looks like more and more focus is made on scalar part. > Considering the main goal of the patch is to enable vectorization, I'm fine > with separating cleanup of scalar part. As an interim solution, it seems > that leaving the scalar part as it is now and matching scalar bit rotate > pattern in VectorNode::is_rotate() should be enough to keep the > vectorization part functioning. Then scalar Rotate nodes and relevant > cleanups can be integrated later. (Or vice versa: clean up scalar part > first and then follow up with vectorization.) > > Some other comments: > > * There's a lot of duplication between OrINode::Ideal and OrLNode::Ideal. > What do you think about introducing a super type > (OrNode) and put a unified version (OrNode::Ideal) there? > > > * src/hotspot/cpu/x86/x86.ad > > +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ > + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || > + n->bottom_type()->is_vect()->element_basic_type() == > +T_LONG); > > +instruct vprorate(vec dst, vec src, vec shift) %{ > + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || > + n->bottom_type()->is_vect()->element_basic_type() == > +T_LONG); > > The predicates are redundant here. > > > * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > > +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, > XMMRegister dst, XMMRegister src, > + int shift, int vector_len) { if > + (opcode == Op_RotateLeftV) { > + if (etype == T_INT) { > + evprold(dst, src, shift, vector_len); > + } else { > + evprolq(dst, src, shift, vector_len); > + } > > Please, put an assert for the false case (assert(etype == T_LONG, "...")). > > > * On testing (with previous version of the patch): -XX:UseAVX is x86- > specific flag, so new/adjusted tests now fail on non-x86 platforms. > Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions will > solve the issue. > > Best regards, > Vladimir Ivanov > > > > > > > Summary of changes: > > 1) Optimization is specifically targeted to exploit vector rotation > instruction added for X86 AVX512. A single rotate instruction encapsulates > entire vector OR/SHIFTs pattern thus offers better latency at reduced > instruction count. > > > > 2) There were two approaches to implement this: > > a) Let everything remain the same and add new wide complex > instruction patterns in the matcher for e.g. > > set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI shift)) > (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate shift)) > > It would have been an overoptimistic assumption to expect that graph > shape would be preserved till the matcher for correct inferencing. > > In addition we would have required multiple such bulky patterns. > > b) Create new RotateLeft/RotateRight scalar nodes, these gets > generated during intrinsification as well as during additional pattern > > matching during node Idealization, later on these nodes are consumed > by SLP for valid vectorization scenarios to emit their vector > > counterparts which eventually emits vector rotates. > > > > 3) I choose approach 2b) since its cleaner, only problem here was that > > in non-evex mode (UseAVX < 3) new scalar Rotate nodes should either be > dismantled back to OR/SHIFT pattern or we penalize the vectorization which > would be very costly, other option would have been to add additional vector > rotate pattern for UseAVX=3 in the matcher which emit vector OR-SHIFTs > instruction but then it will loose on emitting efficient instruction > sequence which node sharing (OrV/LShiftV/URShift) offer in current > implementation - thus it will not be beneficial for non-AVX512 targets, > only saving will be in terms of cleanup of few existing scalar rotate > matcher patterns, also old targets does not offer this powerful rotate > instruction. Therefore new scalar nodes are created only for AVX512 > targets. > > > > As per suggestions constant folding scenarios have been covered during > Idealizations of newly added scalar nodes. > > > > Please review the latest version and share your feedback and test > results. > > > > Best Regards, > > Jatin > > > > > >> -----Original Message----- > >> From: Andrew Haley > >> Sent: Saturday, July 11, 2020 2:24 PM > >> To: Vladimir Ivanov ; Bhateja, Jatin > >> ; hotspot-compiler-dev at openjdk.java.net > >> Cc: Viswanathan, Sandhya > >> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification for > >> X86 > >> > >> On 10/07/2020 18:32, Vladimir Ivanov wrote: > >> > >> > High-level comment: so far, there were no pressing need in > > >> explicitly marking the methods as intrinsics. ROR/ROL instructions > > >> were selected during matching [1]. Now the patch introduces > > >> dedicated nodes > >> (RotateLeft/RotateRight) specifically for intrinsics > which partly > >> duplicates existing logic. > >> > >> The lack of rotate nodes in the IR has always meant that AArch64 > >> doesn't generate optimal code for e.g. > >> > >> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > >> > >> because, with the RotateLeft expanded to its full combination of ORs > >> and shifts, it's to complicated to match. At the time I put this to > >> one side because it wasn't urgent. This is a shame because although > >> such combinations are unusual they are used in some crypto operations. > >> > >> If we can generate immediate-form rotate nodes early by pattern > >> matching during parsing (rather than depending on intrinsics) we'll > >> get more value than by depending on programmers calling intrinsics. > >> > >> -- > >> Andrew Haley (he/him) > >> Java Platform Lead Engineer > >> Red Hat UK Ltd. > >> https://keybase.io/andrewhaley > >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > > From coleen.phillimore at oracle.com Wed Jul 22 12:25:13 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 22 Jul 2020 08:25:13 -0400 Subject: RFR (M) 8249650: Optimize JNIHandle::make_local thread variable usage In-Reply-To: <4ca86ddb-8a73-783c-0b3f-e8003f7160a3@oracle.com> References: <8410d4a2-bbad-090f-55bf-88940f786781@oracle.com> <0590E210-6F23-4498-A51A-C3DAEF54B5AB@oracle.com> <6166e191-c954-70e5-5595-956a0c145d10@oracle.com> <82ac807a-1492-9ac0-570a-d08b1dc93e09@oracle.com> <4ca86ddb-8a73-783c-0b3f-e8003f7160a3@oracle.com> Message-ID: Ok, looks good to me. Colen On 7/21/20 10:46 PM, David Holmes wrote: > Hi Coleen, > > On 22/07/2020 4:01 am, coleen.phillimore at oracle.com wrote: >> >> This looks like a nice cleanup. > > Thanks for looking at this. > >> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/src/hotspot/share/runtime/jniHandles.cpp.udiff.html >> >> >> I'm wondering why you took out the NULL return for make_local() >> without a thread argument?? Here you may call Thread::current() >> unnecessarily. >> >> ? jobject JNIHandles::make_local(oop obj) { >> - if (obj == NULL) { >> - return NULL; // ignore null handles >> - } else { >> - Thread* thread = Thread::current(); >> - assert(oopDesc::is_oop(obj), "not an oop"); >> - assert(!current_thread_in_native(), "must not be in native"); >> - return thread->active_handles()->allocate_handle(obj); >> - } >> + return make_local(Thread::current(), obj); >> ? } > > I was simply using a standard call forwarding pattern to avoid code > duplication. I suspect passing NULL is very rare so the unnecessary > Thread::current() call is not an issue. Otherwise, if not NULL, the > NULL check would happen twice (unless I keep the duplicated > implementations). > >> Beyond the scope of this fix, but it'd be cool to not have a version >> that doesn't take thread, since there may be many more callers that >> already have Thread::current(). > > Indeed! And in fact I had missed a number of these in jvm.cpp and > jni.cpp so I have fixed those. I've filed a RFE for other cases: > > https://bugs.openjdk.java.net/browse/JDK-8249837 > > Updated webrev: > > http://cr.openjdk.java.net/~dholmes/8249650/webrev.v3/ > > If this passes tier 1-3 re-testing then I plan to push. > > Thanks, > David > ----- > >> Coleen >> >> >> On 7/20/20 1:53 AM, David Holmes wrote: >>> Hi Kim, >>> >>> Thanks for looking at this. >>> >>> Updated webrev at: >>> >>> http://cr.openjdk.java.net/~dholmes/8249650/webrev.v2/ >>> >>> On 20/07/2020 3:22 pm, Kim Barrett wrote: >>>>> On Jul 20, 2020, at 12:16 AM, David Holmes >>>>> wrote: >>>>> >>>>> Subject line got truncated by accident ... >>>>> >>>>> On 20/07/2020 11:06 am, David Holmes wrote: >>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249650 >>>>>> webrev: http://cr.openjdk.java.net/~dholmes/8249650/webrev/ >>>>>> This is a simple cleanup that touches files across a number of VM >>>>>> areas - hence the cross-post. >>>>>> Whilst working on a different JNI fix I noticed that in most >>>>>> cases in jni.cpp we were using the following form of make_local: >>>>>> JNIHandles::make_local(env, obj); >>>>>> and what that form does is first extract the thread from the JNIEnv: >>>>>> JavaThread* thread = JavaThread::thread_from_jni_environment(env); >>>>>> return thread->active_handles()->allocate_handle(obj); >>>>>> but there is also another, faster, variant for when you already >>>>>> have the "thread": >>>>>> jobject JNIHandles::make_local(Thread* thread, oop obj) { >>>>>> ?? return thread->active_handles()->allocate_handle(obj); >>>>>> } >>>>>> When you look at the JNI_ENTRY wrapper (and related JVM_ENTRY, >>>>>> WB_ENTRY, UNSAFE_ENTRY etc) it has already extracted the thread >>>>>> from the JNIEnv: >>>>>> ???? JavaThread* >>>>>> thread=JavaThread::thread_from_jni_environment(env); >>>>>> and further defined: >>>>>> ???? Thread* THREAD = thread; >>>>>> so we always already have direct access to the "thread" available >>>>>> (or indirect via TRAPS), and in fact we can end up removing the >>>>>> make_local(JNIEnv* env, oop obj) variant altogether. >>>>>> Along the way I spotted some related issues with unnecessary use >>>>>> of Thread::current() when it is already available from TRAPS, and >>>>>> some other cases where we extracted the JNIEnv from a thread only >>>>>> to later extract the thread from the JNIEnv. >>>>>> Testing: tiers 1 - 3 >>>>>> Thanks, >>>>>> David >>>>>> ----- >>>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/classfile/javaClasses.cpp >>>> ? 439???? JNIEnv *env = thread->jni_environment(); >>>> >>>> Since env is no longer used on the next line, move this down to where >>>> it is used, at line 444. >>> >>> Fixed. >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/classfile/verifier.cpp >>>> ? 299?? JNIEnv *env = thread->jni_environment(); >>>> >>>> env now seems to only be used at line 320.? Move this closer. >>> >>> Fixed. >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jni.cpp >>>> ? 743???? result = JNIHandles::make_local(THREAD, result_handle()); >>>> >>>> jni_PopLocalFrame is now using a mix of "thread" and "THREAD", where >>>> previously it just used "thread". Maybe this change shouldn't be made? >>>> Or can the other uses be changed to THREAD for consistency? >>> >>> "thread" and "THREAD" are interchangeable for anything expecting a >>> "Thread*" (and somewhat surprisingly a number of API's that only >>> work for JavaThreads actually take a Thread*. :( ). I had choice >>> between trying to be file-wide consistent with the make_local calls, >>> versus local-code consistent, and used THREAD as it is available in >>> both JNI_ENTRY and via TRAPS. But I can certainly make a local >>> change to "thread" for local consistency. >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> src/hotspot/share/prims/jvm.cpp >>>> >>>> The calls to JvmtiExport::post_vm_object_alloc have to use "thread" >>>> instead of "THREAD", even though other places nearby are using >>>> "THREAD".? That inconsistency is kind of unfortunate, but doesn't seem >>>> easily avoidable. >>> >>> Everything that uses THREAD in a JVM_ENTRY method can be changed to >>> use "thread" instead. But I'm not sure it's a consistency worth >>> pursuing at least as part of these changes (there are likely similar >>> issues with most of the touched files). >>> >>> Thanks, >>> David >>> >>>> ------------------------------------------------------------------------------ >>>> >>>> >> From boris.ulasevich at bell-sw.com Wed Jul 22 13:36:34 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Wed, 22 Jul 2020 16:36:34 +0300 Subject: RFR 8249189: AARCH64: more L2I conversions can be skipped (ubfiz) Message-ID: <209c5713-4218-4e9c-037d-fe337734697f@bell-sw.com> Hi, Please review the update for aarch64 AD template file to generate more bitfield extraction rules where I2L and L2I conversions can be skipped. http://cr.openjdk.java.net/~bulasevich/8249189/webrev.02 http://bugs.openjdk.java.net/browse/JDK-8249189 Tested with JTREG and manual [1] tests. thanks, Boris [1] http://cr.openjdk.java.net/~bulasevich/8249189/webrev.02/TestConversionSkip.java From bob.vandette at oracle.com Wed Jul 22 15:07:53 2020 From: bob.vandette at oracle.com (Bob Vandette) Date: Wed, 22 Jul 2020 11:07:53 -0400 Subject: RFR: 8249880 - JVMCI calling register_nmethod without CodeCache lock Message-ID: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> Please review this fix which adds a CodeCache lock around registering an nmethod with the collector. This is causing a guarantee to fire when the nmethod sweeper runs during a CompileTheWorld test. BUG: https://bugs.openjdk.java.net/browse/JDK-8249880 PATCH: diff --git a/src/hotspot/share/jvmci/jvmciRuntime.cpp b/src/hotspot/share/jvmci/jvmciRuntime.cpp --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp @@ -668,6 +668,7 @@ // Since we've patched some oops in the nmethod, // (re)register it with the heap. + MutexLocker ml(CodeCache_lock, Mutex::_no_safepoint_check_flag); Universe::heap()->register_nmethod(nm); } Bob. From erik.osterlund at oracle.com Wed Jul 22 15:12:41 2020 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 22 Jul 2020 17:12:41 +0200 Subject: RFR: 8249880 - JVMCI calling register_nmethod without CodeCache lock In-Reply-To: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> References: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> Message-ID: <6723fc8c-b25e-1545-e653-8851f746b7a9@oracle.com> Hi Bob, Looks good. Thanks, /Erik On 2020-07-22 17:07, Bob Vandette wrote: > Please review this fix which adds a CodeCache lock around registering an nmethod with > the collector. This is causing a guarantee to fire when the nmethod sweeper runs during > a CompileTheWorld test. > > BUG: > https://bugs.openjdk.java.net/browse/JDK-8249880 > > PATCH: > > diff --git a/src/hotspot/share/jvmci/jvmciRuntime.cpp b/src/hotspot/share/jvmci/jvmciRuntime.cpp > --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp > +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp > @@ -668,6 +668,7 @@ > > // Since we've patched some oops in the nmethod, > // (re)register it with the heap. > + MutexLocker ml(CodeCache_lock, Mutex::_no_safepoint_check_flag); > Universe::heap()->register_nmethod(nm); > } > > Bob. > From doug.simon at oracle.com Wed Jul 22 15:53:05 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 22 Jul 2020 17:53:05 +0200 Subject: RFR: 8249888: failure to create a libgraal JavaVM should result in a VM crash Message-ID: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> Please review this enhancement which improves the debuggability of certain libgraal isolate creation issues. BUG: https://bugs.openjdk.java.net/browse/JDK-8249888 PATCH: diff -r 8995e9efdee7 src/hotspot/share/jvmci/jvmciRuntime.cpp --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:08:01 2020 +0200 +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:41:06 2020 +0200 @@ -800,7 +800,7 @@ JNI_CreateJavaVM = CAST_TO_FN_PTR(JNI_CreateJavaVM_t, os::dll_lookup(sl_handle, "JNI_CreateJavaVM")); if (JNI_CreateJavaVM == NULL) { - vm_exit_during_initialization("Unable to find JNI_CreateJavaVM", sl_path); + fatal("Unable to find JNI_CreateJavaVM in %s", sl_path); } ResourceMark rm; @@ -835,7 +835,7 @@ JVMCI_event_1("created JavaVM[%ld]@" PTR_FORMAT " for JVMCI runtime %d", javaVM_id, p2i(javaVM), _id); return env; } else { - vm_exit_during_initialization(err_msg("JNI_CreateJavaVM failed with return value %d", result), sl_path); + fatal("JNI_CreateJavaVM failed with return value %d", result); } } return NULL; -Doug From goetz.lindenmaier at sap.com Wed Jul 22 16:21:38 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Wed, 22 Jul 2020 16:21:38 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, Thanks for the quick reply. > > > With DeoptimizeObjectsALot enabled internal threads are started that > > > deoptimize frames and > > > objects. The number of threads started are given with > > > DeoptimizeObjectsALotThreadCountAll and > > > DeoptimizeObjectsALotThreadCountSingle. The former targets all > existing > > > threads whereas the > > > latter operates on a single thread selected round robin. > > > > > > I removed the mode where deoptimizations were performed at every nth > > > exit from the runtime. I never used it. > > > Do I get it right? You have a n:1 and a n:all test scenario. > > n:1: n threads deoptimize 1 Jana thread where n => DOALThreadCountSingle > > n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? > > Not quite. > > -XX:+DeoptimizeObjectsALot // required > -XX:DeoptimizeObjectsALotThreadCountAll=m > -XX:DeoptimizeObjectsALotThreadCountSingle=n > > Will start m+n threads. Each operating on all existing JavaThreads using > EscapeBarriers. The > difference between the 2 thread types is that one distinct EscapeBarrier > targets either just a > single thread or all exisitng threads at onece. If just one single thread is > targeted per > EscapeBarrier, then it is not always the same thread, but threads are selected > round robin. So there > will be n threads selecting independently single threads round robin per > EscapeBarrier and m threads > that target all threads in every EscapeBarrier. Ok, yes, that is how I understood it. > > > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > > > execute it always independently > > > of is_thread_fully_suspended(). > > Is this also a performance optimization? > > Maybe a minor one. OK > > > * JavaThread::wait_for_object_deoptimization(): > > > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > > > safepoint check! This > > > caused issues with not walkable stacks with DeoptimizeObjectsALot. > > OK. As I understand, there was one safepoint check in the old version, > > now there is one in each iteration. I assume this is intended, right? > > Yes it is. The important thing here is (A) a safepoint check is needed /after/ > leaving a safe state > (_thread_in_native, _thread_blocked). (B) Shared variables that are modified > at safepoints or with handshakes need to be reread /after/ the safepoint check. > > BTW: I only noticed now that since JDK-8240918 JavaThreads themselves > must disarm their polling > page. Originally (before handshakes) this was done by the VM thread. With > handshakes it was done by > the thread executing the handshake op. This was changed for > OrderAccess::cross_modify_fence() where > the poll is left armed if the thread is in native and sice JDK-8240918 it is > always left armed. So > when a thread leaves a safe state (native, blocked) and there was a > handshake/vm op, it will always > call SafepointMechanism::block_if_requested_slow(), even if the > handshake/vm operation have been > processed already and everybody else is happyly executing bytecodes :) Ok. > Still (A) and (B) hold. > > > - Added limited spinning inspired by HandshakeSpinYield to fix regression in > > > microbenchmark [1] > > Ok. Nice improvement, nice catch! > > Yes. It certainly took some time to find out. > > > > > > > I refer to some more changes answering your questions and comments > inline > > > below. > > > > > > Thanks, > > > Richard. > > > > > > [1] Microbenchmark: > > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > nchmark/ > > > > > > > > > I understand you annotate at safepoints where the escape analysis > > > > finds out that an object is "better" than global escape. > > > > This are the cases where the analysis identifies optimization > > > > opportunities. These annotations are then used to deoptimize > > > > frames and the objects referenced by them. > > > > Doesn't this overestimate the optimized > > > > objects? E.g., eliminate_alloc_node has many cases where it bails > > > > out. > > > > > > Yes, the implementation is conservative, but it is comparatively simple > and > > > the additional debug > > > info is just 2 flags per safepoint. > > Thanks. It also helped that you explained to me offline that > > there are more optimizations than only lock elimination and scalar > > replacement done based on the ea information. > > The ea refines the IR graph with allows follow up optimizations > > which can not easily be tracked back to the escaping objects or > > the call sites where they do not escape. > > Thus, if there are non-global escaping objects, you have to > > deoptimize the frame. > > Did I repeat that correctly? > > Mostly, but there are also cases where deoptimization is required if and only > if ea-local objects > are passed as arguments. This is the case when values are not read directly > from a frame, but from a callee frame. Hmm, don't get this completely, but ok. > > > Accesses to instance > > > members or array elements can be optimized as well. > > You mean the compiler can/will ignore volatile or memory ordering > > requirements for non-escaping objects? Sounds reasonable to do. > > Yes, for instance. Also without volatile modifiers it will eliminate accesses. > Here is an example: > Method A has a NoEscape allocation O that is not scalar replaced. A calls > Method B, which is not > inlined. When you use your debugger to break in B, then modify a field of O, > then this modification > would have no effect without deoptimization, because the jit assumes that B > cannot modify O without > a reference to it. Yes, A can keep O in a register, while the JVMTI thread would write to the location in the stack where the local is held (if it was written back). > > > > Syncronization: looks good. I think others had a look at this before. > > > > > > > > EscapeBarrier::deoptimize_objects_internal() > > > > The method name is misleading, it is not used by > > > > deoptimize_objects(). > > > > Also, method with the same name is in Deopitmization. > > > > Proposal: deoptimize_objects_thread() ? > > > > > > Sorry, but I don't see, why it would be misleading. > > > What would be the meaning of 'deoptimize_objects_thread'? I don't > > > understand that name. > > 1. I have no idea why it's called "_internal". Because it is private? > > By the name, I would expect that EscapeBarrier::deoptimize_objects() > > calls it for some internal tasks. But it does not. > > Well, I'd say it is pretty internal, what's happening in that method. So IMHO > the suffix _internal > is a match. > > > 2. My proposal: deoptimize_objects_all_threads() iterates all threads > > and calls deoptimize_objects(_one)_thread(thread) for each of these. > > That's how I would have named it. > > But no bike shedding, if you don't see what I mean it's not obvious. > Ok. We could have a quick call, too, if you like. Ok, I think I have understood the remaining points. I'm fine with this so far. Thanks, Goetz. From vladimir.kozlov at oracle.com Wed Jul 22 17:40:18 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 22 Jul 2020 10:40:18 -0700 Subject: RFR: 8249880 - JVMCI calling register_nmethod without CodeCache lock In-Reply-To: <6723fc8c-b25e-1545-e653-8851f746b7a9@oracle.com> References: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> <6723fc8c-b25e-1545-e653-8851f746b7a9@oracle.com> Message-ID: <3625530e-4e48-8346-0ae4-86c88a2778a2@oracle.com> +1 Thanks, Vladimir On 7/22/20 8:12 AM, Erik ?sterlund wrote: > Hi Bob, > > Looks good. > > Thanks, > /Erik > > On 2020-07-22 17:07, Bob Vandette wrote: >> Please review this fix which adds a CodeCache lock around registering an nmethod with >> the collector.?? This is causing a guarantee to fire when the nmethod sweeper runs during >> a CompileTheWorld test. >> >> BUG: >> https://bugs.openjdk.java.net/browse/JDK-8249880 >> >> PATCH: >> >> diff --git a/src/hotspot/share/jvmci/jvmciRuntime.cpp b/src/hotspot/share/jvmci/jvmciRuntime.cpp >> --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp >> +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp >> @@ -668,6 +668,7 @@ >> ??? // Since we've patched some oops in the nmethod, >> ??? // (re)register it with the heap. >> + MutexLocker ml(CodeCache_lock, Mutex::_no_safepoint_check_flag); >> ??? Universe::heap()->register_nmethod(nm); >> ? } >> >> Bob. >> > From tom.rodriguez at oracle.com Wed Jul 22 17:56:11 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 22 Jul 2020 10:56:11 -0700 Subject: RFR: 8249880 - JVMCI calling register_nmethod without CodeCache lock In-Reply-To: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> References: <07AE1117-D70D-4CE5-A636-3B0C789E0555@oracle.com> Message-ID: <9951f7db-24fe-c1c5-26e8-a5174ca57f79@oracle.com> Looks good. tom Bob Vandette wrote on 7/22/20 8:07 AM: > Please review this fix which adds a CodeCache lock around registering an nmethod with > the collector. This is causing a guarantee to fire when the nmethod sweeper runs during > a CompileTheWorld test. > > BUG: > https://bugs.openjdk.java.net/browse/JDK-8249880 > > PATCH: > > diff --git a/src/hotspot/share/jvmci/jvmciRuntime.cpp b/src/hotspot/share/jvmci/jvmciRuntime.cpp > --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp > +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp > @@ -668,6 +668,7 @@ > > // Since we've patched some oops in the nmethod, > // (re)register it with the heap. > + MutexLocker ml(CodeCache_lock, Mutex::_no_safepoint_check_flag); > Universe::heap()->register_nmethod(nm); > } > > Bob. > From vladimir.kozlov at oracle.com Wed Jul 22 18:02:53 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 22 Jul 2020 11:02:53 -0700 Subject: RFR: 8249888: failure to create a libgraal JavaVM should result in a VM crash In-Reply-To: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> References: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> Message-ID: <09a740fa-73e8-5e6b-cc92-db382130c486@oracle.com> Looks good. Thanks, Vladimir On 7/22/20 8:53 AM, Doug Simon wrote: > Please review this enhancement which improves the debuggability of certain libgraal isolate creation issues. > > BUG: > https://bugs.openjdk.java.net/browse/JDK-8249888 > > PATCH: > > diff -r 8995e9efdee7 src/hotspot/share/jvmci/jvmciRuntime.cpp > --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:08:01 2020 +0200 > +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:41:06 2020 +0200 > @@ -800,7 +800,7 @@ > > JNI_CreateJavaVM = CAST_TO_FN_PTR(JNI_CreateJavaVM_t, os::dll_lookup(sl_handle, "JNI_CreateJavaVM")); > if (JNI_CreateJavaVM == NULL) { > - vm_exit_during_initialization("Unable to find JNI_CreateJavaVM", sl_path); > + fatal("Unable to find JNI_CreateJavaVM in %s", sl_path); > } > > ResourceMark rm; > @@ -835,7 +835,7 @@ > JVMCI_event_1("created JavaVM[%ld]@" PTR_FORMAT " for JVMCI runtime %d", javaVM_id, p2i(javaVM), _id); > return env; > } else { > - vm_exit_during_initialization(err_msg("JNI_CreateJavaVM failed with return value %d", result), sl_path); > + fatal("JNI_CreateJavaVM failed with return value %d", result); > } > } > return NULL; > > -Doug > From tom.rodriguez at oracle.com Wed Jul 22 18:04:42 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Wed, 22 Jul 2020 11:04:42 -0700 Subject: RFR: 8249888: failure to create a libgraal JavaVM should result in a VM crash In-Reply-To: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> References: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> Message-ID: <62657817-db4d-8887-4d90-9aedd530db78@oracle.com> Looks good. tom Doug Simon wrote on 7/22/20 8:53 AM: > Please review this enhancement which improves the debuggability of certain libgraal isolate creation issues. > > BUG: > https://bugs.openjdk.java.net/browse/JDK-8249888 > > PATCH: > > diff -r 8995e9efdee7 src/hotspot/share/jvmci/jvmciRuntime.cpp > --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:08:01 2020 +0200 > +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:41:06 2020 +0200 > @@ -800,7 +800,7 @@ > > JNI_CreateJavaVM = CAST_TO_FN_PTR(JNI_CreateJavaVM_t, os::dll_lookup(sl_handle, "JNI_CreateJavaVM")); > if (JNI_CreateJavaVM == NULL) { > - vm_exit_during_initialization("Unable to find JNI_CreateJavaVM", sl_path); > + fatal("Unable to find JNI_CreateJavaVM in %s", sl_path); > } > > ResourceMark rm; > @@ -835,7 +835,7 @@ > JVMCI_event_1("created JavaVM[%ld]@" PTR_FORMAT " for JVMCI runtime %d", javaVM_id, p2i(javaVM), _id); > return env; > } else { > - vm_exit_during_initialization(err_msg("JNI_CreateJavaVM failed with return value %d", result), sl_path); > + fatal("JNI_CreateJavaVM failed with return value %d", result); > } > } > return NULL; > > -Doug > From richard.reingruber at sap.com Wed Jul 22 20:18:23 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 22 Jul 2020 20:18:23 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, > > I'll answer to the obvious things in this mail now. > > I'll go through the code thoroughly again and write > > a review of my findings thereafter. > As promised a detailed walk-throug, but without any major findings: > c1_IR.hpp: ok > ci_Env.h|cpp: ok > compiledMethod.cpp, nmethod.cpp: ok > debugInfoRec.h|cpp: ok > scopeDesc.h|cpp ok > compileBroker.h|cpp: > Maybe a bit of documentation how and why you start > the threads? I had expected there are two test > scenarios run after each other, but now I understand 'Single' > and 'All' run simultaneously. Well, this really is a stress test! > Also good the two variants of depotimization are > stressed against each other. > Besides that really nice it's all in one place. Done. > rootResolver.cpp: ok > jvmciCodeInstaller.cpp: ok > c2compiler.cpp: The essence of this change! Just one line :) > Great! :) > callnode.hpp ok > escape.h|cpp ok > macro.cpp > I was not that happy with the names saying not_global_escape > and similar. I now agreed you have to use the terms of the escape > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with > the 'not' in the term, I always try to expand the name to some > sentence with a negated verb, but it makes no sense. > For example, "has_not_global_escape_in_scope" expands to > "Hasn't a global escape in its scope." in my thinking, which makes > no sense. You probably mean > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} > in its scope." > C2 is using the word "non" in this context, e.g., here > alloc->is_non_escaping. There is also ConnectionGraph::not_global_escape() > non obviously negates the adjective 'global', > non-global or nonglobal even is a English term I find in the > net. > So what about "has_non_global_escape_in_scope?" And what about has_ea_local_in_scope? > matcher.cpp ok > output.cpp:1071 > Please break the long line. Done. > jvmtiCodeBlobEvents.cpp ok > jvmtiEnv.cpp > MaxJavaStackTraceDepth is only documented to affect > the exceptions stack trace depth, not to limit jvmti > operations. Therefore I wondered why it is used here. > Non of your business, but the flag should > document this in globals.hpp, too. > Does jvmti specify that the same limits are used ...? > ok on your side. I don't know and didn't find anything in a quick search. > jvmtiEnvBase.cpp ok > jvmtiImpl.h|cpp ok > jvmtiTagMap.cpp ok > whitebox.cpp ok > deoptimization.cpp > line 177: Please break line > line 246, 281: Please break line > 1578, 1583, 1589, 1632, 1649, 1651 Break line > 1651: You use 'non'-terms, too: non-escaping :) I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..." sounds better (hopefully not only to my german ears). > 2805, 2929, 2946ff, break lines > deoptimization.hpp > 158, 174, 176 ... I would break lines too, but here you are in > good company :) Done. > globals.hpp ok > mutexLocker.h|cpp ok > objectMonitor.cpp ok > thread.cpp > 2631 typo: sapfepont --> safepoint Done. > thread.hpp ok > thread.inline.hpp ok > vframe.cpp ok > vframe_hp.cpp 458ff break lines > vframe_hp.hpp ok > macros.hpp ok > TEST.ROOT ok > WhiteBox.java ok > IterateHeapWithEscapeAnalysisEnabled.java > line 415: > msg("wait until target thread has set testMethod_result"); > while (testMethod_result == 0) { > Thread.sleep(50); > } > Might the test run into timeouts at this place? > The field is volatile, i.e. it will be reloaded > in each iteration. But will dontinline_testMethod > write it back to main memory in time? You mean, the test could hang in that loop for a couple of minutes? I don't think so. There are cache coherence protocols in place which will invalidate stale data very timely. > libIterateHeapWithEscapeAnalysisEnabled.c ok > EATests.java > This is a very elaborate test. > I found a row of test cases illustrating issues > we talked about before. Really helpful! > 1311: TypeO materialize -> materialized Found and fix typo at line 1369. (Probably the cursor was on 1311 and your eyes on 1369 ;)) > 1640: setting local variable i triggers always deoptimization > --> setting local variable i always triggers deoptimization Fixed. > 2176: dontinline_calee --> dontinline_callee > 2510: poping --> popping ... but I'm not sure here. Done. > https://www.urbandictionary.com/define.php?term=poping > poping > Drinking large amounts of Dextromethorphan Hydrobromide (DXM)based cough syrup, and then embarking on an adventure while wandering around neighborhoods or parks all night. This is usually done while listening to Punk rock music from a portable jambox. > ;) > Don?t do it! ?? OMG! How come you know?! ;) > EATestsJVMTI.java > I think you can just copy this test description into the other > test. You can have two @test comments, they will be treated > as separate tests. The @requires will be evaluated accordingly. > For an example see > test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java > which has two different compile setups for the test class (-g). Done. > so, that's it for reading code ... > Some general remarks, maybe a bit picky ...: > I think you could use less commas ',' in comments. > As I understand, you need a comma if the relative > sentence is at the beginning, but not if it is at > the end: > If Corona is over, I go to the office. > but > I go to the office if Corona is over. That seem's to be correct except "If Corona is over" isn't a relative sentence but a conditional sentence, isn't it? The general rule seems to be: the subordinate clause is separated with a comma from a following main clause. No comma separation is needed if the subordinate clause follows the main clause. Thanks, that's a lesson I learned! > I think the same holds for 'because', 'while' etc. > E.g., jvmtiEnvBase.cpp:1313, jvmtiImpl.cpp:646ff, > vframe_hp.hpp 104ff Ok. I've removed quite a lot of the occurrances. > Also, I like full sentences in comments. > Especially for me as foreign speaker, this makes > things much more clear. I.e., I try to make it > a real sentence with articles, capitalized and a > dot at the end if there is a subject and a verb > in first place. > E.g., jvmtiEnvBase.cpp:1327 Are you referring to the following? (from http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hotspot/share/prims/jvmtiEnvBase.cpp.frames.html) 1326 1327 // If the frame is a compiled one, need to deoptimize it. 1328 if (vf->is_compiled_frame()) { This line 1327 is preexisting. > In many places, your comments read really > well but some are quite abbreviated I think. Yeah, but not only because I'm lazy... It is the style that I prefer and I think it matches the surrounding code quite well. > E.g. thread.cpp:2601 is an example where a simple > 'a' helps a lot. > "Single deoptimization is typically very short." > I would add 'A': "A single deoptimization is typically very short (fast?)." > An other meaning of the comment I first considered is this: > "Single deoptimization is typically very short, all_threads deoptimization takes longer" > having in mind the functions > EscapeBarries::deoptimize_objects_all_threads() > and > EscapeBarries::deoptimize_objects() doing a single thread. > German with it's compound nouns is helpful here :) > Einzeldeoptimierung <--> eine einzelne Deoptimierung I've added the 'A' and I'll try to use complete sentences in the future. The telegram style has advantages, too, though ;) Thanks! Cheers, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Freitag, 17. Juli 2020 14:31 To: Lindenmaier, Goetz ; Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, > I'll answer to the obvious things in this mail now. > I'll go through the code thoroughly again and write > a review of my findings thereafter. As promised a detailed walk-throug, but without any major findings: c1_IR.hpp: ok ci_Env.h|cpp: ok compiledMethod.cpp, nmethod.cpp: ok debugInfoRec.h|cpp: ok scopeDesc.h|cpp ok compileBroker.h|cpp: Maybe a bit of documentation how and why you start the threads? I had expected there are two test scenarios run after each other, but now I understand 'Single' and 'All' run simultaneously. Well, this really is a stress test! Also good the two variants of depotimization are stressed against each other. Besides that really nice it's all in one place. rootResolver.cpp: ok jvmciCodeInstaller.cpp: ok c2compiler.cpp: The essence of this change! Just one line :) Great! callnode.hpp ok escape.h|cpp ok macro.cpp I was not that happy with the names saying not_global_escape and similar. I now agreed you have to use the terms of the escape analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with the 'not' in the term, I always try to expand the name to some sentence with a negated verb, but it makes no sense. For example, "has_not_global_escape_in_scope" expands to "Hasn't a global escape in its scope." in my thinking, which makes no sense. You probably mean "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} in its scope." C2 is using the word "non" in this context, e.g., here alloc->is_non_escaping. non obviously negates the adjective 'global', non-global or nonglobal even is a English term I find in the net. So what about "has_non_global_escape_in_scope?" matcher.cpp ok output.cpp:1071 Please break the long line. jvmtiCodeBlobEvents.cpp ok jvmtiEnv.cpp MaxJavaStackTraceDepth is only documented to affect the exceptions stack trace depth, not to limit jvmti operations. Therefore I wondered why it is used here. Non of your business, but the flag should document this in globals.hpp, too. Does jvmti specify that the same limits are used ...? ok on your side. jvmtiEnvBase.cpp ok jvmtiImpl.h|cpp ok jvmtiTagMap.cpp ok whitebox.cpp ok deoptimization.cpp line 177: Please break line line 246, 281: Please break line 1578, 1583, 1589, 1632, 1649, 1651 Break line 1651: You use 'non'-terms, too: non-escaping :) 2805, 2929, 2946ff, break lines deoptimization.hpp 158, 174, 176 ... I would break lines too, but here you are in good company :) globals.hpp ok mutexLocker.h|cpp ok objectMonitor.cpp ok thread.cpp 2631 typo: sapfepont --> safepoint thread.hpp ok thread.inline.hpp ok vframe.cpp ok vframe_hp.cpp 458ff break lines vframe_hp.hpp ok macros.hpp ok TEST.ROOT ok WhiteBox.java ok IterateHeapWithEscapeAnalysisEnabled.java line 415: msg("wait until target thread has set testMethod_result"); while (testMethod_result == 0) { Thread.sleep(50); } Might the test run into timeouts at this place? The field is volatile, i.e. it will be reloaded in each iteration. But will dontinline_testMethod write it back to main memory in time? libIterateHeapWithEscapeAnalysisEnabled.c ok EATests.java This is a very elaborate test. I found a row of test cases illustrating issues we talked about before. Really helpful! 1311: TypeO materialize -> materialized 1640: setting local variable i triggers always deoptimization --> setting local variable i always triggers deoptimization 2176: dontinline_calee --> dontinline_callee 2510: poping --> popping ... but I'm not sure here. https://www.urbandictionary.com/define.php?term=poping poping Drinking large amounts of Dextromethorphan Hydrobromide (DXM)based cough syrup, and then embarking on an adventure while wandering around neighborhoods or parks all night. This is usually done while listening to Punk rock music from a portable jambox. ;) Don?t do it! ?? EATestsJVMTI.java I think you can just copy this test description into the other test. You can have two @test comments, they will be treated as separate tests. The @requires will be evaluated accordingly. For an example see test/hotspot/jtreg/runtime/exceptionMsgs/NullPointerException/NullPointerExceptionTest.java which has two different compile setups for the test class (-g). so, that's it for reading code ... Some general remarks, maybe a bit picky ...: I think you could use less commas ',' in comments. As I understand, you need a comma if the relative sentence is at the beginning, but not if it is at the end: If Corona is over, I go to the office. but I go to the office if Corona is over. I think the same holds for 'because', 'while' etc. E.g., jvmtiEnvBase.cpp:1313, jvmtiImpl.cpp:646ff, vframe_hp.hpp 104ff Also, I like full sentences in comments. Especially for me as foreign speaker, this makes things much more clear. I.e., I try to make it a real sentence with articles, capitalized and a dot at the end if there is a subject and a verb in first place. E.g., jvmtiEnvBase.cpp:1327 In many places, your comments read really well but some are quite abbreviated I think. E.g. thread.cpp:2601 is an example where a simple 'a' helps a lot. "Single deoptimization is typically very short." I would add 'A': "A single deoptimization is typically very short (fast?)." An other meaning of the comment I first considered is this: "Single deoptimization is typically very short, all_threads deoptimization takes longer" having in mind the functions EscapeBarries::deoptimize_objects_all_threads() and EscapeBarries::deoptimize_objects() doing a single thread. German with it's compound nouns is helpful here :) Einzeldeoptimierung <--> eine einzelne Deoptimierung Best regards, Goetz. From richard.reingruber at sap.com Wed Jul 22 20:53:19 2020 From: richard.reingruber at sap.com (Reingruber, Richard) Date: Wed, 22 Jul 2020 20:53:19 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Goetz, > Thanks for the quick reply. Yes, this time it didn't take that long... [... snip ...] > > > > > I understand you annotate at safepoints where the escape analysis > > > > > finds out that an object is "better" than global escape. > > > > > This are the cases where the analysis identifies optimization > > > > > opportunities. These annotations are then used to deoptimize > > > > > frames and the objects referenced by them. > > > > > Doesn't this overestimate the optimized > > > > > objects? E.g., eliminate_alloc_node has many cases where it bails > > > > > out. > > > > > > > > Yes, the implementation is conservative, but it is comparatively simple > > and > > > > the additional debug > > > > info is just 2 flags per safepoint. > > > Thanks. It also helped that you explained to me offline that > > > there are more optimizations than only lock elimination and scalar > > > replacement done based on the ea information. > > > The ea refines the IR graph with allows follow up optimizations > > > which can not easily be tracked back to the escaping objects or > > > the call sites where they do not escape. > > > Thus, if there are non-global escaping objects, you have to > > > deoptimize the frame. > > > Did I repeat that correctly? > > > > Mostly, but there are also cases where deoptimization is required if and only > > if ea-local objects > > are passed as arguments. This is the case when values are not read directly > > from a frame, but from a callee frame. > Hmm, don't get this completely, but ok. Let C be a callee frame of B which is a callee of A. If you use JVMTI to read an object reference from a local variable of C then the implementation of JDK-8227745 deoptimizes A if it passes any ea-local as argument, because the reference could be ea-local in A and there might be optimizations that are invalid after the escape state change. > > > > Accesses to instance > > > > members or array elements can be optimized as well. > > > You mean the compiler can/will ignore volatile or memory ordering > > > requirements for non-escaping objects? Sounds reasonable to do. > > > > Yes, for instance. Also without volatile modifiers it will eliminate accesses. > > Here is an example: > > Method A has a NoEscape allocation O that is not scalar replaced. A calls > > Method B, which is not > > inlined. When you use your debugger to break in B, then modify a field of O, > > then this modification > > would have no effect without deoptimization, because the jit assumes that B > > cannot modify O without > > a reference to it. > Yes, A can keep O in a register, while the JVMTI thread would write to > the location in the stack where the local is held (if it was written back). Not quite. It is the value of the field of O that is in a register not the reference to O itself. The agent changes the field's value in the /java heap/ (remember: O is _not_ scalar replaced), but the fields value is not reloaded after return from B. > > > > > Syncronization: looks good. I think others had a look at this before. > > > > > > > > > > EscapeBarrier::deoptimize_objects_internal() > > > > > The method name is misleading, it is not used by > > > > > deoptimize_objects(). > > > > > Also, method with the same name is in Deopitmization. > > > > > Proposal: deoptimize_objects_thread() ? > > > > > > > > Sorry, but I don't see, why it would be misleading. > > > > What would be the meaning of 'deoptimize_objects_thread'? I don't > > > > understand that name. > > > 1. I have no idea why it's called "_internal". Because it is private? > > > By the name, I would expect that EscapeBarrier::deoptimize_objects() > > > calls it for some internal tasks. But it does not. > > > > Well, I'd say it is pretty internal, what's happening in that method. So IMHO > > the suffix _internal > > is a match. > > > > > 2. My proposal: deoptimize_objects_all_threads() iterates all threads > > > and calls deoptimize_objects(_one)_thread(thread) for each of these. > > > That's how I would have named it. > > > But no bike shedding, if you don't see what I mean it's not obvious. > > Ok. We could have a quick call, too, if you like. > Ok, I think I have understood the remaining points. I'm fine with this > so far. Thanks again and best regards, Richard. -----Original Message----- From: Lindenmaier, Goetz Sent: Mittwoch, 22. Juli 2020 18:22 To: Reingruber, Richard ; serviceability-dev at openjdk.java.net; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: RE: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents Hi Richard, Thanks for the quick reply. > > > With DeoptimizeObjectsALot enabled internal threads are started that > > > deoptimize frames and > > > objects. The number of threads started are given with > > > DeoptimizeObjectsALotThreadCountAll and > > > DeoptimizeObjectsALotThreadCountSingle. The former targets all > existing > > > threads whereas the > > > latter operates on a single thread selected round robin. > > > > > > I removed the mode where deoptimizations were performed at every nth > > > exit from the runtime. I never used it. > > > Do I get it right? You have a n:1 and a n:all test scenario. > > n:1: n threads deoptimize 1 Jana thread where n => DOALThreadCountSingle > > n:m: n threads deoptimize all Java threads where n = DOALThreadCountAll? > > Not quite. > > -XX:+DeoptimizeObjectsALot // required > -XX:DeoptimizeObjectsALotThreadCountAll=m > -XX:DeoptimizeObjectsALotThreadCountSingle=n > > Will start m+n threads. Each operating on all existing JavaThreads using > EscapeBarriers. The > difference between the 2 thread types is that one distinct EscapeBarrier > targets either just a > single thread or all exisitng threads at onece. If just one single thread is > targeted per > EscapeBarrier, then it is not always the same thread, but threads are selected > round robin. So there > will be n threads selecting independently single threads round robin per > EscapeBarrier and m threads > that target all threads in every EscapeBarrier. Ok, yes, that is how I understood it. > > > * EscapeBarrier::sync_and_suspend_one(): use a direct handshake and > > > execute it always independently > > > of is_thread_fully_suspended(). > > Is this also a performance optimization? > > Maybe a minor one. OK > > > * JavaThread::wait_for_object_deoptimization(): > > > - Bugfix: the last check of is_obj_deopt_suspend() must be /after/ the > > > safepoint check! This > > > caused issues with not walkable stacks with DeoptimizeObjectsALot. > > OK. As I understand, there was one safepoint check in the old version, > > now there is one in each iteration. I assume this is intended, right? > > Yes it is. The important thing here is (A) a safepoint check is needed /after/ > leaving a safe state > (_thread_in_native, _thread_blocked). (B) Shared variables that are modified > at safepoints or with handshakes need to be reread /after/ the safepoint check. > > BTW: I only noticed now that since JDK-8240918 JavaThreads themselves > must disarm their polling > page. Originally (before handshakes) this was done by the VM thread. With > handshakes it was done by > the thread executing the handshake op. This was changed for > OrderAccess::cross_modify_fence() where > the poll is left armed if the thread is in native and sice JDK-8240918 it is > always left armed. So > when a thread leaves a safe state (native, blocked) and there was a > handshake/vm op, it will always > call SafepointMechanism::block_if_requested_slow(), even if the > handshake/vm operation have been > processed already and everybody else is happyly executing bytecodes :) Ok. > Still (A) and (B) hold. > > > - Added limited spinning inspired by HandshakeSpinYield to fix regression in > > > microbenchmark [1] > > Ok. Nice improvement, nice catch! > > Yes. It certainly took some time to find out. > > > > > > > I refer to some more changes answering your questions and comments > inline > > > below. > > > > > > Thanks, > > > Richard. > > > > > > [1] Microbenchmark: > > > > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6.microbe > nchmark/ > > > > > > > > > I understand you annotate at safepoints where the escape analysis > > > > finds out that an object is "better" than global escape. > > > > This are the cases where the analysis identifies optimization > > > > opportunities. These annotations are then used to deoptimize > > > > frames and the objects referenced by them. > > > > Doesn't this overestimate the optimized > > > > objects? E.g., eliminate_alloc_node has many cases where it bails > > > > out. > > > > > > Yes, the implementation is conservative, but it is comparatively simple > and > > > the additional debug > > > info is just 2 flags per safepoint. > > Thanks. It also helped that you explained to me offline that > > there are more optimizations than only lock elimination and scalar > > replacement done based on the ea information. > > The ea refines the IR graph with allows follow up optimizations > > which can not easily be tracked back to the escaping objects or > > the call sites where they do not escape. > > Thus, if there are non-global escaping objects, you have to > > deoptimize the frame. > > Did I repeat that correctly? > > Mostly, but there are also cases where deoptimization is required if and only > if ea-local objects > are passed as arguments. This is the case when values are not read directly > from a frame, but from a callee frame. Hmm, don't get this completely, but ok. > > > Accesses to instance > > > members or array elements can be optimized as well. > > You mean the compiler can/will ignore volatile or memory ordering > > requirements for non-escaping objects? Sounds reasonable to do. > > Yes, for instance. Also without volatile modifiers it will eliminate accesses. > Here is an example: > Method A has a NoEscape allocation O that is not scalar replaced. A calls > Method B, which is not > inlined. When you use your debugger to break in B, then modify a field of O, > then this modification > would have no effect without deoptimization, because the jit assumes that B > cannot modify O without > a reference to it. Yes, A can keep O in a register, while the JVMTI thread would write to the location in the stack where the local is held (if it was written back). > > > > Syncronization: looks good. I think others had a look at this before. > > > > > > > > EscapeBarrier::deoptimize_objects_internal() > > > > The method name is misleading, it is not used by > > > > deoptimize_objects(). > > > > Also, method with the same name is in Deopitmization. > > > > Proposal: deoptimize_objects_thread() ? > > > > > > Sorry, but I don't see, why it would be misleading. > > > What would be the meaning of 'deoptimize_objects_thread'? I don't > > > understand that name. > > 1. I have no idea why it's called "_internal". Because it is private? > > By the name, I would expect that EscapeBarrier::deoptimize_objects() > > calls it for some internal tasks. But it does not. > > Well, I'd say it is pretty internal, what's happening in that method. So IMHO > the suffix _internal > is a match. > > > 2. My proposal: deoptimize_objects_all_threads() iterates all threads > > and calls deoptimize_objects(_one)_thread(thread) for each of these. > > That's how I would have named it. > > But no bike shedding, if you don't see what I mean it's not obvious. > Ok. We could have a quick call, too, if you like. Ok, I think I have understood the remaining points. I'm fine with this so far. Thanks, Goetz. From doug.simon at oracle.com Wed Jul 22 20:56:47 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 22 Jul 2020 22:56:47 +0200 Subject: RFR: 8249888: failure to create a libgraal JavaVM should result in a VM crash In-Reply-To: <09a740fa-73e8-5e6b-cc92-db382130c486@oracle.com> References: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> <09a740fa-73e8-5e6b-cc92-db382130c486@oracle.com> Message-ID: <439CC977-9905-4D3E-9FC1-F345D0785F37@oracle.com> Thanks Vladimir. -Doug > On 22 Jul 2020, at 20:02, Vladimir Kozlov wrote: > > Looks good. > > Thanks, > Vladimir > > On 7/22/20 8:53 AM, Doug Simon wrote: >> Please review this enhancement which improves the debuggability of certain libgraal isolate creation issues. >> BUG: >> https://bugs.openjdk.java.net/browse/JDK-8249888 >> PATCH: >> diff -r 8995e9efdee7 src/hotspot/share/jvmci/jvmciRuntime.cpp >> --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:08:01 2020 +0200 >> +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:41:06 2020 +0200 >> @@ -800,7 +800,7 @@ >> JNI_CreateJavaVM = CAST_TO_FN_PTR(JNI_CreateJavaVM_t, os::dll_lookup(sl_handle, "JNI_CreateJavaVM")); >> if (JNI_CreateJavaVM == NULL) { >> - vm_exit_during_initialization("Unable to find JNI_CreateJavaVM", sl_path); >> + fatal("Unable to find JNI_CreateJavaVM in %s", sl_path); >> } >> ResourceMark rm; >> @@ -835,7 +835,7 @@ >> JVMCI_event_1("created JavaVM[%ld]@" PTR_FORMAT " for JVMCI runtime %d", javaVM_id, p2i(javaVM), _id); >> return env; >> } else { >> - vm_exit_during_initialization(err_msg("JNI_CreateJavaVM failed with return value %d", result), sl_path); >> + fatal("JNI_CreateJavaVM failed with return value %d", result); >> } >> } >> return NULL; >> -Doug From doug.simon at oracle.com Wed Jul 22 20:56:57 2020 From: doug.simon at oracle.com (Doug Simon) Date: Wed, 22 Jul 2020 22:56:57 +0200 Subject: RFR: 8249888: failure to create a libgraal JavaVM should result in a VM crash In-Reply-To: <62657817-db4d-8887-4d90-9aedd530db78@oracle.com> References: <4ED30258-0888-4D6B-867E-6CC5DB4159E4@oracle.com> <62657817-db4d-8887-4d90-9aedd530db78@oracle.com> Message-ID: <8B5E6F86-B1DD-4DE6-AF9F-3A110F152997@oracle.com> Thanks Tom. -Doug > On 22 Jul 2020, at 20:04, Tom Rodriguez wrote: > > Looks good. > > tom > > Doug Simon wrote on 7/22/20 8:53 AM: >> Please review this enhancement which improves the debuggability of certain libgraal isolate creation issues. >> BUG: >> https://bugs.openjdk.java.net/browse/JDK-8249888 >> PATCH: >> diff -r 8995e9efdee7 src/hotspot/share/jvmci/jvmciRuntime.cpp >> --- a/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:08:01 2020 +0200 >> +++ b/src/hotspot/share/jvmci/jvmciRuntime.cpp Wed Jul 22 17:41:06 2020 +0200 >> @@ -800,7 +800,7 @@ >> JNI_CreateJavaVM = CAST_TO_FN_PTR(JNI_CreateJavaVM_t, os::dll_lookup(sl_handle, "JNI_CreateJavaVM")); >> if (JNI_CreateJavaVM == NULL) { >> - vm_exit_during_initialization("Unable to find JNI_CreateJavaVM", sl_path); >> + fatal("Unable to find JNI_CreateJavaVM in %s", sl_path); >> } >> ResourceMark rm; >> @@ -835,7 +835,7 @@ >> JVMCI_event_1("created JavaVM[%ld]@" PTR_FORMAT " for JVMCI runtime %d", javaVM_id, p2i(javaVM), _id); >> return env; >> } else { >> - vm_exit_during_initialization(err_msg("JNI_CreateJavaVM failed with return value %d", result), sl_path); >> + fatal("JNI_CreateJavaVM failed with return value %d", result); >> } >> } >> return NULL; >> -Doug From vladimir.x.ivanov at oracle.com Wed Jul 22 21:36:52 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 23 Jul 2020 00:36:52 +0300 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> Message-ID: <8c05d468-8753-b671-e3a9-92a7148f4f14@oracle.com> > http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ FTR there's one more aarch64-specific change in shared code to enable aarch64_neon.ad processing: diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk b/make/hotspot/gensrc/GensrcAdlc.gmk --- a/make/hotspot/gensrc/GensrcAdlc.gmk +++ b/make/hotspot/gensrc/GensrcAdlc.gmk @@ -129,6 +129,12 @@ $d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad \ ))) + ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64) + AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \ + ))) + endif + ifeq ($(call check-jvm-feature, shenandoahgc), true) AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, $(AD_SRC_ROOTS), \ $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad \ Best regards, Vladimir Ivanov > On 7/8/20 3:05 PM, Yang Zhang wrote: >> Hi Andrew >> >> I have updated this patch. Could you please help to review it again? >> In this patch, the following changes are made: >> 1. Separate newly added NEON instructions to a new ad file >> ??? aarch64_neon.ad >> 2. Add assembler tests for NEON instructions. Trailing spaces >> ??? in the python script are also removed. >> >> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ >> >> >> Thanks, >> Yang >> >> >> -----Original Message----- >> From: Andrew Haley >> Sent: Tuesday, June 30, 2020 12:10 AM >> To: Yang Zhang ; Viswanathan, Sandhya >> ; Paul Sandoz >> Cc: nd ; hotspot-compiler-dev at openjdk.java.net; >> hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; >> aarch64-port-dev at openjdk.java.net >> Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of >> Vector API (Incubator): AArch64 backend changes >> >> On 29/06/2020 08:48, Yang Zhang wrote: >>> 1. Instructions that can be matched with NEON instructions directly. >>> MulVB, SqrtVF and AbsV have been merged into jdk master already. >>> >>> 2. Instructions that jdk master has middle end support for, but they >>> cannot be matched with NEON instructions directly. >>> Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These >>> new instructions can be moved into jdk master first, but for >>> auto-vectorization, the performance might not get improved. >>> >>> 3. Panama/Vector API specific? instructions such as Load/StoreVector >>> ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend >>> etc. >>> These instructions cannot be moved into jdk master first because >>> there isn't middle-end support. >>> >>> I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also >>> update aarch64_asmtest.py and macroassemler.cpp. When the patch is >>> ready, I will send it again. >> >> Thank you *very* much for your hard work. Appreciated! >> >> -- >> Andrew Haley? (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. https://keybase.io/andrewhaley >> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >> > From tobias.hartmann at oracle.com Thu Jul 23 09:13:41 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Thu, 23 Jul 2020 11:13:41 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1595401959932.33284@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> Message-ID: On 22.07.20 09:12, Liu, Xin wrote: > 1. I move the validation logic for compiler directives to compilerOracle::scan_flag_and_value. > If something wrong happens in parser, the patch will "gracefully" quit JVM using jvm_exit(1). is that okay? With "piggy-back on the error mechanism" I meant that you should use the existing bailout mechanism in the parser. In this case, couldn't you simply put the error message in 'errorbuf' and let the caller take care of handling it? Best regards, Tobias From boris.ulasevich at bell-sw.com Thu Jul 23 11:25:00 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Thu, 23 Jul 2020 14:25:00 +0300 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> Message-ID: <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com> Hi Andrew, Since the JDK-8248414 patch has been committed, I believe we can revive this review. I think it is still better to move my rule to the ubfiz command group, which is in the auto-generated area. http://cr.openjdk.java.net/~bulasevich/8248870/webrev.02 regards, Boris On 09.07.2020 19:20, Boris Ulasevich wrote: > Hi Andrew, > > Ok, let us proceed after 8248414. > > Meanwhile, I moved the change out of do-not-edit scope, thanks: > http://cr.openjdk.java.net/~bulasevich/8248870/webrev.01 > > regards, > Boris > > On 08.07.2020 12:46, Andrew Haley wrote: >> On 07/07/2020 16:47, Boris Ulasevich wrote: >>> Please review the change to skip i2l conversion after the mask: >>> >>> http://cr.openjdk.java.net/~bulasevich/8248870/webrev.00 >>> http://bugs.openjdk.java.net/browse/JDK-8248870 >> You seem to have inserted this between the DO NOT EDIT THIS SECTION >> markers. >> >> Please hold off this change until I've committed the patch for >> 8248414. >> > From ningsheng.jian at arm.com Thu Jul 23 08:02:47 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 23 Jul 2020 16:02:47 +0800 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <8c05d468-8753-b671-e3a9-92a7148f4f14@oracle.com> References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> <8c05d468-8753-b671-e3a9-92a7148f4f14@oracle.com> Message-ID: Hi Vladimir, Thanks for pointing out this. Yes, I missed that change in shared code. I've regenerated the webrev, with GensrcAdlc.gmk file change included: http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ Also add build-dev. Thanks, Ningsheng On 7/23/20 5:36 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ > > > FTR there's one more aarch64-specific change in shared code to enable > aarch64_neon.ad processing: > > diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk > b/make/hotspot/gensrc/GensrcAdlc.gmk > --- a/make/hotspot/gensrc/GensrcAdlc.gmk > +++ b/make/hotspot/gensrc/GensrcAdlc.gmk > @@ -129,6 +129,12 @@ > > $d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad > \ > ???? ))) > > +? ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64) > +??? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, > $(AD_SRC_ROOTS), \ > + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \ > +??? ))) > +? endif > + > ?? ifeq ($(call check-jvm-feature, shenandoahgc), true) > ???? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, > $(AD_SRC_ROOTS), \ > > $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad > \ > > Best regards, > Vladimir Ivanov > >> On 7/8/20 3:05 PM, Yang Zhang wrote: >>> Hi Andrew >>> >>> I have updated this patch. Could you please help to review it again? >>> In this patch, the following changes are made: >>> 1. Separate newly added NEON instructions to a new ad file >>> ??? aarch64_neon.ad >>> 2. Add assembler tests for NEON instructions. Trailing spaces >>> ??? in the python script are also removed. >>> >>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ >>> >>> >>> Thanks, >>> Yang >>> >>> >>> -----Original Message----- >>> From: Andrew Haley >>> Sent: Tuesday, June 30, 2020 12:10 AM >>> To: Yang Zhang ; Viswanathan, Sandhya >>> ; Paul Sandoz >>> Cc: nd ; hotspot-compiler-dev at openjdk.java.net; >>> hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; >>> aarch64-port-dev at openjdk.java.net >>> Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of >>> Vector API (Incubator): AArch64 backend changes >>> >>> On 29/06/2020 08:48, Yang Zhang wrote: >>>> 1. Instructions that can be matched with NEON instructions directly. >>>> MulVB, SqrtVF and AbsV have been merged into jdk master already. >>>> >>>> 2. Instructions that jdk master has middle end support for, but they >>>> cannot be matched with NEON instructions directly. >>>> Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These >>>> new instructions can be moved into jdk master first, but for >>>> auto-vectorization, the performance might not get improved. >>>> >>>> 3. Panama/Vector API specific? instructions such as Load/StoreVector >>>> ( 16 bits), VectorReinterpret, VectorMaskCmp, MaxV/MinV, VectorBlend >>>> etc. >>>> These instructions cannot be moved into jdk master first because >>>> there isn't middle-end support. >>>> >>>> I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also >>>> update aarch64_asmtest.py and macroassemler.cpp. When the patch is >>>> ready, I will send it again. >>> >>> Thank you *very* much for your hard work. Appreciated! >>> >>> -- >>> Andrew Haley? (he/him) >>> Java Platform Lead Engineer >>> Red Hat UK Ltd. https://keybase.io/andrewhaley >>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>> >> From goetz.lindenmaier at sap.com Thu Jul 23 14:19:57 2020 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Thu, 23 Jul 2020 14:19:57 +0000 Subject: RFR(L) 8227745: Enable Escape Analysis for Better Performance in the Presence of JVMTI Agents In-Reply-To: References: <1f8a3c7a-fa0f-b5b2-4a8a-7d3d8dbbe1b5@oracle.com> <4b56a45c-a14c-6f74-2bfd-25deaabe8201@oracle.com> <5271429a-481d-ddb9-99dc-b3f6670fcc0b@oracle.com> Message-ID: Hi Richard, Thanks for your two further explanations in the other thread. That made the points clear to me. > > I was not that happy with the names saying not_global_escape > > and similar. I now agreed you have to use the terms of the escape > > analysis (NoEscape ArgEscape= throughout the runtime code. I'm still not happy with > > the 'not' in the term, I always try to expand the name to some > > sentence with a negated verb, but it makes no sense. > > For example, "has_not_global_escape_in_scope" expands to > > "Hasn't a global escape in its scope." in my thinking, which makes > > no sense. You probably mean > > "Has not-global escape in its scope." or "Has {ArgEscape|NoEscape} > > in its scope." > > > C2 is using the word "non" in this context, e.g., here > > alloc->is_non_escaping. > > There is also ConnectionGraph::not_global_escape() That talks about a single node that represents a single Object. An object has a single state wrt. ea. You use the term for safepoint which tracks a set of objects. Here, has_not_global_excape can mean 1. None of the several objects does escape globaly. 2. There is at least one object that escapes globaly. > > non obviously negates the adjective 'global', > > non-global or nonglobal even is a English term I find in the > > net. > > So what about "has_non_global_escape_in_scope?" > > And what about has_ea_local_in_scope? That's good. Please document somewhere that Ea_local == ArgEscape | NoEscape. That's what it is, right? > > Does jvmti specify that the same limits are used ...? > > ok on your side. > > I don't know and didn't find anything in a quick search. Ok, not your business. > > > jvmtiEnvBase.cpp ok > > jvmtiImpl.h|cpp ok > > jvmtiTagMap.cpp ok > > whitebox.cpp ok > > > deoptimization.cpp > > > line 177: Please break line > > line 246, 281: Please break line > > 1578, 1583, 1589, 1632, 1649, 1651 Break line > > > 1651: You use 'non'-terms, too: non-escaping :) > > I know :) At least here it is wrong I'd say. "...has to be a not escaping obj..." > sounds better > (hopefully not only to my german ears). I thought the term non-escpaing makes it quite clear. I just wanted to point out that using non above would be similar to the wording here. > > IterateHeapWithEscapeAnalysisEnabled.java > > > line 415: > > msg("wait until target thread has set testMethod_result"); > > while (testMethod_result == 0) { > > Thread.sleep(50); > > } > > Might the test run into timeouts at this place? > > The field is volatile, i.e. it will be reloaded > > in each iteration. But will dontinline_testMethod > > write it back to main memory in time? > > You mean, the test could hang in that loop for a couple of minutes? I don't > think so. There are cache coherence protocols in place which will invalidate > stale data very timely. Ok, anyways, it would only be a hanging test. > > Ok. I've removed quite a lot of the occurrances. > > > Also, I like full sentences in comments. > > Especially for me as foreign speaker, this makes > > things much more clear. I.e., I try to make it > > a real sentence with articles, capitalized and a > > dot at the end if there is a subject and a verb > > in first place. > > E.g., jvmtiEnvBase.cpp:1327 > > Are you referring to the following? > (from > http://cr.openjdk.java.net/~rrich/webrevs/2019/8227745/webrev.6/src/hots > pot/share/prims/jvmtiEnvBase.cpp.frames.html) > > 1326 > 1327 // If the frame is a compiled one, need to deoptimize it. > 1328 if (vf->is_compiled_frame()) { > > This line 1327 is preexisting. Sorry, wrong line number again. I think I meant 1333 // eagerly reallocate scalar replaced objects. But I must admit, the subject is missing. It's one of these imperative sentences where the subject is left out, which are used throughout documentation. Bad example, but still a correct sentence, so qualifies for punctuation? Best regards, Goetz. From erik.joelsson at oracle.com Thu Jul 23 13:06:22 2020 From: erik.joelsson at oracle.com (Erik Joelsson) Date: Thu, 23 Jul 2020 06:06:22 -0700 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: References: <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> <8c05d468-8753-b671-e3a9-92a7148f4f14@oracle.com> Message-ID: <2bc029fc-2823-18ac-9aa0-1a8edd7f9094@oracle.com> Hello Ningsheng, Build change looks good. /Erik On 2020-07-23 01:02, Ningsheng Jian wrote: > Hi Vladimir, > > Thanks for pointing out this. Yes, I missed that change in shared > code. I've regenerated the webrev, with GensrcAdlc.gmk file change > included: > > http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ > > > Also add build-dev. > > Thanks, > Ningsheng > > On 7/23/20 5:36 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ >> >> >> >> FTR there's one more aarch64-specific change in shared code to enable >> aarch64_neon.ad processing: >> >> diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk >> b/make/hotspot/gensrc/GensrcAdlc.gmk >> --- a/make/hotspot/gensrc/GensrcAdlc.gmk >> +++ b/make/hotspot/gensrc/GensrcAdlc.gmk >> @@ -129,6 +129,12 @@ >> >> $d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad >> \ >> ????? ))) >> >> +? ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64) >> +??? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, >> $(AD_SRC_ROOTS), \ >> + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \ >> +??? ))) >> +? endif >> + >> ??? ifeq ($(call check-jvm-feature, shenandoahgc), true) >> ????? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, >> $(AD_SRC_ROOTS), \ >> >> $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad >> \ >> >> Best regards, >> Vladimir Ivanov >> >>> On 7/8/20 3:05 PM, Yang Zhang wrote: >>>> Hi Andrew >>>> >>>> I have updated this patch. Could you please help to review it again? >>>> In this patch, the following changes are made: >>>> 1. Separate newly added NEON instructions to a new ad file >>>> ??? aarch64_neon.ad >>>> 2. Add assembler tests for NEON instructions. Trailing spaces >>>> ??? in the python script are also removed. >>>> >>>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ >>>> >>>> >>>> Thanks, >>>> Yang >>>> >>>> >>>> -----Original Message----- >>>> From: Andrew Haley >>>> Sent: Tuesday, June 30, 2020 12:10 AM >>>> To: Yang Zhang ; Viswanathan, Sandhya >>>> ; Paul Sandoz >>>> Cc: nd ; hotspot-compiler-dev at openjdk.java.net; >>>> hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; >>>> aarch64-port-dev at openjdk.java.net >>>> Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of >>>> Vector API (Incubator): AArch64 backend changes >>>> >>>> On 29/06/2020 08:48, Yang Zhang wrote: >>>>> 1. Instructions that can be matched with NEON instructions directly. >>>>> MulVB, SqrtVF and AbsV have been merged into jdk master already. >>>>> >>>>> 2. Instructions that jdk master has middle end support for, but >>>>> they cannot be matched with NEON instructions directly. >>>>> Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These >>>>> new instructions can be moved into jdk master first, but for >>>>> auto-vectorization, the performance might not get improved. >>>>> >>>>> 3. Panama/Vector API specific? instructions such as >>>>> Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, >>>>> MaxV/MinV, VectorBlend etc. >>>>> These instructions cannot be moved into jdk master first because >>>>> there isn't middle-end support. >>>>> >>>>> I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also >>>>> update aarch64_asmtest.py and macroassemler.cpp. When the patch is >>>>> ready, I will send it again. >>>> >>>> Thank you *very* much for your hard work. Appreciated! >>>> >>>> -- >>>> Andrew Haley? (he/him) >>>> Java Platform Lead Engineer >>>> Red Hat UK Ltd. >>>> https://keybase.io/andrewhaley >>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>> >>> > From xxinliu at amazon.com Thu Jul 23 16:02:42 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 23 Jul 2020 16:02:42 +0000 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com>, Message-ID: <1595520162373.22868@amazon.com> hi, Tobias, That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. thanks, --lx ________________________________________ From: Tobias Hartmann Sent: Thursday, July 23, 2020 2:13 AM To: Liu, Xin; Nils Eliasson; hotspot-compiler-dev at openjdk.java.net; hotspot-runtime-dev Subject: RE: [EXTERNAL] RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On 22.07.20 09:12, Liu, Xin wrote: > 1. I move the validation logic for compiler directives to compilerOracle::scan_flag_and_value. > If something wrong happens in parser, the patch will "gracefully" quit JVM using jvm_exit(1). is that okay? With "piggy-back on the error mechanism" I meant that you should use the existing bailout mechanism in the parser. In this case, couldn't you simply put the error message in 'errorbuf' and let the caller take care of handling it? Best regards, Tobias From vladimir.x.ivanov at oracle.com Thu Jul 23 21:50:54 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 24 Jul 2020 00:50:54 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Message-ID: Hi Jatin, > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ Much better! Thanks. > Change Summary: > > 1) Unified the handling for scalar rotate operation. All scalar rotate selection patterns are now dependent on newly created RotateLeft/RotateRight nodes. This promotes rotate inferencing. Currently if DAG nodes corresponding to a sub-pattern are shared (have multiple users) then existing complex patterns based on Or/LShiftL/URShift does not get matched and this prevents inferring rotate nodes. Please refer to JIT'ed assembly output with baseline[1] and with patch[2] . We can see that generated code size also went done from 832 byte to 768 bytes. Also this can cause perf degradation if shift-or dependency chain appears inside a hot region. > > 2) Due to enhanced rotate inferencing new patch shows better performance even for legacy targets (non AVX-512). Please refer to the perf result[3] over AVX2 machine for JMH benchmark part of the patch. Very nice! > 3) As suggested, removed Java API intrinsification changes and scalar rotate transformation are done during OrI/OrL node idealizations. Good. (Still would be nice to factor the matching code from Ideal() and share it between multiple use sites. Especially considering OrVNode::Ideal() now does basically the same thing. As an example/idea, take a look at is_bmi_pattern() in x86.ad.) > 4) SLP always gets to work on new scalar Rotate nodes and creates vector rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes if target does not supports vector rotates(non-AVX512). Good. > 5) Added new instruction patterns for vector shift Left/Right operations with constant shift operands. This prevents emitting extra moves to XMM. +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ + match(Set dst (LShiftVI src shift)); I'd prefer to see a uniform Ideal IR shape being used irrespective of whether the argument is a constant or not. It should also simplify the logic in SuperWord and make it easier to support on non-x86 architectures. For example, here's how it is done on AArch64: instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ predicate(n->as_Vector()->length() == 4); match(Set dst (LShiftVI src (LShiftCntV shift))); ... > 6) Constant folding scenarios are covered in RotateLeft/RotateRight idealization, inferencing of vector rotate through OrV idealization covers the vector patterns generated though non SLP route i.e. VectorAPI. I'm fine with keeping OrV::Ideal(), but I'm concerned with the general direction here - duplication of scalar transformations to lane-wise vector operations. It definitely won't scale and in a longer run it risks to diverge. Would be nice to find a way to automatically "lift" scalar transformations to vectors and apply them uniformly. But right now it is just an idea which requires more experimentation. Some other minor comments/suggestions: + // Swap the computed left and right shift counts. + if (is_rotate_left) { + Node* temp = shiftRCnt; + shiftRCnt = shiftLCnt; + shiftLCnt = temp; + } Maybe use swap() here (declared in globalDefinitions.hpp)? + if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) + return true; Please, don't omit curly braces (even for simple cases). -// Rotate Right by variable -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 zero, rFlagsReg cr) +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) %{ - match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero shift)))); - + predicate(!VM_Version::supports_bmi2() && n->bottom_type()->basic_type() == T_INT); + match(Set dst (RotateRight dst shift)); + format %{ "rorl $dst, $shift" %} expand %{ - rorI_rReg_CL(dst, shift, cr); + rorI_rReg_imm8(dst, shift, cr); %} It would be really nice to migrate to MacroAssembler along the way (as a cleanup). > Please push the patch through your testing framework and let me know your review feedback. There's one new assertion failure: # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), pid=5476, tid=6219 # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize should return new nodes, use Identity to return old nodes I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal which can return pre-contructed constants. I suggest to get rid of Ideal() methods and move constant folding logic into Node::Value() (as implemented for other bitwise/arithmethic nodes in addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic approach since it enables richer type information (ranges vs constants) and IMO it's more convenient to work with constants through Types than ConNodes. (I suspect that original/expanded IR shape may already provide more precise type info for non-constant case which can affect the benchmarks.) Best regards, Vladimir Ivanov > > Best Regards, > Jatin > > [1] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm.txt > [2] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_asm.txt > [3] http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_patch.txt > > >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Saturday, July 18, 2020 12:25 AM >> To: Bhateja, Jatin ; Andrew Haley >> Cc: Viswanathan, Sandhya ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 >> >> Hi Jatin, >> >>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >> >> It definitely looks better, but IMO it hasn't reached the sweet spot yet. >> It feels like the focus is on auto-vectorizer while the burden is put on >> scalar cases. >> >> First of all, considering GVN folds relevant operation patterns into a >> single Rotate node now, what's the motivation to introduce intrinsics? >> >> Another point is there's still significant duplication for scalar cases. >> >> I'd prefer to see the legacy cases which rely on pattern matching to go >> away and be substituted with instructions which match Rotate instructions >> (migrating ). >> >> I understand that it will penalize the vectorization implementation, but >> IMO reducing overall complexity is worth it. On auto-vectorizer side, I see >> 2 ways to fix it: >> >> (1) introduce additional AD instructions for RotateLeftV/RotateRightV >> specifically for pre-AVX512 hardware; >> >> (2) in SuperWord::output(), when matcher doesn't support >> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), >> generate vectorized version of the original pattern. >> >> Overall, it looks like more and more focus is made on scalar part. >> Considering the main goal of the patch is to enable vectorization, I'm fine >> with separating cleanup of scalar part. As an interim solution, it seems >> that leaving the scalar part as it is now and matching scalar bit rotate >> pattern in VectorNode::is_rotate() should be enough to keep the >> vectorization part functioning. Then scalar Rotate nodes and relevant >> cleanups can be integrated later. (Or vice versa: clean up scalar part >> first and then follow up with vectorization.) >> >> Some other comments: >> >> * There's a lot of duplication between OrINode::Ideal and OrLNode::Ideal. >> What do you think about introducing a super type >> (OrNode) and put a unified version (OrNode::Ideal) there? >> >> >> * src/hotspot/cpu/x86/x86.ad >> >> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || >> + n->bottom_type()->is_vect()->element_basic_type() == >> +T_LONG); >> >> +instruct vprorate(vec dst, vec src, vec shift) %{ >> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT || >> + n->bottom_type()->is_vect()->element_basic_type() == >> +T_LONG); >> >> The predicates are redundant here. >> >> >> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >> >> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, >> XMMRegister dst, XMMRegister src, >> + int shift, int vector_len) { if >> + (opcode == Op_RotateLeftV) { >> + if (etype == T_INT) { >> + evprold(dst, src, shift, vector_len); >> + } else { >> + evprolq(dst, src, shift, vector_len); >> + } >> >> Please, put an assert for the false case (assert(etype == T_LONG, "...")). >> >> >> * On testing (with previous version of the patch): -XX:UseAVX is x86- >> specific flag, so new/adjusted tests now fail on non-x86 platforms. >> Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions will >> solve the issue. >> >> Best regards, >> Vladimir Ivanov >> >>> >>> >>> Summary of changes: >>> 1) Optimization is specifically targeted to exploit vector rotation >> instruction added for X86 AVX512. A single rotate instruction encapsulates >> entire vector OR/SHIFTs pattern thus offers better latency at reduced >> instruction count. >>> >>> 2) There were two approaches to implement this: >>> a) Let everything remain the same and add new wide complex >> instruction patterns in the matcher for e.g. >>> set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI shift)) >> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate shift)) >>> It would have been an overoptimistic assumption to expect that graph >> shape would be preserved till the matcher for correct inferencing. >>> In addition we would have required multiple such bulky patterns. >>> b) Create new RotateLeft/RotateRight scalar nodes, these gets >> generated during intrinsification as well as during additional pattern >>> matching during node Idealization, later on these nodes are consumed >> by SLP for valid vectorization scenarios to emit their vector >>> counterparts which eventually emits vector rotates. >>> >>> 3) I choose approach 2b) since its cleaner, only problem here was that >>> in non-evex mode (UseAVX < 3) new scalar Rotate nodes should either be >> dismantled back to OR/SHIFT pattern or we penalize the vectorization which >> would be very costly, other option would have been to add additional vector >> rotate pattern for UseAVX=3 in the matcher which emit vector OR-SHIFTs >> instruction but then it will loose on emitting efficient instruction >> sequence which node sharing (OrV/LShiftV/URShift) offer in current >> implementation - thus it will not be beneficial for non-AVX512 targets, >> only saving will be in terms of cleanup of few existing scalar rotate >> matcher patterns, also old targets does not offer this powerful rotate >> instruction. Therefore new scalar nodes are created only for AVX512 >> targets. >>> >>> As per suggestions constant folding scenarios have been covered during >> Idealizations of newly added scalar nodes. >>> >>> Please review the latest version and share your feedback and test >> results. >>> >>> Best Regards, >>> Jatin >>> >>> >>>> -----Original Message----- >>>> From: Andrew Haley >>>> Sent: Saturday, July 11, 2020 2:24 PM >>>> To: Vladimir Ivanov ; Bhateja, Jatin >>>> ; hotspot-compiler-dev at openjdk.java.net >>>> Cc: Viswanathan, Sandhya >>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification for >>>> X86 >>>> >>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>> >>>> > High-level comment: so far, there were no pressing need in > >>>> explicitly marking the methods as intrinsics. ROR/ROL instructions > >>>> were selected during matching [1]. Now the patch introduces > >>>> dedicated nodes >>>> (RotateLeft/RotateRight) specifically for intrinsics > which partly >>>> duplicates existing logic. >>>> >>>> The lack of rotate nodes in the IR has always meant that AArch64 >>>> doesn't generate optimal code for e.g. >>>> >>>> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>> >>>> because, with the RotateLeft expanded to its full combination of ORs >>>> and shifts, it's to complicated to match. At the time I put this to >>>> one side because it wasn't urgent. This is a shame because although >>>> such combinations are unusual they are used in some crypto operations. >>>> >>>> If we can generate immediate-form rotate nodes early by pattern >>>> matching during parsing (rather than depending on intrinsics) we'll >>>> get more value than by depending on programmers calling intrinsics. >>>> >>>> -- >>>> Andrew Haley (he/him) >>>> Java Platform Lead Engineer >>>> Red Hat UK Ltd. >>>> https://keybase.io/andrewhaley >>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>> From nick.gasson at arm.com Fri Jul 24 06:08:52 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 24 Jul 2020 14:08:52 +0800 Subject: RFR(XS): 8249781: AArch64: AOT compiled code crashes if C2 allocates r27 Message-ID: <85eep11m2z.fsf@nicgas01-pc.shanghai.arm.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8249781 Webrev: http://cr.openjdk.java.net/~ngasson/8249781/webrev.0/ AOT compiled code always assumes r27 is the heap base pointer, but since JDK-8242449 C2 can allocate it as a general register if the compressed class base is null. If C2 complied code that uses r27 runs before AOT code, rheapbase will be clobbered causing a crash in the AOT code. To reproduce: make test TEST="compiler/aot/cli/jaotc/CompileModuleTest.java" \ JTREG="VM_OPTIONS=-Xcomp -XX:-TieredCompilation" Fix by checking if AOT is enabled before using r27 as a general register. Tested with jtreg hotspot_all_no_apps and jdk_core. -- Thanks, Nick From rwestrel at redhat.com Fri Jul 24 07:20:38 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 24 Jul 2020 09:20:38 +0200 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 Message-ID: <87zh7pmla1.fsf@redhat.com> Original bug: https://bugs.openjdk.java.net/browse/JDK-8240676 https://hg.openjdk.java.net/jdk/jdk/rev/6ccf082f50d4 The context in compile.hpp changed so the original patch requires a small adjustment. Testing triggered a crash, so I had to cherry-pick the change in type.cpp line 3996 from an RFE that was integrated in a later version of the jdk: 8031755 (Type speculation should be used to optimize explicit null checks). 8u webrev: http://cr.openjdk.java.net/~roland/8240676.8u/webrev.00/ Testing: x86_64, verified new test fails with the fix commented out, works otherwise, hotspot/compiler jtreg, some CTW, ran octane with nashorn. Roland. From aph at redhat.com Fri Jul 24 07:49:37 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 24 Jul 2020 08:49:37 +0100 Subject: [aarch64-port-dev ] RFR(XS): 8249781: AArch64: AOT compiled code crashes if C2 allocates r27 In-Reply-To: <85eep11m2z.fsf@nicgas01-pc.shanghai.arm.com> References: <85eep11m2z.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <6780400a-dd63-fef2-fb38-b92d2e9d8292@redhat.com> Hi, On 24/07/2020 07:08, Nick Gasson wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8249781 > Webrev: http://cr.openjdk.java.net/~ngasson/8249781/webrev.0/ > > AOT compiled code always assumes r27 is the heap base pointer, but since > JDK-8242449 C2 can allocate it as a general register if the compressed > class base is null. If C2 complied code that uses r27 runs before AOT > code, rheapbase will be clobbered causing a crash in the AOT code. > > To reproduce: > > make test TEST="compiler/aot/cli/jaotc/CompileModuleTest.java" \ > JTREG="VM_OPTIONS=-Xcomp -XX:-TieredCompilation" > > Fix by checking if AOT is enabled before using r27 as a general > register. > > Tested with jtreg hotspot_all_no_apps and jdk_core. OK, thanks. Are there any backports needed? -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From nick.gasson at arm.com Fri Jul 24 07:59:55 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Fri, 24 Jul 2020 15:59:55 +0800 Subject: [aarch64-port-dev ] RFR(XS): 8249781: AArch64: AOT compiled code crashes if C2 allocates r27 In-Reply-To: <6780400a-dd63-fef2-fb38-b92d2e9d8292@redhat.com> References: <85eep11m2z.fsf@nicgas01-pc.shanghai.arm.com> <6780400a-dd63-fef2-fb38-b92d2e9d8292@redhat.com> Message-ID: <85blk51gxw.fsf@nicgas01-pc.shanghai.arm.com> On 07/24/20 15:49 pm, Andrew Haley wrote: > > OK, thanks. Are there any backports needed? It only affects JDK 15 and tip. I don't know if it's appropriate for jdk15 in the current RDP2 phase as AOT is an experimental feature? (The JBS entry is P3.) -- Thanks, Nick From aph at redhat.com Fri Jul 24 08:15:23 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 24 Jul 2020 09:15:23 +0100 Subject: RFR 8249189: AARCH64: more L2I conversions can be skipped (ubfiz) In-Reply-To: <209c5713-4218-4e9c-037d-fe337734697f@bell-sw.com> References: <209c5713-4218-4e9c-037d-fe337734697f@bell-sw.com> Message-ID: <8dc598ba-f17e-9d0c-db02-1a329dc010c4@redhat.com> On 22/07/2020 14:36, Boris Ulasevich wrote: > Please review the update for aarch64 AD template file to generate more > bitfield extraction rules where I2L and L2I conversions can be skipped. > > http://cr.openjdk.java.net/~bulasevich/8249189/webrev.02 > http://bugs.openjdk.java.net/browse/JDK-8249189 > > Tested with JTREG and manual [1] tests. 4056 operand immL_positive_bitmaskI() 4057 %{ 4058 predicate((n->get_long() != 0) 4059 && ((n->get_long() & 0xffffffff80000000L) == 0) 4060 && is_power_of_2(n->get_long() + 1)); 4061 match(ConL); 4062 4063 op_cost(0); 4064 format %{ %} 4065 interface(CONST_INTER); 4066 %} Isn't this a difficult-to-understand way of saying 4058 predicate((n->get_long() != 0) 4059 && ((julong)n->get_long() < 0x80000000LL) 4060 && is_power_of_2(n->get_long() + 1)); Note the "LL" here: we have to work with LLP64 systems. Otherwise OK. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Jul 24 08:18:40 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 24 Jul 2020 09:18:40 +0100 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com> References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com> Message-ID: On 23/07/2020 12:25, Boris Ulasevich wrote: > Since the JDK-8248414 patch has been committed, I believe we can revive > this review. I think it is still better to move my rule to the ubfiz > command group, > which is in the auto-generated area. > > http://cr.openjdk.java.net/~bulasevich/8248870/webrev.02 OK, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Fri Jul 24 08:19:10 2020 From: aph at redhat.com (Andrew Haley) Date: Fri, 24 Jul 2020 09:19:10 +0100 Subject: [aarch64-port-dev ] RFR(XS): 8249781: AArch64: AOT compiled code crashes if C2 allocates r27 In-Reply-To: <85blk51gxw.fsf@nicgas01-pc.shanghai.arm.com> References: <85eep11m2z.fsf@nicgas01-pc.shanghai.arm.com> <6780400a-dd63-fef2-fb38-b92d2e9d8292@redhat.com> <85blk51gxw.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: On 24/07/2020 08:59, Nick Gasson wrote: > > On 07/24/20 15:49 pm, Andrew Haley wrote: >> >> OK, thanks. Are there any backports needed? > > It only affects JDK 15 and tip. I don't know if it's appropriate for > jdk15 in the current RDP2 phase as AOT is an experimental feature? (The > JBS entry is P3.) Probably no need, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From adinn at redhat.com Fri Jul 24 08:57:53 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 24 Jul 2020 09:57:53 +0100 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <87zh7pmla1.fsf@redhat.com> References: <87zh7pmla1.fsf@redhat.com> Message-ID: <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> Hi Roland, On 24/07/2020 08:20, Roland Westrelin wrote: > > Original bug: > https://bugs.openjdk.java.net/browse/JDK-8240676 > https://hg.openjdk.java.net/jdk/jdk/rev/6ccf082f50d4 > > The context in compile.hpp changed so the original patch requires a > small adjustment. Testing triggered a crash, so I had to cherry-pick the > change in type.cpp line 3996 from an RFE that was integrated in a later > version of the jdk: 8031755 (Type speculation should be used to optimize > explicit null checks). > > 8u webrev: > http://cr.openjdk.java.net/~roland/8240676.8u/webrev.00/ > > Testing: x86_64, verified new test fails with the fix commented out, > works otherwise, hotspot/compiler jtreg, some CTW, ran octane with > nashorn. The changes to Type::meet_helper and Type::check_symmetrical look fine. However, I don't understand what the cherry-picked change to line 3996 in TypeAryPtr::xmeet_helper does and why it is legitimate: - return make(NotNull, NULL, tary, lazy_klass, false, off, InstanceBot); + return make(NotNull, NULL, tary, lazy_klass, false, off, InstanceBot, speculative, depth); Obviously it fixes a crash but -- for the record -- can you explain 1) why the crash happened and how this fixes it 2) why this was not needed in the upstream patch and is needed here regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From rwestrel at redhat.com Fri Jul 24 09:26:36 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 24 Jul 2020 11:26:36 +0200 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> References: <87zh7pmla1.fsf@redhat.com> <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> Message-ID: <87wo2tmfg3.fsf@redhat.com> Hi Andrew, Thanks for looking at this one. > The changes to Type::meet_helper and Type::check_symmetrical look fine. > > However, I don't understand what the cherry-picked change to line 3996 > in TypeAryPtr::xmeet_helper does and why it is legitimate: > > - return make(NotNull, NULL, tary, lazy_klass, false, off, > InstanceBot); > + return make(NotNull, NULL, tary, lazy_klass, false, off, > InstanceBot, speculative, depth); > > Obviously it fixes a crash but -- for the record -- can you explain > > 1) why the crash happened and how this fixes it The background for this patch is the following: we saw a rare crash during testing. The crash couldn't be reproduced. My attempts at a test case didn't succeed either. So instead, I made a change to the verification code in the type system so it stress tested some combinations of types that were usually rarely exercised. It was then easy to write a test case that triggered the failure and implement a fix. The risk with this change is not so much in the fix itself but in the improvement to the verification code that can uncover bugs that we were not aware of before. That's what happens with 8u where we hit a bug that was never seen with 8u before. Object pointer types have 2 parts: a known type part and a speculative part. When the verification code triggers it verify both parts. In the case of this fix, the speculative parts gets accidentally dropped. The new verification code catches it. The previous one didn't for some reason. > 2) why this was not needed in the upstream patch and is needed here I cherry-picked the change from a later release (jdk 9 I think). So the change was not needed in the 11u patch because it was already there. Roland From tobias.hartmann at oracle.com Fri Jul 24 09:52:42 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 24 Jul 2020 11:52:42 +0200 Subject: RFR(S): 8247732: validate user-input intrinsic_ids in ControlIntrinsic In-Reply-To: <1595520162373.22868@amazon.com> References: <821e3d29-c95b-aafc-8ee5-6e49a1bdde82@amazon.com> <9b324805-eb86-27e1-5dcb-96a823f8495b@amazon.com> <82cba5e4-2020-ce0a-4576-e8e0cc2e5ae5@oracle.com> <1595401959932.33284@amazon.com> <1595520162373.22868@amazon.com> Message-ID: <916b3a4a-5617-941d-6161-840f3ea900bd@oracle.com> Hi Liu, On 23.07.20 18:02, Liu, Xin wrote: > That is my intention too, but CompilerOracle doesn't exit JVM when it encounters parsing errors. > It just exacts information from CompileCommand as many as possible. That makes sense because compiler "directives" are supposed to be optional for program execution. > > I do put the error message in parser's errorbuf. I set a flag "exit_on_error" to quit JVM after it dumps parser errors. yes, I treat undefined intrinsics as fatal errors. > This behavior is from Nils comment: "I want to see an error on startup if the user has specified unknown intrinsic names." It is also consistent with JVM option -XX:ControlIntrinsic=. Okay, thanks for the explanation! I would prefer consistency in error handling of compiler directives, i.e., handle all parser failures the same way. But I leave it to Nils to decide. Best regards, Tobias From boris.ulasevich at bell-sw.com Fri Jul 24 10:48:16 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 24 Jul 2020 13:48:16 +0300 Subject: RFR 8249893: AARCH64: optimize the construction of the value from the bits of the other two Message-ID: Hi, Please review the change to C2 and AArch64 which reduces constructs like? "(v1 & 0xFF) | ((v2 & 0xFF) << 8)" into two Bitfield Insert instructions. http://bugs.openjdk.java.net/browse/JDK-8249893 http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00 The change in common code was made to enable Node::is_AndL method. The method in the rule predicate is required to find out if we are within the straight or reversed rule (ADLC adds rule with swapped parameters for commutative operands). Tested with JTREG and generated [1] tests. thanks, Boris [1] http://cr.openjdk.java.net/~bulasevich/8249893/webrev.00/Gen.java From boris.ulasevich at bell-sw.com Fri Jul 24 10:35:41 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 24 Jul 2020 13:35:41 +0300 Subject: RFR 8249189: AARCH64: more L2I conversions can be skipped (ubfiz) In-Reply-To: <8dc598ba-f17e-9d0c-db02-1a329dc010c4@redhat.com> References: <209c5713-4218-4e9c-037d-fe337734697f@bell-sw.com> <8dc598ba-f17e-9d0c-db02-1a329dc010c4@redhat.com> Message-ID: <34b9e725-de15-4cb4-c211-9c870c871c52@bell-sw.com> Hi Andrew, Thank you! Fixed inline: (julong)n->get_long() < 0x80000000ULL Boris On 24.07.2020 11:15, Andrew Haley wrote: > On 22/07/2020 14:36, Boris Ulasevich wrote: >> Please review the update for aarch64 AD template file to generate more >> bitfield extraction rules where I2L and L2I conversions can be skipped. >> >> http://cr.openjdk.java.net/~bulasevich/8249189/webrev.02 >> http://bugs.openjdk.java.net/browse/JDK-8249189 >> >> Tested with JTREG and manual [1] tests. > 4056 operand immL_positive_bitmaskI() > 4057 %{ > 4058 predicate((n->get_long() != 0) > 4059 && ((n->get_long() & 0xffffffff80000000L) == 0) > 4060 && is_power_of_2(n->get_long() + 1)); > 4061 match(ConL); > 4062 > 4063 op_cost(0); > 4064 format %{ %} > 4065 interface(CONST_INTER); > 4066 %} > > Isn't this a difficult-to-understand way of saying > > 4058 predicate((n->get_long() != 0) > 4059 && ((julong)n->get_long() < 0x80000000LL) > 4060 && is_power_of_2(n->get_long() + 1)); > > Note the "LL" here: we have to work with LLP64 systems. > > Otherwise OK. > From christian.hagedorn at oracle.com Fri Jul 24 11:44:41 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 24 Jul 2020 13:44:41 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> Message-ID: <2f5978fe-af76-df18-15c0-dcc62563299d@oracle.com> Hi Tobias Thank you for your review! > Please make sure to run performance testing. There is a repeated regression in the micros open crypto benchmark openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes with these two settings: - algorithm=SHA1PRNG-dataSize:64-provider:-shared:false - algorithm=SHA1PRNG-dataSize:64-provider:-shared:true Repeated runs with these two settings resulted in a regression between 1 and 2%. I could trace it back to the additional type filtering in PhiNode::Value() (webrev.02). This is only required for the assertion code and not for the bailout fix itself. When running performance testing with webrev.01, the regressions disappear. I therefore suggest to go with webrev.01 (without assertion code and type filtering) and file a new RFE to investigate the usage of type filtering in PhiNode::Value() for iv phis and why we get a performance regression in these two benchmark settings. In theory, I think it should be beneficial to narrow the type range of iv phis. > cfgnode.cpp:1083 > - There's an extra whitespace before "," > > loopopts.cpp:84/86 > - No need for extra brackets These are not present anymore in webrev.01. http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ Best regards, Christian On 20.07.20 11:14, Tobias Hartmann wrote: > Hi Christian, > > On 15.07.20 15:08, Christian Hagedorn wrote: >> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ > > Looks good to me. > > Some code style comments: > Best regards, > Tobias > From adinn at redhat.com Fri Jul 24 12:32:21 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 24 Jul 2020 13:32:21 +0100 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <87wo2tmfg3.fsf@redhat.com> References: <87zh7pmla1.fsf@redhat.com> <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> <87wo2tmfg3.fsf@redhat.com> Message-ID: On 24/07/2020 10:26, Roland Westrelin wrote: >> The changes to Type::meet_helper and Type::check_symmetrical look fine. >> >> However, I don't understand what the cherry-picked change to line 3996 >> in TypeAryPtr::xmeet_helper does and why it is legitimate: >> >> - return make(NotNull, NULL, tary, lazy_klass, false, off, >> InstanceBot); >> + return make(NotNull, NULL, tary, lazy_klass, false, off, >> InstanceBot, speculative, depth); >> >> Obviously it fixes a crash but -- for the record -- can you explain >> >> 1) why the crash happened and how this fixes it > > The background for this patch is the following: we saw a rare crash > during testing. The crash couldn't be reproduced. My attempts at a test > case didn't succeed either. So instead, I made a change to the > verification code in the type system so it stress tested some > combinations of types that were usually rarely exercised. It was then > easy to write a test case that triggered the failure and implement a > fix. Ok, understood. > The risk with this change is not so much in the fix itself but in the > improvement to the verification code that can uncover bugs that we were > not aware of before. That's what happens with 8u where we hit a bug that > was never seen with 8u before. Ok, but all the verification code happens under #ifdef ASSERT so that is only going to change behaviour in non-production builds right? i.e. the important change is the one to the meet code? > Object pointer types have 2 parts: a known type part and a speculative > part. When the verification code triggers it verify both parts. In the > case of this fix, the speculative parts gets accidentally dropped. The > new verification code catches it. The previous one didn't for some > reason. Ah ok, I get this now. The change ensures that the speculative type of the meet type is the meet of the respective speculative types. That may well change behaviour for some programs as meets are computed outside of the changed verification path. I'd like to assume the benefits of improving type accuracy override the risk. Do you think that is justified? (one might argue that improved type accuracy is not always better, especially for speculative info where avoiding the erasure might enable optimizations not previously attempted). >> 2) why this was not needed in the upstream patch and is needed here > > I cherry-picked the change from a later release (jdk 9 I think). So the > change was not needed in the 11u patch because it was already there. Doh! Of course. Thanks for the explanation. Well, the change looks good to me but I'm not really in a position to assess the risk of the xmeet change. I am reassured that it exists in the upstream code and is not known to have caused any errors. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From lutz.schmidt at sap.com Fri Jul 24 12:51:14 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 24 Jul 2020 12:51:14 +0000 Subject: RFR(XS): -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp Message-ID: Dear all, may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ Local testing looks good. jdk/submit tests pending. Thank you! Lutz From lutz.schmidt at sap.com Fri Jul 24 12:53:21 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Fri, 24 Jul 2020 12:53:21 +0000 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp Message-ID: Resending after updating subject line with bug id. Sorry for the spam. Lutz ?On 24.07.20, 14:51, "Schmidt, Lutz" wrote: Dear all, may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ Local testing looks good. jdk/submit tests pending. Thank you! Lutz From christian.hagedorn at oracle.com Fri Jul 24 12:57:57 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 24 Jul 2020 14:57:57 +0200 Subject: [16] RFR(XS): 8249602: C2: assert(cnt == _outcnt) failed: no insertions allowed Message-ID: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8249602 http://cr.openjdk.java.net/~chagedorn/8249602/webrev.00/ The testcase hits the assert when inserting a post loop. When correcting the fall-in values to the post-loop phis to take the values from the main-loop, we have to separately handle nodes that belong to the backedge control block and cannot float. In this process, we clone data nodes in PhaseIdealLoop::clone_up_backedge_goo and then hit the assert because some nodes to be cloned have a control input from the main-loop header node (main_head). These nodes are cloned and the main_head node gets these nodes as additional output nodes. This should be fine but the DUIterator_Fast forbids insertions. The fix simply switches to a normal DUIterator which allows insertions. This should also be done when correcting the fall-in values to the main-loop to take the values from the pre-loop. Best regards, Christian From rwestrel at redhat.com Fri Jul 24 12:59:44 2020 From: rwestrel at redhat.com (Roland Westrelin) Date: Fri, 24 Jul 2020 14:59:44 +0200 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: References: <87zh7pmla1.fsf@redhat.com> <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> <87wo2tmfg3.fsf@redhat.com> Message-ID: <87lfj9m5kv.fsf@redhat.com> > Ok, but all the verification code happens under #ifdef ASSERT so that is > only going to change behaviour in non-production builds right? > > i.e. the important change is the one to the meet code? Yes, to both. > That may well change behaviour for some programs as meets are computed > outside of the changed verification path. I'd like to assume the > benefits of improving type accuracy override the risk. Do you think that > is justified? (one might argue that improved type accuracy is not always > better, especially for speculative info where avoiding the erasure might > enable optimizations not previously attempted). I would say both benefit and risk are small. Without the speculative type change, we'll hit failures in the new verification code so some other tweak would have to be done to work around them. Not sure what could be done but that would likely be as risky. Or only the actual fix is backported and the new verification code is left out. Then the speculative type fix is not required. But a regression wouldn't be caught either. Roland. From coleen.phillimore at oracle.com Fri Jul 24 13:10:24 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 24 Jul 2020 09:10:24 -0400 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> Message-ID: I can also replace method_oop with method_ptr in the CPU ad files, and this seems to build but now someone who knows the compiler area needs to comment; this was supposed to be trivial... :)? But it still is really trivial to look at. I left interpreter_method_oop_reg and compiler_method_oop_reg and friends in opto/matcher.cpp for someone else. incremental webrev at http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev full webrev at http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev Thanks, Coleen On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: > > Thanks for looking at this. > > On 7/24/20 1:01 AM, David Holmes wrote: >> Hi Coleen, >> >> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>> See bug for more details.? I've been running into these names a lot >>> lately.?? Many of these names are in JVMTI. >>> >>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>> platforms. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >> >> src/hotspot/cpu/*/*.ad >> >> These still refer to "method oop" and method_oop in a number of places. > > Yes, I only replaced method_oop in the shared code and not in the AD > code.? method_oop can be the name of a parameter and using "sed" to > change it to "method" doesn't work.?? Somebody who understands this > code and looks at it will have to make the rest of the changes. > > What I did was replace "method oop" with "method" and "methodOop" with > "method" in all the sources.? I replaced "method_oop" with "method" or > "checked_method" in the shared sources. > >> >> src/hotspot/share/adlc/adlparse.cpp >> >> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >> entry"); >> >> I guess I'm not understanding the scope of this renaming - why is >> _interpreter_method_oop_reg not renamed as well? Should this (and >> other uses) be parsed as method-(oop-reg) rather than (method-oop)-reg? > > I don't know this code, so I'd rather not change any more of it. The > comment makes sense changed, even though the variable name still > refers to method_oop. > > Thanks, > Coleen >> >> Otherwise all okay. >> >> Thanks, >> David >> >>> Thanks, >>> Coleen > From boris.ulasevich at bell-sw.com Fri Jul 24 13:19:33 2020 From: boris.ulasevich at bell-sw.com (Boris Ulasevich) Date: Fri, 24 Jul 2020 16:19:33 +0300 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com> Message-ID: Thank you for review, Andrew! Boris On 24.07.2020 11:18, Andrew Haley wrote: > On 23/07/2020 12:25, Boris Ulasevich wrote: >> Since the JDK-8248414 patch has been committed, I believe we can revive >> this review. I think it is still better to move my rule to the ubfiz >> command group, >> which is in the auto-generated area. >> >> http://cr.openjdk.java.net/~bulasevich/8248870/webrev.02 > OK, thanks. > From doug.simon at oracle.com Fri Jul 24 13:21:14 2020 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 24 Jul 2020 15:21:14 +0200 Subject: RFR(XS): -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: References: Message-ID: <9800BE84-F2C2-456A-BDAD-66D3CA74DBF9@oracle.com> This is not quite right. UseJVMCICompiler means ?use JVMCI compiler as top tier JIT compiler?. It is still possible to use the JVMCI compiler via its Java API without using it as the top tier JIT compiler (i.e. ?hosted? mode). I think what you are aiming for is to omit printing this info when the JVMCI compiler is not used at all. This patch should achieve that: diff -r f564ec7074f0 src/hotspot/share/jvmci/jvmciCompiler.cpp --- a/src/hotspot/share/jvmci/jvmciCompiler.cpp Thu Jul 23 11:47:20 2020 +0200 +++ b/src/hotspot/share/jvmci/jvmciCompiler.cpp Fri Jul 24 15:18:22 2020 +0200 @@ -146,6 +146,8 @@ // Print compilation timers and statistics void JVMCICompiler::print_compilation_timers() { - JVMCI_event_1("JVMCICompiler::print_timers"); - tty->print_cr(" JVMCI code install time: %6.3f s", _codeInstallTimer.seconds()); + if (_codeInstallTimer.seconds() != 0) { + JVMCI_event_1("JVMCICompiler::print_timers"); + tty->print_cr(" JVMCI code install time: %6.3f s", _codeInstallTimer.seconds()); + } } -Doug > On 24 Jul 2020, at 14:51, Schmidt, Lutz wrote: > > Dear all, > > may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ > > Local testing looks good. jdk/submit tests pending. > > Thank you! > Lutz > > > From luhenry at microsoft.com Fri Jul 24 15:53:25 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 24 Jul 2020 15:53:25 +0000 Subject: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC Message-ID: Hi, Could I please have a review on the following. It simply adds the `DEPRECATED` macro to wrap `__attribute__ ((deprecated))` for GCC, and the equivalent for MSVC. JBS: https://bugs.openjdk.java.net/browse/JDK-8248672 Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248672/webrev.00 Thank you, -- Ludovic From vladimir.kozlov at oracle.com Fri Jul 24 16:02:23 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 09:02:23 -0700 Subject: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: Message-ID: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> "And use it in the AArch64 sub system." Can you explain more? Do you have RFE filed for DEPRECATED use? It is small change which usually done together with usage. Why do this separately? Thanks, Vladimir K On 7/24/20 8:53 AM, Ludovic Henry wrote: > Hi, > > Could I please have a review on the following. It simply adds the `DEPRECATED` macro to wrap `__attribute__ ((deprecated))` for GCC, and the equivalent for MSVC. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8248672 > Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248672/webrev.00 > > Thank you, > > -- > Ludovic > From luhenry at microsoft.com Fri Jul 24 16:02:35 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 24 Jul 2020 16:02:35 +0000 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: <0aed0646-c770-03e6-4e0b-5108919b7203@redhat.com> References: , <0aed0646-c770-03e6-4e0b-5108919b7203@redhat.com> Message-ID: Hi Andrew, Are you saying that you would like this change to land into `aarch64-port/jdk-windows` before getting into jdk/jdk? This change doesn't strike me as windows-aarch64 specific and is in line with general removal of GCC-specific code (similarly to the LP64 vs LLP64, or JDK-8248666). Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248676/webrev.01 Thank you for your review, -- Ludovic ________________________________________ From: Andrew Haley Sent: Thursday, July 16, 2020 01:44 To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64 Subject: Re: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor On 15/07/2020 14:27, Ludovic Henry wrote: > A quick follow-up on that patch. Is there anything you would like to see done differently? It's fine, but (as discussed) it should go into https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhg.openjdk.java.net%2Faarch64-port%2Fjdk-windows%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cc5846b05f89e459465c008d829647194%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637304858666610938&sdata=XHQJVnYMgVPu6NTEq94rJRO2sgXGCVFCaCr8yFVa60I%3D&reserved=0 We'll need to do a regular pull from jdk/jdk into that tree. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7Cluhenry%40microsoft.com%7Cc5846b05f89e459465c008d829647194%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637304858666610938&sdata=Sa%2BaNNLEzkQqnHjobj3CfdW%2B6oX3ItJrBV3IHlgAvek%3D&reserved=0 EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From luhenry at microsoft.com Fri Jul 24 16:41:52 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 24 Jul 2020 16:41:52 +0000 Subject: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> References: , <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> Message-ID: I'm not sure I understand your question. Are you asking whether I also replaced all uses of __attribute__((deprecated)) with DEPRECATED? If so, I did replace the only use of it [1] together with defining the macro. Please let me know if I misunderstood your question. Thank you. -- Ludovic [1] in src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp. ________________________________________ From: Vladimir Kozlov Sent: Friday, July 24, 2020 09:02 To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64; hotspot-gc-dev at openjdk.java.net Subject: Re: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC "And use it in the AArch64 sub system." Can you explain more? Do you have RFE filed for DEPRECATED use? It is small change which usually done together with usage. Why do this separately? Thanks, Vladimir K On 7/24/20 8:53 AM, Ludovic Henry wrote: > Hi, > > Could I please have a review on the following. It simply adds the `DEPRECATED` macro to wrap `__attribute__ ((deprecated))` for GCC, and the equivalent for MSVC. > > JBS: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8248672&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=p%2Bec5f3YYaazBblPt9vRWjQ2ZWa209lHGPLlsuMbpk8%3D&reserved=0 > Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248672%2Fwebrev.00&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=aOC6Xq%2BJExcKQ8oeNew4aZMHZUD2idlTi3tRihRwpjs%3D&reserved=0 > > Thank you, > > -- > Ludovic > From doug.simon at oracle.com Fri Jul 24 16:53:54 2020 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 24 Jul 2020 18:53:54 +0200 Subject: RFR: 8250548: libgraal can deadlock in -Xcomp mode Message-ID: Please review this bug fix for a deadlock in libgraal under Xcomp. BUG: https://bugs.openjdk.java.net/browse/JDK-8250548 PATCH: diff -r 1f37a5cd6afc src/hotspot/share/compiler/compileBroker.cpp --- a/src/hotspot/share/compiler/compileBroker.cpp Fri Jul 24 11:00:50 2020 -0400 +++ b/src/hotspot/share/compiler/compileBroker.cpp Fri Jul 24 18:52:30 2020 +0200 @@ -1655,7 +1655,7 @@ bool free_task; #if INCLUDE_JVMCI AbstractCompiler* comp = compiler(task->comp_level()); - if (!UseJVMCINativeLibrary && comp->is_jvmci() && !task->should_wait_for_compilation()) { + if (comp->is_jvmci() && !task->should_wait_for_compilation()) { // It may return before compilation is completed. free_task = wait_for_jvmci_completion((JVMCICompiler*) comp, task, thread); } else Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From tom.rodriguez at oracle.com Fri Jul 24 16:54:56 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Fri, 24 Jul 2020 09:54:56 -0700 Subject: RFR: 8250548: libgraal can deadlock in -Xcomp mode In-Reply-To: References: Message-ID: Looks good. tom Doug Simon wrote on 7/24/20 9:53 AM: > Please review this bug fix for a deadlock in libgraal under Xcomp. > > BUG: > https://bugs.openjdk.java.net/browse/JDK-8250548 > > PATCH: > > diff -r 1f37a5cd6afc src/hotspot/share/compiler/compileBroker.cpp > --- a/src/hotspot/share/compiler/compileBroker.cpp Fri Jul 24 11:00:50 2020 -0400 > +++ b/src/hotspot/share/compiler/compileBroker.cpp Fri Jul 24 18:52:30 2020 +0200 > @@ -1655,7 +1655,7 @@ > bool free_task; > #if INCLUDE_JVMCI > AbstractCompiler* comp = compiler(task->comp_level()); > - if (!UseJVMCINativeLibrary && comp->is_jvmci() && !task->should_wait_for_compilation()) { > + if (comp->is_jvmci() && !task->should_wait_for_compilation()) { > // It may return before compilation is completed. > free_task = wait_for_jvmci_completion((JVMCICompiler*) comp, task, thread); > } else > > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From vladimir.kozlov at oracle.com Fri Jul 24 17:23:58 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 10:23:58 -0700 Subject: RFR: 8250548: libgraal can deadlock in -Xcomp mode In-Reply-To: References: Message-ID: +1 Thanks, Vladimir K On 7/24/20 9:54 AM, Tom Rodriguez wrote: > Looks good. > > tom > > Doug Simon wrote on 7/24/20 9:53 AM: >> Please review this bug fix for a deadlock in libgraal under Xcomp. >> >> BUG: >> https://bugs.openjdk.java.net/browse/JDK-8250548 >> >> PATCH: >> >> diff -r 1f37a5cd6afc src/hotspot/share/compiler/compileBroker.cpp >> --- a/src/hotspot/share/compiler/compileBroker.cpp????? Fri Jul 24 11:00:50 2020 -0400 >> +++ b/src/hotspot/share/compiler/compileBroker.cpp????? Fri Jul 24 18:52:30 2020 +0200 >> @@ -1655,7 +1655,7 @@ >> ??? bool free_task; >> ? #if INCLUDE_JVMCI >> ??? AbstractCompiler* comp = compiler(task->comp_level()); >> -? if (!UseJVMCINativeLibrary && comp->is_jvmci() && !task->should_wait_for_compilation()) { >> +? if (comp->is_jvmci() && !task->should_wait_for_compilation()) { >> ????? // It may return before compilation is completed. >> ????? free_task = wait_for_jvmci_completion((JVMCICompiler*) comp, task, thread); >> ??? } else >> >> >> Testing: hs-tier1,hs-tier2,hs-tier3-graal >> >> -Doug >> From vladimir.kozlov at oracle.com Fri Jul 24 17:26:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 10:26:31 -0700 Subject: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> Message-ID: It was my mistake - I missed that it is indeed used in macroAssembler_aarch64.hpp And thank you, Monica, for RFE description change - it is more clear now. Change is fine. You need someone from aarch64 to review this too to make sure it works with their GCC. Regards, Vladimir K On 7/24/20 9:41 AM, Ludovic Henry wrote: > I'm not sure I understand your question. Are you asking whether I also replaced all uses of __attribute__((deprecated)) with DEPRECATED? If so, I did replace the only use of it [1] together with defining the macro. > > Please let me know if I misunderstood your question. > > Thank you. > > -- > Ludovic > > [1] in src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp. > > ________________________________________ > From: Vladimir Kozlov > Sent: Friday, July 24, 2020 09:02 > To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Cc: openjdk-aarch64; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > > "And use it in the AArch64 sub system." > > Can you explain more? Do you have RFE filed for DEPRECATED use? > It is small change which usually done together with usage. Why do this separately? > > Thanks, > Vladimir K > > On 7/24/20 8:53 AM, Ludovic Henry wrote: >> Hi, >> >> Could I please have a review on the following. It simply adds the `DEPRECATED` macro to wrap `__attribute__ ((deprecated))` for GCC, and the equivalent for MSVC. >> >> JBS: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8248672&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=p%2Bec5f3YYaazBblPt9vRWjQ2ZWa209lHGPLlsuMbpk8%3D&reserved=0 >> Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248672%2Fwebrev.00&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=aOC6Xq%2BJExcKQ8oeNew4aZMHZUD2idlTi3tRihRwpjs%3D&reserved=0 >> >> Thank you, >> >> -- >> Ludovic >> From vladimir.kozlov at oracle.com Fri Jul 24 18:03:29 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 11:03:29 -0700 Subject: [16] RFR(XS): 8249602: C2: assert(cnt == _outcnt) failed: no insertions allowed In-Reply-To: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> References: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> Message-ID: <82cbd463-d480-b882-04da-0d1269717fff@oracle.com> Looks good. Thanks, Vladimir On 7/24/20 5:57 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8249602 > http://cr.openjdk.java.net/~chagedorn/8249602/webrev.00/ > > The testcase hits the assert when inserting a post loop. When correcting the fall-in values to the post-loop phis to > take the values from the main-loop, we have to separately handle nodes that belong to the backedge control block and > cannot float. In this process, we clone data nodes in PhaseIdealLoop::clone_up_backedge_goo and then hit the assert > because some nodes to be cloned have a control input from the main-loop header node (main_head). These nodes are cloned > and the main_head node gets these nodes as additional output nodes. This should be fine but the DUIterator_Fast forbids > insertions. > > The fix simply switches to a normal DUIterator which allows insertions. This should also be done when correcting the > fall-in values to the main-loop to take the values from the pre-loop. > > Best regards, > Christian From doug.simon at oracle.com Fri Jul 24 18:12:33 2020 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 24 Jul 2020 20:12:33 +0200 Subject: RFR: 8250548: libgraal can deadlock in -Xcomp mode Message-ID: <0DE64525-958F-4885-84D0-151990B5D8F9@oracle.com> Please review this bug fix to revert the JVMCI changes made as part of JDK-8230395. Instead of aborting the VM when JVMCI counter expansion fails, the JVMCI client should simply be informed of the failure (as was originally suggested by David). https://bugs.openjdk.java.net/browse/JDK-8250556 https://cr.openjdk.java.net/~dnsimon/8250556/webrev.00/ Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From doug.simon at oracle.com Fri Jul 24 18:35:21 2020 From: doug.simon at oracle.com (Doug Simon) Date: Fri, 24 Jul 2020 20:35:21 +0200 Subject: RFR: 8250556: revert JVMCI part of JDK-8230395 Message-ID: <0DE58760-A197-46FD-99B0-A2C1A5394DEE@oracle.com> (with correct subject this time) Please review this bug fix to revert the JVMCI changes made as part of JDK-8230395. Instead of aborting the VM when JVMCI counter expansion fails, the JVMCI client should simply be informed of the failure (as was originally suggested by David). https://bugs.openjdk.java.net/browse/JDK-8250556 https://cr.openjdk.java.net/~dnsimon/8250556/webrev.00/ Testing: hs-tier1,hs-tier2,hs-tier3-graal -Doug From vladimir.kozlov at oracle.com Fri Jul 24 18:40:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 11:40:13 -0700 Subject: RFR: 8250556: revert JVMCI part of JDK-8230395 In-Reply-To: <0DE58760-A197-46FD-99B0-A2C1A5394DEE@oracle.com> References: <0DE58760-A197-46FD-99B0-A2C1A5394DEE@oracle.com> Message-ID: Looks good. Thanks, Vladimir K On 7/24/20 11:35 AM, Doug Simon wrote: > (with correct subject this time) > > Please review this bug fix to revert the JVMCI changes made as part of JDK-8230395. > > Instead of aborting the VM when JVMCI counter expansion fails, the JVMCI client should simply be informed of the failure (as was originally suggested by David). > > https://bugs.openjdk.java.net/browse/JDK-8250556 > https://cr.openjdk.java.net/~dnsimon/8250556/webrev.00/ > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From serguei.spitsyn at oracle.com Fri Jul 24 20:28:18 2020 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 24 Jul 2020 13:28:18 -0700 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> Message-ID: <60098204-b23d-7da6-908f-80f3d40c2ebc@oracle.com> Hi Coleen, The fix looks good to me. I've more focused on the serviceability related update. Thank you for taking care about it! Thanks, Serguei On 7/24/20 06:10, coleen.phillimore at oracle.com wrote: > > I can also replace method_oop with method_ptr in the CPU ad files, and > this seems to build but now someone who knows the compiler area needs > to comment; this was supposed to be trivial... :)? But it still is > really trivial to look at. > > I left interpreter_method_oop_reg and compiler_method_oop_reg and > friends in opto/matcher.cpp for someone else. > > incremental webrev at > http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev > full webrev at http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev > > Thanks, > Coleen > > > On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: >> >> Thanks for looking at this. >> >> On 7/24/20 1:01 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>>> See bug for more details.? I've been running into these names a lot >>>> lately.?? Many of these names are in JVMTI. >>>> >>>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>>> platforms. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >>> >>> src/hotspot/cpu/*/*.ad >>> >>> These still refer to "method oop" and method_oop in a number of places. >> >> Yes, I only replaced method_oop in the shared code and not in the AD >> code.? method_oop can be the name of a parameter and using "sed" to >> change it to "method" doesn't work.?? Somebody who understands this >> code and looks at it will have to make the rest of the changes. >> >> What I did was replace "method oop" with "method" and "methodOop" >> with "method" in all the sources.? I replaced "method_oop" with >> "method" or "checked_method" in the shared sources. >> >>> >>> src/hotspot/share/adlc/adlparse.cpp >>> >>> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >>> entry"); >>> >>> I guess I'm not understanding the scope of this renaming - why is >>> _interpreter_method_oop_reg not renamed as well? Should this (and >>> other uses) be parsed as method-(oop-reg) rather than (method-oop)-reg? >> >> I don't know this code, so I'd rather not change any more of it. The >> comment makes sense changed, even though the variable name still >> refers to method_oop. >> >> Thanks, >> Coleen >>> >>> Otherwise all okay. >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Coleen >> > From adinn at redhat.com Fri Jul 24 21:07:48 2020 From: adinn at redhat.com (Andrew Dinn) Date: Fri, 24 Jul 2020 22:07:48 +0100 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <87lfj9m5kv.fsf@redhat.com> References: <87zh7pmla1.fsf@redhat.com> <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> <87wo2tmfg3.fsf@redhat.com> <87lfj9m5kv.fsf@redhat.com> Message-ID: <7e9268bf-41d5-52d3-cb6c-449fabb0f192@redhat.com> On 24/07/2020 13:59, Roland Westrelin wrote: >> That may well change behaviour for some programs as meets are computed >> outside of the changed verification path. I'd like to assume the >> benefits of improving type accuracy override the risk. Do you think that >> is justified? (one might argue that improved type accuracy is not always >> better, especially for speculative info where avoiding the erasure might >> enable optimizations not previously attempted). > > I would say both benefit and risk are small. Without the speculative > type change, we'll hit failures in the new verification code so some > other tweak would have to be done to work around them. Not sure what > could be done but that would likely be as risky. Or only the actual fix > is backported and the new verification code is left out. Then the > speculative type fix is not required. But a regression wouldn't be > caught either. That's a good enough justification for me. Ship it! . . . well, modulo maintainer approval ;-) regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill From vladimir.kozlov at oracle.com Fri Jul 24 21:41:29 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 24 Jul 2020 14:41:29 -0700 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <2f5978fe-af76-df18-15c0-dcc62563299d@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> <2f5978fe-af76-df18-15c0-dcc62563299d@oracle.com> Message-ID: Good. Thanks, Vladimir K On 7/24/20 4:44 AM, Christian Hagedorn wrote: > Hi Tobias > > Thank you for your review! > >> Please make sure to run performance testing. > > There is a repeated regression in the micros open crypto benchmark > openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes with these two settings: > - algorithm=SHA1PRNG-dataSize:64-provider:-shared:false > - algorithm=SHA1PRNG-dataSize:64-provider:-shared:true > > Repeated runs with these two settings resulted in a regression between 1 and 2%. I could trace it back to the additional > type filtering in PhiNode::Value() (webrev.02). This is only required for the assertion code and not for the bailout fix > itself. When running performance testing with webrev.01, the regressions disappear. > > I therefore suggest to go with webrev.01 (without assertion code and type filtering) and file a new RFE to investigate > the usage of type filtering in PhiNode::Value() for iv phis and why we get a performance regression in these two > benchmark settings. In theory, I think it should be beneficial to narrow the type range of iv phis. > >> cfgnode.cpp:1083 >> - There's an extra whitespace before "," >> >> loopopts.cpp:84/86 >> - No need for extra brackets > > These are not present anymore in webrev.01. > > http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ > > Best regards, > Christian > > > On 20.07.20 11:14, Tobias Hartmann wrote: >> Hi Christian, >> >> On 15.07.20 15:08, Christian Hagedorn wrote: >>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ >> >> Looks good to me. >> >> Some code style comments: > > >> Best regards, >> Tobias >> From coleen.phillimore at oracle.com Fri Jul 24 22:20:23 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 24 Jul 2020 18:20:23 -0400 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: <60098204-b23d-7da6-908f-80f3d40c2ebc@oracle.com> References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> <60098204-b23d-7da6-908f-80f3d40c2ebc@oracle.com> Message-ID: <4232a089-3984-7feb-d2eb-46f1551ee0ab@oracle.com> On 7/24/20 4:28 PM, serguei.spitsyn at oracle.com wrote: > Hi Coleen, > > The fix looks good to me. > I've more focused on the serviceability related update. > Thank you for taking care about it! Thank you for reviewing it!? Most of the name changes were in jvmti.? Hope it's cleaner to work on now. Coleen > > Thanks, > Serguei > > > On 7/24/20 06:10, coleen.phillimore at oracle.com wrote: >> >> I can also replace method_oop with method_ptr in the CPU ad files, >> and this seems to build but now someone who knows the compiler area >> needs to comment; this was supposed to be trivial... :)? But it still >> is really trivial to look at. >> >> I left interpreter_method_oop_reg and compiler_method_oop_reg and >> friends in opto/matcher.cpp for someone else. >> >> incremental webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev >> full webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev >> >> Thanks, >> Coleen >> >> >> On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: >>> >>> Thanks for looking at this. >>> >>> On 7/24/20 1:01 AM, David Holmes wrote: >>>> Hi Coleen, >>>> >>>> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>>>> See bug for more details.? I've been running into these names a >>>>> lot lately.?? Many of these names are in JVMTI. >>>>> >>>>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>>>> platforms. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >>>> >>>> src/hotspot/cpu/*/*.ad >>>> >>>> These still refer to "method oop" and method_oop in a number of >>>> places. >>> >>> Yes, I only replaced method_oop in the shared code and not in the AD >>> code.? method_oop can be the name of a parameter and using "sed" to >>> change it to "method" doesn't work.?? Somebody who understands this >>> code and looks at it will have to make the rest of the changes. >>> >>> What I did was replace "method oop" with "method" and "methodOop" >>> with "method" in all the sources.? I replaced "method_oop" with >>> "method" or "checked_method" in the shared sources. >>> >>>> >>>> src/hotspot/share/adlc/adlparse.cpp >>>> >>>> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >>>> entry"); >>>> >>>> I guess I'm not understanding the scope of this renaming - why is >>>> _interpreter_method_oop_reg not renamed as well? Should this (and >>>> other uses) be parsed as method-(oop-reg) rather than >>>> (method-oop)-reg? >>> >>> I don't know this code, so I'd rather not change any more of it. The >>> comment makes sense changed, even though the variable name still >>> refers to method_oop. >>> >>> Thanks, >>> Coleen >>>> >>>> Otherwise all okay. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Coleen >>> >> > From kim.barrett at oracle.com Fri Jul 24 23:42:43 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 24 Jul 2020 19:42:43 -0400 Subject: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> Message-ID: > On Jul 24, 2020, at 12:41 PM, Ludovic Henry wrote: > > I'm not sure I understand your question. Are you asking whether I also replaced all uses of __attribute__((deprecated)) with DEPRECATED? If so, I did replace the only use of it [1] together with defining the macro. > > Please let me know if I misunderstood your question. > > Thank you. > > -- > Ludovic > > [1] in src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp. Why are we deprecating something rather than just deleting it and fixing any users? If the point is to keep the overload but prevent it from being called, there are better ways than a deprecation warning. And if we *really* needed deprecation warnings, I suggest using the C++14 [[deprecated]] attribute (after adding it to the approved new feature list). But I think we shouldn't be doing this at all. > > ________________________________________ > From: Vladimir Kozlov > Sent: Friday, July 24, 2020 09:02 > To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net > Cc: openjdk-aarch64; hotspot-gc-dev at openjdk.java.net > Subject: Re: RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > > "And use it in the AArch64 sub system." > > Can you explain more? Do you have RFE filed for DEPRECATED use? > It is small change which usually done together with usage. Why do this separately? > > Thanks, > Vladimir K > > On 7/24/20 8:53 AM, Ludovic Henry wrote: >> Hi, >> >> Could I please have a review on the following. It simply adds the `DEPRECATED` macro to wrap `__attribute__ ((deprecated))` for GCC, and the equivalent for MSVC. >> >> JBS: https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.openjdk.java.net%2Fbrowse%2FJDK-8248672&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=p%2Bec5f3YYaazBblPt9vRWjQ2ZWa209lHGPLlsuMbpk8%3D&reserved=0 >> Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248672%2Fwebrev.00&data=02%7C01%7Cluhenry%40microsoft.com%7Cf8e2909451c44ee9c9b108d82feafcd6%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312033704245529&sdata=aOC6Xq%2BJExcKQ8oeNew4aZMHZUD2idlTi3tRihRwpjs%3D&reserved=0 >> >> Thank you, >> >> -- >> Ludovic From aph at redhat.com Sat Jul 25 15:14:12 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 25 Jul 2020 16:14:12 +0100 Subject: [8u] RFR: 8240676: Meet not symmetric failure when running lucene on jdk8 In-Reply-To: <7e9268bf-41d5-52d3-cb6c-449fabb0f192@redhat.com> References: <87zh7pmla1.fsf@redhat.com> <9ae30a2b-3443-9954-950e-08e7e26ddd97@redhat.com> <87wo2tmfg3.fsf@redhat.com> <87lfj9m5kv.fsf@redhat.com> <7e9268bf-41d5-52d3-cb6c-449fabb0f192@redhat.com> Message-ID: On 24/07/2020 22:07, Andrew Dinn wrote: > On 24/07/2020 13:59, Roland Westrelin wrote: >>> That may well change behaviour for some programs as meets are computed >>> outside of the changed verification path. I'd like to assume the >>> benefits of improving type accuracy override the risk. Do you think that >>> is justified? (one might argue that improved type accuracy is not always >>> better, especially for speculative info where avoiding the erasure might >>> enable optimizations not previously attempted). >> >> I would say both benefit and risk are small. Without the speculative >> type change, we'll hit failures in the new verification code so some >> other tweak would have to be done to work around them. Not sure what >> could be done but that would likely be as risky. Or only the actual fix >> is backported and the new verification code is left out. Then the >> speculative type fix is not required. But a regression wouldn't be >> caught either. > That's a good enough justification for me. Ship it! > > . . . well, modulo maintainer approval ;-) Yeah, OK. Of course I'm not super keen on a change in C2 which fixes a bug that we can't reproduce, but it'll have to do. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Sat Jul 25 15:30:26 2020 From: aph at redhat.com (Andrew Haley) Date: Sat, 25 Jul 2020 16:30:26 +0100 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: <0aed0646-c770-03e6-4e0b-5108919b7203@redhat.com> Message-ID: On 24/07/2020 17:02, Ludovic Henry wrote: > Are you saying that you would like this change to land into > `aarch64-port/jdk-windows` before getting into jdk/jdk? This change > doesn't strike me as windows-aarch64 specific and is in line with > general removal of GCC-specific code (similarly to the LP64 vs > LLP64, or JDK-8248666). > > Webrev: http://cr.openjdk.java.net/~burban/luhenry/8248676/webrev.01 You make a good point. I didn't want to get integer type cleanups mixed up with the Windows import, so I wanted to do them first. I think there was a general feeling, expressed by Dalibor, the leader of the Porters' Group, that the Windows changes should be integrated into the http://hg.openjdk.java.net/aarch64-port/jdk-windows/ tree. This change is marginal, IMO. Clearly it's a GCC-ism, so I won't refuse it being cleaned up in mainline if you want. But I think we should now move to integrating all of your Windows- specific changes in the jdk-windows tree and then we'll put together a Big Windows Patch and push that to mainline. I don't think it'll take long. Let's just get it done! -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From luhenry at microsoft.com Sat Jul 25 15:51:21 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Sat, 25 Jul 2020 15:51:21 +0000 Subject: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor In-Reply-To: References: <0aed0646-c770-03e6-4e0b-5108919b7203@redhat.com> , Message-ID: > But I think we should now move to integrating all of your Windows- > specific changes in the jdk-windows tree and then we'll put together a > Big Windows Patch and push that to mainline. I don't think it'll take > long. Let's just get it done! Sounds good to me, let's do that then. ________________________________________ From: Andrew Haley Sent: Saturday, July 25, 2020 08:30 To: Ludovic Henry; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net Cc: openjdk-aarch64 Subject: Re: [aarch64-port-dev ] RFR(S): 8248676: AArch64: Add workaround for LITable constructor On 24/07/2020 17:02, Ludovic Henry wrote: > Are you saying that you would like this change to land into > `aarch64-port/jdk-windows` before getting into jdk/jdk? This change > doesn't strike me as windows-aarch64 specific and is in line with > general removal of GCC-specific code (similarly to the LP64 vs > LLP64, or JDK-8248666). > > Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248676%2Fwebrev.01&data=02%7C01%7Cluhenry%40microsoft.com%7Cac4df4162a664138e45308d830afacd7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312878365016046&sdata=9994EG19jrHKN5ITL4unJ4E4UeA5g%2FEG0w%2BoRlJaIGA%3D&reserved=0 You make a good point. I didn't want to get integer type cleanups mixed up with the Windows import, so I wanted to do them first. I think there was a general feeling, expressed by Dalibor, the leader of the Porters' Group, that the Windows changes should be integrated into the https://nam06.safelinks.protection.outlook.com/?url=http%3A%2F%2Fhg.openjdk.java.net%2Faarch64-port%2Fjdk-windows%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cac4df4162a664138e45308d830afacd7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312878365026037&sdata=GFvB2M2i7Dfgsxi%2B61JpdBatSAORz%2Bt1TQwpG06eBfY%3D&reserved=0 tree. This change is marginal, IMO. Clearly it's a GCC-ism, so I won't refuse it being cleaned up in mainline if you want. But I think we should now move to integrating all of your Windows- specific changes in the jdk-windows tree and then we'll put together a Big Windows Patch and push that to mainline. I don't think it'll take long. Let's just get it done! -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fkeybase.io%2Fandrewhaley&data=02%7C01%7Cluhenry%40microsoft.com%7Cac4df4162a664138e45308d830afacd7%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637312878365026037&sdata=S%2FaJkySd0c1SlT1b6XWYoAJQrkI6Vzm2X%2Fd44oEyaUw%3D&reserved=0 EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From aph at redhat.com Sun Jul 26 09:56:31 2020 From: aph at redhat.com (Andrew Haley) Date: Sun, 26 Jul 2020 10:56:31 +0100 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> Message-ID: <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> On 25/07/2020 00:42, Kim Barrett wrote: > Why are we deprecating something rather than just deleting it and > fixing any users? C++ overloading. AArch64 CMP (immediate) only has a limited range, so we only have a byte-wide Assembler::cmp() definition. The deprecation warning on the wider version makes sure that any maintenance programmer is immediately warned if it is used. There are other things we could do: by not providing a definition for the wider cmp() you get a link error, but that wouldn't be as explicit as a deprecation warning. The root problem is that the immediate value to CMP isn't always known when HotSpot is compiled, but may be calculated at runtime. We have seen failures in production when an immediate offset overflowed. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From kim.barrett at oracle.com Sun Jul 26 21:41:47 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Sun, 26 Jul 2020 17:41:47 -0400 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> Message-ID: > On Jul 26, 2020, at 5:56 AM, Andrew Haley wrote: > > On 25/07/2020 00:42, Kim Barrett wrote: >> Why are we deprecating something rather than just deleting it and >> fixing any users? > > C++ overloading. AArch64 CMP (immediate) only has a limited range, so > we only have a byte-wide Assembler::cmp() definition. The deprecation > warning on the wider version makes sure that any maintenance > programmer is immediately warned if it is used. There are other things > we could do: by not providing a definition for the wider cmp() you get > a link error, but that wouldn't be as explicit as a deprecation > warning. > > The root problem is that the immediate value to CMP isn't always known > when HotSpot is compiled, but may be calculated at runtime. We have > seen failures in production when an immediate offset overflowed. Yeah, I'd guessed that might be the point, and confirmed it later by looking at the changeset that originally introduced the attribute. As of early last week, a definition of "= delete;" is the way to poison an overload. From luhenry at microsoft.com Sun Jul 26 23:10:56 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Sun, 26 Jul 2020 23:10:56 +0000 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com>, Message-ID: > As of early last week, a definition of "= delete;" is the way to > poison an overload. Let me try that locally, compile on Windows-AArch64 and Linux-AArch64, and confirm whether it works for MSVC. ________________________________________ From: Kim Barrett Sent: Sunday, July 26, 2020 14:41 To: Andrew Haley Cc: Ludovic Henry; Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; hotspot-gc-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > On Jul 26, 2020, at 5:56 AM, Andrew Haley wrote: > > On 25/07/2020 00:42, Kim Barrett wrote: >> Why are we deprecating something rather than just deleting it and >> fixing any users? > > C++ overloading. AArch64 CMP (immediate) only has a limited range, so > we only have a byte-wide Assembler::cmp() definition. The deprecation > warning on the wider version makes sure that any maintenance > programmer is immediately warned if it is used. There are other things > we could do: by not providing a definition for the wider cmp() you get > a link error, but that wouldn't be as explicit as a deprecation > warning. > > The root problem is that the immediate value to CMP isn't always known > when HotSpot is compiled, but may be calculated at runtime. We have > seen failures in production when an immediate offset overflowed. Yeah, I'd guessed that might be the point, and confirmed it later by looking at the changeset that originally introduced the attribute. As of early last week, a definition of "= delete;" is the way to poison an overload. From xxinliu at amazon.com Sun Jul 26 23:46:38 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Sun, 26 Jul 2020 23:46:38 +0000 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init Message-ID: <1595807197546.52082@amazon.com> hi, Reviewers, Could you review this simple patch? bug: https://bugs.openjdk.java.net/browse/JDK-8249809 webrev: https://cr.openjdk.java.net/~xliu/8249809/00/webrev/ When the users specify a method-level compiler directive, the DirectiveSet is cloned for every single compiling method. It's expensive but rarely hit. Actually, Only user-specified methods must clone the DirectiveSet. I introduce a smart pointer DirectiveSetPtr. operator->() returns a pointer to a constant DirectiveSet, which is read-only. It doesn't clone the _origin until c2 need to update its members. transfer() yield the ownership of the pointer. Test: manually tests with different CompileComand options. hotspot:tier1 and gtest:all. thanks, --lx From david.holmes at oracle.com Mon Jul 27 01:26:31 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 27 Jul 2020 11:26:31 +1000 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> Message-ID: <8eab1729-9a35-8c79-65cf-e67a098965d5@oracle.com> Hi Coleen, On 24/07/2020 11:10 pm, coleen.phillimore at oracle.com wrote: > > I can also replace method_oop with method_ptr in the CPU ad files, and > this seems to build but now someone who knows the compiler area needs to > comment; this was supposed to be trivial... :)? But it still is really > trivial to look at. method_ptr works for me. Changes seem fine. > I left interpreter_method_oop_reg and compiler_method_oop_reg and > friends in opto/matcher.cpp for someone else. Okay. Hopefully someone will pick it up. Thanks, David ----- > incremental webrev at > http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev > full webrev at http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev > > Thanks, > Coleen > > > On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: >> >> Thanks for looking at this. >> >> On 7/24/20 1:01 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>>> See bug for more details.? I've been running into these names a lot >>>> lately.?? Many of these names are in JVMTI. >>>> >>>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>>> platforms. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >>> >>> src/hotspot/cpu/*/*.ad >>> >>> These still refer to "method oop" and method_oop in a number of places. >> >> Yes, I only replaced method_oop in the shared code and not in the AD >> code.? method_oop can be the name of a parameter and using "sed" to >> change it to "method" doesn't work.?? Somebody who understands this >> code and looks at it will have to make the rest of the changes. >> >> What I did was replace "method oop" with "method" and "methodOop" with >> "method" in all the sources.? I replaced "method_oop" with "method" or >> "checked_method" in the shared sources. >> >>> >>> src/hotspot/share/adlc/adlparse.cpp >>> >>> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >>> entry"); >>> >>> I guess I'm not understanding the scope of this renaming - why is >>> _interpreter_method_oop_reg not renamed as well? Should this (and >>> other uses) be parsed as method-(oop-reg) rather than (method-oop)-reg? >> >> I don't know this code, so I'd rather not change any more of it. The >> comment makes sense changed, even though the variable name still >> refers to method_oop. >> >> Thanks, >> Coleen >>> >>> Otherwise all okay. >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Coleen >> > From david.holmes at oracle.com Mon Jul 27 01:46:03 2020 From: david.holmes at oracle.com (David Holmes) Date: Mon, 27 Jul 2020 11:46:03 +1000 Subject: RFR: 8250556: revert JVMCI part of JDK-8230395 In-Reply-To: <0DE58760-A197-46FD-99B0-A2C1A5394DEE@oracle.com> References: <0DE58760-A197-46FD-99B0-A2C1A5394DEE@oracle.com> Message-ID: Hi Doug, This looks like an accurate reversal of the previous changes. Thanks, David ----- On 25/07/2020 4:35 am, Doug Simon wrote: > (with correct subject this time) > > Please review this bug fix to revert the JVMCI changes made as part of JDK-8230395. > > Instead of aborting the VM when JVMCI counter expansion fails, the JVMCI client should simply be informed of the failure (as was originally suggested by David). > > https://bugs.openjdk.java.net/browse/JDK-8250556 > https://cr.openjdk.java.net/~dnsimon/8250556/webrev.00/ > > Testing: hs-tier1,hs-tier2,hs-tier3-graal > > -Doug > From ningsheng.jian at arm.com Mon Jul 27 01:58:38 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Mon, 27 Jul 2020 09:58:38 +0800 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <2bc029fc-2823-18ac-9aa0-1a8edd7f9094@oracle.com> References: <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> <8c05d468-8753-b671-e3a9-92a7148f4f14@oracle.com> <2bc029fc-2823-18ac-9aa0-1a8edd7f9094@oracle.com> Message-ID: <942c4be0-4f5d-acd6-86ae-e6769215ca37@arm.com> Thank you Erik! Regards, Ningsheng On 7/23/20 9:06 PM, Erik Joelsson wrote: > Hello Ningsheng, > > Build change looks good. > > /Erik > > On 2020-07-23 01:02, Ningsheng Jian wrote: >> Hi Vladimir, >> >> Thanks for pointing out this. Yes, I missed that change in shared >> code. I've regenerated the webrev, with GensrcAdlc.gmk file change >> included: >> >> http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ >> >> >> Also add build-dev. >> >> Thanks, >> Ningsheng >> >> On 7/23/20 5:36 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ >>> >>> >>> >>> >>> FTR there's one more aarch64-specific change in shared code to enable >>> aarch64_neon.ad processing: >>> >>> diff --git a/make/hotspot/gensrc/GensrcAdlc.gmk >>> b/make/hotspot/gensrc/GensrcAdlc.gmk >>> --- a/make/hotspot/gensrc/GensrcAdlc.gmk >>> +++ b/make/hotspot/gensrc/GensrcAdlc.gmk >>> @@ -129,6 +129,12 @@ >>> >>> $d/os_cpu/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_OS)_$(HOTSPOT_TARGET_CPU_ARCH).ad >>> \ >>> ????? ))) >>> >>> +? ifeq ($(HOTSPOT_TARGET_CPU_ARCH), aarch64) >>> +??? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, >>> $(AD_SRC_ROOTS), \ >>> + $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/$(HOTSPOT_TARGET_CPU_ARCH)_neon.ad \ >>> +??? ))) >>> +? endif >>> + >>> ??? ifeq ($(call check-jvm-feature, shenandoahgc), true) >>> ????? AD_SRC_FILES += $(call uniq, $(wildcard $(foreach d, >>> $(AD_SRC_ROOTS), \ >>> >>> $d/cpu/$(HOTSPOT_TARGET_CPU_ARCH)/gc/shenandoah/shenandoah_$(HOTSPOT_TARGET_CPU).ad >>> \ >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> On 7/8/20 3:05 PM, Yang Zhang wrote: >>>>> Hi Andrew >>>>> >>>>> I have updated this patch. Could you please help to review it again? >>>>> In this patch, the following changes are made: >>>>> 1. Separate newly added NEON instructions to a new ad file >>>>> ??? aarch64_neon.ad >>>>> 2. Add assembler tests for NEON instructions. Trailing spaces >>>>> ??? in the python script are also removed. >>>>> >>>>> http://cr.openjdk.java.net/~yzhang/vectorapi/vectorapi.rfr/aarch64_webrev/webrev.02/ >>>>> >>>>> >>>>> Thanks, >>>>> Yang >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Andrew Haley >>>>> Sent: Tuesday, June 30, 2020 12:10 AM >>>>> To: Yang Zhang ; Viswanathan, Sandhya >>>>> ; Paul Sandoz >>>>> Cc: nd ; hotspot-compiler-dev at openjdk.java.net; >>>>> hotspot-dev at openjdk.java.net; core-libs-dev at openjdk.java.net; >>>>> aarch64-port-dev at openjdk.java.net >>>>> Subject: Re: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of >>>>> Vector API (Incubator): AArch64 backend changes >>>>> >>>>> On 29/06/2020 08:48, Yang Zhang wrote: >>>>>> 1. Instructions that can be matched with NEON instructions directly. >>>>>> MulVB, SqrtVF and AbsV have been merged into jdk master already. >>>>>> >>>>>> 2. Instructions that jdk master has middle end support for, but >>>>>> they cannot be matched with NEON instructions directly. >>>>>> Such as AddReductionVL, MulReductionVL, And/Or/XorReductionV These >>>>>> new instructions can be moved into jdk master first, but for >>>>>> auto-vectorization, the performance might not get improved. >>>>>> >>>>>> 3. Panama/Vector API specific? instructions such as >>>>>> Load/StoreVector ( 16 bits), VectorReinterpret, VectorMaskCmp, >>>>>> MaxV/MinV, VectorBlend etc. >>>>>> These instructions cannot be moved into jdk master first because >>>>>> there isn't middle-end support. >>>>>> >>>>>> I will put 2 and 3 in a new ad file aarch64_neon.ad. I will also >>>>>> update aarch64_asmtest.py and macroassemler.cpp. When the patch is >>>>>> ready, I will send it again. >>>>> >>>>> Thank you *very* much for your hard work. Appreciated! >>>>> >>>>> -- >>>>> Andrew Haley? (he/him) >>>>> Java Platform Lead Engineer >>>>> Red Hat UK Ltd. >>>>> https://keybase.io/andrewhaley >>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>> >>>> >> From christian.hagedorn at oracle.com Mon Jul 27 06:42:45 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 27 Jul 2020 08:42:45 +0200 Subject: [16] RFR(XS): 8249602: C2: assert(cnt == _outcnt) failed: no insertions allowed In-Reply-To: <82cbd463-d480-b882-04da-0d1269717fff@oracle.com> References: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> <82cbd463-d480-b882-04da-0d1269717fff@oracle.com> Message-ID: Thank you Vladimir for your review! Best regards, Christian On 24.07.20 20:03, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/24/20 5:57 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8249602 >> http://cr.openjdk.java.net/~chagedorn/8249602/webrev.00/ >> >> The testcase hits the assert when inserting a post loop. When >> correcting the fall-in values to the post-loop phis to take the values >> from the main-loop, we have to separately handle nodes that belong to >> the backedge control block and cannot float. In this process, we clone >> data nodes in PhaseIdealLoop::clone_up_backedge_goo and then hit the >> assert because some nodes to be cloned have a control input from the >> main-loop header node (main_head). These nodes are cloned and the >> main_head node gets these nodes as additional output nodes. This >> should be fine but the DUIterator_Fast forbids insertions. >> >> The fix simply switches to a normal DUIterator which allows >> insertions. This should also be done when correcting the fall-in >> values to the main-loop to take the values from the pre-loop. >> >> Best regards, >> Christian From tobias.hartmann at oracle.com Mon Jul 27 07:35:56 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 27 Jul 2020 09:35:56 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> <2f5978fe-af76-df18-15c0-dcc62563299d@oracle.com> Message-ID: <15b4d739-a972-13db-5b61-f08ab24d2ca7@oracle.com> +1 Best regards, Tobias On 24.07.20 23:41, Vladimir Kozlov wrote: > Good. > > Thanks, > Vladimir K > > On 7/24/20 4:44 AM, Christian Hagedorn wrote: >> Hi Tobias >> >> Thank you for your review! >> >>> Please make sure to run performance testing. >> >> There is a repeated regression in the micros open crypto benchmark >> openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes with these two settings: >> - algorithm=SHA1PRNG-dataSize:64-provider:-shared:false >> - algorithm=SHA1PRNG-dataSize:64-provider:-shared:true >> >> Repeated runs with these two settings resulted in a regression between 1 and 2%. I could trace it >> back to the additional type filtering in PhiNode::Value() (webrev.02). This is only required for >> the assertion code and not for the bailout fix itself. When running performance testing with >> webrev.01, the regressions disappear. >> >> I therefore suggest to go with webrev.01 (without assertion code and type filtering) and file a >> new RFE to investigate the usage of type filtering in PhiNode::Value() for iv phis and why we get >> a performance regression in these two benchmark settings. In theory, I think it should be >> beneficial to narrow the type range of iv phis. >> >>> cfgnode.cpp:1083 >>> - There's an extra whitespace before "," >>> >>> loopopts.cpp:84/86 >>> - No need for extra brackets >> >> These are not present anymore in webrev.01. >> >> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >> >> Best regards, >> Christian >> >> >> On 20.07.20 11:14, Tobias Hartmann wrote: >>> Hi Christian, >>> >>> On 15.07.20 15:08, Christian Hagedorn wrote: >>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ >>> >>> Looks good to me. >>> >>> Some code style comments: >> >> >>> Best regards, >>> Tobias >>> From tobias.hartmann at oracle.com Mon Jul 27 07:50:15 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Mon, 27 Jul 2020 09:50:15 +0200 Subject: [16] RFR(XS): 8249602: C2: assert(cnt == _outcnt) failed: no insertions allowed In-Reply-To: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> References: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> Message-ID: <81184963-aa5a-17e8-cc99-58e60045cbbb@oracle.com> Hi Christian, On 24.07.20 14:57, Christian Hagedorn wrote: > http://cr.openjdk.java.net/~chagedorn/8249602/webrev.00/ Looks good to me! Small suggestion: The test can be executed in same VM mode because it does not require any additional flags. No new webrev required. JDK 11 is probably affected as well, right? If so, please add the corresponding affects version. Best regards, Tobias From christian.hagedorn at oracle.com Mon Jul 27 07:58:23 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 27 Jul 2020 09:58:23 +0200 Subject: [16] RFR(S): 8248552: C2 crashes with SIGFPE due to division by zero In-Reply-To: <15b4d739-a972-13db-5b61-f08ab24d2ca7@oracle.com> References: <70e8e42b-5cb3-9c1e-419e-2f771f042368@oracle.com> <3ba2ef6a-8ade-7ede-5252-21051c34b472@oracle.com> <9e2f26bd-daa4-9540-8401-9850e0beea94@oracle.com> <5b2e7b1b-24f7-d575-58a3-376ec9ab7944@oracle.com> <518cd022-73e1-cb5c-499d-86853ae679c3@oracle.com> <2f5978fe-af76-df18-15c0-dcc62563299d@oracle.com> <15b4d739-a972-13db-5b61-f08ab24d2ca7@oracle.com> Message-ID: <8d1cd893-b366-272a-59f9-e3180526a07c@oracle.com> Thank you Vladimir and Tobias for reviewing it again! I filed an RFE [1] to investigate the change in PhiNode::Value() further. Best regards, Christian [1] https://bugs.openjdk.java.net/browse/JDK-8250607 On 27.07.20 09:35, Tobias Hartmann wrote: > +1 > > Best regards, > Tobias > > On 24.07.20 23:41, Vladimir Kozlov wrote: >> Good. >> >> Thanks, >> Vladimir K >> >> On 7/24/20 4:44 AM, Christian Hagedorn wrote: >>> Hi Tobias >>> >>> Thank you for your review! >>> >>>> Please make sure to run performance testing. >>> >>> There is a repeated regression in the micros open crypto benchmark >>> openjdk.bench.javax.crypto.small.SecureRandomBench.nextBytes with these two settings: >>> - algorithm=SHA1PRNG-dataSize:64-provider:-shared:false >>> - algorithm=SHA1PRNG-dataSize:64-provider:-shared:true >>> >>> Repeated runs with these two settings resulted in a regression between 1 and 2%. I could trace it >>> back to the additional type filtering in PhiNode::Value() (webrev.02). This is only required for >>> the assertion code and not for the bailout fix itself. When running performance testing with >>> webrev.01, the regressions disappear. >>> >>> I therefore suggest to go with webrev.01 (without assertion code and type filtering) and file a >>> new RFE to investigate the usage of type filtering in PhiNode::Value() for iv phis and why we get >>> a performance regression in these two benchmark settings. In theory, I think it should be >>> beneficial to narrow the type range of iv phis. >>> >>>> cfgnode.cpp:1083 >>>> - There's an extra whitespace before "," >>>> >>>> loopopts.cpp:84/86 >>>> - No need for extra brackets >>> >>> These are not present anymore in webrev.01. >>> >>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.01/ >>> >>> Best regards, >>> Christian >>> >>> >>> On 20.07.20 11:14, Tobias Hartmann wrote: >>>> Hi Christian, >>>> >>>> On 15.07.20 15:08, Christian Hagedorn wrote: >>>>> http://cr.openjdk.java.net/~chagedorn/8248552/webrev.02/ >>>> >>>> Looks good to me. >>>> >>>> Some code style comments: >>> >>> >>>> Best regards, >>>> Tobias >>>> From christian.hagedorn at oracle.com Mon Jul 27 08:12:35 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Mon, 27 Jul 2020 10:12:35 +0200 Subject: [16] RFR(XS): 8249602: C2: assert(cnt == _outcnt) failed: no insertions allowed In-Reply-To: <81184963-aa5a-17e8-cc99-58e60045cbbb@oracle.com> References: <2cd118ab-c117-bf61-ae03-117b9383a5e6@oracle.com> <81184963-aa5a-17e8-cc99-58e60045cbbb@oracle.com> Message-ID: Hi Tobias On 27.07.20 09:50, Tobias Hartmann wrote: > Hi Christian, > > On 24.07.20 14:57, Christian Hagedorn wrote: >> http://cr.openjdk.java.net/~chagedorn/8249602/webrev.00/ > > Looks good to me! Thank you for your review! > Small suggestion: The test can be executed in same VM mode because it does not require any > additional flags. No new webrev required. Yes that makes sense. I updated my webrev inline. > JDK 11 is probably affected as well, right? If so, please add the corresponding affects version. Even though I could not directly reproduce it in JDK-11, it should also be affected as the changed code was already there for a long time. I added 11 as affected version. Best regards, Christian From nick.gasson at arm.com Mon Jul 27 08:50:56 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Mon, 27 Jul 2020 16:50:56 +0800 Subject: RFR(S): 8237483: AArch64 C1 OopMap inserted twice fatal error Message-ID: <85k0ypjq8f.fsf@nicgas01-pc.shanghai.arm.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8237483 Webrev: http://cr.openjdk.java.net/~ngasson/8237483/webrev.0/ In the method java.util.zip.Inflater::inflate C1 generates these two LIR instructions: 724 move [c_rarg3|I] [Base:[c_rarg1|L] Disp: 2147483647|I] [patch_normal] [bci:95] 728 throw [c_rarg3|I] [c_rarg0|L] [bci:100] The move instruction at 724 generates a runtime call to deoptimise the method since this patching is not implemented on AArch64. An oop map is inserted for the return PC of the runtime call (LIR_Assembler::deoptimize_trap()). The following throw LIR instruction then inserts another oop map at the same PC, triggering an assertion failure. To reproduce: make test TEST="compiler/c1/CanonicalizeArrayLength.java" \ JTREG="VM_OPTIONS=-Xcomp" This patch just adds a NOP in this situation to ensure the PCs are unique. Not sure if there's a better way to do it? Tested hotspot_all_no_apps, jdk_core. -- Thanks, Nick From aph at redhat.com Mon Jul 27 09:40:57 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 27 Jul 2020 10:40:57 +0100 Subject: [aarch64-port-dev ] RFR(S): 8237483: AArch64 C1 OopMap inserted twice fatal error In-Reply-To: <85k0ypjq8f.fsf@nicgas01-pc.shanghai.arm.com> References: <85k0ypjq8f.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <04c4f9e0-e29a-3250-878c-2b29c11a45d8@redhat.com> On 7/27/20 9:50 AM, Nick Gasson wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8237483 > Webrev: http://cr.openjdk.java.net/~ngasson/8237483/webrev.0/ > > In the method java.util.zip.Inflater::inflate C1 generates these two LIR > instructions: > > 724 move [c_rarg3|I] [Base:[c_rarg1|L] Disp: 2147483647|I] [patch_normal] [bci:95] > 728 throw [c_rarg3|I] [c_rarg0|L] [bci:100] > > The move instruction at 724 generates a runtime call to deoptimise the > method since this patching is not implemented on AArch64. An oop map is > inserted for the return PC of the runtime call > (LIR_Assembler::deoptimize_trap()). The following throw LIR instruction > then inserts another oop map at the same PC, triggering an assertion > failure. > > To reproduce: > > make test TEST="compiler/c1/CanonicalizeArrayLength.java" \ > JTREG="VM_OPTIONS=-Xcomp" > > This patch just adds a NOP in this situation to ensure the PCs are > unique. Not sure if there's a better way to do it? I would have thought it would make more sense, rather than asserting, simply to detect that we already have an oopmap so we don't need another one. Having said that, it's probably not worth worrying about so your fix is OK. It needs a better comment, though. The only way to find out why this code is here would be to trawl the email archives. Something like this would do: // In the method java.util.zip.Inflater::inflate C1 generates these two LIR // instructions: // 724 move [c_rarg3|I] [Base:[c_rarg1|L] Disp: 2147483647|I] [patch_normal] [bci:95] // 728 throw [c_rarg3|I] [c_rarg0|L] [bci:100] // The move instruction at 724 generates a runtime call to deoptimise the // method since this patching is not implemented on AArch64. An oop map is // inserted for the return PC of the runtime call // (LIR_Assembler::deoptimize_trap()). The following throw LIR instruction // then inserts another oop map at the same PC, triggering an assertion // failure. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From patric.hedlin at oracle.com Mon Jul 27 10:02:49 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 27 Jul 2020 12:02:49 +0200 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <2809ab8c-4a2e-c0c3-9b93-a0f5df41b992@redhat.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> <2809ab8c-4a2e-c0c3-9b93-a0f5df41b992@redhat.com> Message-ID: <4d5b4219-3f9a-f606-64cc-4bc40fe2c7bd@oracle.com> Hi Andrew, On 2020-07-09 17:48, Andrew Haley wrote: > On 07/07/2020 12:17, Patric Hedlin wrote: >> Dear all, >> >> I would like to ask for help to review the following change/update: >> >> Issue:? https://bugs.openjdk.java.net/browse/JDK-8247766 >> Webrev: http://cr.openjdk.java.net/~phedlin/tr8247766/ >> >> >> C1 code generation for reading and writing stack-slots does not handle >> large immediate offsets on aarch64. This patch will ensure that >> immediate offsets are admissible for base+(immediate)offset encoding or, >> if this is not the case, will enforce an explicit address calculation to >> a scratch register. (Also correcting a small glitch in 9-bit signed >> immediate encoding check.) > This is all very complicated. > > So it seems to me that there is a better way to do this. We already have > MacroAssembler::legitimize_address(), and you should use that. > > Like so: > > diff -r 7c59af4db158 src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Jul 09 11:01:29 2020 -0400 > +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp Thu Jul 09 11:36:02 2020 -0400 > @@ -736,25 +736,32 @@ > > void LIR_Assembler::reg2stack(LIR_Opr src, LIR_Opr dest, BasicType type, bool pop_fpu_stack) { > if (src->is_single_cpu()) { > + int index = dest->single_stack_ix(); > if (is_reference_type(type)) { > - __ str(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); > + __ str(src->as_register(), > + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerWord, rscratch1)); > __ verify_oop(src->as_register()); > } else if (type == T_METADATA || type == T_DOUBLE || type == T_ADDRESS) { > - __ str(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); > + __ str(src->as_register(), > + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerWord, rscratch1)); > } else { > - __ strw(src->as_register(), frame_map()->address_for_slot(dest->single_stack_ix())); > + __ strw(src->as_register(), > + __ legitimize_address(frame_map()->address_for_slot(index), BytesPerInt, rscratch1)); > } > > } else if (src->is_double_cpu()) { > Address dest_addr_LO = frame_map()->address_for_slot(dest->double_stack_ix(), lo_word_offset_in_bytes); > + dest_addr_LO = __ legitimize_address(dest_addr_LO, BytesPerLong, rscratch1); > __ str(src->as_register_lo(), dest_addr_LO); > > } else if (src->is_single_fpu()) { > Address dest_addr = frame_map()->address_for_slot(dest->single_stack_ix()); > + dest_addr = __ legitimize_address(dest_addr, BytesPerInt, rscratch1); > __ strs(src->as_float_reg(), dest_addr); > > } else if (src->is_double_fpu()) { > Address dest_addr = frame_map()->address_for_slot(dest->double_stack_ix()); > + dest_addr = __ legitimize_address(dest_addr, BytesPerLong, rscratch1); > __ strd(src->as_double_reg(), dest_addr); > > } else { > > stack_offset_in_reach() seems to duplicate the functionality of offset_ok_for_immed(), > and it's only used in this one place. By all means please use the new is_uimm() and > is_simm() in offset_ok_for_immed(). > I've refreshed the webrev (as discussed off-line), moving legitimize_address() into the stack_slot_address() with additional conditions related to the (well-aligned) frame slot address produced. Use of is_simm9() and is_uimm12() is now using implementation in Assembler. /Patric From felix.yang at huawei.com Mon Jul 27 12:27:19 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Mon, 27 Jul 2020 12:27:19 +0000 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares Message-ID: Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a single CmpU. At the crash site in IfNode::fold_compares_helper: 995 if (lo && hi) { 996 // Merge the two compares into a single unsigned compare by building (CmpU (n - lo) (hi - lo)) 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); 998 if (adjusted_lim == NULL) { 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); 1000 } At line 997, we have: (gdb) p lo->dump() 641 AddI === _ 513 92 [[]] $1 = void After the transformation at line 997, we have (gdb) p lo->dump() 641 AddI === _ _ _ [[]] [34200641] $3 = void Then node 641 was used at line 999, which triggers the crash. Patch fixes the issue by delaying transformation in IfNode::fold_compares temporarily. Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. Newly added test fail without the patch and pass otherwise. Suggestions? Thanks, Felix From aph at redhat.com Mon Jul 27 13:24:23 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 27 Jul 2020 14:24:23 +0100 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <4d5b4219-3f9a-f606-64cc-4bc40fe2c7bd@oracle.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> <2809ab8c-4a2e-c0c3-9b93-a0f5df41b992@redhat.com> <4d5b4219-3f9a-f606-64cc-4bc40fe2c7bd@oracle.com> Message-ID: <6663f2a2-ccd8-5692-d90b-6ea664294ea8@redhat.com> Hi, On 27/07/2020 11:02, Patric Hedlin wrote: > > I've refreshed the webrev (as discussed off-line), moving > legitimize_address() into the stack_slot_address() with additional > conditions related to the (well-aligned) frame slot address produced. > Use of is_simm9() and is_uimm12() is now using implementation in Assembler. That's much nicer. Some minor nits... Please pass the scratch register to be used as an argument to stack_slot_address: +// Ensure a valid Address (base + offset) to a stack-slot. If stack access is +// not encodable as a base + (immediate) offset, generate an explicit address +// calculation to hold the address in a temporary register (rscratch1). +Address LIR_Assembler::stack_slot_address(int index, uint size, int adjust) { These consts are too obscure. Please be explicit: either use one of the predefined constants that mean the same thing (such as BytesPerInt) or if you really want low-level types, sizeof int32_t: + uint const c_sz32 = 4; + uint const c_sz64 = 8; Otherwise OK. It doesn't need another review with these changes. Thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From eric.c.liu at arm.com Mon Jul 27 14:35:17 2020 From: eric.c.liu at arm.com (Eric Liu) Date: Mon, 27 Jul 2020 14:35:17 +0000 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com>, Message-ID: Hi, We are planing to elide the redundant 'sxt' on AArch64 in macro-assembler for better performance and small code size. I think the redundant signed extend could be generated in following cases: A) Load a data less than 32 bits and then using it as 64 bits data. E.g. ldrsb w1, mem sxt x1, w1 B) And with a possible number. E.g. and w11, w1, #0xffff sxtw x0, w11 C) Sign extend a number twice. E.g. sxth w11, w1 sxtw x0, w11 To address issue A), current C2's ad file has about 8 match rules to match those kinds of patterns. E.g. // Load Byte (8 bit signed) into long instruct loadB2L(iRegLNoSp dst, memory1 mem) %{ match(Set dst (ConvI2L (LoadB mem))); predicate(!needs_acquiring_load(n->in(1))); ins_cost(4 * INSN_COST); format %{ "ldrsb $dst, $mem\t# byte" %} ins_encode(aarch64_enc_ldrsb(dst, mem)); ins_pipe(iload_reg_mem); %} For issue B), Boris' patch did a good job to elide the redundant 'sxt' followed by 'and'. But this pair could also be generated by other pattern, e.g. (ConvI2L (RShiftI (LShiftI src lshift_count) rshift_count)). This pattern can be reproduced by below case: public static long l2c2l (long x) { return (char) x; } For issue C), type conversion usually generate those piece of code. e.g. private static long test_l2s2l(long x) { return (short) x; } In my view, eliding the redundant 'sxt' above are more likely a machine code problem rather than a IR problem. I think peephole is the best fit for above optimization. However, C2's peephole is very complicated and I'm not sure whether it has been enabled in AArch64. So I was thinking if it better to remove this kinds of instructions in macro-assembler, even this sounds somehow beyond the assembler's responsibility. By handling those pair in macro-assembler, we can only focus on instruction sequence rather than the type and shape of IR node. What do you think? Welcome any feedback! -- Best regards, Eric From: hotspot-compiler-dev on behalf of Andrew Haley Sent: 24 July 2020 16:18 To: Boris Ulasevich ; aarch64-port-dev at openjdk.java.net Cc: hotspot-compiler-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values On 23/07/2020 12:25, Boris Ulasevich wrote: > Since the JDK-8248414 patch has been committed, I believe we can revive > this review. I think it is still better to move my rule to the ubfiz > command group, > which is in the auto-generated area. > > http://cr.openjdk.java.net/~bulasevich/8248870/webrev.02 OK, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. From aph at redhat.com Mon Jul 27 15:45:24 2020 From: aph at redhat.com (Andrew Haley) Date: Mon, 27 Jul 2020 16:45:24 +0100 Subject: [aarch64-port-dev ] RFR 8248870: AARCH64: I2L conversions can be skipped for small positive masked values In-Reply-To: References: <9ccf64f1-7a88-0f67-8b50-4dea09af9c8b@redhat.com> <05369383-c6d8-5e61-50ce-51fec955e2d4@bell-sw.com> Message-ID: <7c7cf3e3-1d7a-8d83-25c3-47a605055deb@redhat.com> > In my view, eliding the redundant 'sxt' above are more likely a > machine code problem rather than a IR problem. I think peephole is > the best fit for above optimization. However, C2's peephole is very > complicated and I'm not sure whether it has been enabled in AArch64. > So I was thinking if it better to remove this kinds of instructions > in macro-assembler, even this sounds somehow beyond the assembler's > responsibility. By handling those pair in macro-assembler, we can > only focus on instruction sequence rather than the type and shape of > IR node. Is that even possible? Sure, it works if an instruction's output is the same as its sole register input, but that's all. If these things can be canonicalized earlier in compilation they should be. This one: sxth w11, w1 sxtw x0, w11 Corresponds with (((long)n) << 48) >> 48 and and w11, w1, #0xffff sxtw x0, w11 with (((long)n) << 48) >>> 48 and that's how they could be canonicalized early in compilation. Whether that's a good idea depends on other processors, though. It'll probably not hurt them but it'll help us. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From volker.simonis at gmail.com Mon Jul 27 15:51:55 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Mon, 27 Jul 2020 17:51:55 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: <1595807197546.52082@amazon.com> References: <1595807197546.52082@amazon.com> Message-ID: Hi Xin, I'm not sure if saving the allocation of an DirectiveSet has any visible effect compared to the much larger allocations required for the method compilation itself. Apart from that, I must confess that I'm not totally understanding the original logic. From what I see, it sets "changed" to true in the case where it changes the cloned DirectiveSet. But it doesn't do that in the cases where it only changes the clone's control word: 341 set->_intrinsic_control_words.fill_in(TriBool()); ... 348 set->_intrinsic_control_words[id] = iter.is_enabled(); ... 361 set->_intrinsic_control_words.fill_in(TriBool()); ... 368 set->_intrinsic_control_words[id] = false; Why don't these mutations count as "changing" the cloned DirectiveSet? After your patch, you've changed the above lines such that they will always create a clone which seems different from the initial behaviour. Which of the two behaviours is correct here, the original one, the new one after your change or doesn't it matter for reasons I don't understand? I also wonder why you need to overload both operators "operator*()" and "operator->()"? It seems a little bit arbitrary (and hard to understand for people reading the code) that "operator*()" clones the underlying directiveSet while "operator->()" uses the original one. Why not just define two versions of "operator->()" and let the compiler choose the right one like so: DirectiveSet const* operator->() const { return !_clone ? _origin : _clone; } DirectiveSet* operator->() { if (!_clone) { _clone = DirectiveSet::clone(_origin); } return _clone; } ... if (!_modified[LogIndex]) { bool log = CompilerOracle::should_log(method); if (log != const_cast(set)->LogOption) { set->LogOption = log; } } Thank you and best regards, Volker On Mon, Jul 27, 2020 at 1:47 AM Liu, Xin wrote: > > hi, Reviewers, > > Could you review this simple patch? > bug: https://bugs.openjdk.java.net/browse/JDK-8249809 > webrev: https://cr.openjdk.java.net/~xliu/8249809/00/webrev/ > > When the users specify a method-level compiler directive, the DirectiveSet is cloned for every single compiling method. It's expensive but rarely hit. Actually, Only user-specified methods must clone the DirectiveSet. I introduce a smart pointer DirectiveSetPtr. operator->() returns a pointer to a constant DirectiveSet, which is read-only. It doesn't clone the _origin until c2 need to update its members. transfer() yield the ownership of the pointer. > > Test: > manually tests with different CompileComand options. > hotspot:tier1 and gtest:all. > > thanks, > --lx > From patric.hedlin at oracle.com Mon Jul 27 16:11:59 2020 From: patric.hedlin at oracle.com (Patric Hedlin) Date: Mon, 27 Jul 2020 18:11:59 +0200 Subject: [aarch64-port-dev ] RFR(S/M): 8247766: [aarch64] guarantee(val < (1U << nbits)) failed: Field too big for insn In-Reply-To: <6663f2a2-ccd8-5692-d90b-6ea664294ea8@redhat.com> References: <0cdbdf26-ad4d-056b-a801-cc31b2cc4ab3@oracle.com> <2809ab8c-4a2e-c0c3-9b93-a0f5df41b992@redhat.com> <4d5b4219-3f9a-f606-64cc-4bc40fe2c7bd@oracle.com> <6663f2a2-ccd8-5692-d90b-6ea664294ea8@redhat.com> Message-ID: <19f664ec-429e-0927-84e2-90749bffca5a@oracle.com> Thanks for reviewing Andrew. /Patric On 2020-07-27 15:24, Andrew Haley wrote: > Hi, > > On 27/07/2020 11:02, Patric Hedlin wrote: >> I've refreshed the webrev (as discussed off-line), moving >> legitimize_address() into the stack_slot_address() with additional >> conditions related to the (well-aligned) frame slot address produced. >> Use of is_simm9() and is_uimm12() is now using implementation in Assembler. > That's much nicer. Some minor nits... > > Please pass the scratch register to be used as an argument to > stack_slot_address: > > +// Ensure a valid Address (base + offset) to a stack-slot. If stack access is > +// not encodable as a base + (immediate) offset, generate an explicit address > +// calculation to hold the address in a temporary register (rscratch1). > +Address LIR_Assembler::stack_slot_address(int index, uint size, int adjust) { > > These consts are too obscure. Please be explicit: either use one of > the predefined constants that mean the same thing (such as > BytesPerInt) or if you really want low-level types, sizeof int32_t: > > + uint const c_sz32 = 4; > + uint const c_sz64 = 8; > > Otherwise OK. It doesn't need another review with these changes. > > Thanks. > From lutz.schmidt at sap.com Mon Jul 27 16:45:33 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Mon, 27 Jul 2020 16:45:33 +0000 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: References: Message-ID: Hi Community, any volunteers for a review? Sorry for being impatient. I'll be on vacation starting Wednesday, EOB, and would like to get this thing out of the way before. Thanks, Lutz ?On 24.07.20, 14:53, "Schmidt, Lutz" wrote: Resending after updating subject line with bug id. Sorry for the spam. Lutz On 24.07.20, 14:51, "Schmidt, Lutz" wrote: Dear all, may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ Local testing looks good. jdk/submit tests pending. Thank you! Lutz From vladimir.kozlov at oracle.com Mon Jul 27 19:15:03 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 27 Jul 2020 12:15:03 -0700 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: References: Message-ID: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> It happens because 'lo' is new node created just now and have no uses yet. For such new nodes we usually add dummy use to avoid removal from graph: http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/share/opto/convertnode.cpp#l403 Thanks, Vladimir K On 7/27/20 5:27 AM, Yangfei (Felix) wrote: > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 > Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ > > In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a single CmpU. > At the crash site in IfNode::fold_compares_helper: > 995 if (lo && hi) { > 996 // Merge the two compares into a single unsigned compare by building (CmpU (n - lo) (hi - lo)) > 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); > 998 if (adjusted_lim == NULL) { > 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); > 1000 } > > At line 997, we have: > (gdb) p lo->dump() > 641 AddI === _ 513 92 [[]] > $1 = void > > After the transformation at line 997, we have > (gdb) p lo->dump() > 641 AddI === _ _ _ [[]] [34200641] > $3 = void > > Then node 641 was used at line 999, which triggers the crash. > Patch fixes the issue by delaying transformation in IfNode::fold_compares temporarily. > Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. > Newly added test fail without the patch and pass otherwise. > Suggestions? > > Thanks, > Felix > From vladimir.kozlov at oracle.com Mon Jul 27 19:35:48 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 27 Jul 2020 12:35:48 -0700 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: References: Message-ID: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> Nope, the check is correct. In hosted mode Graal is used as Java application and not as JIT compiler so that UseJVMCICompiler flag is false. The problem is really caused by recent 8248321 changes - it is regression. Doug, please advice how to fix it. I think the new JVMCI events code should be adjusted for hosted mode. Thanks, Vladimir On 7/24/20 5:53 AM, Schmidt, Lutz wrote: > Resending after updating subject line with bug id. > Sorry for the spam. > Lutz > > ?On 24.07.20, 14:51, "Schmidt, Lutz" wrote: > > Dear all, > > may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 > Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ > > Local testing looks good. jdk/submit tests pending. > > Thank you! > Lutz > > > > From evgeny.nikitin at oracle.com Mon Jul 27 19:38:14 2020 From: evgeny.nikitin at oracle.com (Evgeny Nikitin) Date: Mon, 27 Jul 2020 21:38:14 +0200 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java Message-ID: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> Hi, Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ Adjusting the test to current state of the VM. - Definition of 'trivial code' does not depend on whether the method has been profiled or not; - Trivial code does only go level 0 to level 1; - Some refactoring. The change has been checked in mach5 for the 5 platforms (passed). Please review, /Evgeny Nikitin. From tom.rodriguez at oracle.com Mon Jul 27 19:51:27 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 27 Jul 2020 12:51:27 -0700 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> Message-ID: <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> Doug is away, but it seems like we could just remove the JVMCI_event_1 call in print_compilation_timers. It's not a particularly worthwhile notification and all the other events should be able to safely assume that JVMCI has actually been initialized which I think was probably the point of the guarantee. If that's not sufficient then we need to convert that guarantee in a check for NULL and return. tom Vladimir Kozlov wrote on 7/27/20 12:35 PM: > Nope, the check is correct. In hosted mode Graal is used as Java > application and not as JIT compiler so that UseJVMCICompiler flag is false. > > The problem is really caused by recent 8248321 changes - it is > regression. Doug, please advice how to fix it. > I think the new JVMCI events code should be adjusted for hosted mode. > > Thanks, > Vladimir > > On 7/24/20 5:53 AM, Schmidt, Lutz wrote: >> Resending after updating subject line with bug id. >> Sorry for the spam. >> Lutz >> >> ?On 24.07.20, 14:51, "Schmidt, Lutz" wrote: >> >> ???? Dear all, >> >> ???? may I please request reviews for this small fix? I would even say >> it is a trivial fix. It inverts an if condition such that JVMCI >> specific code is called only when JVMCI compilation is enabled via >> UseJVMCICompiler. >> >> ???? Bug:??? https://bugs.openjdk.java.net/browse/JDK-8250233 >> ???? Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ >> >> ???? Local testing looks good. jdk/submit tests pending. >> >> ???? Thank you! >> ???? Lutz >> >> >> >> From vladimir.kozlov at oracle.com Mon Jul 27 20:47:24 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Mon, 27 Jul 2020 13:47:24 -0700 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> Message-ID: <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> We simply missing EnableJVMCI flag check! That is why JVMCI is not initialized. EnableJVMCI should be true in hosted mode. I looked and I see few problematic places in compileBroker.cpp statistic code guarded by #if INCLUDE_JVMCI but which does not check EnableJVMCI or comp->is_jvmci(). I think it should be fixed. Also I think JVMCI_event_1 is useless now in print_compilation_timers() because its output does not go into tty. Thanks, Vladimir On 7/27/20 12:51 PM, Tom Rodriguez wrote: > Doug is away, but it seems like we could just remove the JVMCI_event_1 call in print_compilation_timers.? It's not a > particularly worthwhile notification and all the other events should be able to safely assume that JVMCI has actually > been initialized which I think was probably the point of the guarantee.? If that's not sufficient then we need to > convert that guarantee in a check for NULL and return. > > tom > > Vladimir Kozlov wrote on 7/27/20 12:35 PM: >> Nope, the check is correct. In hosted mode Graal is used as Java application and not as JIT compiler so that >> UseJVMCICompiler flag is false. >> >> The problem is really caused by recent 8248321 changes - it is regression. Doug, please advice how to fix it. >> I think the new JVMCI events code should be adjusted for hosted mode. >> >> Thanks, >> Vladimir >> >> On 7/24/20 5:53 AM, Schmidt, Lutz wrote: >>> Resending after updating subject line with bug id. >>> Sorry for the spam. >>> Lutz >>> >>> ?On 24.07.20, 14:51, "Schmidt, Lutz" wrote: >>> >>> ???? Dear all, >>> >>> ???? may I please request reviews for this small fix? I would even say it is a trivial fix. It inverts an if >>> condition such that JVMCI specific code is called only when JVMCI compilation is enabled via UseJVMCICompiler. >>> >>> ???? Bug:??? https://bugs.openjdk.java.net/browse/JDK-8250233 >>> ???? Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ >>> >>> ???? Local testing looks good. jdk/submit tests pending. >>> >>> ???? Thank you! >>> ???? Lutz >>> >>> >>> >>> From tom.rodriguez at oracle.com Mon Jul 27 20:50:20 2020 From: tom.rodriguez at oracle.com (Tom Rodriguez) Date: Mon, 27 Jul 2020 13:50:20 -0700 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> Message-ID: Vladimir Kozlov wrote on 7/27/20 1:47 PM: > We simply missing EnableJVMCI flag check! That is why JVMCI is not > initialized. EnableJVMCI should be true in hosted mode. Yes that makes sense. tom > > I looked and I see few problematic places in compileBroker.cpp > statistic code guarded by #if INCLUDE_JVMCI but which does not check > EnableJVMCI or comp->is_jvmci(). I think it should be fixed. > > Also I think JVMCI_event_1 is useless now in print_compilation_timers() > because its output does not go into tty. > > Thanks, > Vladimir > > On 7/27/20 12:51 PM, Tom Rodriguez wrote: >> Doug is away, but it seems like we could just remove the JVMCI_event_1 >> call in print_compilation_timers.? It's not a particularly worthwhile >> notification and all the other events should be able to safely assume >> that JVMCI has actually been initialized which I think was probably >> the point of the guarantee.? If that's not sufficient then we need to >> convert that guarantee in a check for NULL and return. >> >> tom >> >> Vladimir Kozlov wrote on 7/27/20 12:35 PM: >>> Nope, the check is correct. In hosted mode Graal is used as Java >>> application and not as JIT compiler so that UseJVMCICompiler flag is >>> false. >>> >>> The problem is really caused by recent 8248321 changes - it is >>> regression. Doug, please advice how to fix it. >>> I think the new JVMCI events code should be adjusted for hosted mode. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/24/20 5:53 AM, Schmidt, Lutz wrote: >>>> Resending after updating subject line with bug id. >>>> Sorry for the spam. >>>> Lutz >>>> >>>> ?On 24.07.20, 14:51, "Schmidt, Lutz" wrote: >>>> >>>> ???? Dear all, >>>> >>>> ???? may I please request reviews for this small fix? I would even >>>> say it is a trivial fix. It inverts an if condition such that JVMCI >>>> specific code is called only when JVMCI compilation is enabled via >>>> UseJVMCICompiler. >>>> >>>> ???? Bug:??? https://bugs.openjdk.java.net/browse/JDK-8250233 >>>> ???? Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ >>>> >>>> ???? Local testing looks good. jdk/submit tests pending. >>>> >>>> ???? Thank you! >>>> ???? Lutz >>>> >>>> >>>> >>>> From cjashfor at linux.ibm.com Tue Jul 28 01:49:06 2020 From: cjashfor at linux.ibm.com (Corey Ashford) Date: Mon, 27 Jul 2020 18:49:06 -0700 Subject: RFR(S): 8248188: [PATCH] Add HotSpotIntrinsicCandidate and API for Base64 decoding In-Reply-To: References: <11ca749f-3015-c004-aa6b-3194e1dfe4eb@linux.ibm.com> Message-ID: Michihiro Horie uploaded a new revision of the Base64 decodeBlock intrinsic API for me: http://cr.openjdk.java.net/~mhorie/8248188/webrev.01/ It has the following changes with respect to the original one posted: * In the event of encountering a non-base64 character, instead of having a separate error code of -1, the intrinsic can now just return either 0, or the number of data bytes produced up to the point where the illegal base64 character was encountered. This reduces the number of special cases, and also provides a way to speed up the process of finding the bad character by the slower, pure-Java algorithm. * The isMIME boolean is removed from the API for two reasons: - The current API is not sufficient to handle the isMIME case, because there isn't a strict relationship between the number of input bytes and the number of output bytes, because there can be an arbitrary number of non-base64 characters in the source. - If an intrinsic only implements the (isMIME == false) case as ours does, it will always return 0 bytes processed, which will slightly slow down the normal path of processing an (isMIME == true) instantiation. - We considered adding a separate hotspot candidate for the (isMIME == true) case, but since we don't have an intrinsic implementation to test that, we decided to leave it as a future optimization. Comments and suggestions are welcome. Thanks for your consideration. - Corey On 6/23/20 6:23 PM, Michihiro Horie wrote: > Hi Corey, > > Following is the issue I created. > https://bugs.openjdk.java.net/browse/JDK-8248188 > > I will upload a webrev when you're ready as we talked in private. > > Best regards, > Michihiro > > Inactive hide details for "Corey Ashford" ---2020/06/24 > 09:40:10---Currently in java.util.Base64, there is a > HotSpotIntrinsicCa"Corey Ashford" ---2020/06/24 09:40:10---Currently in > java.util.Base64, there is a HotSpotIntrinsicCandidate and API for > encodeBlock, but no > > From: "Corey Ashford" > To: "hotspot-compiler-dev at openjdk.java.net" > , > "ppc-aix-port-dev at openjdk.java.net" > Cc: Michihiro Horie/Japan/IBM at IBMJP, Kazunori Ogata/Japan/IBM at IBMJP, > joserz at br.ibm.com > Date: 2020/06/24 09:40 > Subject: RFR(S): [PATCH] Add HotSpotIntrinsicCandidate and API for > Base64 decoding > > ------------------------------------------------------------------------ > > > > Currently in java.util.Base64, there is a HotSpotIntrinsicCandidate and > API for encodeBlock, but none for decoding. ?This means that only > encoding gets acceleration from the underlying CPU's vector hardware. > > I'd like to propose adding a new intrinsic for decodeBlock. ?The > considerations I have for this new intrinsic's API: > > ?* Don't make any assumptions about the underlying capability of the > hardware. ?For example, do not impose any specific block size granularity. > > ?* Don't assume the underlying intrinsic can handle isMIME or isURL > modes, but also let them decide if they will process the data regardless > of the settings of the two booleans. > > ?* Any remaining data that is not processed by the intrinsic will be > processed by the pure Java implementation. ?This allows the intrinsic to > process whatever block sizes it's good at without the complexity of > handling the end fragments. > > ?* If any illegal character is discovered in the decoding process, the > intrinsic will simply return -1, instead of requiring it to throw a > proper exception from the context of the intrinsic. ?In the event of > getting a -1 returned from the intrinsic, the Java Base64 library code > simply calls the pure Java implementation to have it find the error and > properly throw an exception. ?This is a performance trade-off in the > case of an error (which I expect to be very rare). > > ?* One thought I have for a further optimization (not implemented in > the current patch), is that when the intrinsic decides not to process a > block because of some combination of isURL and isMIME settings it > doesn't handle, it could return extra bits in the return code, encoded > as a negative number. ?For example: > > Illegal_Base64_char ? = 0b001; > isMIME_unsupported ? ?= 0b010; > isURL_unsupported ? ? = 0b100; > > These can be OR'd together as needed and then negated (flip the sign). > The Base64 library code could then cache these flags, so it will know > not to call the intrinsic again when another decodeBlock is requested > but with an unsupported mode. ?This will save the performance hit of > calling the intrinsic when it is guaranteed to fail. > > I've tested the attached patch with an actual intrinsic coded up for > Power9/Power10, but those runtime intrinsics and arch-specific patches > aren't attached today. ?I want to get some consensus on the > library-level intrinsic API first. > > Also attached is a simple test case to test that the new intrinsic API > doesn't break anything. > > I'm open to any comments about this. > > Thanks for your consideration, > > - Corey > > > Corey Ashford > IBM Systems, Linux Technology Center, OpenJDK team > cjashfor at us dot ibm dot com > [attachment "decodeBlock_api-20200623.patch" deleted by Michihiro > Horie/Japan/IBM] [attachment "TestBase64.java" deleted by Michihiro > Horie/Japan/IBM] > > From xxinliu at amazon.com Tue Jul 28 03:39:08 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 28 Jul 2020 03:39:08 +0000 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com>, Message-ID: <1595907547514.55531@amazon.com> hi, Volker, Thank you to review my patch. 1. yes. I guess nodes are the major memory consumption, but compiler directives are for both c1 and c2. it still can reduce memory footprint a little. 2. Previous code set the flag here. both ControlIntrinsic and DisableIntrinsic belong to compilerdirectives_common_flags. 327 #define init_default_cc(name, type, dvalue, cc_flag) { type v; if (!_modified[name##Index] && CompilerOracle::has_option_value(method, #cc_flag, v) && v != this->name##Option) { set->name##Option = v; changed = true;} } 328 compilerdirectives_common_flags(init_default_cc) When method-level directives override the global directives, this code snippet set changed. Even though I remove the flag 'changed', I have the same logic in the smart pointer. 3. the smart pointer DirectiveSetPtr needs to provide 2 accesses of the underlying pointer. one is read-only and the other one is mutable. Ideally, it should has the following 2 operator->(). DirectiveSet* operator->(); DirectiveSet const* operator->(). AFAFI, C++ doesn't support covariant return type overload. That is to say, we need to find a way to work around. the reason I provide overload operator*() because it returns a reference to object. Users who want to modify the pointee have to explicitly dereference the smart pointer. (*set).member = newvalue; set->member = newvalue; // compiler error. your approach also works, but you need to invoke const_cast<> for all places where you want to read. I think my approach has shorter code. I just came up a new idea. How about I provide a method cloned(), which returns the unqualified pointer? + DirectiveSet* cloned() { + if (!_clone) { + _clone = DirectiveSet::clone(_origin); + } + return _clone; + } + DirectiveSet* transfer() { assert(_origin != NULL, "_origin is NULL! transfer() can only be invoked once."); if (_clone != NULL) { @@ -340,7 +347,7 @@ if (CompilerOracle::should_print(method)) { if (!_modified[PrintAssemblyIndex]) { - (*set).PrintAssemblyOption = true; + set.cloned()->PrintAssemblyOption = true; } } thanks, --lx ________________________________________ From: Volker Simonis Sent: Monday, July 27, 2020 8:51 AM To: Liu, Xin Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. Hi Xin, I'm not sure if saving the allocation of an DirectiveSet has any visible effect compared to the much larger allocations required for the method compilation itself. Apart from that, I must confess that I'm not totally understanding the original logic. From what I see, it sets "changed" to true in the case where it changes the cloned DirectiveSet. But it doesn't do that in the cases where it only changes the clone's control word: 341 set->_intrinsic_control_words.fill_in(TriBool()); ... 348 set->_intrinsic_control_words[id] = iter.is_enabled(); ... 361 set->_intrinsic_control_words.fill_in(TriBool()); ... 368 set->_intrinsic_control_words[id] = false; Why don't these mutations count as "changing" the cloned DirectiveSet? After your patch, you've changed the above lines such that they will always create a clone which seems different from the initial behaviour. Which of the two behaviours is correct here, the original one, the new one after your change or doesn't it matter for reasons I don't understand? I also wonder why you need to overload both operators "operator*()" and "operator->()"? It seems a little bit arbitrary (and hard to understand for people reading the code) that "operator*()" clones the underlying directiveSet while "operator->()" uses the original one. Why not just define two versions of "operator->()" and let the compiler choose the right one like so: DirectiveSet const* operator->() const { return !_clone ? _origin : _clone; } DirectiveSet* operator->() { if (!_clone) { _clone = DirectiveSet::clone(_origin); } return _clone; } ... if (!_modified[LogIndex]) { bool log = CompilerOracle::should_log(method); if (log != const_cast(set)->LogOption) { set->LogOption = log; } } Thank you and best regards, Volker On Mon, Jul 27, 2020 at 1:47 AM Liu, Xin wrote: > > hi, Reviewers, > > Could you review this simple patch? > bug: https://bugs.openjdk.java.net/browse/JDK-8249809 > webrev: https://cr.openjdk.java.net/~xliu/8249809/00/webrev/ > > When the users specify a method-level compiler directive, the DirectiveSet is cloned for every single compiling method. It's expensive but rarely hit. Actually, Only user-specified methods must clone the DirectiveSet. I introduce a smart pointer DirectiveSetPtr. operator->() returns a pointer to a constant DirectiveSet, which is read-only. It doesn't clone the _origin until c2 need to update its members. transfer() yield the ownership of the pointer. > > Test: > manually tests with different CompileComand options. > hotspot:tier1 and gtest:all. > > thanks, > --lx > From nick.gasson at arm.com Tue Jul 28 05:56:54 2020 From: nick.gasson at arm.com (Nick Gasson) Date: Tue, 28 Jul 2020 13:56:54 +0800 Subject: [aarch64-port-dev ] RFR(S): 8237483: AArch64 C1 OopMap inserted twice fatal error In-Reply-To: <04c4f9e0-e29a-3250-878c-2b29c11a45d8@redhat.com> References: <85k0ypjq8f.fsf@nicgas01-pc.shanghai.arm.com> <04c4f9e0-e29a-3250-878c-2b29c11a45d8@redhat.com> Message-ID: <85h7tsji6x.fsf@nicgas01-pc.shanghai.arm.com> On 07/27/20 17:40 pm, Andrew Haley wrote: > > I would have thought it would make more sense, rather than asserting, > simply to detect that we already have an oopmap so we don't need > another one. Having said that, it's probably not worth worrying about > so your fix is OK. > > It needs a better comment, though. The only way to find out why this > code is here would be to trawl the email archives. Something like this > would do: > > // In the method java.util.zip.Inflater::inflate C1 generates these two LIR > // instructions: > > // 724 move [c_rarg3|I] [Base:[c_rarg1|L] Disp: 2147483647|I] [patch_normal] [bci:95] > // 728 throw [c_rarg3|I] [c_rarg0|L] [bci:100] > > // The move instruction at 724 generates a runtime call to deoptimise the > // method since this patching is not implemented on AArch64. An oop map is > // inserted for the return PC of the runtime call > // (LIR_Assembler::deoptimize_trap()). The following throw LIR instruction > // then inserts another oop map at the same PC, triggering an assertion > // failure. Seems a bit too verbose? How about this: --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp @@ -2085,6 +2085,13 @@ void LIR_Assembler::throw_op(LIR_Opr exceptionPC, LIR_Opr exceptionOop, CodeEmit // get current pc information // pc is only needed if the method has an exception handler, the unwind code does not need it. + if (compilation()->debug_info_recorder()->last_pc_offset() == __ offset()) { + // As no instructions have been generated yet for this LIR node it's + // possible that an oop map already exists for the current offset. + // In that case insert an dummy NOP here to ensure all oop map PCs + // are unique. See JDK-8237483. + __ nop(); + } int pc_for_athrow_offset = __ offset(); InternalAddress pc_for_athrow(__ pc()); __ adr(exceptionPC->as_register(), pc_for_athrow); -- Thanks, Nick From shade at redhat.com Tue Jul 28 07:09:33 2020 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 28 Jul 2020 09:09:33 +0200 Subject: RFR (XS) 8250612: jvmciCompilerToVM.cpp declares jio_printf with "void" return type, should be "int" Message-ID: <90ebea60-d625-e67f-918b-1ba2a531316e@redhat.com> Bug: https://bugs.openjdk.java.net/browse/JDK-8250612 Fix: https://cr.openjdk.java.net/~shade/8250612/webrev.01/ Testing: Linux x86_64 builds; jdk-submit -- Thanks, -Aleksey From christian.hagedorn at oracle.com Tue Jul 28 07:31:41 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Tue, 28 Jul 2020 09:31:41 +0200 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> Message-ID: <8c737ff0-7a62-d18d-78b2-b415802ebbdb@oracle.com> Hi Coleen On 24.07.20 15:10, coleen.phillimore at oracle.com wrote: > incremental webrev at > http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev > full webrev at http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev Thanks for cleaning this up! The compiler changes look good to me. Just a minor comment (no new webrev required): - arm.ad:8873 & x86_32.ad:13321: There is an extra whitespace before ")" Best regards, Christian > Thanks, > Coleen > > > On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: >> >> Thanks for looking at this. >> >> On 7/24/20 1:01 AM, David Holmes wrote: >>> Hi Coleen, >>> >>> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>>> See bug for more details.? I've been running into these names a lot >>>> lately.?? Many of these names are in JVMTI. >>>> >>>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>>> platforms. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >>> >>> src/hotspot/cpu/*/*.ad >>> >>> These still refer to "method oop" and method_oop in a number of places. >> >> Yes, I only replaced method_oop in the shared code and not in the AD >> code.? method_oop can be the name of a parameter and using "sed" to >> change it to "method" doesn't work.?? Somebody who understands this >> code and looks at it will have to make the rest of the changes. >> >> What I did was replace "method oop" with "method" and "methodOop" with >> "method" in all the sources.? I replaced "method_oop" with "method" or >> "checked_method" in the shared sources. >> >>> >>> src/hotspot/share/adlc/adlparse.cpp >>> >>> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >>> entry"); >>> >>> I guess I'm not understanding the scope of this renaming - why is >>> _interpreter_method_oop_reg not renamed as well? Should this (and >>> other uses) be parsed as method-(oop-reg) rather than (method-oop)-reg? >> >> I don't know this code, so I'd rather not change any more of it. The >> comment makes sense changed, even though the variable name still >> refers to method_oop. >> >> Thanks, >> Coleen >>> >>> Otherwise all okay. >>> >>> Thanks, >>> David >>> >>>> Thanks, >>>> Coleen >> > From aph at redhat.com Tue Jul 28 08:34:24 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 28 Jul 2020 09:34:24 +0100 Subject: [aarch64-port-dev ] RFR(S): 8237483: AArch64 C1 OopMap inserted twice fatal error In-Reply-To: <85h7tsji6x.fsf@nicgas01-pc.shanghai.arm.com> References: <85k0ypjq8f.fsf@nicgas01-pc.shanghai.arm.com> <04c4f9e0-e29a-3250-878c-2b29c11a45d8@redhat.com> <85h7tsji6x.fsf@nicgas01-pc.shanghai.arm.com> Message-ID: <569a6967-408c-1895-ce99-18c5147c5958@redhat.com> On 7/28/20 6:56 AM, Nick Gasson wrote: > Seems a bit too verbose? How about this: > > --- a/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > +++ b/src/hotspot/cpu/aarch64/c1_LIRAssembler_aarch64.cpp > @@ -2085,6 +2085,13 @@ void LIR_Assembler::throw_op(LIR_Opr exceptionPC, LIR_Opr exceptionOop, CodeEmit > > // get current pc information > // pc is only needed if the method has an exception handler, the unwind code does not need it. > + if (compilation()->debug_info_recorder()->last_pc_offset() == __ offset()) { > + // As no instructions have been generated yet for this LIR node it's > + // possible that an oop map already exists for the current offset. > + // In that case insert an dummy NOP here to ensure all oop map PCs > + // are unique. See JDK-8237483. > + __ nop(); > + } OK. I wonder if this bug exists in other ports. They too deoptimize C1 code, albeit more rarely. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From tobias.hartmann at oracle.com Tue Jul 28 09:05:26 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2020 11:05:26 +0200 Subject: RFR (XS) 8250612: jvmciCompilerToVM.cpp declares jio_printf with "void" return type, should be "int" In-Reply-To: <90ebea60-d625-e67f-918b-1ba2a531316e@redhat.com> References: <90ebea60-d625-e67f-918b-1ba2a531316e@redhat.com> Message-ID: <5f0a2b21-c897-c474-b54d-03e587bdf046@oracle.com> Hi Aleksey, looks good and trivial to me. Best regards, Tobias On 28.07.20 09:09, Aleksey Shipilev wrote: > Bug: > https://bugs.openjdk.java.net/browse/JDK-8250612 > > Fix: > https://cr.openjdk.java.net/~shade/8250612/webrev.01/ > > Testing: Linux x86_64 builds; jdk-submit > From lutz.schmidt at sap.com Tue Jul 28 09:24:30 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 28 Jul 2020 09:24:30 +0000 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> Message-ID: <3D2D6A62-52A7-43B2-88EF-AF6589E5B21A@sap.com> OK then, I withdraw my proposed fix which is apparently incorrect. May I assume the issue is handled by knowledgeable people from now on? Tom, will you take over? Thanks, Lutz ?On 27.07.20, 22:50, "hotspot-compiler-dev on behalf of Tom Rodriguez" wrote: Vladimir Kozlov wrote on 7/27/20 1:47 PM: > We simply missing EnableJVMCI flag check! That is why JVMCI is not > initialized. EnableJVMCI should be true in hosted mode. Yes that makes sense. tom > > I looked and I see few problematic places in compileBroker.cpp > statistic code guarded by #if INCLUDE_JVMCI but which does not check > EnableJVMCI or comp->is_jvmci(). I think it should be fixed. > > Also I think JVMCI_event_1 is useless now in print_compilation_timers() > because its output does not go into tty. > > Thanks, > Vladimir > > On 7/27/20 12:51 PM, Tom Rodriguez wrote: >> Doug is away, but it seems like we could just remove the JVMCI_event_1 >> call in print_compilation_timers. It's not a particularly worthwhile >> notification and all the other events should be able to safely assume >> that JVMCI has actually been initialized which I think was probably >> the point of the guarantee. If that's not sufficient then we need to >> convert that guarantee in a check for NULL and return. >> >> tom >> >> Vladimir Kozlov wrote on 7/27/20 12:35 PM: >>> Nope, the check is correct. In hosted mode Graal is used as Java >>> application and not as JIT compiler so that UseJVMCICompiler flag is >>> false. >>> >>> The problem is really caused by recent 8248321 changes - it is >>> regression. Doug, please advice how to fix it. >>> I think the new JVMCI events code should be adjusted for hosted mode. >>> >>> Thanks, >>> Vladimir >>> >>> On 7/24/20 5:53 AM, Schmidt, Lutz wrote: >>>> Resending after updating subject line with bug id. >>>> Sorry for the spam. >>>> Lutz >>>> >>>> On 24.07.20, 14:51, "Schmidt, Lutz" wrote: >>>> >>>> Dear all, >>>> >>>> may I please request reviews for this small fix? I would even >>>> say it is a trivial fix. It inverts an if condition such that JVMCI >>>> specific code is called only when JVMCI compilation is enabled via >>>> UseJVMCICompiler. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8250233 >>>> Webrev: http://cr.openjdk.java.net/~lucy/webrevs/8250233.00/ >>>> >>>> Local testing looks good. jdk/submit tests pending. >>>> >>>> Thank you! >>>> Lutz >>>> >>>> >>>> >>>> From tobias.hartmann at oracle.com Tue Jul 28 10:00:41 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 28 Jul 2020 12:00:41 +0200 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: <3D2D6A62-52A7-43B2-88EF-AF6589E5B21A@sap.com> References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> <3D2D6A62-52A7-43B2-88EF-AF6589E5B21A@sap.com> Message-ID: Hi Lutz, On 28.07.20 11:24, Schmidt, Lutz wrote: > May I assume the issue is handled by knowledgeable people from now on? Tom, will you take over? You've probably missed that but Doug already replied in your original RFR: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039171.html Best regards, Tobias From lutz.schmidt at sap.com Tue Jul 28 10:31:06 2020 From: lutz.schmidt at sap.com (Schmidt, Lutz) Date: Tue, 28 Jul 2020 10:31:06 +0000 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> <3D2D6A62-52A7-43B2-88EF-AF6589E5B21A@sap.com> Message-ID: <0BA04FB6-18C2-4AE9-B4ED-677270C910E8@sap.com> Hi Tobias, thank you for pointing me to Doug's reply. You are right, I missed just that single one - my fault (e-mail filter issue). Regards, Lutz ?On 28.07.20, 12:00, "Tobias Hartmann" wrote: Hi Lutz, On 28.07.20 11:24, Schmidt, Lutz wrote: > May I assume the issue is handled by knowledgeable people from now on? Tom, will you take over? You've probably missed that but Doug already replied in your original RFR: https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039171.html Best regards, Tobias From sergei.tsypanov at yandex.ru Tue Jul 28 10:35:35 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Tue, 28 Jul 2020 12:35:35 +0200 Subject: Performance degradation due to probable (?) C2 issue Message-ID: <925401595926726@mail.yandex.ru> Hello, I've run into a strange issue while trying to improve java.net.URLEncoder.encode() for the case URL contains UTF-8 symbols. The idea of the fix it to replace the contents of line 276 String str = new String(charArrayWriter.toCharArray()); with String str = charArrayWriter.toString()); The CharArrayWriter.toCharArray() allocates a copy of underlying char[] which is passed into String constructor, while CharArrayWriter.toString() passes the char[] to String constructor direclty. In theory this must give us ceratin improvement both in time and memory as we don't allocate redundant char[]. To verify it I've used the benchmark encoding the link to article about UN in Russian wiki: @State(Scope.Thread) @BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.MICROSECONDS) @Fork(jvmArgsAppend = {"-Xms2g", "-Xmx2g", "-XX:+UseParallelGC"}) public class UrlEncoderBenchmark { private final Charset charset = Charset.defaultCharset(); private final String utf8Url = "https://ru.wikipedia.org/wiki/???????????_????????????_?????"; @Benchmark public String encodeUtf8() { return URLEncoder.encode(utf8Url, charset); } } In practise it turned out that we win only in interpreter and tier1: Benchmark Mode Cnt Score Error Units -Xint before UrlEncoderBenchmark.encodeUtf8 avgt 100 179.905 ? 2.498 us/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1712.752 ? 0.542 B/op -Xint after UrlEncoderBenchmark.encodeUtf8 avgt 100 173.323 ? 3.459 us/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1552.409 ? 0.339 B/op -XX:TieredStopAtLevel=1 before UrlEncoderBenchmark.encodeUtf8 avgt 100 3.846 ? 0.021 us/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1712.271 ? 0.011 B/op -XX:TieredStopAtLevel=1 after UrlEncoderBenchmark.encodeUtf8 avgt 100 3.732 ? 0.013 us/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1552.246 ? 0.014 B/op Here we see that we indeed consume less time and memory. However in case of full compilation we have severe degraddation (+ 30%) in time consumption while as of memory we still have the same improvement: before UrlEncoderBenchmark.encodeUtf8 avgt 100 1108.668 ? 6.226 ns/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1712.202 ? 0.003 B/op after UrlEncoderBenchmark.encodeUtf8 avgt 100 1454.647 ? 6.067 ns/op UrlEncoderBenchmark.encodeUtf8:?gc.alloc.rate.norm avgt 100 1528.219 ? 0.007 B/op As inlining log says in the second case ther's something wrong: Compilation before @ 186 java.io.CharArrayWriter::flush (1 bytes) inline (hot) !m @ 195 java.io.CharArrayWriter::toCharArray (26 bytes) inline (hot) @ 15 java.util.Arrays::copyOf (19 bytes) inline (hot) @ 11 java.lang.Math::min (11 bytes) (intrinsic) @ 14 java.lang.System::arraycopy (0 bytes) (intrinsic) @ 198 java.lang.String:: (10 bytes) inline (hot) @ 6 java.lang.String:: (74 bytes) inline (hot) @ 1 java.lang.Object:: (1 bytes) inline (hot) @ 36 java.lang.StringUTF16::compress (20 bytes) inline (hot) @ 9 java.lang.StringUTF16::compress (50 bytes) (intrinsic) @ 67 java.lang.StringUTF16::toBytes (34 bytes) (intrinsic) Compilation after @ 186 java.io.CharArrayWriter::flush (1 bytes) inline (hot) !m @ 191 java.io.CharArrayWriter::toString (31 bytes) already compiled into a big method <---------------- @ 199 java.lang.String::getBytes (25 bytes) inline (hot) @ 14 java.lang.String::coder (15 bytes) inline (hot) ! @ 21 java.lang.StringCoding::encode (324 bytes) inline (hot) @ 10 java.lang.StringCoding::encodeUTF8 (132 bytes) inline (hot) @ 7 java.lang.StringCoding::encodeUTF8_UTF16 (369 bytes) hot method too big <---------------- @ 15 java.lang.StringCoding::hasNegatives (25 bytes) (intrinsic) @ 24 java.util.Arrays::copyOf (19 bytes) inline (hot) @ 11 java.lang.Math::min (11 bytes) (intrinsic) @ 14 java.lang.System::arraycopy (0 bytes) (intrinsic) And in compilation log for the patched case I have this entry: This complies with results of profiling with perfasm: - for the original code we have only 1 hot region .................................................................................................... 62.29% ....[Hottest Regions]............................................................................... 62.29% c2, level 4 java.net.URLEncoder::encode, version 1032 (1487 bytes) - for the patched code we have 2 hot regions: ....[Hottest Region 1].............................................................................. c2, level 4, java.net.URLEncoder::encode, version 1019 (1467 bytes) .................................................................................................... 61.44% ....[Hottest Region 2].............................................................................. c2, level 4, java.net.URLEncoder::encode, version 1019 (1048 bytes) .................................................................................................... 10.90% So my question is whether there's something wrong with compier of the original idea of improvement was wrong? Here are some attachments if one finds them useful 1. Output of LinuxPerfAsmProfiler for original code: https://gist.github.com/stsypanov/6bcd95fd9fbe79afc5f29db929e517f1 2. Output of LinuxPerfAsmProfiler for patched code: https://gist.github.com/stsypanov/794c0b4fdb13bad9fcb7fc890cec3dc8 Regards, Sergey Tsypanov From coleen.phillimore at oracle.com Tue Jul 28 11:20:50 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 28 Jul 2020 07:20:50 -0400 Subject: RFR (T) 8250042: Clean up methodOop and method_oop names from the code In-Reply-To: <8c737ff0-7a62-d18d-78b2-b415802ebbdb@oracle.com> References: <85efc3ab-abbf-c5f2-9b7b-47fa516d9a2d@oracle.com> <6f973a0a-cf55-e1ab-8de3-b57f68dbd2cf@oracle.com> <8c737ff0-7a62-d18d-78b2-b415802ebbdb@oracle.com> Message-ID: <460241cc-7b49-b793-fa50-c12898d4b332@oracle.com> Hi, Thank you for reviewing the compiler changes. On 7/28/20 3:31 AM, Christian Hagedorn wrote: > Hi Coleen > > On 24.07.20 15:10, coleen.phillimore at oracle.com wrote: >> incremental webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8250042.02.incr/webrev >> full webrev at >> http://cr.openjdk.java.net/~coleenp/2020/8250042.02/webrev > > Thanks for cleaning this up! The compiler changes look good to me. > > Just a minor comment (no new webrev required): > - arm.ad:8873 & x86_32.ad:13321: There is an extra whitespace before ")" Fixed! Thanks, Coleen > > Best regards, > Christian > >> Thanks, >> Coleen >> >> >> On 7/24/20 8:23 AM, coleen.phillimore at oracle.com wrote: >>> >>> Thanks for looking at this. >>> >>> On 7/24/20 1:01 AM, David Holmes wrote: >>>> Hi Coleen, >>>> >>>> On 24/07/2020 2:58 am, coleen.phillimore at oracle.com wrote: >>>>> See bug for more details.? I've been running into these names a >>>>> lot lately.?? Many of these names are in JVMTI. >>>>> >>>>> Tested with tier1 on all Oracle platforms and built on non-Oracle >>>>> platforms. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2020/8250042.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8250042 >>>> >>>> src/hotspot/cpu/*/*.ad >>>> >>>> These still refer to "method oop" and method_oop in a number of >>>> places. >>> >>> Yes, I only replaced method_oop in the shared code and not in the AD >>> code.? method_oop can be the name of a parameter and using "sed" to >>> change it to "method" doesn't work.?? Somebody who understands this >>> code and looks at it will have to make the rest of the changes. >>> >>> What I did was replace "method oop" with "method" and "methodOop" >>> with "method" in all the sources.? I replaced "method_oop" with >>> "method" or "checked_method" in the shared sources. >>> >>>> >>>> src/hotspot/share/adlc/adlparse.cpp >>>> >>>> +? frame->_interpreter_method_oop_reg = parse_one_arg("method reg >>>> entry"); >>>> >>>> I guess I'm not understanding the scope of this renaming - why is >>>> _interpreter_method_oop_reg not renamed as well? Should this (and >>>> other uses) be parsed as method-(oop-reg) rather than >>>> (method-oop)-reg? >>> >>> I don't know this code, so I'd rather not change any more of it. The >>> comment makes sense changed, even though the variable name still >>> refers to method_oop. >>> >>> Thanks, >>> Coleen >>>> >>>> Otherwise all okay. >>>> >>>> Thanks, >>>> David >>>> >>>>> Thanks, >>>>> Coleen >>> >> From felix.yang at huawei.com Tue Jul 28 12:10:13 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Tue, 28 Jul 2020 12:10:13 +0000 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> References: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> Message-ID: Hi, > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Tuesday, July 28, 2020 3:15 AM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > > It happens because 'lo' is new node created just now and have no uses yet. > For such new nodes we usually add dummy use to avoid removal from graph: > > http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/share/op > to/convertnode.cpp#l403 Thanks for the suggestions. Yes, that will also fix the issue. New webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.01/ Performed the same tests as before. Does it look better? Felix > On 7/27/20 5:27 AM, Yangfei (Felix) wrote: > > Hi, > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 > > Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ > > > > In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a single > CmpU. > > At the crash site in IfNode::fold_compares_helper: > > 995 if (lo && hi) { > > 996 // Merge the two compares into a single unsigned compare by > building (CmpU (n - lo) (hi - lo)) > > 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); > > 998 if (adjusted_lim == NULL) { > > 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); > > 1000 } > > > > At line 997, we have: > > (gdb) p lo->dump() > > 641 AddI === _ 513 92 [[]] > > $1 = void > > > > After the transformation at line 997, we have > > (gdb) p lo->dump() > > 641 AddI === _ _ _ [[]] [34200641] > > $3 = void > > > > Then node 641 was used at line 999, which triggers the crash. > > Patch fixes the issue by delaying transformation in IfNode::fold_compares > temporarily. > > Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. > > Newly added test fail without the patch and pass otherwise. > > Suggestions? > > > > Thanks, > > Felix > > From aph at redhat.com Tue Jul 28 12:12:43 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 28 Jul 2020 13:12:43 +0100 Subject: Performance degradation due to probable (?) C2 issue In-Reply-To: <925401595926726@mail.yandex.ru> References: <925401595926726@mail.yandex.ru> Message-ID: Hi, On 28/07/2020 11:35, ?????? ??????? wrote: > So my question is whether there's something wrong with compier of > the original idea of improvement was wrong? No, and (probably) no. C2 uses a bunch of of heuristics. Here, it's detected that CharArrayWriter::toString is large and has already been compiled so there's no sense inlining another copy of it. This isn't necessarily true, but it's a good guess. Try playing with InlineSmallCode: start with =1000, and increases it from there to see if it helps. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From volker.simonis at gmail.com Tue Jul 28 13:06:29 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Tue, 28 Jul 2020 15:06:29 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: <1595907547514.55531@amazon.com> References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> Message-ID: On Tue, Jul 28, 2020 at 5:40 AM Liu, Xin wrote: > > hi, Volker, > > Thank you to review my patch. > > 1. yes. I guess nodes are the major memory consumption, but compiler directives are for both c1 and c2. > it still can reduce memory footprint a little. > > 2. Previous code set the flag here. both ControlIntrinsic and DisableIntrinsic belong to compilerdirectives_common_flags. > > 327 #define init_default_cc(name, type, dvalue, cc_flag) { type v; if (!_modified[name##Index] && CompilerOracle::has_option_value(method, #cc_flag, v) && v != this->name##Option) { set->name##Option = v; changed = true;} } > 328 compilerdirectives_common_flags(init_default_cc) > > When method-level directives override the global directives, this code snippet set changed. > Even though I remove the flag 'changed', I have the same logic in the smart pointer. > > 3. the smart pointer DirectiveSetPtr needs to provide 2 accesses of the underlying pointer. > one is read-only and the other one is mutable. > > Ideally, it should has the following 2 operator->(). > DirectiveSet* operator->(); > DirectiveSet const* operator->(). > > AFAFI, C++ doesn't support covariant return type overload. That is to say, we need to find a way to work around. > the reason I provide overload operator*() because it returns a reference to object. Users who want to modify the pointee have to explicitly dereference the smart pointer. > (*set).member = newvalue; > set->member = newvalue; // compiler error. > > your approach also works, but you need to invoke const_cast<> for all places where you want to read. > I think my approach has shorter code. > > > I just came up a new idea. How about I provide a method cloned(), which returns the unqualified pointer? > > + DirectiveSet* cloned() { > + if (!_clone) { > + _clone = DirectiveSet::clone(_origin); > + } > + return _clone; > + } > + > DirectiveSet* transfer() { > assert(_origin != NULL, "_origin is NULL! transfer() can only be invoked once."); > if (_clone != NULL) { > @@ -340,7 +347,7 @@ > > if (CompilerOracle::should_print(method)) { > if (!_modified[PrintAssemblyIndex]) { > - (*set).PrintAssemblyOption = true; > + set.cloned()->PrintAssemblyOption = true; > } > } > Hi Xin, I like this solution much better. It makes it clear that we want to alter the state of the Directive Set. Can you please provide a new webrev based on this idea? Thank you and best regards, Volker > thanks, > --lx > > ________________________________________ > From: Volker Simonis > Sent: Monday, July 27, 2020 8:51 AM > To: Liu, Xin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Xin, > > I'm not sure if saving the allocation of an DirectiveSet has any > visible effect compared to the much larger allocations required for > the method compilation itself. > > Apart from that, I must confess that I'm not totally understanding the > original logic. From what I see, it sets "changed" to true in the case > where it changes the cloned DirectiveSet. But it doesn't do that in > the cases where it only changes the clone's control word: > > 341 set->_intrinsic_control_words.fill_in(TriBool()); > ... > 348 set->_intrinsic_control_words[id] = iter.is_enabled(); > ... > 361 set->_intrinsic_control_words.fill_in(TriBool()); > ... > 368 set->_intrinsic_control_words[id] = false; > > Why don't these mutations count as "changing" the cloned DirectiveSet? > > After your patch, you've changed the above lines such that they will > always create a clone which seems different from the initial > behaviour. > > Which of the two behaviours is correct here, the original one, the new > one after your change or doesn't it matter for reasons I don't > understand? > > > I also wonder why you need to overload both operators "operator*()" > and "operator->()"? It seems a little bit arbitrary (and hard to > understand for people reading the code) that "operator*()" clones the > underlying directiveSet while "operator->()" uses the original one. > Why not just define two versions of "operator->()" and let the > compiler choose the right one like so: > > DirectiveSet const* operator->() const { > return !_clone ? _origin : _clone; > } > > DirectiveSet* operator->() { > if (!_clone) { > _clone = DirectiveSet::clone(_origin); > } > return _clone; > } > ... > if (!_modified[LogIndex]) { > bool log = CompilerOracle::should_log(method); > if (log != const_cast(set)->LogOption) { > set->LogOption = log; > } > } > > Thank you and best regards, > Volker > > On Mon, Jul 27, 2020 at 1:47 AM Liu, Xin wrote: > > > > hi, Reviewers, > > > > Could you review this simple patch? > > bug: https://bugs.openjdk.java.net/browse/JDK-8249809 > > webrev: https://cr.openjdk.java.net/~xliu/8249809/00/webrev/ > > > > When the users specify a method-level compiler directive, the DirectiveSet is cloned for every single compiling method. It's expensive but rarely hit. Actually, Only user-specified methods must clone the DirectiveSet. I introduce a smart pointer DirectiveSetPtr. operator->() returns a pointer to a constant DirectiveSet, which is read-only. It doesn't clone the _origin until c2 need to update its members. transfer() yield the ownership of the pointer. > > > > Test: > > manually tests with different CompileComand options. > > hotspot:tier1 and gtest:all. > > > > thanks, > > --lx > > From vladimir.kozlov at oracle.com Tue Jul 28 16:01:45 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2020 09:01:45 -0700 Subject: RFR (XS) 8250612: jvmciCompilerToVM.cpp declares jio_printf with "void" return type, should be "int" In-Reply-To: <5f0a2b21-c897-c474-b54d-03e587bdf046@oracle.com> References: <90ebea60-d625-e67f-918b-1ba2a531316e@redhat.com> <5f0a2b21-c897-c474-b54d-03e587bdf046@oracle.com> Message-ID: <97787ade-5d9f-ab11-5e38-11345d4d3c95@oracle.com> +1 Thanks, Vladimir K On 7/28/20 2:05 AM, Tobias Hartmann wrote: > Hi Aleksey, > > looks good and trivial to me. > > Best regards, > Tobias > > On 28.07.20 09:09, Aleksey Shipilev wrote: >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8250612 >> >> Fix: >> https://cr.openjdk.java.net/~shade/8250612/webrev.01/ >> >> Testing: Linux x86_64 builds; jdk-submit >> From luhenry at microsoft.com Tue Jul 28 16:22:52 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Tue, 28 Jul 2020 16:22:52 +0000 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com>, , Message-ID: Hi, I confirm that `= delete` works, and that we get a compile-time error if you try to use it. Please find the updated webrev at http://cr.openjdk.java.net/~burban/luhenry/8248672/webrev.01 Thank you Ludovic ________________________________________ From: Ludovic Henry Sent: Sunday, July 26, 2020 16:10 To: Kim Barrett; Andrew Haley Cc: Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; hotspot-gc-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > As of early last week, a definition of "= delete;" is the way to > poison an overload. Let me try that locally, compile on Windows-AArch64 and Linux-AArch64, and confirm whether it works for MSVC. ________________________________________ From: Kim Barrett Sent: Sunday, July 26, 2020 14:41 To: Andrew Haley Cc: Ludovic Henry; Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; hotspot-gc-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > On Jul 26, 2020, at 5:56 AM, Andrew Haley wrote: > > On 25/07/2020 00:42, Kim Barrett wrote: >> Why are we deprecating something rather than just deleting it and >> fixing any users? > > C++ overloading. AArch64 CMP (immediate) only has a limited range, so > we only have a byte-wide Assembler::cmp() definition. The deprecation > warning on the wider version makes sure that any maintenance > programmer is immediately warned if it is used. There are other things > we could do: by not providing a definition for the wider cmp() you get > a link error, but that wouldn't be as explicit as a deprecation > warning. > > The root problem is that the immediate value to CMP isn't always known > when HotSpot is compiled, but may be calculated at runtime. We have > seen failures in production when an immediate offset overflowed. Yeah, I'd guessed that might be the point, and confirmed it later by looking at the changeset that originally introduced the attribute. As of early last week, a definition of "= delete;" is the way to poison an overload. From vladimir.kozlov at oracle.com Tue Jul 28 16:29:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2020 09:29:20 -0700 Subject: RFR(XS): 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp In-Reply-To: <0BA04FB6-18C2-4AE9-B4ED-677270C910E8@sap.com> References: <7f5c8191-748f-d26d-9a4b-4efcea72ab3c@oracle.com> <8ff1c8ad-4111-03da-90cc-9c22b9e2b078@oracle.com> <994bb7a0-20dd-ea53-021c-9f8d49b49917@oracle.com> <3D2D6A62-52A7-43B2-88EF-AF6589E5B21A@sap.com> <0BA04FB6-18C2-4AE9-B4ED-677270C910E8@sap.com> Message-ID: <4584367b-c6f4-cb21-d319-357cc9003796@oracle.com> I actually did not see Doug's reply too. But I think we still need to add EnableJVMCI check in few places to not do useless work. I took this bug. Thanks, Vladimir K On 7/28/20 3:31 AM, Schmidt, Lutz wrote: > Hi Tobias, > > thank you for pointing me to Doug's reply. You are right, I missed just that single one - my fault (e-mail filter issue). > > Regards, > Lutz > > > ?On 28.07.20, 12:00, "Tobias Hartmann" wrote: > > Hi Lutz, > > On 28.07.20 11:24, Schmidt, Lutz wrote: > > May I assume the issue is handled by knowledgeable people from now on? Tom, will you take over? > > You've probably missed that but Doug already replied in your original RFR: > https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039171.html > > Best regards, > Tobias > From vladimir.kozlov at oracle.com Tue Jul 28 17:27:47 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2020 10:27:47 -0700 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: References: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> Message-ID: Yes, this looks good. Thanks, Vladimir K On 7/28/20 5:10 AM, Yangfei (Felix) wrote: > Hi, > >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Tuesday, July 28, 2020 3:15 AM >> To: Yangfei (Felix) ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares >> >> It happens because 'lo' is new node created just now and have no uses yet. >> For such new nodes we usually add dummy use to avoid removal from graph: >> >> http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/share/op >> to/convertnode.cpp#l403 > > Thanks for the suggestions. Yes, that will also fix the issue. > New webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.01/ > Performed the same tests as before. Does it look better? > > Felix > >> On 7/27/20 5:27 AM, Yangfei (Felix) wrote: >>> Hi, >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 >>> Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ >>> >>> In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a single >> CmpU. >>> At the crash site in IfNode::fold_compares_helper: >>> 995 if (lo && hi) { >>> 996 // Merge the two compares into a single unsigned compare by >> building (CmpU (n - lo) (hi - lo)) >>> 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); >>> 998 if (adjusted_lim == NULL) { >>> 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); >>> 1000 } >>> >>> At line 997, we have: >>> (gdb) p lo->dump() >>> 641 AddI === _ 513 92 [[]] >>> $1 = void >>> >>> After the transformation at line 997, we have >>> (gdb) p lo->dump() >>> 641 AddI === _ _ _ [[]] [34200641] >>> $3 = void >>> >>> Then node 641 was used at line 999, which triggers the crash. >>> Patch fixes the issue by delaying transformation in IfNode::fold_compares >> temporarily. >>> Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. >>> Newly added test fail without the patch and pass otherwise. >>> Suggestions? >>> >>> Thanks, >>> Felix >>> From aph at redhat.com Tue Jul 28 18:21:18 2020 From: aph at redhat.com (Andrew Haley) Date: Tue, 28 Jul 2020 19:21:18 +0100 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> Message-ID: <279cd44e-dfcc-00c2-4aee-1cca630bd5ec@redhat.com> On 28/07/2020 17:22, Ludovic Henry wrote: > I confirm that `= delete` works, and that we get a compile-time error if you try to use it. > > Please find the updated webrev at http://cr.openjdk.java.net/~burban/luhenry/8248672/webrev.01 OK, thanks. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From Charlie.Gracie at microsoft.com Tue Jul 28 20:39:58 2020 From: Charlie.Gracie at microsoft.com (Charlie Gracie) Date: Tue, 28 Jul 2020 20:39:58 +0000 Subject: Inlining difference when using G1GC instead of ParallelGC Message-ID: <4EBCB07A-73AF-43D9-AE9D-3F71152082BB@microsoft.com> Hi, ? I have noticed an inlining difference in C2 when the JVM is using G1GC as compared to ParallelGC. It is causing a measurable difference in performance since other optimizations cannot take place if the method is not inlined. This is a code snippet my small example [1] that demonstrates the difference: public class TypeCheck { public static void main(String[] args) { ... ?? Handler handler1 = new Handler(new InnerImpl1()); ?? handler1.doIt(); ? ?? Handler handler2 = new Handler(new InnerImpl2()); ?? handler2.doIt(); ? ?? Handler handler3 = new Handler(new InnerImpl3()); ?? handler3.doIt(); } } ? public class Handler { ?? Inner inner; ?? public Handler(Inner i) { ????? inner = i; ?? } ?? public int doIt() { ????? return inner.getValue(); ?? } } ? abstract class Inner { ?? public abstract int getValue(); }? ? Handler.doIt() is invoked with Handler.inner having more than 2 different types, so TypeSpeculation is not used. When the JVM is using ParallelGC, C2 determines a concrete type because it can see the value stored to the `inner` field in the constructor instead of reading it from the field. I believe this is happening because of MemNode::can_see_stored_value(). With this optimization the concrete subclass type is known and the method is inlined. When using G1GC the GC write barrier contains an Op_MemBarVolatile. I believe that volatile memory barrier generated for the field write in the constructor stops the value from being visible after the write barrier. This forces the read of `inner` in doIt() to happen and then the result only has a type of Inner so getValue() cannot be inlined. ? Is this a deficiency that should be investigated further to attempt a "fix"? I would like to work on a solution, but I am looking for feedback on whether this is something the community feels can and should be fixed. ? Cheers, Charlie Gracie ? Extra information: If the code is modified such that the allocations are on separate lines, then C2 can inline the getValue() method when the JVM is using G1GC and ParallelGC. This is because the constructor will directly follow the allocation of the Handler object. When this happens the GC barrier can be elided so the original value being stored can be used instead of having to do the read. I have a 2nd example [3] which can be used to demonstrate this.? ? [1] https://github.com/charliegracie/code-examples/tree/master/java/InlineTests [2] https://github.com/charliegracie/code-examples/blob/master/java/InlineTests/TypeCheck2.java From xxinliu at amazon.com Tue Jul 28 20:56:25 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 28 Jul 2020 20:56:25 +0000 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com>, Message-ID: <1595969785292.62158@amazon.com> hi, Volker, Here is a new revision with cloned(). http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ thanks, --lx ________________________________________ From: Volker Simonis Sent: Tuesday, July 28, 2020 6:06 AM To: Liu, Xin Cc: hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Tue, Jul 28, 2020 at 5:40 AM Liu, Xin wrote: > > hi, Volker, > > Thank you to review my patch. > > 1. yes. I guess nodes are the major memory consumption, but compiler directives are for both c1 and c2. > it still can reduce memory footprint a little. > > 2. Previous code set the flag here. both ControlIntrinsic and DisableIntrinsic belong to compilerdirectives_common_flags. > > 327 #define init_default_cc(name, type, dvalue, cc_flag) { type v; if (!_modified[name##Index] && CompilerOracle::has_option_value(method, #cc_flag, v) && v != this->name##Option) { set->name##Option = v; changed = true;} } > 328 compilerdirectives_common_flags(init_default_cc) > > When method-level directives override the global directives, this code snippet set changed. > Even though I remove the flag 'changed', I have the same logic in the smart pointer. > > 3. the smart pointer DirectiveSetPtr needs to provide 2 accesses of the underlying pointer. > one is read-only and the other one is mutable. > > Ideally, it should has the following 2 operator->(). > DirectiveSet* operator->(); > DirectiveSet const* operator->(). > > AFAFI, C++ doesn't support covariant return type overload. That is to say, we need to find a way to work around. > the reason I provide overload operator*() because it returns a reference to object. Users who want to modify the pointee have to explicitly dereference the smart pointer. > (*set).member = newvalue; > set->member = newvalue; // compiler error. > > your approach also works, but you need to invoke const_cast<> for all places where you want to read. > I think my approach has shorter code. > > > I just came up a new idea. How about I provide a method cloned(), which returns the unqualified pointer? > > + DirectiveSet* cloned() { > + if (!_clone) { > + _clone = DirectiveSet::clone(_origin); > + } > + return _clone; > + } > + > DirectiveSet* transfer() { > assert(_origin != NULL, "_origin is NULL! transfer() can only be invoked once."); > if (_clone != NULL) { > @@ -340,7 +347,7 @@ > > if (CompilerOracle::should_print(method)) { > if (!_modified[PrintAssemblyIndex]) { > - (*set).PrintAssemblyOption = true; > + set.cloned()->PrintAssemblyOption = true; > } > } > Hi Xin, I like this solution much better. It makes it clear that we want to alter the state of the Directive Set. Can you please provide a new webrev based on this idea? Thank you and best regards, Volker > thanks, > --lx > > ________________________________________ > From: Volker Simonis > Sent: Monday, July 27, 2020 8:51 AM > To: Liu, Xin > Cc: hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > Hi Xin, > > I'm not sure if saving the allocation of an DirectiveSet has any > visible effect compared to the much larger allocations required for > the method compilation itself. > > Apart from that, I must confess that I'm not totally understanding the > original logic. From what I see, it sets "changed" to true in the case > where it changes the cloned DirectiveSet. But it doesn't do that in > the cases where it only changes the clone's control word: > > 341 set->_intrinsic_control_words.fill_in(TriBool()); > ... > 348 set->_intrinsic_control_words[id] = iter.is_enabled(); > ... > 361 set->_intrinsic_control_words.fill_in(TriBool()); > ... > 368 set->_intrinsic_control_words[id] = false; > > Why don't these mutations count as "changing" the cloned DirectiveSet? > > After your patch, you've changed the above lines such that they will > always create a clone which seems different from the initial > behaviour. > > Which of the two behaviours is correct here, the original one, the new > one after your change or doesn't it matter for reasons I don't > understand? > > > I also wonder why you need to overload both operators "operator*()" > and "operator->()"? It seems a little bit arbitrary (and hard to > understand for people reading the code) that "operator*()" clones the > underlying directiveSet while "operator->()" uses the original one. > Why not just define two versions of "operator->()" and let the > compiler choose the right one like so: > > DirectiveSet const* operator->() const { > return !_clone ? _origin : _clone; > } > > DirectiveSet* operator->() { > if (!_clone) { > _clone = DirectiveSet::clone(_origin); > } > return _clone; > } > ... > if (!_modified[LogIndex]) { > bool log = CompilerOracle::should_log(method); > if (log != const_cast(set)->LogOption) { > set->LogOption = log; > } > } > > Thank you and best regards, > Volker > > On Mon, Jul 27, 2020 at 1:47 AM Liu, Xin wrote: > > > > hi, Reviewers, > > > > Could you review this simple patch? > > bug: https://bugs.openjdk.java.net/browse/JDK-8249809 > > webrev: https://cr.openjdk.java.net/~xliu/8249809/00/webrev/ > > > > When the users specify a method-level compiler directive, the DirectiveSet is cloned for every single compiling method. It's expensive but rarely hit. Actually, Only user-specified methods must clone the DirectiveSet. I introduce a smart pointer DirectiveSetPtr. operator->() returns a pointer to a constant DirectiveSet, which is read-only. It doesn't clone the _origin until c2 need to update its members. transfer() yield the ownership of the pointer. > > > > Test: > > manually tests with different CompileComand options. > > hotspot:tier1 and gtest:all. > > > > thanks, > > --lx > > From igor.ignatyev at oracle.com Tue Jul 28 21:38:05 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 28 Jul 2020 14:38:05 -0700 Subject: RFR(T) : 8250739 : remove Compile::Generate_*_Graph methods declarations Message-ID: <122EF98B-1ED5-4813-8D7C-7F6326D8ABD7@oracle.com> http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ > 8 lines changed: 0 ins; 8 del; 0 mod; Hi all, could you please review this trivial cleanup? from JBS: > Compile::Generate_Compiled_To_Interpreter_Graph and Generate_Interpreter_To_Compiled_Graph methods are declared but not defined (and not used) webrev: http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ JBS: https://bugs.openjdk.java.net/browse/JDK-8250739 Thanks, -- Igor From igor.ignatyev at oracle.com Tue Jul 28 21:40:46 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 28 Jul 2020 14:40:46 -0700 Subject: RFR(T) : 8250738 : C2Compiler::is_intrinsic_supported(methodHandle&,bool) shouldn't be virtual Message-ID: http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 > 2 lines changed: 0 ins; 0 del; 2 mod; Hi all, could you please review this trivial one-liner which removes virtual specifier from C2Compiler::is_intrinsic_supported(methodHandle&,bool)? from JBS: > C2Compiler::is_intrinsic_supported(methodHandle&,bool) is declared by C2Compiler which doesn't and shouldn't have any subclasses. webrev: http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8250738 Thanks, -- Igor From xxinliu at amazon.com Tue Jul 28 22:01:40 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Tue, 28 Jul 2020 22:01:40 +0000 Subject: RFR(T) : 8250738 : C2Compiler::is_intrinsic_supported(methodHandle&, bool) shouldn't be virtual In-Reply-To: References: Message-ID: <1595973700138.36641@amazon.com> hi, Igor, Reviewed your code. you are right. I don't think it intends to be virtual. thanks, --lx ________________________________________ From: hotspot-compiler-dev on behalf of Igor Ignatyev Sent: Tuesday, July 28, 2020 2:40 PM To: hotspot compiler Subject: [EXTERNAL] RFR(T) : 8250738 : C2Compiler::is_intrinsic_supported(methodHandle&, bool) shouldn't be virtual CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 > 2 lines changed: 0 ins; 0 del; 2 mod; Hi all, could you please review this trivial one-liner which removes virtual specifier from C2Compiler::is_intrinsic_supported(methodHandle&,bool)? from JBS: > C2Compiler::is_intrinsic_supported(methodHandle&,bool) is declared by C2Compiler which doesn't and shouldn't have any subclasses. webrev: http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 JBS: https://bugs.openjdk.java.net/browse/JDK-8250738 Thanks, -- Igor From vladimir.x.ivanov at oracle.com Tue Jul 28 22:09:54 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 01:09:54 +0300 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): Hotspot and x86 backend changes In-Reply-To: References: Message-ID: > Shared Hotspot: > Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/hs_webrev/webrev.01/ > Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/hs_webrev/webrev.00-webrev.01/ FTR here are the latest changes in HotSpot shared code: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 Incremental changes: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 Best regards, Vladimir Ivanov > Older webrev links for your reference: > Shared Hotspot: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/ From vladimir.kozlov at oracle.com Tue Jul 28 22:21:25 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2020 15:21:25 -0700 Subject: RFR(T) : 8250739 : remove Compile::Generate_*_Graph methods declarations In-Reply-To: <122EF98B-1ED5-4813-8D7C-7F6326D8ABD7@oracle.com> References: <122EF98B-1ED5-4813-8D7C-7F6326D8ABD7@oracle.com> Message-ID: Cleanup is good and trivial. The code was removed in JDK 6 as part of preparing for tiered JIT system "JDK-5082720: Remove adapter frames". Thanks, Vladimir K On 7/28/20 2:38 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ >> 8 lines changed: 0 ins; 8 del; 0 mod; > > Hi all, > > could you please review this trivial cleanup? > > from JBS: >> Compile::Generate_Compiled_To_Interpreter_Graph and Generate_Interpreter_To_Compiled_Graph methods are declared but not defined (and not used) > > webrev: http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ > JBS: https://bugs.openjdk.java.net/browse/JDK-8250739 > > Thanks, > -- Igor > From vladimir.x.ivanov at oracle.com Tue Jul 28 22:29:41 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 01:29:41 +0300 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: References: Message-ID: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Hi, Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! Here are the latest changes for Vector API support in HotSpot shared code: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 Incremental changes (diff against webrev.00): http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes. Detailed summary: - rebased to jdk/jdk tip; - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); - got rid of x86-specific changes in shared code; - fix for 8244867 [1]; - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics - numerous minor cleanups Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 http://jbs.oracle.com/browse/JDK-8244867 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. On 04.04.2020 02:12, Vladimir Ivanov wrote: > Hi, > > Following up on review requests of API [0] and Java implementation [1] > for Vector API (JEP 338 [2]), here's a request for review of general > HotSpot changes (in shared code) required for supporting the API: > > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ > > > (First of all, to set proper expectations: since the JEP is still in > Candidate state, the intention is to initiate preliminary round(s) of > review to inform the community and gather feedback before sending out > final/official RFRs once the JEP is Targeted to a release.) > > Vector API (being developed in Project Panama [3]) relies on JVM support > to utilize optimal vector hardware instructions at runtime. It interacts > with JVM through intrinsics (declared in > jdk.internal.vm.vector.VectorSupport [4]) which expose vector operations > support in C2 JIT-compiler. > > As Paul wrote earlier: "A vector intrinsic is an internal low-level > vector operation. The last argument to the intrinsic is fall back > behavior in Java, implementing the scalar operation over the number of > elements held by the vector.? Thus, If the intrinsic is not supported in > C2 for the other arguments then the Java implementation is executed (the > Java implementation is always executed when running in the interpreter > or for C1)." > > The rest of JVM support is about aggressively optimizing vector boxes to > minimize (ideally eliminate) the overhead of boxing for vector values. > It's a stop-the-gap solution for vector box elimination problem until > inline classes arrive. Vector classes are value-based and in the longer > term will be migrated to inline classes once the support becomes available. > > Vector API talk from JVMLS'18 [5] contains brief overview of JVM > implementation and some details. > > Complete implementation resides in vector-unstable branch of panama/dev > repository [6]. > > Now to gory details (the patch is split in multiple "sub-webrevs"): > > =========================================================== > > (1) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ > > > Ideal vector nodes for new operations introduced by Vector API. > > (Platform-specific back end support will be posted for review separately). > > =========================================================== > > (2) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ > > > JVM Java interface (VectorSupport) and intrinsic support in C2. > > Vector instances are initially represented as VectorBox macro nodes and > "unboxing" is represented by VectorUnbox node. It simplifies vector box > elimination analysis and the nodes are expanded later right before EA pass. > > Vectors have 2-level on-heap representation: for the vector value > primitive array is used as a backing storage and it is encapsulated in a > typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a int[8] > instance which is used to store vector value). > > Unless VectorBox node goes away, it needs to be expanded into an > allocation eventually, but it is a pure node and doesn't have any JVM > state associated with it. The problem is solved by keeping JVM state > separately in a VectorBoxAllocate node associated with VectorBox node > and use it during expansion. > > Also, to simplify vector box elimination, inlining of vector reboxing > calls (VectorSupport::maybeRebox) is delayed until the analysis is over. > > =========================================================== > > (3) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ > > > Vector box elimination analysis implementation. (Brief overview: slides > #36-42 [5].) > > The main part is devoted to scalarization across safepoints and > rematerialization support during deoptimization. In C2-generated code > vector operations work with raw vector values which live in registers or > spilled on the stack and it allows to avoid boxing/unboxing when a > vector value is alive across a safepoint. As with other values, there's > just a location of the vector value at the safepoint and vector type > information recorded in the relevant nmethod metadata and all the > heavy-lifting happens only when rematerialization takes place. > > The analysis preserves object identity invariants except during > aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). > > (Aggressive reboxing is crucial for cases when vectors "escape": it > allocates a fresh instance at every escape point thus enabling original > instance to go away.) > > =========================================================== > > (4) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ > > > HotSpot changes for jdk.incubator.vector module. Vector support is > makred experimental and turned off by default. JEP 338 proposes the API > to be released as an incubator module, so a user has to specify > "--add-module jdk.incubator.vector" on the command line to be able to > use it. > When user does that, JVM automatically enables Vector API support. > It improves usability (user doesn't need to separately "open" the API > and enable JVM support) while minimizing risks of destabilitzation from > new code when the API is not used. > > > That's it! Will be happy to answer any questions. > > And thanks in advance for any feedback! > > Best regards, > Vladimir Ivanov > > [0] > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html > > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html > > [2] https://openjdk.java.net/jeps/338 > > [3] https://openjdk.java.net/projects/panama/ > > [4] > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html > > > [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf > > [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 > > ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b vector-unstable From vladimir.kozlov at oracle.com Tue Jul 28 22:27:26 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Tue, 28 Jul 2020 15:27:26 -0700 Subject: RFR(T) : 8250738 : C2Compiler::is_intrinsic_supported(methodHandle&,bool) shouldn't be virtual In-Reply-To: References: Message-ID: <19167625-9430-3230-87ea-e67b86728de0@oracle.com> Good and trivial. Thanks, Vladimir K On 7/28/20 2:40 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 >> 2 lines changed: 0 ins; 0 del; 2 mod; > > Hi all, > > could you please review this trivial one-liner which removes virtual specifier from C2Compiler::is_intrinsic_supported(methodHandle&,bool)? > > from JBS: >> C2Compiler::is_intrinsic_supported(methodHandle&,bool) is declared by C2Compiler which doesn't and shouldn't have any subclasses. > > webrev: http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 > JBS: https://bugs.openjdk.java.net/browse/JDK-8250738 > > Thanks, > -- Igor > From igor.ignatyev at oracle.com Tue Jul 28 22:31:58 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 28 Jul 2020 15:31:58 -0700 Subject: RFR(T) : 8250739 : remove Compile::Generate_*_Graph methods declarations In-Reply-To: References: <122EF98B-1ED5-4813-8D7C-7F6326D8ABD7@oracle.com> Message-ID: <7EBECEA2-FB20-4A46-8FBA-A3CA6DD5055D@oracle.com> Thanks Vladimir, pushed. -- Igor > On Jul 28, 2020, at 3:21 PM, Vladimir Kozlov wrote: > > Cleanup is good and trivial. > > The code was removed in JDK 6 as part of preparing for tiered JIT system "JDK-5082720: Remove adapter frames". > > Thanks, > Vladimir K > > On 7/28/20 2:38 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ >>> 8 lines changed: 0 ins; 8 del; 0 mod; >> Hi all, >> could you please review this trivial cleanup? >> from JBS: >>> Compile::Generate_Compiled_To_Interpreter_Graph and Generate_Interpreter_To_Compiled_Graph methods are declared but not defined (and not used) >> webrev: http://cr.openjdk.java.net/~iignatyev//8250739/webrev.00/ >> JBS: https://bugs.openjdk.java.net/browse/JDK-8250739 >> Thanks, >> -- Igor From igor.ignatyev at oracle.com Tue Jul 28 22:33:56 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Tue, 28 Jul 2020 15:33:56 -0700 Subject: RFR(T) : 8250738 : C2Compiler::is_intrinsic_supported(methodHandle&,bool) shouldn't be virtual In-Reply-To: <19167625-9430-3230-87ea-e67b86728de0@oracle.com> References: <19167625-9430-3230-87ea-e67b86728de0@oracle.com> Message-ID: <1402219D-0403-47AD-82E0-1E73C7434815@oracle.com> Vladimir, Xin, thank you for your reviews, pushed. -- Igor > On Jul 28, 2020, at 3:27 PM, Vladimir Kozlov wrote: > > Good and trivial. > > Thanks, > Vladimir K > > On 7/28/20 2:40 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 >>> 2 lines changed: 0 ins; 0 del; 2 mod; >> Hi all, >> could you please review this trivial one-liner which removes virtual specifier from C2Compiler::is_intrinsic_supported(methodHandle&,bool)? >> from JBS: >>> C2Compiler::is_intrinsic_supported(methodHandle&,bool) is declared by C2Compiler which doesn't and shouldn't have any subclasses. >> webrev: http://cr.openjdk.java.net/~iignatyev//8250738/webrev.00 >> JBS: https://bugs.openjdk.java.net/browse/JDK-8250738 >> Thanks, >> -- Igor From kim.barrett at oracle.com Wed Jul 29 00:34:35 2020 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 28 Jul 2020 20:34:35 -0400 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> Message-ID: <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com> > On Jul 28, 2020, at 12:22 PM, Ludovic Henry wrote: > > Hi, > > I confirm that `= delete` works, and that we get a compile-time error if you try to use it. > > Please find the updated webrev at http://cr.openjdk.java.net/~burban/luhenry/8248672/webrev.01 > > Thank you > Ludovic Looks good. Probably the bug title should be updated. From felix.yang at huawei.com Wed Jul 29 03:20:34 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 29 Jul 2020 03:20:34 +0000 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: References: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> Message-ID: Hi, Thanks for reviewing this. Committed to jdk/submit repo and test result received looks good. Will do the push. Thanks, Felix > -----Original Message----- > From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > Sent: Wednesday, July 29, 2020 1:28 AM > To: Yangfei (Felix) ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > > Yes, this looks good. > > Thanks, > Vladimir K > > On 7/28/20 5:10 AM, Yangfei (Felix) wrote: > > Hi, > > > >> -----Original Message----- > >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >> Sent: Tuesday, July 28, 2020 3:15 AM > >> To: Yangfei (Felix) ; hotspot-compiler- > >> dev at openjdk.java.net > >> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > >> > >> It happens because 'lo' is new node created just now and have no uses > yet. > >> For such new nodes we usually add dummy use to avoid removal from > graph: > >> > >> http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/shar > >> e/op > >> to/convertnode.cpp#l403 > > > > Thanks for the suggestions. Yes, that will also fix the issue. > > New webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.01/ > > Performed the same tests as before. Does it look better? > > > > Felix > > > >> On 7/27/20 5:27 AM, Yangfei (Felix) wrote: > >>> Hi, > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 > >>> Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ > >>> > >>> In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a > >>> single > >> CmpU. > >>> At the crash site in IfNode::fold_compares_helper: > >>> 995 if (lo && hi) { > >>> 996 // Merge the two compares into a single unsigned compare by > >> building (CmpU (n - lo) (hi - lo)) > >>> 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); > >>> 998 if (adjusted_lim == NULL) { > >>> 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); > >>> 1000 } > >>> > >>> At line 997, we have: > >>> (gdb) p lo->dump() > >>> 641 AddI === _ 513 92 [[]] > >>> $1 = void > >>> > >>> After the transformation at line 997, we have > >>> (gdb) p lo->dump() > >>> 641 AddI === _ _ _ [[]] [34200641] > >>> $3 = void > >>> > >>> Then node 641 was used at line 999, which triggers the crash. > >>> Patch fixes the issue by delaying transformation in > >>> IfNode::fold_compares > >> temporarily. > >>> Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. > >>> Newly added test fail without the patch and pass otherwise. > >>> Suggestions? > >>> > >>> Thanks, > >>> Felix > >>> From jiefu at tencent.com Wed Jul 29 03:43:22 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Wed, 29 Jul 2020 03:43:22 +0000 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent Message-ID: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> Hi all, This bug[1] was first observed while testing Panama's vector api on AVX512 machines. During the discussion on panama-dev, Vladimir Ivanov pointed out that this is not Vector API-specific[2]. So it would be better to fix the potential bug in the auto-vectorizer. JBS: https://bugs.openjdk.java.net/browse/JDK-8250745 Webrev: http://cr.openjdk.java.net/~jiefu/8250745/webrev.00/ Testing: 1. jdk/jdk: tier1-3 on Linux/x64 AVX512 machines 2. Panama(vectorIntrinsics): jdk/incubator/vector on Linux/x64 AVX512 machines Thanks a lot. Best regards, Jie [1] https://bugs.openjdk.java.net/browse/JDK-8250675 [2] https://mail.openjdk.java.net/pipermail/panama-dev/2020-July/010113.html From christian.hagedorn at oracle.com Wed Jul 29 07:03:10 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Wed, 29 Jul 2020 09:03:10 +0200 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: References: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> Message-ID: Hi Felix Looks good to me. Just some minor comments (no new webrev required): - L996: The asterisk should be at the type (Node*) - L997/1003: You could remove one extra whitespace before the comment starts Best regards, Christian On 29.07.20 05:20, Yangfei (Felix) wrote: > Hi, > > Thanks for reviewing this. > Committed to jdk/submit repo and test result received looks good. > Will do the push. > > Thanks, > Felix > >> -----Original Message----- >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >> Sent: Wednesday, July 29, 2020 1:28 AM >> To: Yangfei (Felix) ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares >> >> Yes, this looks good. >> >> Thanks, >> Vladimir K >> >> On 7/28/20 5:10 AM, Yangfei (Felix) wrote: >>> Hi, >>> >>>> -----Original Message----- >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] >>>> Sent: Tuesday, July 28, 2020 3:15 AM >>>> To: Yangfei (Felix) ; hotspot-compiler- >>>> dev at openjdk.java.net >>>> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares >>>> >>>> It happens because 'lo' is new node created just now and have no uses >> yet. >>>> For such new nodes we usually add dummy use to avoid removal from >> graph: >>>> >>>> http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/shar >>>> e/op >>>> to/convertnode.cpp#l403 >>> >>> Thanks for the suggestions. Yes, that will also fix the issue. >>> New webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.01/ >>> Performed the same tests as before. Does it look better? >>> >>> Felix >>> >>>> On 7/27/20 5:27 AM, Yangfei (Felix) wrote: >>>>> Hi, >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 >>>>> Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ >>>>> >>>>> In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a >>>>> single >>>> CmpU. >>>>> At the crash site in IfNode::fold_compares_helper: >>>>> 995 if (lo && hi) { >>>>> 996 // Merge the two compares into a single unsigned compare by >>>> building (CmpU (n - lo) (hi - lo)) >>>>> 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); >>>>> 998 if (adjusted_lim == NULL) { >>>>> 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); >>>>> 1000 } >>>>> >>>>> At line 997, we have: >>>>> (gdb) p lo->dump() >>>>> 641 AddI === _ 513 92 [[]] >>>>> $1 = void >>>>> >>>>> After the transformation at line 997, we have >>>>> (gdb) p lo->dump() >>>>> 641 AddI === _ _ _ [[]] [34200641] >>>>> $3 = void >>>>> >>>>> Then node 641 was used at line 999, which triggers the crash. >>>>> Patch fixes the issue by delaying transformation in >>>>> IfNode::fold_compares >>>> temporarily. >>>>> Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. >>>>> Newly added test fail without the patch and pass otherwise. >>>>> Suggestions? >>>>> >>>>> Thanks, >>>>> Felix >>>>> From felix.yang at huawei.com Wed Jul 29 07:17:47 2020 From: felix.yang at huawei.com (Yangfei (Felix)) Date: Wed, 29 Jul 2020 07:17:47 +0000 Subject: RFR(S): 8250609: C2 crash in IfNode::fold_compares In-Reply-To: References: <23f0ab18-8bfb-3874-3000-ee2b37caca7c@oracle.com> Message-ID: Hi Christian, Thanks for the careful reviewing :-) That will be easy to fix and I will modify when I push. Felix > -----Original Message----- > From: Christian Hagedorn [mailto:christian.hagedorn at oracle.com] > Sent: Wednesday, July 29, 2020 3:03 PM > To: Yangfei (Felix) ; Vladimir Kozlov > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > > Hi Felix > > Looks good to me. > > Just some minor comments (no new webrev required): > - L996: The asterisk should be at the type (Node*) > - L997/1003: You could remove one extra whitespace before the comment > starts > > Best regards, > Christian > > On 29.07.20 05:20, Yangfei (Felix) wrote: > > Hi, > > > > Thanks for reviewing this. > > Committed to jdk/submit repo and test result received looks good. > > Will do the push. > > > > Thanks, > > Felix > > > >> -----Original Message----- > >> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >> Sent: Wednesday, July 29, 2020 1:28 AM > >> To: Yangfei (Felix) ; hotspot-compiler- > >> dev at openjdk.java.net > >> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > >> > >> Yes, this looks good. > >> > >> Thanks, > >> Vladimir K > >> > >> On 7/28/20 5:10 AM, Yangfei (Felix) wrote: > >>> Hi, > >>> > >>>> -----Original Message----- > >>>> From: Vladimir Kozlov [mailto:vladimir.kozlov at oracle.com] > >>>> Sent: Tuesday, July 28, 2020 3:15 AM > >>>> To: Yangfei (Felix) ; hotspot-compiler- > >>>> dev at openjdk.java.net > >>>> Subject: Re: RFR(S): 8250609: C2 crash in IfNode::fold_compares > >>>> > >>>> It happens because 'lo' is new node created just now and have no > >>>> uses > >> yet. > >>>> For such new nodes we usually add dummy use to avoid removal from > >> graph: > >>>> > >>>> http://hg.openjdk.java.net/jdk/jdk/file/c379dc750a02/src/hotspot/sh > >>>> ar > >>>> e/op > >>>> to/convertnode.cpp#l403 > >>> > >>> Thanks for the suggestions. Yes, that will also fix the issue. > >>> New webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.01/ > >>> Performed the same tests as before. Does it look better? > >>> > >>> Felix > >>> > >>>> On 7/27/20 5:27 AM, Yangfei (Felix) wrote: > >>>>> Hi, > >>>>> > >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8250609 > >>>>> Webrev: http://cr.openjdk.java.net/~fyang/8250609/webrev.00/ > >>>>> > >>>>> In IfNode::fold_compares_helper, C2 tries to fold 2 CmpI into a > >>>>> single > >>>> CmpU. > >>>>> At the crash site in IfNode::fold_compares_helper: > >>>>> 995 if (lo && hi) { > >>>>> 996 // Merge the two compares into a single unsigned compare by > >>>> building (CmpU (n - lo) (hi - lo)) > >>>>> 997 Node* adjusted_val = igvn->transform(new SubINode(n, lo)); > >>>>> 998 if (adjusted_lim == NULL) { > >>>>> 999 adjusted_lim = igvn->transform(new SubINode(hi, lo)); > >>>>> 1000 } > >>>>> > >>>>> At line 997, we have: > >>>>> (gdb) p lo->dump() > >>>>> 641 AddI === _ 513 92 [[]] > >>>>> $1 = void > >>>>> > >>>>> After the transformation at line 997, we have > >>>>> (gdb) p lo->dump() > >>>>> 641 AddI === _ _ _ [[]] [34200641] > >>>>> $3 = void > >>>>> > >>>>> Then node 641 was used at line 999, which triggers the crash. > >>>>> Patch fixes the issue by delaying transformation in > >>>>> IfNode::fold_compares > >>>> temporarily. > >>>>> Tier1-3 tested on aarch64-linux-gnu & x86_64-linux-gnu. > >>>>> Newly added test fail without the patch and pass otherwise. > >>>>> Suggestions? > >>>>> > >>>>> Thanks, > >>>>> Felix > >>>>> From tobias.hartmann at oracle.com Wed Jul 29 07:38:50 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Wed, 29 Jul 2020 09:38:50 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: <1595969785292.62158@amazon.com> References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> Message-ID: Hi Xin, On 28.07.20 22:56, Liu, Xin wrote: > http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ Overall looks good to me. Some style comments: - Add a comment to 'DirectiveSetPtr' to describe its purpose - Why not put the "cloned" logic in "operator->"? - Do not use the _clone pointer as boolean (see "Miscellaneous" section in the style guide [1]) - Indentation in line 301-303 is wrong - Line 306 use brackets around the "else" and move it one line up "} else {" Best regards, Tobias [1] https://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/hotspot-style.html From sergei.tsypanov at yandex.ru Wed Jul 29 07:43:10 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Wed, 29 Jul 2020 09:43:10 +0200 Subject: Performance degradation due to probable (?) C2 issue In-Reply-To: References: <925401595926726@mail.yandex.ru> Message-ID: <79821596008482@mail.yandex.ru> Hi Andrew, your suggestion was correct: with -XX:InlineSmallCode=1000 patched code works faster than original as expected. Thanks for explaining that to me! 28.07.2020, 14:12, "Andrew Haley" : > Hi, > > On 28/07/2020 11:35, ?????? ??????? wrote: > >> ?So my question is whether there's something wrong with compier of >> ?the original idea of improvement was wrong? > > No, and (probably) no. > > C2 uses a bunch of of heuristics. Here, it's detected that > CharArrayWriter::toString is large and has already been compiled so > there's no sense inlining another copy of it. This isn't necessarily > true, but it's a good guess. Try playing with InlineSmallCode: start > with =1000, and increases it from there to see if it helps. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.x.ivanov at oracle.com Wed Jul 29 09:20:10 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 12:20:10 +0300 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent In-Reply-To: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> References: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> Message-ID: > Webrev: http://cr.openjdk.java.net/~jiefu/8250745/webrev.00/ Looks good. FTR the bug was introduced by JDK-8241040, but I don't see a way it can be hit by auto-vectorizer: before it kicks in, scalar code is strongly normalized and constants are pushed to the right. It leads to the shape where (Replicate -1) is always the second input of bitwise NOT shape (XorV v (Replicate -1)). Since there are no GVN transformations happening for vector nodes, both left-hand and right-hand variants become possible with Vector API. Best regards, Vladimir Ivanov From aph at redhat.com Wed Jul 29 11:44:50 2020 From: aph at redhat.com (Andrew Haley) Date: Wed, 29 Jul 2020 12:44:50 +0100 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> Message-ID: <852a3a09-a627-c0fc-89c6-8c8100ae17f5@redhat.com> On 20/07/2020 04:51, Ningsheng Jian wrote: > Since we are getting ready to propose Vector API target to JDK 16 [1]. I > have regenerated webrev of aarch64 backend parts from panama repo, which > has been rebased to jdk/jdk very recently, by: > > $ hg update vector-unstable && hg diff -r default > all.patch > $ grep "diff -r" all.patch | grep -e "src/hotspot/cpu/aarch64" | awk > '{print $4}' > aarch64_list > $ ksh ./webrev.ksh -r default -o aarch64_webrev aarch64_list > > The new webrev: > http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ > > Could you please help to take a look? OK, thanks. It all looks fine. Sorry for the delay. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From luhenry at microsoft.com Wed Jul 29 13:59:47 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 29 Jul 2020 13:59:47 +0000 Subject: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC In-Reply-To: <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com> References: <5e301790-8bfe-0ced-b5e2-8a9c76ae33de@oracle.com> <1259c3fd-b69c-6d81-0427-cb769f00bca5@redhat.com> , <116277BD-EA21-49AA-8DE1-DBC06ED43C43@oracle.com> Message-ID: Hi Kim, I just had it updated. Thanks ________________________________________ From: Kim Barrett Sent: Tuesday, July 28, 2020 17:34 To: Ludovic Henry Cc: Andrew Haley; Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net; aarch64-port-dev at openjdk.java.net; openjdk-aarch64; hotspot-gc-dev at openjdk.java.net Subject: Re: [aarch64-port-dev ] RFR[XXS] 8248672: utilities: Introduce DEPRECATED macro for GCC and MSVC > On Jul 28, 2020, at 12:22 PM, Ludovic Henry wrote: > > Hi, > > I confirm that `= delete` works, and that we get a compile-time error if you try to use it. > > Please find the updated webrev at https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2F8248672%2Fwebrev.01&data=02%7C01%7Cluhenry%40microsoft.com%7C80a04191ac684f7ed64008d833577d4d%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637315798145154680&sdata=FLhVIXFkMZpRfpuV2jiVoZmqcm8dHKfg8SgNtQJSUrE%3D&reserved=0 > > Thank you > Ludovic Looks good. Probably the bug title should be updated. From volker.simonis at gmail.com Wed Jul 29 14:34:08 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Wed, 29 Jul 2020 16:34:08 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> Message-ID: On Wed, Jul 29, 2020 at 9:38 AM Tobias Hartmann wrote: > > Hi Xin, > > On 28.07.20 22:56, Liu, Xin wrote: > > http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ > > Overall looks good to me. > > Some style comments: > - Add a comment to 'DirectiveSetPtr' to describe its purpose > - Why not put the "cloned" logic in "operator->"? Because there's also a "read-only" access of the DirectiveSetPtr which doesn't mutate its content and therefore should clone the underlying DirectiveSet. See my first mail where I proposed to add a second, `const`-version of "operator->". But that still required const casts in the places where we didn't want to clone. I've therefore voted for the new "cloned()" method which makes cloning and mutating explicit and which is much easier to understand from my point of view (compared to two overloaded operators). > - Do not use the _clone pointer as boolean (see "Miscellaneous" section in the style guide [1]) > - Indentation in line 301-303 is wrong > - Line 306 use brackets around the "else" and move it one line up "} else {" > > Best regards, > Tobias > > [1] https://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/hotspot-style.html From jatin.bhateja at intel.com Wed Jul 29 14:45:49 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Wed, 29 Jul 2020 14:45:49 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Message-ID: Hi Vladimir, Thanks for the pointers, following is the link to updated patch: http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ > I'd prefer to see a uniform Ideal IR shape being used irrespective of > whether the argument is a constant or not. It should also simplify the > logic in SuperWord and make it easier to support on non-x86 architectures. > > For example, here's how it is done on AArch64: > > instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ > predicate(n->as_Vector()->length() == 4); > match(Set dst (LShiftVI src (LShiftCntV shift))); ... > Graph shape has been made consistent, we could have also optimized the pattern for ARM port for immediate shifts. > # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), > pid=5476, tid=6219 > # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize should > return new nodes, use Identity to return old nodes > > I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal > which can return pre-contructed constants. I suggest to get rid of > Ideal() methods and move constant folding logic into Node::Value() (as > implemented for other bitwise/arithmetic nodes in > addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic approach > since it enables richer type information (ranges vs constants) and IMO it's > more convenient to work with constants through Types than ConNodes. I have removed RotateLeftNode/RotateRightNode::Ideal routines since we are anyways doing constant folding in LShiftI/URShiftI value routines. Since JAVA rotate APIs are no longer intrincified hence these routines may no longer be useful. > > It would be really nice to migrate to MacroAssembler along the way (as a > cleanup). I guess you are saying remove opcodes/encoding from patterns and move then to Assembler, Can we take this cleanup activity separately since other patterns are also using these matcher directives. Other synthetic comments have been taken care of. I have extended the Test to cover all the newly added scalar transforms. Kindly let me know if there other comments. Best Regards, Jatin > -----Original Message----- > From: Vladimir Ivanov > Sent: Friday, July 24, 2020 3:21 AM > To: Bhateja, Jatin > Cc: Viswanathan, Sandhya ; Andrew Haley > ; hotspot-compiler-dev at openjdk.java.net > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 > > Hi Jatin, > > > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ > > Much better! Thanks. > > > Change Summary: > > > > 1) Unified the handling for scalar rotate operation. All scalar rotate > selection patterns are now dependent on newly created > RotateLeft/RotateRight nodes. This promotes rotate inferencing. Currently > if DAG nodes corresponding to a sub-pattern are shared (have multiple > users) then existing complex patterns based on Or/LShiftL/URShift does not > get matched and this prevents inferring rotate nodes. Please refer to > JIT'ed assembly output with baseline[1] and with patch[2] . We can see that > generated code size also went done from 832 byte to 768 bytes. Also this > can cause perf degradation if shift-or dependency chain appears inside a > hot region. > > > > 2) Due to enhanced rotate inferencing new patch shows better performance > even for legacy targets (non AVX-512). Please refer to the perf result[3] > over AVX2 machine for JMH benchmark part of the patch. > > Very nice! > > 3) As suggested, removed Java API intrinsification changes and scalar > rotate transformation are done during OrI/OrL node idealizations. > > Good. > > (Still would be nice to factor the matching code from Ideal() and share it > between multiple use sites. Especially considering OrVNode::Ideal() now > does basically the same thing. As an example/idea, take a look at > is_bmi_pattern() in x86.ad.) > > > 4) SLP always gets to work on new scalar Rotate nodes and creates vector > rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes if > target does not supports vector rotates(non-AVX512). > > Good. > > > 5) Added new instruction patterns for vector shift Left/Right operations > with constant shift operands. This prevents emitting extra moves to XMM. > > +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ > + match(Set dst (LShiftVI src shift)); > > I'd prefer to see a uniform Ideal IR shape being used irrespective of > whether the argument is a constant or not. It should also simplify the > logic in SuperWord and make it easier to support on non-x86 architectures. > > For example, here's how it is done on AArch64: > > instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ > predicate(n->as_Vector()->length() == 4); > match(Set dst (LShiftVI src (LShiftCntV shift))); ... > > > 6) Constant folding scenarios are covered in RotateLeft/RotateRight > idealization, inferencing of vector rotate through OrV idealization covers > the vector patterns generated though non SLP route i.e. VectorAPI. > > I'm fine with keeping OrV::Ideal(), but I'm concerned with the general > direction here - duplication of scalar transformations to lane-wise vector > operations. It definitely won't scale and in a longer run it risks to > diverge. Would be nice to find a way to automatically "lift" > scalar transformations to vectors and apply them uniformly. But right now > it is just an idea which requires more experimentation. > > > Some other minor comments/suggestions: > > + // Swap the computed left and right shift counts. > + if (is_rotate_left) { > + Node* temp = shiftRCnt; > + shiftRCnt = shiftLCnt; > + shiftLCnt = temp; > + } > > Maybe use swap() here (declared in globalDefinitions.hpp)? > > > + if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) > + return true; > > Please, don't omit curly braces (even for simple cases). > > > -// Rotate Right by variable > -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 zero, > rFlagsReg cr) > +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) > %{ > - match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero > shift)))); > - > + predicate(!VM_Version::supports_bmi2() && > n->bottom_type()->basic_type() == T_INT); > + match(Set dst (RotateRight dst shift)); > + format %{ "rorl $dst, $shift" %} > expand %{ > - rorI_rReg_CL(dst, shift, cr); > + rorI_rReg_imm8(dst, shift, cr); > %} > > It would be really nice to migrate to MacroAssembler along the way (as a > cleanup). > > > Please push the patch through your testing framework and let me know your > review feedback. > > There's one new assertion failure: > > # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), > pid=5476, tid=6219 > # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize should > return new nodes, use Identity to return old nodes > > I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal > which can return pre-contructed constants. I suggest to get rid of > Ideal() methods and move constant folding logic into Node::Value() (as > implemented for other bitwise/arithmethic nodes in > addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic approach > since it enables richer type information (ranges vs constants) and IMO it's > more convenient to work with constants through Types than ConNodes. > > (I suspect that original/expanded IR shape may already provide more precise > type info for non-constant case which can affect the benchmarks.) > > Best regards, > Vladimir Ivanov > > > > > Best Regards, > > Jatin > > > > [1] > > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. > > txt [2] > > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_asm > > .txt [3] > > http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_patc > > h.txt > > > > > >> -----Original Message----- > >> From: Vladimir Ivanov > >> Sent: Saturday, July 18, 2020 12:25 AM > >> To: Bhateja, Jatin ; Andrew Haley > >> > >> Cc: Viswanathan, Sandhya ; > >> hotspot-compiler- dev at openjdk.java.net > >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for > >> X86 > >> > >> Hi Jatin, > >> > >>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ > >> > >> It definitely looks better, but IMO it hasn't reached the sweet spot > yet. > >> It feels like the focus is on auto-vectorizer while the burden is put > >> on scalar cases. > >> > >> First of all, considering GVN folds relevant operation patterns into > >> a single Rotate node now, what's the motivation to introduce intrinsics? > >> > >> Another point is there's still significant duplication for scalar cases. > >> > >> I'd prefer to see the legacy cases which rely on pattern matching to > >> go away and be substituted with instructions which match Rotate > >> instructions (migrating ). > >> > >> I understand that it will penalize the vectorization implementation, > >> but IMO reducing overall complexity is worth it. On auto-vectorizer > >> side, I see > >> 2 ways to fix it: > >> > >> (1) introduce additional AD instructions for > >> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; > >> > >> (2) in SuperWord::output(), when matcher doesn't support > >> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), > >> generate vectorized version of the original pattern. > >> > >> Overall, it looks like more and more focus is made on scalar part. > >> Considering the main goal of the patch is to enable vectorization, > >> I'm fine with separating cleanup of scalar part. As an interim > >> solution, it seems that leaving the scalar part as it is now and > >> matching scalar bit rotate pattern in VectorNode::is_rotate() should > >> be enough to keep the vectorization part functioning. Then scalar > >> Rotate nodes and relevant cleanups can be integrated later. (Or vice > >> versa: clean up scalar part first and then follow up with > >> vectorization.) > >> > >> Some other comments: > >> > >> * There's a lot of duplication between OrINode::Ideal and > OrLNode::Ideal. > >> What do you think about introducing a super type > >> (OrNode) and put a unified version (OrNode::Ideal) there? > >> > >> > >> * src/hotspot/cpu/x86/x86.ad > >> > >> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ > >> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT > || > >> + n->bottom_type()->is_vect()->element_basic_type() == > >> +T_LONG); > >> > >> +instruct vprorate(vec dst, vec src, vec shift) %{ > >> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT > || > >> + n->bottom_type()->is_vect()->element_basic_type() == > >> +T_LONG); > >> > >> The predicates are redundant here. > >> > >> > >> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > >> > >> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, > >> XMMRegister dst, XMMRegister src, > >> + int shift, int vector_len) { > >> + if (opcode == Op_RotateLeftV) { > >> + if (etype == T_INT) { > >> + evprold(dst, src, shift, vector_len); > >> + } else { > >> + evprolq(dst, src, shift, vector_len); > >> + } > >> > >> Please, put an assert for the false case (assert(etype == T_LONG, > "...")). > >> > >> > >> * On testing (with previous version of the patch): -XX:UseAVX is x86- > >> specific flag, so new/adjusted tests now fail on non-x86 platforms. > >> Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions > >> will solve the issue. > >> > >> Best regards, > >> Vladimir Ivanov > >> > >>> > >>> > >>> Summary of changes: > >>> 1) Optimization is specifically targeted to exploit vector rotation > >> instruction added for X86 AVX512. A single rotate instruction > >> encapsulates entire vector OR/SHIFTs pattern thus offers better > >> latency at reduced instruction count. > >>> > >>> 2) There were two approaches to implement this: > >>> a) Let everything remain the same and add new wide complex > >> instruction patterns in the matcher for e.g. > >>> set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI > >>> shift)) > >> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate > >> shift)) > >>> It would have been an overoptimistic assumption to expect that > >>> graph > >> shape would be preserved till the matcher for correct inferencing. > >>> In addition we would have required multiple such bulky patterns. > >>> b) Create new RotateLeft/RotateRight scalar nodes, these gets > >> generated during intrinsification as well as during additional > >> pattern > >>> matching during node Idealization, later on these nodes are > >>> consumed > >> by SLP for valid vectorization scenarios to emit their vector > >>> counterparts which eventually emits vector rotates. > >>> > >>> 3) I choose approach 2b) since its cleaner, only problem here was > >>> that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should > >>> either be > >> dismantled back to OR/SHIFT pattern or we penalize the vectorization > >> which would be very costly, other option would have been to add > >> additional vector rotate pattern for UseAVX=3 in the matcher which > >> emit vector OR-SHIFTs instruction but then it will loose on emitting > >> efficient instruction sequence which node sharing > >> (OrV/LShiftV/URShift) offer in current implementation - thus it will > >> not be beneficial for non-AVX512 targets, only saving will be in > >> terms of cleanup of few existing scalar rotate matcher patterns, also > >> old targets does not offer this powerful rotate instruction. > >> Therefore new scalar nodes are created only for AVX512 targets. > >>> > >>> As per suggestions constant folding scenarios have been covered > >>> during > >> Idealizations of newly added scalar nodes. > >>> > >>> Please review the latest version and share your feedback and test > >> results. > >>> > >>> Best Regards, > >>> Jatin > >>> > >>> > >>>> -----Original Message----- > >>>> From: Andrew Haley > >>>> Sent: Saturday, July 11, 2020 2:24 PM > >>>> To: Vladimir Ivanov ; Bhateja, Jatin > >>>> ; hotspot-compiler-dev at openjdk.java.net > >>>> Cc: Viswanathan, Sandhya > >>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification > >>>> for > >>>> X86 > >>>> > >>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: > >>>> > >>>> > High-level comment: so far, there were no pressing need in > > >>>> explicitly marking the methods as intrinsics. ROR/ROL instructions > >>>> > were selected during matching [1]. Now the patch introduces > > >>>> dedicated nodes > >>>> (RotateLeft/RotateRight) specifically for intrinsics > which > >>>> partly duplicates existing logic. > >>>> > >>>> The lack of rotate nodes in the IR has always meant that AArch64 > >>>> doesn't generate optimal code for e.g. > >>>> > >>>> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > >>>> > >>>> because, with the RotateLeft expanded to its full combination of > >>>> ORs and shifts, it's to complicated to match. At the time I put > >>>> this to one side because it wasn't urgent. This is a shame because > >>>> although such combinations are unusual they are used in some crypto > operations. > >>>> > >>>> If we can generate immediate-form rotate nodes early by pattern > >>>> matching during parsing (rather than depending on intrinsics) we'll > >>>> get more value than by depending on programmers calling intrinsics. > >>>> > >>>> -- > >>>> Andrew Haley (he/him) > >>>> Java Platform Lead Engineer > >>>> Red Hat UK Ltd. > >>>> https://keybase.io/andrewhaley > >>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > >>> From eric.caspole at oracle.com Wed Jul 29 15:06:01 2020 From: eric.caspole at oracle.com (eric.caspole at oracle.com) Date: Wed, 29 Jul 2020 11:06:01 -0400 Subject: RFR (S) - 8249663: LogCompilation cannot process log from o.r.scala.dotty.JmhDotty Message-ID: <72cabad4-ee7d-f045-b2f9-5969c58abb4a@oracle.com> Hi everyone, Could I get reviews on this bug fix to LogCompilation tool. There were actually 2 problems, the first is that "site" in LogParser is not reset after a , and so the next parse could misuse the stale site which led to the stack trace in the bug. The second problem appeared after I put the first fix in place, that can have an uncommon trap reason='intrinsic_or_type_checked_inlining' so we need to record the location of that to process the uncommon trap that will follow it. I tested this with logs from some jvm08's and a short run of all renaissance, and about 80 runs of the specific renaissance dotty that usually showed the first problem. JBS: https://bugs.openjdk.java.net/browse/JDK-8249663 webrev: http://cr.openjdk.java.net/~ecaspole/JDK-8249663/01/webrev/ Thanks, Eric From vladimir.kozlov at oracle.com Wed Jul 29 16:03:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2020 09:03:13 -0700 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent In-Reply-To: References: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> Message-ID: On 7/29/20 2:20 AM, Vladimir Ivanov wrote: >> Webrev: http://cr.openjdk.java.net/~jiefu/8250745/webrev.00/ > > Looks good. +1 > > FTR the bug was introduced by JDK-8241040, but I don't see a way it can be hit by auto-vectorizer: before it kicks in, > scalar code is strongly normalized and constants are pushed to the right. It leads to the shape where (Replicate -1) is > always the second input of bitwise NOT shape (XorV v (Replicate -1)). Since there are no GVN transformations happening > for vector nodes, both left-hand and right-hand variants become possible with Vector API. So it is difficult to write a test? Thanks, Vladimir K > > Best regards, > Vladimir Ivanov From vladimir.x.ivanov at oracle.com Wed Jul 29 16:18:39 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 19:18:39 +0300 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent In-Reply-To: References: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> Message-ID: >> FTR the bug was introduced by JDK-8241040, but I don't see a way it >> can be hit by auto-vectorizer: before it kicks in, scalar code is >> strongly normalized and constants are pushed to the right. It leads to >> the shape where (Replicate -1) is always the second input of bitwise >> NOT shape (XorV v (Replicate -1)). Since there are no GVN >> transformations happening for vector nodes, both left-hand and >> right-hand variants become possible with Vector API. > > So it is difficult to write a test? IMO there's no way to hit the bug without using Vector API. Best regards, Vladimir Ivanov From john.r.rose at oracle.com Wed Jul 29 16:48:00 2020 From: john.r.rose at oracle.com (John Rose) Date: Wed, 29 Jul 2020 09:48:00 -0700 Subject: Performance degradation due to probable (?) C2 issue In-Reply-To: References: <925401595926726@mail.yandex.ru> Message-ID: <5B377E17-952C-409B-98AB-2E6270A84185@oracle.com> On Jul 28, 2020, at 5:12 AM, Andrew Haley wrote: > > ? This isn't necessarily true, but it's a good guess. ? And that is the history of HotSpot heuristics, in a nutshell. From luhenry at microsoft.com Wed Jul 29 16:55:29 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 29 Jul 2020 16:55:29 +0000 Subject: Adding an Intrinsic for MD5 Message-ID: Hi, After doing profiling on some applications on Azure, I noticed that MD5 takes a significant time when verifying the content of large amount of downloaded data (see [1] for a flamegraph of some Spark operations pulling data from Azure Storage, look at the top most `Lsun/securitu/pro..` entry representing 11.68% of the samples). I then looked into the code generated for `sun.security.provider.MD5.implCompress` (the hottest method). I observed that the generated code contains many branches that are never taken and not even necessary (array-bound checks on a fixed sized array for which we already checked the size, for example). On top of that, MD5 doesn't require any (there are no conditions and no loops), making all these branches pure overhead. Accelerating MD5 will not be only beneficial to Azure workloads, but to anyone doing any sort of content hashing/verification with MD5 (which is quite unfortunate given the known flaws of MD5 and the availability of faster alternatives with greater cryptographical qualities). I worked last night on a prototype of an intrinsic, which I've uploaded at [2]. It's a very rough draft and I want to have your input before I invest further into it. As it is the first time I do such work (adding an intrinsic, generating assembly by hand, adding support for one instruction in the assembler), I'm still running into a crash and I am not sure how to debug it further. I would really appreciate any pointer on how I need to approach debugging such an issue, or even for an expert to look into my change and help me pinpoint what's going wrong. So far, I used the disassembly and hs_err*.log file to clearly see the generated code and the machine state at the time of the crash. I expect the problem to be around calling conventions and assumptions around the shape/content of the parameters. I'll keep debugging in the meantime. Thank you very much, -- Ludovic [1] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/flamegraph-45235.svg [2] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/webrev.00/ From vladimir.x.ivanov at oracle.com Wed Jul 29 17:14:17 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 20:14:17 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Message-ID: > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ Looks good. (Testing is in progress.) >> I'd prefer to see a uniform Ideal IR shape being used irrespective of >> whether the argument is a constant or not. It should also simplify the >> logic in SuperWord and make it easier to support on non-x86 architectures. >> >> For example, here's how it is done on AArch64: >> >> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >> predicate(n->as_Vector()->length() == 4); >> match(Set dst (LShiftVI src (LShiftCntV shift))); ... >> > > Graph shape has been made consistent, we could have also optimized the pattern for ARM port for > immediate shifts. Good. > I have removed RotateLeftNode/RotateRightNode::Ideal routines since we are anyways > doing constant folding in LShiftI/URShiftI value routines. Since JAVA rotate APIs are no longer > intrincified hence these routines may no longer be useful. Nice observation! Good. >> It would be really nice to migrate to MacroAssembler along the way (as a >> cleanup). > > I guess you are saying remove opcodes/encoding from patterns and move then to Assembler, > Can we take this cleanup activity separately since other patterns are also using these matcher > directives. I'm perfectly fine with handling it as a separate enhancement. > Other synthetic comments have been taken care of. I have extended the Test to cover all the newly > added scalar transforms. Kindly let me know if there other comments. Nice! Best regards, Vladimir Ivanov >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Friday, July 24, 2020 3:21 AM >> To: Bhateja, Jatin >> Cc: Viswanathan, Sandhya ; Andrew Haley >> ; hotspot-compiler-dev at openjdk.java.net >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 >> >> Hi Jatin, >> >>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >> >> Much better! Thanks. >> >>> Change Summary: >>> >>> 1) Unified the handling for scalar rotate operation. All scalar rotate >> selection patterns are now dependent on newly created >> RotateLeft/RotateRight nodes. This promotes rotate inferencing. Currently >> if DAG nodes corresponding to a sub-pattern are shared (have multiple >> users) then existing complex patterns based on Or/LShiftL/URShift does not >> get matched and this prevents inferring rotate nodes. Please refer to >> JIT'ed assembly output with baseline[1] and with patch[2] . We can see that >> generated code size also went done from 832 byte to 768 bytes. Also this >> can cause perf degradation if shift-or dependency chain appears inside a >> hot region. >>> >>> 2) Due to enhanced rotate inferencing new patch shows better performance >> even for legacy targets (non AVX-512). Please refer to the perf result[3] >> over AVX2 machine for JMH benchmark part of the patch. >> >> Very nice! >>> 3) As suggested, removed Java API intrinsification changes and scalar >> rotate transformation are done during OrI/OrL node idealizations. >> >> Good. >> >> (Still would be nice to factor the matching code from Ideal() and share it >> between multiple use sites. Especially considering OrVNode::Ideal() now >> does basically the same thing. As an example/idea, take a look at >> is_bmi_pattern() in x86.ad.) >> >>> 4) SLP always gets to work on new scalar Rotate nodes and creates vector >> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes if >> target does not supports vector rotates(non-AVX512). >> >> Good. >> >>> 5) Added new instruction patterns for vector shift Left/Right operations >> with constant shift operands. This prevents emitting extra moves to XMM. >> >> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >> + match(Set dst (LShiftVI src shift)); >> >> I'd prefer to see a uniform Ideal IR shape being used irrespective of >> whether the argument is a constant or not. It should also simplify the >> logic in SuperWord and make it easier to support on non-x86 architectures. >> >> For example, here's how it is done on AArch64: >> >> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >> predicate(n->as_Vector()->length() == 4); >> match(Set dst (LShiftVI src (LShiftCntV shift))); ... >> >>> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >> idealization, inferencing of vector rotate through OrV idealization covers >> the vector patterns generated though non SLP route i.e. VectorAPI. >> >> I'm fine with keeping OrV::Ideal(), but I'm concerned with the general >> direction here - duplication of scalar transformations to lane-wise vector >> operations. It definitely won't scale and in a longer run it risks to >> diverge. Would be nice to find a way to automatically "lift" >> scalar transformations to vectors and apply them uniformly. But right now >> it is just an idea which requires more experimentation. >> >> >> Some other minor comments/suggestions: >> >> + // Swap the computed left and right shift counts. >> + if (is_rotate_left) { >> + Node* temp = shiftRCnt; >> + shiftRCnt = shiftLCnt; >> + shiftLCnt = temp; >> + } >> >> Maybe use swap() here (declared in globalDefinitions.hpp)? >> >> >> + if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >> + return true; >> >> Please, don't omit curly braces (even for simple cases). >> >> >> -// Rotate Right by variable >> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 zero, >> rFlagsReg cr) >> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) >> %{ >> - match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero >> shift)))); >> - >> + predicate(!VM_Version::supports_bmi2() && >> n->bottom_type()->basic_type() == T_INT); >> + match(Set dst (RotateRight dst shift)); >> + format %{ "rorl $dst, $shift" %} >> expand %{ >> - rorI_rReg_CL(dst, shift, cr); >> + rorI_rReg_imm8(dst, shift, cr); >> %} >> >> It would be really nice to migrate to MacroAssembler along the way (as a >> cleanup). >> >>> Please push the patch through your testing framework and let me know your >> review feedback. >> >> There's one new assertion failure: >> >> # Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >> pid=5476, tid=6219 >> # assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize should >> return new nodes, use Identity to return old nodes >> >> I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal >> which can return pre-contructed constants. I suggest to get rid of >> Ideal() methods and move constant folding logic into Node::Value() (as >> implemented for other bitwise/arithmethic nodes in >> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic approach >> since it enables richer type information (ranges vs constants) and IMO it's >> more convenient to work with constants through Types than ConNodes. >> >> (I suspect that original/expanded IR shape may already provide more precise >> type info for non-constant case which can affect the benchmarks.) >> >> Best regards, >> Vladimir Ivanov >> >>> >>> Best Regards, >>> Jatin >>> >>> [1] >>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>> txt [2] >>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_asm >>> .txt [3] >>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_patc >>> h.txt >>> >>> >>>> -----Original Message----- >>>> From: Vladimir Ivanov >>>> Sent: Saturday, July 18, 2020 12:25 AM >>>> To: Bhateja, Jatin ; Andrew Haley >>>> >>>> Cc: Viswanathan, Sandhya ; >>>> hotspot-compiler- dev at openjdk.java.net >>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>>> X86 >>>> >>>> Hi Jatin, >>>> >>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>> >>>> It definitely looks better, but IMO it hasn't reached the sweet spot >> yet. >>>> It feels like the focus is on auto-vectorizer while the burden is put >>>> on scalar cases. >>>> >>>> First of all, considering GVN folds relevant operation patterns into >>>> a single Rotate node now, what's the motivation to introduce intrinsics? >>>> >>>> Another point is there's still significant duplication for scalar cases. >>>> >>>> I'd prefer to see the legacy cases which rely on pattern matching to >>>> go away and be substituted with instructions which match Rotate >>>> instructions (migrating ). >>>> >>>> I understand that it will penalize the vectorization implementation, >>>> but IMO reducing overall complexity is worth it. On auto-vectorizer >>>> side, I see >>>> 2 ways to fix it: >>>> >>>> (1) introduce additional AD instructions for >>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>> >>>> (2) in SuperWord::output(), when matcher doesn't support >>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), >>>> generate vectorized version of the original pattern. >>>> >>>> Overall, it looks like more and more focus is made on scalar part. >>>> Considering the main goal of the patch is to enable vectorization, >>>> I'm fine with separating cleanup of scalar part. As an interim >>>> solution, it seems that leaving the scalar part as it is now and >>>> matching scalar bit rotate pattern in VectorNode::is_rotate() should >>>> be enough to keep the vectorization part functioning. Then scalar >>>> Rotate nodes and relevant cleanups can be integrated later. (Or vice >>>> versa: clean up scalar part first and then follow up with >>>> vectorization.) >>>> >>>> Some other comments: >>>> >>>> * There's a lot of duplication between OrINode::Ideal and >> OrLNode::Ideal. >>>> What do you think about introducing a super type >>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>> >>>> >>>> * src/hotspot/cpu/x86/x86.ad >>>> >>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT >> || >>>> + n->bottom_type()->is_vect()->element_basic_type() == >>>> +T_LONG); >>>> >>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>> + predicate(n->bottom_type()->is_vect()->element_basic_type() == T_INT >> || >>>> + n->bottom_type()->is_vect()->element_basic_type() == >>>> +T_LONG); >>>> >>>> The predicates are redundant here. >>>> >>>> >>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>> >>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, >>>> XMMRegister dst, XMMRegister src, >>>> + int shift, int vector_len) { >>>> + if (opcode == Op_RotateLeftV) { >>>> + if (etype == T_INT) { >>>> + evprold(dst, src, shift, vector_len); >>>> + } else { >>>> + evprolq(dst, src, shift, vector_len); >>>> + } >>>> >>>> Please, put an assert for the false case (assert(etype == T_LONG, >> "...")). >>>> >>>> >>>> * On testing (with previous version of the patch): -XX:UseAVX is x86- >>>> specific flag, so new/adjusted tests now fail on non-x86 platforms. >>>> Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions >>>> will solve the issue. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>>> >>>>> >>>>> Summary of changes: >>>>> 1) Optimization is specifically targeted to exploit vector rotation >>>> instruction added for X86 AVX512. A single rotate instruction >>>> encapsulates entire vector OR/SHIFTs pattern thus offers better >>>> latency at reduced instruction count. >>>>> >>>>> 2) There were two approaches to implement this: >>>>> a) Let everything remain the same and add new wide complex >>>> instruction patterns in the matcher for e.g. >>>>> set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI >>>>> shift)) >>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate >>>> shift)) >>>>> It would have been an overoptimistic assumption to expect that >>>>> graph >>>> shape would be preserved till the matcher for correct inferencing. >>>>> In addition we would have required multiple such bulky patterns. >>>>> b) Create new RotateLeft/RotateRight scalar nodes, these gets >>>> generated during intrinsification as well as during additional >>>> pattern >>>>> matching during node Idealization, later on these nodes are >>>>> consumed >>>> by SLP for valid vectorization scenarios to emit their vector >>>>> counterparts which eventually emits vector rotates. >>>>> >>>>> 3) I choose approach 2b) since its cleaner, only problem here was >>>>> that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should >>>>> either be >>>> dismantled back to OR/SHIFT pattern or we penalize the vectorization >>>> which would be very costly, other option would have been to add >>>> additional vector rotate pattern for UseAVX=3 in the matcher which >>>> emit vector OR-SHIFTs instruction but then it will loose on emitting >>>> efficient instruction sequence which node sharing >>>> (OrV/LShiftV/URShift) offer in current implementation - thus it will >>>> not be beneficial for non-AVX512 targets, only saving will be in >>>> terms of cleanup of few existing scalar rotate matcher patterns, also >>>> old targets does not offer this powerful rotate instruction. >>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>> >>>>> As per suggestions constant folding scenarios have been covered >>>>> during >>>> Idealizations of newly added scalar nodes. >>>>> >>>>> Please review the latest version and share your feedback and test >>>> results. >>>>> >>>>> Best Regards, >>>>> Jatin >>>>> >>>>> >>>>>> -----Original Message----- >>>>>> From: Andrew Haley >>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>> To: Vladimir Ivanov ; Bhateja, Jatin >>>>>> ; hotspot-compiler-dev at openjdk.java.net >>>>>> Cc: Viswanathan, Sandhya >>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification >>>>>> for >>>>>> X86 >>>>>> >>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>> >>>>>> > High-level comment: so far, there were no pressing need in > >>>>>> explicitly marking the methods as intrinsics. ROR/ROL instructions >>>>>>> were selected during matching [1]. Now the patch introduces > >>>>>> dedicated nodes >>>>>> (RotateLeft/RotateRight) specifically for intrinsics > which >>>>>> partly duplicates existing logic. >>>>>> >>>>>> The lack of rotate nodes in the IR has always meant that AArch64 >>>>>> doesn't generate optimal code for e.g. >>>>>> >>>>>> (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>> >>>>>> because, with the RotateLeft expanded to its full combination of >>>>>> ORs and shifts, it's to complicated to match. At the time I put >>>>>> this to one side because it wasn't urgent. This is a shame because >>>>>> although such combinations are unusual they are used in some crypto >> operations. >>>>>> >>>>>> If we can generate immediate-form rotate nodes early by pattern >>>>>> matching during parsing (rather than depending on intrinsics) we'll >>>>>> get more value than by depending on programmers calling intrinsics. >>>>>> >>>>>> -- >>>>>> Andrew Haley (he/him) >>>>>> Java Platform Lead Engineer >>>>>> Red Hat UK Ltd. >>>>>> https://keybase.io/andrewhaley >>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>> From vladimir.x.ivanov at oracle.com Wed Jul 29 17:15:35 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 20:15:35 +0300 Subject: RFR (S) - 8249663: LogCompilation cannot process log from o.r.scala.dotty.JmhDotty In-Reply-To: <72cabad4-ee7d-f045-b2f9-5969c58abb4a@oracle.com> References: <72cabad4-ee7d-f045-b2f9-5969c58abb4a@oracle.com> Message-ID: <9013c6c8-563d-b220-4cc9-f846df76fd2e@oracle.com> > http://cr.openjdk.java.net/~ecaspole/JDK-8249663/01/webrev/ Looks good. Best regards, Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Jul 29 18:17:25 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2020 11:17:25 -0700 Subject: RFR (S) - 8249663: LogCompilation cannot process log from o.r.scala.dotty.JmhDotty In-Reply-To: <9013c6c8-563d-b220-4cc9-f846df76fd2e@oracle.com> References: <72cabad4-ee7d-f045-b2f9-5969c58abb4a@oracle.com> <9013c6c8-563d-b220-4cc9-f846df76fd2e@oracle.com> Message-ID: <8df2e1bc-d27b-1de2-10e6-d9d3b0ec1532@oracle.com> +1 Thanks, Vladimir K On 7/29/20 10:15 AM, Vladimir Ivanov wrote: > >> http://cr.openjdk.java.net/~ecaspole/JDK-8249663/01/webrev/ > > Looks good. > > Best regards, > Vladimir Ivanov From sandhya.viswanathan at intel.com Wed Jul 29 18:19:14 2020 From: sandhya.viswanathan at intel.com (Viswanathan, Sandhya) Date: Wed, 29 Jul 2020 18:19:14 +0000 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: Hi, Likewise, the corresponding x86 backend changes since first review are also only minor cleanups and simple bug fixes: X86: Full: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.01/ Incremental: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00-webrev.01/ Summary: - rebased to jdk/jdk tip; - backend changes related to removal of NotV, VLShiftV, VRShiftV, VURShiftV nodes; - vector insert bug fix - some minor cleanups Older webrev links for your reference: X86b backend: http://cr.openjdk.java.net/~sviswanathan/VAPI_RFR/x86_webrev/webrev.00/ Best Regards, Sandhya -----Original Message----- From: Vladimir Ivanov Sent: Tuesday, July 28, 2020 3:30 PM To: hotspot-dev ; hotspot compiler Cc: Viswanathan, Sandhya ; panama-dev Subject: Re: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes Hi, Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and Ekaterina! Here are the latest changes for Vector API support in HotSpot shared code: http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 Incremental changes (diff against webrev.00): http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 I decided to post it here and not initiate a new round of reviews because the changes are mostly limited to minor cleanups / simple bug fixes. Detailed summary: - rebased to jdk/jdk tip; - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; - restore lazy cleanup logic during incremental inlining (see needs_cleanup in compile.cpp); - got rid of x86-specific changes in shared code; - fix for 8244867 [1]; - fix Graal test failure: enumerate VectorSupport intrinsics in CheckGraalIntrinsics - numerous minor cleanups Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 http://jbs.oracle.com/browse/JDK-8244867 8244867: 2 vector api tests crash with assert(is_reference_type(basic_type())) failed: wrong type Summary: Adding safety checks to prevent intrinsification if class arguments of non-primitive types are uninitialized. On 04.04.2020 02:12, Vladimir Ivanov wrote: > Hi, > > Following up on review requests of API [0] and Java implementation [1] > for Vector API (JEP 338 [2]), here's a request for review of general > HotSpot changes (in shared code) required for supporting the API: > > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/all.00-03/ > > > (First of all, to set proper expectations: since the JEP is still in > Candidate state, the intention is to initiate preliminary round(s) of > review to inform the community and gather feedback before sending out > final/official RFRs once the JEP is Targeted to a release.) > > Vector API (being developed in Project Panama [3]) relies on JVM > support to utilize optimal vector hardware instructions at runtime. It > interacts with JVM through intrinsics (declared in > jdk.internal.vm.vector.VectorSupport [4]) which expose vector > operations support in C2 JIT-compiler. > > As Paul wrote earlier: "A vector intrinsic is an internal low-level > vector operation. The last argument to the intrinsic is fall back > behavior in Java, implementing the scalar operation over the number of > elements held by the vector.? Thus, If the intrinsic is not supported > in > C2 for the other arguments then the Java implementation is executed > (the Java implementation is always executed when running in the > interpreter or for C1)." > > The rest of JVM support is about aggressively optimizing vector boxes > to minimize (ideally eliminate) the overhead of boxing for vector values. > It's a stop-the-gap solution for vector box elimination problem until > inline classes arrive. Vector classes are value-based and in the > longer term will be migrated to inline classes once the support becomes available. > > Vector API talk from JVMLS'18 [5] contains brief overview of JVM > implementation and some details. > > Complete implementation resides in vector-unstable branch of > panama/dev repository [6]. > > Now to gory details (the patch is split in multiple "sub-webrevs"): > > =========================================================== > > (1) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/00.backend.shared/ > > > Ideal vector nodes for new operations introduced by Vector API. > > (Platform-specific back end support will be posted for review separately). > > =========================================================== > > (2) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/01.intrinsics/ > > > JVM Java interface (VectorSupport) and intrinsic support in C2. > > Vector instances are initially represented as VectorBox macro nodes > and "unboxing" is represented by VectorUnbox node. It simplifies > vector box elimination analysis and the nodes are expanded later right before EA pass. > > Vectors have 2-level on-heap representation: for the vector value > primitive array is used as a backing storage and it is encapsulated in > a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains a > int[8] instance which is used to store vector value). > > Unless VectorBox node goes away, it needs to be expanded into an > allocation eventually, but it is a pure node and doesn't have any JVM > state associated with it. The problem is solved by keeping JVM state > separately in a VectorBoxAllocate node associated with VectorBox node > and use it during expansion. > > Also, to simplify vector box elimination, inlining of vector reboxing > calls (VectorSupport::maybeRebox) is delayed until the analysis is over. > > =========================================================== > > (3) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/02.vbox_elimination/ > > > Vector box elimination analysis implementation. (Brief overview: > slides > #36-42 [5].) > > The main part is devoted to scalarization across safepoints and > rematerialization support during deoptimization. In C2-generated code > vector operations work with raw vector values which live in registers > or spilled on the stack and it allows to avoid boxing/unboxing when a > vector value is alive across a safepoint. As with other values, > there's just a location of the vector value at the safepoint and > vector type information recorded in the relevant nmethod metadata and > all the heavy-lifting happens only when rematerialization takes place. > > The analysis preserves object identity invariants except during > aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). > > (Aggressive reboxing is crucial for cases when vectors "escape": it > allocates a fresh instance at every escape point thus enabling > original instance to go away.) > > =========================================================== > > (4) > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/03.module.hotspot/ > > > HotSpot changes for jdk.incubator.vector module. Vector support is > makred experimental and turned off by default. JEP 338 proposes the > API to be released as an incubator module, so a user has to specify > "--add-module jdk.incubator.vector" on the command line to be able to > use it. > When user does that, JVM automatically enables Vector API support. > It improves usability (user doesn't need to separately "open" the API > and enable JVM support) while minimizing risks of destabilitzation > from new code when the API is not used. > > > That's it! Will be happy to answer any questions. > > And thanks in advance for any feedback! > > Best regards, > Vladimir Ivanov > > [0] > https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/06534 > 5.html > > > [1] > https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228. > html > > [2] https://openjdk.java.net/jeps/338 > > [3] https://openjdk.java.net/projects/panama/ > > [4] > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shar > ed/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm > /vector/VectorSupport.java.html > > > [5] > http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf > > [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 > > ??? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b > vector-unstable From luhenry at microsoft.com Wed Jul 29 19:13:32 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Wed, 29 Jul 2020 19:13:32 +0000 Subject: Adding an Intrinsic for MD5 In-Reply-To: References: Message-ID: To add some more information, I've uploaded one of the `hs_err_pid*.log` file at [1]. -- Ludovic [1] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/hs_err_pid28286.log -----Original Message----- From: hotspot-compiler-dev On Behalf Of Ludovic Henry Sent: Wednesday, July 29, 2020 9:55 AM To: hotspot-compiler-dev at openjdk.java.net Subject: Adding an Intrinsic for MD5 Hi, After doing profiling on some applications on Azure, I noticed that MD5 takes a significant time when verifying the content of large amount of downloaded data (see [1] for a flamegraph of some Spark operations pulling data from Azure Storage, look at the top most `Lsun/securitu/pro..` entry representing 11.68% of the samples). I then looked into the code generated for `sun.security.provider.MD5.implCompress` (the hottest method). I observed that the generated code contains many branches that are never taken and not even necessary (array-bound checks on a fixed sized array for which we already checked the size, for example). On top of that, MD5 doesn't require any (there are no conditions and no loops), making all these branches pure overhead. Accelerating MD5 will not be only beneficial to Azure workloads, but to anyone doing any sort of content hashing/verification with MD5 (which is quite unfortunate given the known flaws of MD5 and the availability of faster alternatives with greater cryptographical qualities). I worked last night on a prototype of an intrinsic, which I've uploaded at [2]. It's a very rough draft and I want to have your input before I invest further into it. As it is the first time I do such work (adding an intrinsic, generating assembly by hand, adding support for one instruction in the assembler), I'm still running into a crash and I am not sure how to debug it further. I would really appreciate any pointer on how I need to approach debugging such an issue, or even for an expert to look into my change and help me pinpoint what's going wrong. So far, I used the disassembly and hs_err*.log file to clearly see the generated code and the machine state at the time of the crash. I expect the problem to be around calling conventions and assumptions around the shape/content of the parameters. I'll keep debugging in the meantime. Thank you very much, -- Ludovic [1] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fflamegraph-45235.svg&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858391072&sdata=1rNcCYW29l4KZPjpXT1%2F3nSWma3%2F83rXaIwNsw9s1GM%3D&reserved=0 [2] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858401068&sdata=014gBkFRpgC4QT6U0Zp4%2FKSI0qv0g3fXEJ4YL12bDX0%3D&reserved=0 From igor.ignatyev at oracle.com Wed Jul 29 19:34:02 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 29 Jul 2020 12:34:02 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW Message-ID: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 > 5 lines changed: 0 ins; 4 del; 1 mod; Hi all, could you please review this patch? from JBS: > "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. webrev: JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 -- Igor From ekaterina.pavlova at oracle.com Wed Jul 29 19:44:32 2020 From: ekaterina.pavlova at oracle.com (Ekaterina Pavlova) Date: Wed, 29 Jul 2020 12:44:32 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW In-Reply-To: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> References: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> Message-ID: <1948381d-5e52-1f6d-2ad1-a8e445bdea5a@oracle.com> Looks good. -katya On 7/29/20 12:34 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 >> 5 lines changed: 0 ins; 4 del; 1 mod; > > Hi all, > > could you please review this patch? > from JBS: >> "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. > > besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. > > webrev: > JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 > > -- Igor > From dean.long at oracle.com Wed Jul 29 19:48:01 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 29 Jul 2020 12:48:01 -0700 Subject: [15] RFR(XS) 8248597: [Graal] api/java_security/SignatureSpi/DelegationTests.html fails with Method "javasoft.sqe.tests.api.java.security.SignatureSpi.JCKSignatureSpi.clear" doesn't exist. Message-ID: <688dfe7e-c8ad-3c37-d2f7-432609f45e2f@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8248597 http://cr.openjdk.java.net/~dlong/8248597/webrev/ This change fixes an issue with frame states in Graal that was causing a JCK test to fail.? The fix is from Tom Rodriguez.? This change has already been reviewed and merged into Graal.? A new unit test was added to detect the problem.? After this is reviewed I'll be requesting permission to push this into 15. dl From vladimir.x.ivanov at oracle.com Wed Jul 29 20:08:05 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Wed, 29 Jul 2020 23:08:05 +0300 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent In-Reply-To: References: <6E0374A1-1E57-4FF3-A8B0-BB605E5E8F68@tencent.com> Message-ID: FYI test results are clean (hs-precheckin-comp,hs-tier1,hs-tier2). Best regards, Vladimir Ivanov On 29.07.2020 12:20, Vladimir Ivanov wrote: >> Webrev: http://cr.openjdk.java.net/~jiefu/8250745/webrev.00/ > > Looks good. > > FTR the bug was introduced by JDK-8241040, but I don't see a way it can > be hit by auto-vectorizer: before it kicks in, scalar code is strongly > normalized and constants are pushed to the right. It leads to the shape > where (Replicate -1) is always the second input of bitwise NOT shape > (XorV v (Replicate -1)). Since there are no GVN transformations > happening for vector nodes, both left-hand and right-hand variants > become possible with Vector API. > > Best regards, > Vladimir Ivanov From vladimir.kozlov at oracle.com Wed Jul 29 20:49:20 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2020 13:49:20 -0700 Subject: [15] RFR(XS) 8248597: [Graal] api/java_security/SignatureSpi/DelegationTests.html fails with Method "javasoft.sqe.tests.api.java.security.SignatureSpi.JCKSignatureSpi.clear" doesn't exist. In-Reply-To: <688dfe7e-c8ad-3c37-d2f7-432609f45e2f@oracle.com> References: <688dfe7e-c8ad-3c37-d2f7-432609f45e2f@oracle.com> Message-ID: Looks good. Thanks, Vladimir On 7/29/20 12:48 PM, Dean Long wrote: > https://bugs.openjdk.java.net/browse/JDK-8248597 > http://cr.openjdk.java.net/~dlong/8248597/webrev/ > > This change fixes an issue with frame states in Graal that was causing a JCK test to fail.? The fix is from Tom > Rodriguez.? This change has already been reviewed and merged into Graal.? A new unit test was added to detect the > problem.? After this is reviewed I'll be requesting permission to push this into 15. > > dl From dean.long at oracle.com Wed Jul 29 20:52:43 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 29 Jul 2020 13:52:43 -0700 Subject: [15] RFR(XS) 8248597: [Graal] api/java_security/SignatureSpi/DelegationTests.html fails with Method "javasoft.sqe.tests.api.java.security.SignatureSpi.JCKSignatureSpi.clear" doesn't exist. In-Reply-To: References: <688dfe7e-c8ad-3c37-d2f7-432609f45e2f@oracle.com> Message-ID: Thanks Vladimir. dl On 7/29/20 1:49 PM, Vladimir Kozlov wrote: > Looks good. > > Thanks, > Vladimir > > On 7/29/20 12:48 PM, Dean Long wrote: >> https://bugs.openjdk.java.net/browse/JDK-8248597 >> http://cr.openjdk.java.net/~dlong/8248597/webrev/ >> >> This change fixes an issue with frame states in Graal that was >> causing a JCK test to fail.? The fix is from Tom Rodriguez. This >> change has already been reviewed and merged into Graal.? A new unit >> test was added to detect the problem.? After this is reviewed I'll be >> requesting permission to push this into 15. >> >> dl From vladimir.kozlov at oracle.com Wed Jul 29 20:54:31 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2020 13:54:31 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW In-Reply-To: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> References: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> Message-ID: <18e52d8d-afe9-7ccb-1d3a-ae1b37a2b8d8@oracle.com> Igor, You missed reference in should_wait_for_compilation(). Thanks, Vladimir K On 7/29/20 12:34 PM, Igor Ignatyev wrote: > http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 >> 5 lines changed: 0 ins; 4 del; 1 mod; > > Hi all, > > could you please review this patch? > from JBS: >> "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. > > besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. > > webrev: > JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 > > -- Igor > From xxinliu at amazon.com Wed Jul 29 20:55:53 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Wed, 29 Jul 2020 20:55:53 +0000 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> , Message-ID: <1596056152748.75196@amazon.com> hi, Volker and Tobias, Here is a new revision. http://cr.openjdk.java.net/~xliu/8249809/02/webrev/ 1. This one add comments about this smart pointer and fix the formation issue. 2. Thanks to point me out a new document of hotspot code style. Since it has updated to -std=c++14, I change all NULL to nullptr. 3. I also add NON_COPYABLE because it's not intended to be copied. DirectiveSetPtr is just a thin wrapper of the raw pointer. if users only use it to read, nothing will be cloned. It simply goes through. thanks, --lx ________________________________________ From: Volker Simonis Sent: Wednesday, July 29, 2020 7:34 AM To: Tobias Hartmann Cc: Liu, Xin; hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Wed, Jul 29, 2020 at 9:38 AM Tobias Hartmann wrote: > > Hi Xin, > > On 28.07.20 22:56, Liu, Xin wrote: > > http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ > > Overall looks good to me. > > Some style comments: > - Add a comment to 'DirectiveSetPtr' to describe its purpose > - Why not put the "cloned" logic in "operator->"? Because there's also a "read-only" access of the DirectiveSetPtr which doesn't mutate its content and therefore should clone the underlying DirectiveSet. See my first mail where I proposed to add a second, `const`-version of "operator->". But that still required const casts in the places where we didn't want to clone. I've therefore voted for the new "cloned()" method which makes cloning and mutating explicit and which is much easier to understand from my point of view (compared to two overloaded operators). > - Do not use the _clone pointer as boolean (see "Miscellaneous" section in the style guide [1]) > - Indentation in line 301-303 is wrong > - Line 306 use brackets around the "else" and move it one line up "} else {" > > Best regards, > Tobias > > [1] https://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/hotspot-style.html From vladimir.x.ivanov at oracle.com Wed Jul 29 22:00:26 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 30 Jul 2020 01:00:26 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> Message-ID: <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> >> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ > > Looks good. (Testing is in progress.) FYI test results are clean (tier1-tier5). >> I have removed RotateLeftNode/RotateRightNode::Ideal routines since we >> are anyways >> doing constant folding in LShiftI/URShiftI value routines. Since JAVA >> rotate APIs are no longer >> intrincified hence these routines may no longer be useful. > > Nice observation! Good. As a second thought, it seems there's still a chance left that Rotate nodes get their input type narrowed after the folding happened. For example, as a result of incremental inlining or CFG transformations during loop optimizations. And it does happen in practice since the testing revealed some crashes due to the bug in RotateLeftNode/RotateRightNode::Ideal(). So, it makes sense to keep the transformations. But I'm fine with addressing that as a followup enhancement. Best regards, Vladimir Ivanov > >>> It would be really nice to migrate to MacroAssembler along the way (as a >>> cleanup). >> >> I guess you are saying remove opcodes/encoding from patterns and move >> then to Assembler, >> Can we take this cleanup activity separately since other patterns are >> also using these matcher >> directives. > > I'm perfectly fine with handling it as a separate enhancement. > >> Other synthetic comments have been taken care of. I have extended the >> Test to cover all the newly >> added scalar transforms. Kindly let me know if there other comments. > > Nice! > > Best regards, > Vladimir Ivanov > >>> -----Original Message----- >>> From: Vladimir Ivanov >>> Sent: Friday, July 24, 2020 3:21 AM >>> To: Bhateja, Jatin >>> Cc: Viswanathan, Sandhya ; Andrew Haley >>> ; hotspot-compiler-dev at openjdk.java.net >>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 >>> >>> Hi Jatin, >>> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >>> >>> Much better! Thanks. >>> >>>> Change Summary: >>>> >>>> 1) Unified the handling for scalar rotate operation. All scalar rotate >>> selection patterns are now dependent on newly created >>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. >>> Currently >>> if DAG nodes corresponding to a sub-pattern are shared (have multiple >>> users) then existing complex patterns based on Or/LShiftL/URShift >>> does not >>> get matched and this prevents inferring rotate nodes. Please refer to >>> JIT'ed assembly output with baseline[1] and with patch[2] . We can >>> see that >>> generated code size also went done from 832 byte to 768 bytes. Also this >>> can cause perf degradation if shift-or dependency chain appears inside a >>> hot region. >>>> >>>> 2) Due to enhanced rotate inferencing new patch shows better >>>> performance >>> even for legacy targets (non AVX-512). Please refer to the perf >>> result[3] >>> over AVX2 machine for JMH benchmark part of the patch. >>> >>> Very nice! >>>> 3) As suggested, removed Java API intrinsification changes and scalar >>> rotate transformation are done during OrI/OrL node idealizations. >>> >>> Good. >>> >>> (Still would be nice to factor the matching code from Ideal() and >>> share it >>> between multiple use sites. Especially considering OrVNode::Ideal() now >>> does basically the same thing. As an example/idea, take a look at >>> is_bmi_pattern() in x86.ad.) >>> >>>> 4) SLP always gets to work on new scalar Rotate nodes and creates >>>> vector >>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes if >>> target does not supports vector rotates(non-AVX512). >>> >>> Good. >>> >>>> 5) Added new instruction patterns for vector shift Left/Right >>>> operations >>> with constant shift operands. This prevents emitting extra moves to XMM. >>> >>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >>> +? match(Set dst (LShiftVI src shift)); >>> >>> I'd prefer to see a uniform Ideal IR shape being used irrespective of >>> whether the argument is a constant or not. It should also simplify the >>> logic in SuperWord and make it easier to support on non-x86 >>> architectures. >>> >>> For example, here's how it is done on AArch64: >>> >>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >>> ??? predicate(n->as_Vector()->length() == 4); >>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... >>> >>>> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >>> idealization, inferencing of vector rotate through OrV idealization >>> covers >>> the vector patterns generated though non SLP route i.e. VectorAPI. >>> >>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the general >>> direction here - duplication of scalar transformations to lane-wise >>> vector >>> operations. It definitely won't scale and in a longer run it risks to >>> diverge. Would be nice to find a way to automatically "lift" >>> scalar transformations to vectors and apply them uniformly. But right >>> now >>> it is just an idea which requires more experimentation. >>> >>> >>> Some other minor comments/suggestions: >>> >>> +? // Swap the computed left and right shift counts. >>> +? if (is_rotate_left) { >>> +??? Node* temp = shiftRCnt; >>> +??? shiftRCnt? = shiftLCnt; >>> +??? shiftLCnt? = temp; >>> +? } >>> >>> Maybe use swap() here (declared in globalDefinitions.hpp)? >>> >>> >>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >>> +??? return true; >>> >>> Please, don't omit curly braces (even for simple cases). >>> >>> >>> -// Rotate Right by variable >>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 zero, >>> rFlagsReg cr) >>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) >>> ?? %{ >>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero >>> shift)))); >>> - >>> +? predicate(!VM_Version::supports_bmi2() && >>> n->bottom_type()->basic_type() == T_INT); >>> +? match(Set dst (RotateRight dst shift)); >>> +? format %{ "rorl???? $dst, $shift" %} >>> ???? expand %{ >>> -??? rorI_rReg_CL(dst, shift, cr); >>> +??? rorI_rReg_imm8(dst, shift, cr); >>> ???? %} >>> >>> It would be really nice to migrate to MacroAssembler along the way (as a >>> cleanup). >>> >>>> Please push the patch through your testing framework and let me know >>>> your >>> review feedback. >>> >>> There's one new assertion failure: >>> >>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >>> pid=5476, tid=6219 >>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize should >>> return new nodes, use Identity to return old nodes >>> >>> I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal >>> which can return pre-contructed constants. I suggest to get rid of >>> Ideal() methods and move constant folding logic into Node::Value() (as >>> implemented for other bitwise/arithmethic nodes in >>> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic approach >>> since it enables richer type information (ranges vs constants) and >>> IMO it's >>> more convenient to work with constants through Types than ConNodes. >>> >>> (I suspect that original/expanded IR shape may already provide more >>> precise >>> type info for non-constant case which can affect the benchmarks.) >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>> >>>> Best Regards, >>>> Jatin >>>> >>>> [1] >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>>> txt [2] >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_asm >>>> .txt [3] >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_patc >>>> h.txt >>>> >>>> >>>>> -----Original Message----- >>>>> From: Vladimir Ivanov >>>>> Sent: Saturday, July 18, 2020 12:25 AM >>>>> To: Bhateja, Jatin ; Andrew Haley >>>>> >>>>> Cc: Viswanathan, Sandhya ; >>>>> hotspot-compiler- dev at openjdk.java.net >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>>>> X86 >>>>> >>>>> Hi Jatin, >>>>> >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>>> >>>>> It definitely looks better, but IMO it hasn't reached the sweet spot >>> yet. >>>>> It feels like the focus is on auto-vectorizer while the burden is put >>>>> on scalar cases. >>>>> >>>>> First of all, considering GVN folds relevant operation patterns into >>>>> a single Rotate node now, what's the motivation to introduce >>>>> intrinsics? >>>>> >>>>> Another point is there's still significant duplication for scalar >>>>> cases. >>>>> >>>>> I'd prefer to see the legacy cases which rely on pattern matching to >>>>> go away and be substituted with instructions which match Rotate >>>>> instructions (migrating ). >>>>> >>>>> I understand that it will penalize the vectorization implementation, >>>>> but IMO reducing overall complexity is worth it. On auto-vectorizer >>>>> side, I see >>>>> 2 ways to fix it: >>>>> >>>>> ???? (1) introduce additional AD instructions for >>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>>> >>>>> ???? (2) in SuperWord::output(), when matcher doesn't support >>>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), >>>>> generate vectorized version of the original pattern. >>>>> >>>>> Overall, it looks like more and more focus is made on scalar part. >>>>> Considering the main goal of the patch is to enable vectorization, >>>>> I'm fine with separating cleanup of scalar part. As an interim >>>>> solution, it seems that leaving the scalar part as it is now and >>>>> matching scalar bit rotate pattern in VectorNode::is_rotate() should >>>>> be enough to keep the vectorization part functioning. Then scalar >>>>> Rotate nodes and relevant cleanups can be integrated later. (Or vice >>>>> versa: clean up scalar part first and then follow up with >>>>> vectorization.) >>>>> >>>>> Some other comments: >>>>> >>>>> * There's a lot of duplication between OrINode::Ideal and >>> OrLNode::Ideal. >>>>> What do you think about introducing a super type >>>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>>> >>>>> >>>>> * src/hotspot/cpu/x86/x86.ad >>>>> >>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == >>>>> T_INT >>> || >>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == >>>>> +T_LONG); >>>>> >>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == >>>>> T_INT >>> || >>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == >>>>> +T_LONG); >>>>> >>>>> The predicates are redundant here. >>>>> >>>>> >>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>>> >>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, >>>>> XMMRegister dst, XMMRegister src, >>>>> +???????????????????????????????????? int shift, int vector_len) { >>>>> + if (opcode == Op_RotateLeftV) { >>>>> +??? if (etype == T_INT) { >>>>> +????? evprold(dst, src, shift, vector_len); >>>>> +??? } else { >>>>> +????? evprolq(dst, src, shift, vector_len); >>>>> +??? } >>>>> >>>>> Please, put an assert for the false case (assert(etype == T_LONG, >>> "...")). >>>>> >>>>> >>>>> * On testing (with previous version of the patch): -XX:UseAVX is x86- >>>>> specific flag, so new/adjusted tests now fail on non-x86 platforms. >>>>> Either omitting the flag or adding -XX:+IgnoreUnrecognizedVMOptions >>>>> will solve the issue. >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> >>>>>> >>>>>> Summary of changes: >>>>>> 1) Optimization is specifically targeted to exploit vector rotation >>>>> instruction added for X86 AVX512. A single rotate instruction >>>>> encapsulates entire vector OR/SHIFTs pattern thus offers better >>>>> latency at reduced instruction count. >>>>>> >>>>>> 2) There were two approaches to implement this: >>>>>> ?????? a)? Let everything remain the same and add new wide complex >>>>> instruction patterns in the matcher for e.g. >>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary ReplicateI >>>>>> shift)) >>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate >>>>> shift)) >>>>>> ?????? It would have been an overoptimistic assumption to expect that >>>>>> graph >>>>> shape would be preserved till the matcher for correct inferencing. >>>>>> ?????? In addition we would have required multiple such bulky >>>>>> patterns. >>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, these gets >>>>> generated during intrinsification as well as during additional >>>>> pattern >>>>>> ?????? matching during node Idealization, later on these nodes are >>>>>> consumed >>>>> by SLP for valid vectorization scenarios to emit their vector >>>>>> ?????? counterparts which eventually emits vector rotates. >>>>>> >>>>>> 3) I choose approach 2b) since its cleaner, only problem here was >>>>>> that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should >>>>>> either be >>>>> dismantled back to OR/SHIFT pattern or we penalize the vectorization >>>>> which would be very costly, other option would have been to add >>>>> additional vector rotate pattern for UseAVX=3 in the matcher which >>>>> emit vector OR-SHIFTs instruction but then it will loose on emitting >>>>> efficient instruction sequence which node sharing >>>>> (OrV/LShiftV/URShift) offer in current implementation - thus it will >>>>> not be beneficial for non-AVX512 targets, only saving will be in >>>>> terms of cleanup of few existing scalar rotate matcher patterns, also >>>>> old targets does not offer this powerful rotate instruction. >>>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>>> >>>>>> As per suggestions constant folding scenarios have been covered >>>>>> during >>>>> Idealizations of newly added scalar nodes. >>>>>> >>>>>> Please review the latest version and share your feedback and test >>>>> results. >>>>>> >>>>>> Best Regards, >>>>>> Jatin >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Andrew Haley >>>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>>> To: Vladimir Ivanov ; Bhateja, Jatin >>>>>>> ; hotspot-compiler-dev at openjdk.java.net >>>>>>> Cc: Viswanathan, Sandhya >>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification >>>>>>> for >>>>>>> X86 >>>>>>> >>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>>> >>>>>>> ??? > High-level comment: so far, there were no pressing need in? > >>>>>>> explicitly marking the methods as intrinsics. ROR/ROL instructions >>>>>>>> were selected during matching [1]. Now the patch introduces? > >>>>>>> dedicated nodes >>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > which >>>>>>> partly duplicates existing logic. >>>>>>> >>>>>>> The lack of rotate nodes in the IR has always meant that AArch64 >>>>>>> doesn't generate optimal code for e.g. >>>>>>> >>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>>> >>>>>>> because, with the RotateLeft expanded to its full combination of >>>>>>> ORs and shifts, it's to complicated to match. At the time I put >>>>>>> this to one side because it wasn't urgent. This is a shame because >>>>>>> although such combinations are unusual they are used in some crypto >>> operations. >>>>>>> >>>>>>> If we can generate immediate-form rotate nodes early by pattern >>>>>>> matching during parsing (rather than depending on intrinsics) we'll >>>>>>> get more value than by depending on programmers calling intrinsics. >>>>>>> >>>>>>> -- >>>>>>> Andrew Haley? (he/him) >>>>>>> Java Platform Lead Engineer >>>>>>> Red Hat UK Ltd. >>>>>>> https://keybase.io/andrewhaley >>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>>> From igor.ignatyev at oracle.com Wed Jul 29 22:24:05 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 29 Jul 2020 15:24:05 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW In-Reply-To: <18e52d8d-afe9-7ccb-1d3a-ae1b37a2b8d8@oracle.com> References: <2C1589D3-2FEC-411A-8CC2-DF184593BD25@oracle.com> <18e52d8d-afe9-7ccb-1d3a-ae1b37a2b8d8@oracle.com> Message-ID: <87C23499-6F1E-4B89-BFF6-CC9269C1C279@oracle.com> oopsie, removed, http://cr.openjdk.java.net/~iignatyev//8250797/webrev.01 : > diff -r e5afd04596e7 src/hotspot/share/compiler/compileTask.hpp > --- a/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:31 2020 -0700 > +++ b/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:53 2020 -0700 > @@ -133,7 +133,6 @@ > bool should_wait_for_compilation() const { > // Wait for blocking compilation to finish. > switch (_compile_reason) { > - case Reason_CTW: > case Reason_Replay: > case Reason_Whitebox: > case Reason_Bootstrap: thanks for noticing. for luck, I've started builds-tier1 job. -- Igor > On Jul 29, 2020, at 1:54 PM, Vladimir Kozlov wrote: > > Igor, > > You missed reference in should_wait_for_compilation(). > > Thanks, > Vladimir K > > On 7/29/20 12:34 PM, Igor Ignatyev wrote: >> http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 >>> 5 lines changed: 0 ins; 4 del; 1 mod; >> Hi all, >> could you please review this patch? >> from JBS: >>> "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. >> besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. >> webrev: >> JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 >> -- Igor From vladimir.kozlov at oracle.com Wed Jul 29 22:45:50 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 29 Jul 2020 15:45:50 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW In-Reply-To: <87C23499-6F1E-4B89-BFF6-CC9269C1C279@oracle.com> References: <87C23499-6F1E-4B89-BFF6-CC9269C1C279@oracle.com> Message-ID: <27CCBBD2-A2DE-40AA-A894-BB257EFF95B5@oracle.com> Good. Thanks Vladimir > On Jul 29, 2020, at 3:24 PM, Igor Ignatyev wrote: > > ?oopsie, removed, http://cr.openjdk.java.net/~iignatyev//8250797/webrev.01 : > >> diff -r e5afd04596e7 src/hotspot/share/compiler/compileTask.hpp >> --- a/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:31 2020 -0700 >> +++ b/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:53 2020 -0700 >> @@ -133,7 +133,6 @@ >> bool should_wait_for_compilation() const { >> // Wait for blocking compilation to finish. >> switch (_compile_reason) { >> - case Reason_CTW: >> case Reason_Replay: >> case Reason_Whitebox: >> case Reason_Bootstrap: > > thanks for noticing. for luck, I've started builds-tier1 job. > > -- Igor > >> On Jul 29, 2020, at 1:54 PM, Vladimir Kozlov wrote: >> >> Igor, >> >> You missed reference in should_wait_for_compilation(). >> >> Thanks, >> Vladimir K >> >>> On 7/29/20 12:34 PM, Igor Ignatyev wrote: >>> http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 >>>> 5 lines changed: 0 ins; 4 del; 1 mod; >>> Hi all, >>> could you please review this patch? >>> from JBS: >>>> "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. >>> besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. >>> webrev: >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 >>> -- Igor > From vladimir.x.ivanov at oracle.com Wed Jul 29 23:10:34 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Thu, 30 Jul 2020 02:10:34 +0300 Subject: Adding an Intrinsic for MD5 In-Reply-To: References: Message-ID: <5445fe9c-f113-be29-3835-e342e10623db@oracle.com> Hi Ludovic, It's a crash due to a out-of-bounds Java heap access (right at the upper heap boundary). Something is wrong either with the initial buf value (r15) or limit check: 166 if (multi_block) { 167 // increment data pointer and loop if more to process 168 addptr(buf, 64); 169 movptr(rsi, ofs); 170 addptr(rsi, 64); 171 movptr(ofs, rsi); 172 cmpptr(rsi, limit); 173 jcc(Assembler::belowEqual, loop0); 174 } From the hs_err log: # SIGSEGV (0xb) at pc=0x00007f34f10354a1, pid=28286, tid=28305 siginfo: si_signo: 11 (SIGSEGV), si_code: 2 (SEGV_ACCERR), si_addr: 0x0000000510800000 0x00007f34f10354a1: add 0x18(%r15),%ecx R15=0x00000005107fffe8 points into unknown readable memory: 0x0000000000000000 | 00 00 00 00 00 00 00 00 | 100|0x0000000510000000, 0x0000000510800000, 0x0000000510800000|100%| E|CS|TAMS 0x0000000510000000, 0x0000000510000000| Complete Regarding ways to debug it, I'd put a breakpoint right at the beginning of the stub first to validate that parameters are valid. Then I'd dump parameters on stack in order to simplify post-mortem analysis. (If the problem is with limit check, then many iterations should pass before it reaches the end of the Java heap.) Also, inserting debug checks in the stub itself can catch an inconsistency much closer to the actual place where the bug lurks. Best regards, Vladimir Ivanov On 29.07.2020 22:13, Ludovic Henry wrote: > To add some more information, I've uploaded one of the `hs_err_pid*.log` file at [1]. > > -- > Ludovic > > [1] http://cr.openjdk.java.net/~burban/luhenry/md5-intrinsics/hs_err_pid28286.log > > -----Original Message----- > From: hotspot-compiler-dev On Behalf Of Ludovic Henry > Sent: Wednesday, July 29, 2020 9:55 AM > To: hotspot-compiler-dev at openjdk.java.net > Subject: Adding an Intrinsic for MD5 > > Hi, > > After doing profiling on some applications on Azure, I noticed that MD5 takes a significant time when verifying the content of large amount of downloaded data (see [1] for a flamegraph of some Spark operations pulling data from Azure Storage, look at the top most `Lsun/securitu/pro..` entry representing 11.68% of the samples). I then looked into the code generated for `sun.security.provider.MD5.implCompress` (the hottest method). I observed that the generated code contains many branches that are never taken and not even necessary (array-bound checks on a fixed sized array for which we already checked the size, for example). On top of that, MD5 doesn't require any (there are no conditions and no loops), making all these branches pure overhead. Accelerating MD5 will not be only beneficial to Azure workloads, but to anyone doing any sort of content hashing/verification with MD5 (which is quite unfortunate given the known flaws of MD5 and the availability of faster alternatives with greater cryptographical qualities). > > I worked last night on a prototype of an intrinsic, which I've uploaded at [2]. It's a very rough draft and I want to have your input before I invest further into it. > > As it is the first time I do such work (adding an intrinsic, generating assembly by hand, adding support for one instruction in the assembler), I'm still running into a crash and I am not sure how to debug it further. I would really appreciate any pointer on how I need to approach debugging such an issue, or even for an expert to look into my change and help me pinpoint what's going wrong. So far, I used the disassembly and hs_err*.log file to clearly see the generated code and the machine state at the time of the crash. I expect the problem to be around calling conventions and assumptions around the shape/content of the parameters. I'll keep debugging in the meantime. > > Thank you very much, > > -- > Ludovic > > [1] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fflamegraph-45235.svg&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858391072&sdata=1rNcCYW29l4KZPjpXT1%2F3nSWma3%2F83rXaIwNsw9s1GM%3D&reserved=0 > [2] https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~burban%2Fluhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cbc2c83438c794f73c6cb08d833e08f5f%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637316386858401068&sdata=014gBkFRpgC4QT6U0Zp4%2FKSI0qv0g3fXEJ4YL12bDX0%3D&reserved=0 > From jiefu at tencent.com Wed Jul 29 23:09:34 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Wed, 29 Jul 2020 23:09:34 +0000 Subject: RFR: 8250745: Fix a potential bug on AVX512 machines with assert(eval_map.contains(n)) failed: absent Message-ID: Thanks Vladimir Ivanov and Vladimir Kozlov for your review. Will push it several hours later (since the hotspot code review must be there for at least 24 hours). Best regards, Jie ?On 2020/7/30, 4:07 AM, "Vladimir Ivanov" wrote: FYI test results are clean (hs-precheckin-comp,hs-tier1,hs-tier2). Best regards, Vladimir Ivanov On 29.07.2020 12:20, Vladimir Ivanov wrote: >> Webrev: http://cr.openjdk.java.net/~jiefu/8250745/webrev.00/ > > Looks good. > > FTR the bug was introduced by JDK-8241040, but I don't see a way it can > be hit by auto-vectorizer: before it kicks in, scalar code is strongly > normalized and constants are pushed to the right. It leads to the shape > where (Replicate -1) is always the second input of bitwise NOT shape > (XorV v (Replicate -1)). Since there are no GVN transformations > happening for vector nodes, both left-hand and right-hand variants > become possible with Vector API. > > Best regards, > Vladimir Ivanov From dean.long at oracle.com Wed Jul 29 23:48:23 2020 From: dean.long at oracle.com (Dean Long) Date: Wed, 29 Jul 2020 16:48:23 -0700 Subject: Adding an Intrinsic for MD5 In-Reply-To: <5445fe9c-f113-be29-3835-e342e10623db@oracle.com> References: <5445fe9c-f113-be29-3835-e342e10623db@oracle.com> Message-ID: Does this cmp have the lhs and rhs reversed? dl On 7/29/20 4:10 PM, Vladimir Ivanov wrote: > 172 cmpptr(rsi, limit); From igor.ignatyev at oracle.com Wed Jul 29 23:56:17 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Wed, 29 Jul 2020 16:56:17 -0700 Subject: RFR(T) : 8250797 : remove CompileReason::Reason_CTW In-Reply-To: <27CCBBD2-A2DE-40AA-A894-BB257EFF95B5@oracle.com> References: <87C23499-6F1E-4B89-BFF6-CC9269C1C279@oracle.com> <27CCBBD2-A2DE-40AA-A894-BB257EFF95B5@oracle.com> Message-ID: <75838403-3E5D-4A7F-87A7-A8A6B905D095@oracle.com> Vladimir, Katya, Thanks for your reviews, pushed. -- Igor > On Jul 29, 2020, at 3:45 PM, Vladimir Kozlov wrote: > > Good. > > Thanks > Vladimir > >> On Jul 29, 2020, at 3:24 PM, Igor Ignatyev wrote: >> >> ?oopsie, removed, http://cr.openjdk.java.net/~iignatyev//8250797/webrev.01 : >> >>> diff -r e5afd04596e7 src/hotspot/share/compiler/compileTask.hpp >>> --- a/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:31 2020 -0700 >>> +++ b/src/hotspot/share/compiler/compileTask.hpp Wed Jul 29 15:02:53 2020 -0700 >>> @@ -133,7 +133,6 @@ >>> bool should_wait_for_compilation() const { >>> // Wait for blocking compilation to finish. >>> switch (_compile_reason) { >>> - case Reason_CTW: >>> case Reason_Replay: >>> case Reason_Whitebox: >>> case Reason_Bootstrap: >> >> thanks for noticing. for luck, I've started builds-tier1 job. >> >> -- Igor >> >>> On Jul 29, 2020, at 1:54 PM, Vladimir Kozlov wrote: >>> >>> Igor, >>> >>> You missed reference in should_wait_for_compilation(). >>> >>> Thanks, >>> Vladimir K >>> >>>> On 7/29/20 12:34 PM, Igor Ignatyev wrote: >>>> http://cr.openjdk.java.net/~iignatyev//8250797/webrev.00 >>>>> 5 lines changed: 0 ins; 4 del; 1 mod; >>>> Hi all, >>>> could you please review this patch? >>>> from JBS: >>>>> "native" CTW has been removed by JDK-8213812 (JDK-8214917), so CompileReason::Reason_CTW isn't used anymore and should be removed. >>>> besides removing CompileReason::Reason_CTW and corresponding element from reason_names[] array, the patch also updates the comment for CompileReason as CompileTask::can_become_stale doesn't really depend on the order. >>>> webrev: >>>> JBS: https://bugs.openjdk.java.net/browse/JDK-8250797 >>>> -- Igor >> > From ningsheng.jian at arm.com Thu Jul 30 01:53:07 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 30 Jul 2020 09:53:07 +0800 Subject: [aarch64-port-dev ] RFR (XXL): 8223347: Integration of Vector API (Incubator): AArch64 backend changes In-Reply-To: <852a3a09-a627-c0fc-89c6-8c8100ae17f5@redhat.com> References: <275eb57c-51c0-675e-c32a-91b198023559@redhat.com> <719F9169-ABC4-408E-B732-F1BD9A84337F@oracle.com> <9a13f5df-d946-579d-4282-917dc7338dc8@redhat.com> <09BC0693-80E0-4F87-855E-0B38A6F5EFA2@oracle.com> <668e500e-f621-5a2c-a41e-f73536880f73@redhat.com> <1909fa9d-98bb-c2fb-45d8-540247d1ca8b@redhat.com> <2acbcc99-8dd4-b8f1-5982-1d439953c416@redhat.com> <54d6b2b6-b79a-4700-981c-6ab33aca82f2@arm.com> <852a3a09-a627-c0fc-89c6-8c8100ae17f5@redhat.com> Message-ID: <564c8283-0c8f-9487-af3c-c971fa6b736d@arm.com> On 7/29/20 7:44 PM, Andrew Haley wrote: > On 20/07/2020 04:51, Ningsheng Jian wrote: >> Since we are getting ready to propose Vector API target to JDK 16 [1]. I >> have regenerated webrev of aarch64 backend parts from panama repo, which >> has been rebased to jdk/jdk very recently, by: >> >> $ hg update vector-unstable && hg diff -r default > all.patch >> $ grep "diff -r" all.patch | grep -e "src/hotspot/cpu/aarch64" | awk >> '{print $4}' > aarch64_list >> $ ksh ./webrev.ksh -r default -o aarch64_webrev aarch64_list >> >> The new webrev: >> http://cr.openjdk.java.net/~njian/vectorapi/8223347-integration/aarch64-webrev.01/ >> >> Could you please help to take a look? > > OK, thanks. It all looks fine. Sorry for the delay. > Thank you Andrew! Regards, Ningsheng From ningsheng.jian at arm.com Thu Jul 30 06:22:35 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 30 Jul 2020 14:22:35 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> Message-ID: <5f6dc64c-f51a-50d4-995f-ed0c7a7724e8@arm.com> Hi, Pengfei helped to review the patch offline and found that some multiply-add/sub and popcount match rules are missing for SVE. Added in the new webrev. Thanks to Pengfei! New webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.03 Incremental changes: http://cr.openjdk.java.net/~njian/8231441/webrev.03-vs-02/ Split parts: 1) SVE feature detection: http://cr.openjdk.java.net/~njian/8231441/webrev.03-feature 2) c2 register allocation: http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra 3) SVE c2 backend: http://cr.openjdk.java.net/~njian/8231441/webrev.03-c2 Thanks, Ningsheng On 7/21/20 2:05 PM, Ningsheng Jian wrote: > [Ping] > > Could anyone please help to review this patch, especially for the c2 > register allocation part? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 > > The latest webrev: > http://cr.openjdk.java.net/~njian/8231441/webrev.02 > > In the latest webrev, we block one predicate register (p7) with all > elements preset to TRUE, so that c2 compiled code can use it freely to > generate instructions for unpredicated operations. > > And the split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 > > The initial RFR which has some descriptions of the patch: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html > > The description can also be found at: > http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt > > Notes to verify the patch on QEMU user emulation, with an example of > compiled code: > http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt > > Thanks, > Ningsheng > > > On 5/27/20 3:23 PM, Ningsheng Jian wrote: >> Hi, >> >> I have rebased this patch with some more comments added. And also >> relaxed the instruction matching conditions for 128-bit vector. >> >> I would appreciate if someone could help to review this. >> >> Whole patch: >> http://cr.openjdk.java.net/~njian/8231441/webrev.01 >> >> Different parts of changes: >> >> 1) SVE feature detection >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature >> >> 2) c2 registion allocation >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra >> >> 3) SVE c2 backend >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 >> >> (Or should I split this into different JBS?) >> >> Thanks, >> Ningsheng >> >> On 3/25/20 2:37 PM, Ningsheng Jian wrote: >>> Hi, >>> >>> Could you please help to review this patch adding AArch64 SVE support? >>> It also touches c2 compiler shared code. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 >>> >>> Arm has released new vector ISA extension for AArch64, SVE [1] and >>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this >>> patch we have: >>> >>> 1) SVE feature enablement and detection >>> 2) SVE vector register allocation support with initial predicate >>> register definition >>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a POC >>> patch of a new vectorizer using SVE predicate-driven loop control, but >>> that's still under development.) >>> >>> SVE register definition >>> ======================= >>> Unlike other SIMD architectures, SVE allows hardware implementations to >>> choose a vector register length from 128 and 2048 bits, multiple of 128 >>> bits. So we introduce a new vector type VectorA, i.e. length agnostic >>> (scalable) vector type, and Op_VecA for machine vectora register. In the >>> meantime, to minimize register allocation code changes, we also take >>> advantage of one JIT compiler aspect, that is during the compile time we >>> actually know the real hardware SVE vector register size of current >>> running machine. So, the register allocator actually knows how many >>> register slots an Op_VecA ideal reg requires, and could work fine >>> without much modification. >>> >>> Since the bottom 128 bits are shared with the NEON, we extend current >>> register mask definition of V0-V31 registers. Currently, c2 uses one bit >>> mask for a 32-bit register slot, so to define at most 2048 bits we will >>> need to add 64 slots in AD file. That's a really large number, and will >>> also break current regmask assumption. Considering the SVE vector >>> register is architecturally scalable for different sizes, we just define >>> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H, >>> Vx_J ... Vx_O. After adlc, the generated register masks now looks like: >>> >>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, >>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... >>> >>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, >>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... >>> >>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, >>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... >>> >>> And we use SlotsPerVecA to indicate regmask bit size for a VecA register. >>> >>> Although for physical register allocation, register allocator does not >>> need to know the real VecA register size, while doing spill/unspill, >>> current register allocation needs to know actual stack slot size to >>> store/load VecA registers. SVE is able to do vector size agnostic >>> spilling, but to minimize the code changes, as I mentioned before, we >>> just let RA know the actual vector register size in current running >>> machine, by calling scalable_vector_reg_size(). >>> >>> In the meantime, since some vector operations do not have unpredicated >>> SVE1 instructions, but only predicate version, e.g. vector multiply, >>> vector load/store. We have also defined predicate registers in this >>> patch, and c2 register allocator will allocate a temp predicate register >>> to fulfill the expecting unpredicated operations. And this can also be >>> used for future predicate-driven vectorizer. This is not efficient for >>> now, as we can see many ptrue instructions in the generated code. One >>> possible solution I can see, is to block one predicate register, and >>> preset it to all true. But to preserve/reinitialize a caller save >>> register value cross calls seems risky to work in this patch. I decide >>> to defer it to further optimization work. If anyone has any suggestions >>> on this, I would appreciate. >>> >>> SVE feature detection >>> ===================== >>> Since we may have some compiled code based on the initial detected SVE >>> vector register length and the compiled code is compiled only for that >>> vector register length, we assume that the SVE vector register length >>> will not be changed during the JVM lifetime. However, SVE vector length >>> is per-thread and can be changed by system call [3], so we need to make >>> sure that each jni call will not change the sve vector length. >>> >>> Currently, we verify the SVE vector register length on each JNI return, >>> and if an SVE vector length change is detected, jvm simply reports error >>> and stops running. The VM running vector length can also be set by >>> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is >>> specified not the same as system default sve vector length (in >>> /proc/sys/abi/sve_default_vector_length), JVM will set current process >>> sve vector length to the specified vector length. >>> >>> Compiled code >>> ============= >>> We have added all current c2 backend codegen on par with NEON, but only >>> for vector length larger than 128-bit. >>> >>> On a 1024 bit SVE environment, for the following simple loop with int >>> array element type: >>> >>> ??? for (int i = 0; i < LENGTH; i++) { >>> ????? c[i] = a[i] + b[i]; >>> ??? } >>> >>> c2 generated loop: >>> >>> ??? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 >>> ??? 0x0000ffff811c0824:?? add???? x13, x18, x11 >>> ??? 0x0000ffff811c0828:?? add???? x14, x1, x11 >>> ??? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 >>> ??? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 >>> ??? 0x0000ffff811c0834:?? add???? x11, x0, x11 >>> ??? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 >>> ??? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized >>> ??? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] >>> ??? 0x0000ffff811c0844:?? ptrue?? p0.s >>> ??? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] >>> ??? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s >>> ??? 0x0000ffff811c0850:?? ptrue?? p1.s >>> ??? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] >>> ??? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 >>> ??? 0x0000ffff811c085c:?? cmp???? w10, w12 >>> ??? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 >>> >>> Test >>> ==== >>> Currently, we don't have real hardware to verify SVE features (and >>> performance). But we have run jtreg tests with SVE in some emulators. On >>> QEMU system emulator, which has SVE emulation support, jtreg tier1-3 >>> passed with different vector sizes. We've also verified it with full >>> jtreg tests without SVE on both x86 and AArch64, to make sure that >>> there's no regression. >>> >>> The patch has also been applied to Vector API code base, and verified on >>> emulator. In Vector API, there are more vector related tests and is more >>> possible to generate vector instructions by intrinsification. >>> >>> A simple test can also run in QEMU user emulation, e.g. >>> >>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD >>> >>> ( >>> To run it in user emulation mode, we will need to bypass SVE feature >>> detection code in this patch. E.g. apply: >>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch >>> )l >>> >>> Others >>> ====== >>> Since this patch is a bit large, I've also split it into 3 parts, for >>> easy review: >>> >>> 1) SVE feature detection >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature >>> >>> 2) c2 registion allocation >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra >>> >>> 3) SVE c2 backend >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 >>> >>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. >>> >>> Refs >>> ==== >>> [1] https://developer.arm.com/docs/ddi0584/latest >>> [2] https://developer.arm.com/docs/ddi0602/latest >>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt >>> >>> Thanks, >>> Ningsheng >>> >> > From Pengfei.Li at arm.com Thu Jul 30 06:59:23 2020 From: Pengfei.Li at arm.com (Pengfei Li) Date: Thu, 30 Jul 2020 06:59:23 +0000 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <5f6dc64c-f51a-50d4-995f-ed0c7a7724e8@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <5f6dc64c-f51a-50d4-995f-ed0c7a7724e8@arm.com> Message-ID: Hi, To help reviewing the large ad file changes in the AArch64 backend, I created some jtreg tests checking if expected SVE/NEON instructions are correctly generated for each C2 vectornode. I've uploaded my jtreg at http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/. Hope it would be useful for other reviewers. -- Thanks, Pengfei > -----Original Message----- > From: Ningsheng Jian > Sent: Thursday, July 30, 2020 14:23 > To: hotspot-compiler-dev at openjdk.java.net; Pengfei Li > ; Vladimir Kozlov ; > Vladimir Ivanov ; Andrew Haley > > Cc: aarch64-port-dev at openjdk.java.net > Subject: Re: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE > backend support > > Hi, > > Pengfei helped to review the patch offline and found that some multiply- > add/sub and popcount match rules are missing for SVE. Added in the new > webrev. Thanks to Pengfei! > > New webrev: > http://cr.openjdk.java.net/~njian/8231441/webrev.03 > > Incremental changes: > http://cr.openjdk.java.net/~njian/8231441/webrev.03-vs-02/ > > Split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.03-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.03-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.03-c2 > > Thanks, > Ningsheng > > On 7/21/20 2:05 PM, Ningsheng Jian wrote: > > [Ping] > > > > Could anyone please help to review this patch, especially for the c2 > > register allocation part? > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 > > > > The latest webrev: > > http://cr.openjdk.java.net/~njian/8231441/webrev.02 > > > > In the latest webrev, we block one predicate register (p7) with all > > elements preset to TRUE, so that c2 compiled code can use it freely to > > generate instructions for unpredicated operations. > > > > And the split parts: > > > > 1) SVE feature detection: > > http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature > > > > 2) c2 register allocation: > > http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra > > > > 3) SVE c2 backend: > > http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 > > > > The initial RFR which has some descriptions of the patch: > > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March > > /037628.html > > > > The description can also be found at: > > http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt > > > > Notes to verify the patch on QEMU user emulation, with an example of > > compiled code: > > http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt > > > > Thanks, > > Ningsheng > > > > > > On 5/27/20 3:23 PM, Ningsheng Jian wrote: > >> Hi, > >> > >> I have rebased this patch with some more comments added. And also > >> relaxed the instruction matching conditions for 128-bit vector. > >> > >> I would appreciate if someone could help to review this. > >> > >> Whole patch: > >> http://cr.openjdk.java.net/~njian/8231441/webrev.01 > >> > >> Different parts of changes: > >> > >> 1) SVE feature detection > >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature > >> > >> 2) c2 registion allocation > >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra > >> > >> 3) SVE c2 backend > >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 > >> > >> (Or should I split this into different JBS?) > >> > >> Thanks, > >> Ningsheng > >> > >> On 3/25/20 2:37 PM, Ningsheng Jian wrote: > >>> Hi, > >>> > >>> Could you please help to review this patch adding AArch64 SVE support? > >>> It also touches c2 compiler shared code. > >>> > >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 > >>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 > >>> > >>> Arm has released new vector ISA extension for AArch64, SVE [1] and > >>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In > >>> this patch we have: > >>> > >>> 1) SVE feature enablement and detection > >>> 2) SVE vector register allocation support with initial predicate > >>> register definition > >>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a > >>> POC patch of a new vectorizer using SVE predicate-driven loop > >>> control, but that's still under development.) > >>> > >>> SVE register definition > >>> ======================= > >>> Unlike other SIMD architectures, SVE allows hardware implementations > >>> to choose a vector register length from 128 and 2048 bits, multiple > >>> of 128 bits. So we introduce a new vector type VectorA, i.e. length > >>> agnostic > >>> (scalable) vector type, and Op_VecA for machine vectora register. In > >>> the meantime, to minimize register allocation code changes, we also > >>> take advantage of one JIT compiler aspect, that is during the > >>> compile time we actually know the real hardware SVE vector register > >>> size of current running machine. So, the register allocator actually > >>> knows how many register slots an Op_VecA ideal reg requires, and > >>> could work fine without much modification. > >>> > >>> Since the bottom 128 bits are shared with the NEON, we extend > >>> current register mask definition of V0-V31 registers. Currently, c2 > >>> uses one bit mask for a 32-bit register slot, so to define at most > >>> 2048 bits we will need to add 64 slots in AD file. That's a really > >>> large number, and will also break current regmask assumption. > >>> Considering the SVE vector register is architecturally scalable for > >>> different sizes, we just define double of original NEON vector > >>> register slots, i.e. 8 slots: Vx, Vx_H, Vx_J ... Vx_O. After adlc, the > generated register masks now looks like: > >>> > >>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, > >>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... > >>> > >>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, > >>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... > >>> > >>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, > >>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... > >>> > >>> And we use SlotsPerVecA to indicate regmask bit size for a VecA register. > >>> > >>> Although for physical register allocation, register allocator does > >>> not need to know the real VecA register size, while doing > >>> spill/unspill, current register allocation needs to know actual > >>> stack slot size to store/load VecA registers. SVE is able to do > >>> vector size agnostic spilling, but to minimize the code changes, as > >>> I mentioned before, we just let RA know the actual vector register > >>> size in current running machine, by calling scalable_vector_reg_size(). > >>> > >>> In the meantime, since some vector operations do not have > >>> unpredicated > >>> SVE1 instructions, but only predicate version, e.g. vector multiply, > >>> vector load/store. We have also defined predicate registers in this > >>> patch, and c2 register allocator will allocate a temp predicate > >>> register to fulfill the expecting unpredicated operations. And this > >>> can also be used for future predicate-driven vectorizer. This is not > >>> efficient for now, as we can see many ptrue instructions in the > >>> generated code. One possible solution I can see, is to block one > >>> predicate register, and preset it to all true. But to > >>> preserve/reinitialize a caller save register value cross calls seems > >>> risky to work in this patch. I decide to defer it to further > >>> optimization work. If anyone has any suggestions on this, I would > appreciate. > >>> > >>> SVE feature detection > >>> ===================== > >>> Since we may have some compiled code based on the initial detected > >>> SVE vector register length and the compiled code is compiled only > >>> for that vector register length, we assume that the SVE vector > >>> register length will not be changed during the JVM lifetime. > >>> However, SVE vector length is per-thread and can be changed by > >>> system call [3], so we need to make sure that each jni call will not change > the sve vector length. > >>> > >>> Currently, we verify the SVE vector register length on each JNI > >>> return, and if an SVE vector length change is detected, jvm simply > >>> reports error and stops running. The VM running vector length can > >>> also be set by existing VM option MaxVectorSize with c2 enabled. If > >>> MaxVectorSize is specified not the same as system default sve vector > >>> length (in /proc/sys/abi/sve_default_vector_length), JVM will set > >>> current process sve vector length to the specified vector length. > >>> > >>> Compiled code > >>> ============= > >>> We have added all current c2 backend codegen on par with NEON, but > >>> only for vector length larger than 128-bit. > >>> > >>> On a 1024 bit SVE environment, for the following simple loop with > >>> int array element type: > >>> > >>> ??? for (int i = 0; i < LENGTH; i++) { > >>> ????? c[i] = a[i] + b[i]; > >>> ??? } > >>> > >>> c2 generated loop: > >>> > >>> ??? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 > >>> ??? 0x0000ffff811c0824:?? add???? x13, x18, x11 > >>> ??? 0x0000ffff811c0828:?? add???? x14, x1, x11 > >>> ??? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 > >>> ??? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 > >>> ??? 0x0000ffff811c0834:?? add???? x11, x0, x11 > >>> ??? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 > >>> ??? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized > >>> ??? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] > >>> ??? 0x0000ffff811c0844:?? ptrue?? p0.s > >>> ??? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] > >>> ??? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s > >>> ??? 0x0000ffff811c0850:?? ptrue?? p1.s > >>> ??? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] > >>> ??? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 > >>> ??? 0x0000ffff811c085c:?? cmp???? w10, w12 > >>> ??? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 > >>> > >>> Test > >>> ==== > >>> Currently, we don't have real hardware to verify SVE features (and > >>> performance). But we have run jtreg tests with SVE in some > >>> emulators. On QEMU system emulator, which has SVE emulation > support, > >>> jtreg tier1-3 passed with different vector sizes. We've also > >>> verified it with full jtreg tests without SVE on both x86 and > >>> AArch64, to make sure that there's no regression. > >>> > >>> The patch has also been applied to Vector API code base, and > >>> verified on emulator. In Vector API, there are more vector related > >>> tests and is more possible to generate vector instructions by > intrinsification. > >>> > >>> A simple test can also run in QEMU user emulation, e.g. > >>> > >>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD > >>> > >>> ( > >>> To run it in user emulation mode, we will need to bypass SVE feature > >>> detection code in this patch. E.g. apply: > >>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch > >>> )l > >>> > >>> Others > >>> ====== > >>> Since this patch is a bit large, I've also split it into 3 parts, > >>> for easy review: > >>> > >>> 1) SVE feature detection > >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature > >>> > >>> 2) c2 registion allocation > >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra > >>> > >>> 3) SVE c2 backend > >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 > >>> > >>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. > >>> > >>> Refs > >>> ==== > >>> [1] https://developer.arm.com/docs/ddi0584/latest > >>> [2] https://developer.arm.com/docs/ddi0602/latest > >>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt > >>> > >>> Thanks, > >>> Ningsheng > >>> > >> > > From ningsheng.jian at arm.com Thu Jul 30 08:13:23 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Thu, 30 Jul 2020 16:13:23 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <5f6dc64c-f51a-50d4-995f-ed0c7a7724e8@arm.com> Message-ID: <5fdc66e4-da99-f9bf-f656-a85a0e9e7b00@arm.com> On 7/30/20 2:59 PM, Pengfei Li wrote: > Hi, > > To help reviewing the large ad file changes in the AArch64 backend, I created some jtreg tests checking if expected SVE/NEON instructions are correctly generated for each C2 vectornode. > > I've uploaded my jtreg at http://cr.openjdk.java.net/~pli/rfr/8231441/jtreg.webrev.00/. Hope it would be useful for other reviewers. > Thanks! I like the idea of Opto checker. That would be helpful to the SVE vectorization work. Regards, Ningsheng From jatin.bhateja at intel.com Thu Jul 30 10:55:22 2020 From: jatin.bhateja at intel.com (Bhateja, Jatin) Date: Thu, 30 Jul 2020 10:55:22 +0000 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> Message-ID: Hi Vladimir, > So, it makes sense to keep the transformations. But I'm fine with > addressing that as a followup enhancement. Updated patch placed at following link http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ test-tier1 shows no surprises. Have submitted the patch to submit-repo for testing: http://hg.openjdk.java.net/jdk/submit/rev/3ed477bb24a7 Best Regards, Jatin > -----Original Message----- > From: Vladimir Ivanov > Sent: Thursday, July 30, 2020 3:30 AM > To: Bhateja, Jatin > Cc: Viswanathan, Sandhya ; hotspot-compiler- > dev at openjdk.java.net > Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 > > > >> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ > > > > Looks good. (Testing is in progress.) > > FYI test results are clean (tier1-tier5). > > >> I have removed RotateLeftNode/RotateRightNode::Ideal routines since > >> we are anyways doing constant folding in LShiftI/URShiftI value > >> routines. Since JAVA rotate APIs are no longer intrincified hence > >> these routines may no longer be useful. > > > > Nice observation! Good. > > As a second thought, it seems there's still a chance left that Rotate nodes > get their input type narrowed after the folding happened. For example, as a > result of incremental inlining or CFG transformations during loop > optimizations. And it does happen in practice since the testing revealed > some crashes due to the bug in RotateLeftNode/RotateRightNode::Ideal(). > > So, it makes sense to keep the transformations. But I'm fine with > addressing that as a followup enhancement. > > Best regards, > Vladimir Ivanov > > > > >>> It would be really nice to migrate to MacroAssembler along the way > >>> (as a cleanup). > >> > >> I guess you are saying remove opcodes/encoding from patterns and move > >> then to Assembler, Can we take this cleanup activity separately since > >> other patterns are also using these matcher directives. > > > > I'm perfectly fine with handling it as a separate enhancement. > > > >> Other synthetic comments have been taken care of. I have extended the > >> Test to cover all the newly added scalar transforms. Kindly let me > >> know if there other comments. > > > > Nice! > > > > Best regards, > > Vladimir Ivanov > > > >>> -----Original Message----- > >>> From: Vladimir Ivanov > >>> Sent: Friday, July 24, 2020 3:21 AM > >>> To: Bhateja, Jatin > >>> Cc: Viswanathan, Sandhya ; Andrew > >>> Haley ; hotspot-compiler-dev at openjdk.java.net > >>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for > >>> X86 > >>> > >>> Hi Jatin, > >>> > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ > >>> > >>> Much better! Thanks. > >>> > >>>> Change Summary: > >>>> > >>>> 1) Unified the handling for scalar rotate operation. All scalar > >>>> rotate > >>> selection patterns are now dependent on newly created > >>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. > >>> Currently > >>> if DAG nodes corresponding to a sub-pattern are shared (have > >>> multiple > >>> users) then existing complex patterns based on Or/LShiftL/URShift > >>> does not get matched and this prevents inferring rotate nodes. > >>> Please refer to JIT'ed assembly output with baseline[1] and with > >>> patch[2] . We can see that generated code size also went done from > >>> 832 byte to 768 bytes. Also this can cause perf degradation if > >>> shift-or dependency chain appears inside a hot region. > >>>> > >>>> 2) Due to enhanced rotate inferencing new patch shows better > >>>> performance > >>> even for legacy targets (non AVX-512). Please refer to the perf > >>> result[3] over AVX2 machine for JMH benchmark part of the patch. > >>> > >>> Very nice! > >>>> 3) As suggested, removed Java API intrinsification changes and > >>>> scalar > >>> rotate transformation are done during OrI/OrL node idealizations. > >>> > >>> Good. > >>> > >>> (Still would be nice to factor the matching code from Ideal() and > >>> share it between multiple use sites. Especially considering > >>> OrVNode::Ideal() now does basically the same thing. As an > >>> example/idea, take a look at > >>> is_bmi_pattern() in x86.ad.) > >>> > >>>> 4) SLP always gets to work on new scalar Rotate nodes and creates > >>>> vector > >>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes > >>> if target does not supports vector rotates(non-AVX512). > >>> > >>> Good. > >>> > >>>> 5) Added new instruction patterns for vector shift Left/Right > >>>> operations > >>> with constant shift operands. This prevents emitting extra moves to > XMM. > >>> > >>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ > >>> +? match(Set dst (LShiftVI src shift)); > >>> > >>> I'd prefer to see a uniform Ideal IR shape being used irrespective > >>> of whether the argument is a constant or not. It should also > >>> simplify the logic in SuperWord and make it easier to support on > >>> non-x86 architectures. > >>> > >>> For example, here's how it is done on AArch64: > >>> > >>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ > >>> ??? predicate(n->as_Vector()->length() == 4); > >>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... > >>> > >>>> 6) Constant folding scenarios are covered in RotateLeft/RotateRight > >>> idealization, inferencing of vector rotate through OrV idealization > >>> covers the vector patterns generated though non SLP route i.e. > >>> VectorAPI. > >>> > >>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the > >>> general direction here - duplication of scalar transformations to > >>> lane-wise vector operations. It definitely won't scale and in a > >>> longer run it risks to diverge. Would be nice to find a way to > >>> automatically "lift" > >>> scalar transformations to vectors and apply them uniformly. But > >>> right now it is just an idea which requires more experimentation. > >>> > >>> > >>> Some other minor comments/suggestions: > >>> > >>> +? // Swap the computed left and right shift counts. > >>> +? if (is_rotate_left) { > >>> +??? Node* temp = shiftRCnt; > >>> +??? shiftRCnt? = shiftLCnt; > >>> +??? shiftLCnt? = temp; > >>> +? } > >>> > >>> Maybe use swap() here (declared in globalDefinitions.hpp)? > >>> > >>> > >>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) > >>> +??? return true; > >>> > >>> Please, don't omit curly braces (even for simple cases). > >>> > >>> > >>> -// Rotate Right by variable > >>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 > >>> zero, rFlagsReg cr) > >>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) > >>> ?? %{ > >>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero > >>> shift)))); > >>> - > >>> +? predicate(!VM_Version::supports_bmi2() && > >>> n->bottom_type()->basic_type() == T_INT); > >>> +? match(Set dst (RotateRight dst shift)); > >>> +? format %{ "rorl???? $dst, $shift" %} > >>> ???? expand %{ > >>> -??? rorI_rReg_CL(dst, shift, cr); > >>> +??? rorI_rReg_imm8(dst, shift, cr); > >>> ???? %} > >>> > >>> It would be really nice to migrate to MacroAssembler along the way > >>> (as a cleanup). > >>> > >>>> Please push the patch through your testing framework and let me > >>>> know your > >>> review feedback. > >>> > >>> There's one new assertion failure: > >>> > >>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), > >>> pid=5476, tid=6219 > >>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize > >>> should return new nodes, use Identity to return old nodes > >>> > >>> I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal > >>> which can return pre-contructed constants. I suggest to get rid of > >>> Ideal() methods and move constant folding logic into Node::Value() > >>> (as implemented for other bitwise/arithmethic nodes in > >>> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic > >>> approach since it enables richer type information (ranges vs > >>> constants) and IMO it's more convenient to work with constants > >>> through Types than ConNodes. > >>> > >>> (I suspect that original/expanded IR shape may already provide more > >>> precise type info for non-constant case which can affect the > >>> benchmarks.) > >>> > >>> Best regards, > >>> Vladimir Ivanov > >>> > >>>> > >>>> Best Regards, > >>>> Jatin > >>>> > >>>> [1] > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. > >>>> txt [2] > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_ > >>>> asm > >>>> .txt [3] > >>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_p > >>>> atc > >>>> h.txt > >>>> > >>>> > >>>>> -----Original Message----- > >>>>> From: Vladimir Ivanov > >>>>> Sent: Saturday, July 18, 2020 12:25 AM > >>>>> To: Bhateja, Jatin ; Andrew Haley > >>>>> > >>>>> Cc: Viswanathan, Sandhya ; > >>>>> hotspot-compiler- dev at openjdk.java.net > >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification > >>>>> for > >>>>> X86 > >>>>> > >>>>> Hi Jatin, > >>>>> > >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ > >>>>> > >>>>> It definitely looks better, but IMO it hasn't reached the sweet > >>>>> spot > >>> yet. > >>>>> It feels like the focus is on auto-vectorizer while the burden is > >>>>> put on scalar cases. > >>>>> > >>>>> First of all, considering GVN folds relevant operation patterns > >>>>> into a single Rotate node now, what's the motivation to introduce > >>>>> intrinsics? > >>>>> > >>>>> Another point is there's still significant duplication for scalar > >>>>> cases. > >>>>> > >>>>> I'd prefer to see the legacy cases which rely on pattern matching > >>>>> to go away and be substituted with instructions which match Rotate > >>>>> instructions (migrating ). > >>>>> > >>>>> I understand that it will penalize the vectorization > >>>>> implementation, but IMO reducing overall complexity is worth it. > >>>>> On auto-vectorizer side, I see > >>>>> 2 ways to fix it: > >>>>> > >>>>> ???? (1) introduce additional AD instructions for > >>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; > >>>>> > >>>>> ???? (2) in SuperWord::output(), when matcher doesn't support > >>>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), > >>>>> generate vectorized version of the original pattern. > >>>>> > >>>>> Overall, it looks like more and more focus is made on scalar part. > >>>>> Considering the main goal of the patch is to enable vectorization, > >>>>> I'm fine with separating cleanup of scalar part. As an interim > >>>>> solution, it seems that leaving the scalar part as it is now and > >>>>> matching scalar bit rotate pattern in VectorNode::is_rotate() > >>>>> should be enough to keep the vectorization part functioning. Then > >>>>> scalar Rotate nodes and relevant cleanups can be integrated later. > >>>>> (Or vice > >>>>> versa: clean up scalar part first and then follow up with > >>>>> vectorization.) > >>>>> > >>>>> Some other comments: > >>>>> > >>>>> * There's a lot of duplication between OrINode::Ideal and > >>> OrLNode::Ideal. > >>>>> What do you think about introducing a super type > >>>>> (OrNode) and put a unified version (OrNode::Ideal) there? > >>>>> > >>>>> > >>>>> * src/hotspot/cpu/x86/x86.ad > >>>>> > >>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ > >>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == > >>>>> T_INT > >>> || > >>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == > >>>>> +T_LONG); > >>>>> > >>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ > >>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == > >>>>> T_INT > >>> || > >>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == > >>>>> +T_LONG); > >>>>> > >>>>> The predicates are redundant here. > >>>>> > >>>>> > >>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp > >>>>> > >>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, > >>>>> XMMRegister dst, XMMRegister src, > >>>>> +???????????????????????????????????? int shift, int vector_len) { > >>>>> +if (opcode == Op_RotateLeftV) { > >>>>> +??? if (etype == T_INT) { > >>>>> +????? evprold(dst, src, shift, vector_len); > >>>>> +??? } else { > >>>>> +????? evprolq(dst, src, shift, vector_len); > >>>>> +??? } > >>>>> > >>>>> Please, put an assert for the false case (assert(etype == T_LONG, > >>> "...")). > >>>>> > >>>>> > >>>>> * On testing (with previous version of the patch): -XX:UseAVX is > >>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 > platforms. > >>>>> Either omitting the flag or adding > >>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. > >>>>> > >>>>> Best regards, > >>>>> Vladimir Ivanov > >>>>> > >>>>>> > >>>>>> > >>>>>> Summary of changes: > >>>>>> 1) Optimization is specifically targeted to exploit vector > >>>>>> rotation > >>>>> instruction added for X86 AVX512. A single rotate instruction > >>>>> encapsulates entire vector OR/SHIFTs pattern thus offers better > >>>>> latency at reduced instruction count. > >>>>>> > >>>>>> 2) There were two approaches to implement this: > >>>>>> ?????? a)? Let everything remain the same and add new wide > >>>>>> complex > >>>>> instruction patterns in the matcher for e.g. > >>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary > >>>>>> ReplicateI > >>>>>> shift)) > >>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate > >>>>> shift)) > >>>>>> ?????? It would have been an overoptimistic assumption to expect > >>>>>> that graph > >>>>> shape would be preserved till the matcher for correct inferencing. > >>>>>> ?????? In addition we would have required multiple such bulky > >>>>>> patterns. > >>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, these > >>>>>> gets > >>>>> generated during intrinsification as well as during additional > >>>>> pattern > >>>>>> ?????? matching during node Idealization, later on these nodes > >>>>>> are consumed > >>>>> by SLP for valid vectorization scenarios to emit their vector > >>>>>> ?????? counterparts which eventually emits vector rotates. > >>>>>> > >>>>>> 3) I choose approach 2b) since its cleaner, only problem here was > >>>>>> that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should > >>>>>> either be > >>>>> dismantled back to OR/SHIFT pattern or we penalize the > >>>>> vectorization which would be very costly, other option would have > >>>>> been to add additional vector rotate pattern for UseAVX=3 in the > >>>>> matcher which emit vector OR-SHIFTs instruction but then it will > >>>>> loose on emitting efficient instruction sequence which node > >>>>> sharing > >>>>> (OrV/LShiftV/URShift) offer in current implementation - thus it > >>>>> will not be beneficial for non-AVX512 targets, only saving will be > >>>>> in terms of cleanup of few existing scalar rotate matcher > >>>>> patterns, also old targets does not offer this powerful rotate > instruction. > >>>>> Therefore new scalar nodes are created only for AVX512 targets. > >>>>>> > >>>>>> As per suggestions constant folding scenarios have been covered > >>>>>> during > >>>>> Idealizations of newly added scalar nodes. > >>>>>> > >>>>>> Please review the latest version and share your feedback and test > >>>>> results. > >>>>>> > >>>>>> Best Regards, > >>>>>> Jatin > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Andrew Haley > >>>>>>> Sent: Saturday, July 11, 2020 2:24 PM > >>>>>>> To: Vladimir Ivanov ; Bhateja, > >>>>>>> Jatin ; > >>>>>>> hotspot-compiler-dev at openjdk.java.net > >>>>>>> Cc: Viswanathan, Sandhya > >>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification > >>>>>>> for > >>>>>>> X86 > >>>>>>> > >>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: > >>>>>>> > >>>>>>> ??? > High-level comment: so far, there were no pressing need in > >>>>>>> > explicitly marking the methods as intrinsics. ROR/ROL > >>>>>>> instructions > >>>>>>>> were selected during matching [1]. Now the patch introduces? > > >>>>>>> dedicated nodes > >>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > which > >>>>>>> partly duplicates existing logic. > >>>>>>> > >>>>>>> The lack of rotate nodes in the IR has always meant that AArch64 > >>>>>>> doesn't generate optimal code for e.g. > >>>>>>> > >>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) > >>>>>>> > >>>>>>> because, with the RotateLeft expanded to its full combination of > >>>>>>> ORs and shifts, it's to complicated to match. At the time I put > >>>>>>> this to one side because it wasn't urgent. This is a shame > >>>>>>> because although such combinations are unusual they are used in > >>>>>>> some crypto > >>> operations. > >>>>>>> > >>>>>>> If we can generate immediate-form rotate nodes early by pattern > >>>>>>> matching during parsing (rather than depending on intrinsics) > >>>>>>> we'll get more value than by depending on programmers calling > intrinsics. > >>>>>>> > >>>>>>> -- > >>>>>>> Andrew Haley? (he/him) > >>>>>>> Java Platform Lead Engineer > >>>>>>> Red Hat UK Ltd. > >>>>>>> https://keybase.io/andrewhaley > >>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 > >>>>>> From adinn at redhat.com Thu Jul 30 11:26:42 2020 From: adinn at redhat.com (Andrew Dinn) Date: Thu, 30 Jul 2020 12:26:42 +0100 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> Message-ID: <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> Hi Ningsheng, I will start to review this either later today or (more likely) tomorrow. It will probably take some time to work through it all. I will work from the updated patch posted by PengFei. regards, Andrew Dinn ----------- Red Hat Distinguished Engineer Red Hat UK Ltd Registered in England and Wales under Company Registration No. 03798903 Directors: Michael Cunningham, Michael ("Mike") O'Neill On 21/07/2020 07:05, Ningsheng Jian wrote: > [Ping] > > Could anyone please help to review this patch, especially for the c2 > register allocation part? > > JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 > > The latest webrev: > http://cr.openjdk.java.net/~njian/8231441/webrev.02 > > In the latest webrev, we block one predicate register (p7) with all > elements preset to TRUE, so that c2 compiled code can use it freely to > generate instructions for unpredicated operations. > > And the split parts: > > 1) SVE feature detection: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature > > 2) c2 register allocation: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra > > 3) SVE c2 backend: > http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 > > The initial RFR which has some descriptions of the patch: > http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html > > > The description can also be found at: > http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt > > Notes to verify the patch on QEMU user emulation, with an example of > compiled code: > http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt > > Thanks, > Ningsheng > > > On 5/27/20 3:23 PM, Ningsheng Jian wrote: >> Hi, >> >> I have rebased this patch with some more comments added. And also >> relaxed the instruction matching conditions for 128-bit vector. >> >> I would appreciate if someone could help to review this. >> >> Whole patch: >> http://cr.openjdk.java.net/~njian/8231441/webrev.01 >> >> Different parts of changes: >> >> 1) SVE feature detection >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature >> >> 2) c2 registion allocation >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra >> >> 3) SVE c2 backend >> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 >> >> (Or should I split this into different JBS?) >> >> Thanks, >> Ningsheng >> >> On 3/25/20 2:37 PM, Ningsheng Jian wrote: >>> Hi, >>> >>> Could you please help to review this patch adding AArch64 SVE support? >>> It also touches c2 compiler shared code. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 >>> >>> Arm has released new vector ISA extension for AArch64, SVE [1] and >>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this >>> patch we have: >>> >>> 1) SVE feature enablement and detection >>> 2) SVE vector register allocation support with initial predicate >>> register definition >>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a POC >>> patch of a new vectorizer using SVE predicate-driven loop control, but >>> that's still under development.) >>> >>> SVE register definition >>> ======================= >>> Unlike other SIMD architectures, SVE allows hardware implementations to >>> choose a vector register length from 128 and 2048 bits, multiple of 128 >>> bits. So we introduce a new vector type VectorA, i.e. length agnostic >>> (scalable) vector type, and Op_VecA for machine vectora register. In the >>> meantime, to minimize register allocation code changes, we also take >>> advantage of one JIT compiler aspect, that is during the compile time we >>> actually know the real hardware SVE vector register size of current >>> running machine. So, the register allocator actually knows how many >>> register slots an Op_VecA ideal reg requires, and could work fine >>> without much modification. >>> >>> Since the bottom 128 bits are shared with the NEON, we extend current >>> register mask definition of V0-V31 registers. Currently, c2 uses one bit >>> mask for a 32-bit register slot, so to define at most 2048 bits we will >>> need to add 64 slots in AD file. That's a really large number, and will >>> also break current regmask assumption. Considering the SVE vector >>> register is architecturally scalable for different sizes, we just define >>> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H, >>> Vx_J ... Vx_O. After adlc, the generated register masks now looks like: >>> >>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, >>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... >>> >>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, >>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... >>> >>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, >>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... >>> >>> And we use SlotsPerVecA to indicate regmask bit size for a VecA >>> register. >>> >>> Although for physical register allocation, register allocator does not >>> need to know the real VecA register size, while doing spill/unspill, >>> current register allocation needs to know actual stack slot size to >>> store/load VecA registers. SVE is able to do vector size agnostic >>> spilling, but to minimize the code changes, as I mentioned before, we >>> just let RA know the actual vector register size in current running >>> machine, by calling scalable_vector_reg_size(). >>> >>> In the meantime, since some vector operations do not have unpredicated >>> SVE1 instructions, but only predicate version, e.g. vector multiply, >>> vector load/store. We have also defined predicate registers in this >>> patch, and c2 register allocator will allocate a temp predicate register >>> to fulfill the expecting unpredicated operations. And this can also be >>> used for future predicate-driven vectorizer. This is not efficient for >>> now, as we can see many ptrue instructions in the generated code. One >>> possible solution I can see, is to block one predicate register, and >>> preset it to all true. But to preserve/reinitialize a caller save >>> register value cross calls seems risky to work in this patch. I decide >>> to defer it to further optimization work. If anyone has any suggestions >>> on this, I would appreciate. >>> >>> SVE feature detection >>> ===================== >>> Since we may have some compiled code based on the initial detected SVE >>> vector register length and the compiled code is compiled only for that >>> vector register length, we assume that the SVE vector register length >>> will not be changed during the JVM lifetime. However, SVE vector length >>> is per-thread and can be changed by system call [3], so we need to make >>> sure that each jni call will not change the sve vector length. >>> >>> Currently, we verify the SVE vector register length on each JNI return, >>> and if an SVE vector length change is detected, jvm simply reports error >>> and stops running. The VM running vector length can also be set by >>> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is >>> specified not the same as system default sve vector length (in >>> /proc/sys/abi/sve_default_vector_length), JVM will set current process >>> sve vector length to the specified vector length. >>> >>> Compiled code >>> ============= >>> We have added all current c2 backend codegen on par with NEON, but only >>> for vector length larger than 128-bit. >>> >>> On a 1024 bit SVE environment, for the following simple loop with int >>> array element type: >>> >>> ??? for (int i = 0; i < LENGTH; i++) { >>> ????? c[i] = a[i] + b[i]; >>> ??? } >>> >>> c2 generated loop: >>> >>> ??? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 >>> ??? 0x0000ffff811c0824:?? add???? x13, x18, x11 >>> ??? 0x0000ffff811c0828:?? add???? x14, x1, x11 >>> ??? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 >>> ??? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 >>> ??? 0x0000ffff811c0834:?? add???? x11, x0, x11 >>> ??? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 >>> ??? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized >>> ??? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] >>> ??? 0x0000ffff811c0844:?? ptrue?? p0.s >>> ??? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] >>> ??? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s >>> ??? 0x0000ffff811c0850:?? ptrue?? p1.s >>> ??? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] >>> ??? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 >>> ??? 0x0000ffff811c085c:?? cmp???? w10, w12 >>> ??? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 >>> >>> Test >>> ==== >>> Currently, we don't have real hardware to verify SVE features (and >>> performance). But we have run jtreg tests with SVE in some emulators. On >>> QEMU system emulator, which has SVE emulation support, jtreg tier1-3 >>> passed with different vector sizes. We've also verified it with full >>> jtreg tests without SVE on both x86 and AArch64, to make sure that >>> there's no regression. >>> >>> The patch has also been applied to Vector API code base, and verified on >>> emulator. In Vector API, there are more vector related tests and is more >>> possible to generate vector instructions by intrinsification. >>> >>> A simple test can also run in QEMU user emulation, e.g. >>> >>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD >>> >>> ( >>> To run it in user emulation mode, we will need to bypass SVE feature >>> detection code in this patch. E.g. apply: >>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch >>> )l >>> >>> Others >>> ====== >>> Since this patch is a bit large, I've also split it into 3 parts, for >>> easy review: >>> >>> 1) SVE feature detection >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature >>> >>> 2) c2 registion allocation >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra >>> >>> 3) SVE c2 backend >>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 >>> >>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. >>> >>> Refs >>> ==== >>> [1] https://developer.arm.com/docs/ddi0584/latest >>> [2] https://developer.arm.com/docs/ddi0602/latest >>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt >>> >>> Thanks, >>> Ningsheng >>> >> > From jiefu at tencent.com Thu Jul 30 13:09:14 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Thu, 30 Jul 2020 13:09:14 +0000 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field Message-ID: <71F94D35-2B7B-4032-AD01-954524A150B7@tencent.com> Hi all, JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: ConP ConL \ | \ | AddP Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. Testing: - tier1-3 on Linux/x64 Could you please review it and give me some advice? Thanks a lot. Best regards, Jie From christian.hagedorn at oracle.com Thu Jul 30 13:15:34 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Thu, 30 Jul 2020 15:15:34 +0200 Subject: [16] RFR(S): C2: assert(no_dead_loop) failed: dead loop detected Message-ID: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> Hi Please review the following patch: https://bugs.openjdk.java.net/browse/JDK-8249605 http://cr.openjdk.java.net/~chagedorn/8249605/webrev.00/ There is a dead memory loop detected during IGVN. In the testcase, many nodes are dying during IGVN because they are not reachable anymore. In this process, a (not yet dead) memory phi node (150 Phi) with two inputs is processed (see [1]): (1) 289 MergeMem, whose base memory is 150 Phi and has one slice for 274 StoreD which is again an output of 150 Phi (2) 356 MergeMem, whose base memory is top (i.e. dead and would be removed when IGVN processes this node) In PhiNode::Ideal, we check if a phi node is part of a dead loop where all its inputs reference itself over a MergeMemNode input whose base memory is the phi node itself again [2]. However, in this check we do not account for dead MergeMemNodes (like the input 356 MergeMem of 150 Phi). Therefore, we do not return top and apply the optimization [3] to replace 150 Phi by a new MergeMemNode (380 MergeMem) whose base memory is top and now has again one slice which is input and output to 274 StoreD [4]. This cycle is later detected and the assertion fails. The fix accounts additionally for dead MergeMemNodes when trying to detect dead loops in PhiNode::Ideal to return top instead of a new dead MergeMemNode. Best regards, Christian [1] https://bugs.openjdk.java.net/secure/attachment/89582/before_PhiNode_Ideal.png [2] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2234 [3] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2246 [4] https://bugs.openjdk.java.net/secure/attachment/89583/after_PhiNode_Ideal.png From volker.simonis at gmail.com Thu Jul 30 17:03:49 2020 From: volker.simonis at gmail.com (Volker Simonis) Date: Thu, 30 Jul 2020 19:03:49 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: <1596056152748.75196@amazon.com> References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> <1596056152748.75196@amazon.com> Message-ID: On Wed, Jul 29, 2020 at 10:56 PM Liu, Xin wrote: > > hi, Volker and Tobias, > > Here is a new revision. > http://cr.openjdk.java.net/~xliu/8249809/02/webrev/ > > 1. This one add comments about this smart pointer and fix the formation issue. > > 2. Thanks to point me out a new document of hotspot code style. > Since it has updated to -std=c++14, I change all NULL to nullptr. > > 3. I also add NON_COPYABLE because it's not intended to be copied. > > > DirectiveSetPtr is just a thin wrapper of the raw pointer. if users only use it to read, nothing will be cloned. It simply goes through. > Hi Xin, I like the new version :) I think it's fine except the assertion in "transfer()": 308 DirectiveSet* transfer() { 309 assert(_origin != nullptr, "_origin is NULL! transfer() can only be invoked once."); 310 311 if (_clone != nullptr) { 312 // We are returning a (parentless) copy. The original parent don't need to account for this. 313 DirectivesStack::release(_origin); 314 _origin = nullptr; 315 return _clone; 316 } 317 else { 318 return _origin; 319 } 320 } 321 }; You should either move it into the " if (_clone != nullptr)" block or set "_origin" to NULL in the "else" branch as well. Best regards, Volker PS: I won't have access to mail for the next two weeks. If there won't be any fundamental changes to this patch any more you can consider it reviewed from my side. > thanks, > --lx > > ________________________________________ > From: Volker Simonis > Sent: Wednesday, July 29, 2020 7:34 AM > To: Tobias Hartmann > Cc: Liu, Xin; hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed, Jul 29, 2020 at 9:38 AM Tobias Hartmann > wrote: > > > > Hi Xin, > > > > On 28.07.20 22:56, Liu, Xin wrote: > > > http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ > > > > Overall looks good to me. > > > > Some style comments: > > - Add a comment to 'DirectiveSetPtr' to describe its purpose > > - Why not put the "cloned" logic in "operator->"? > > Because there's also a "read-only" access of the DirectiveSetPtr > which doesn't mutate its content and therefore should clone the > underlying DirectiveSet. See my first mail where I proposed to add a > second, `const`-version of "operator->". But that still required const > casts in the places where we didn't want to clone. I've therefore > voted for the new "cloned()" method which makes cloning and mutating > explicit and which is much easier to understand from my point of view > (compared to two overloaded operators). > > > - Do not use the _clone pointer as boolean (see "Miscellaneous" section in the style guide [1]) > > - Indentation in line 301-303 is wrong > > - Line 306 use brackets around the "else" and move it one line up "} else {" > > > > Best regards, > > Tobias > > > > [1] https://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/hotspot-style.html From vladimir.kozlov at oracle.com Thu Jul 30 17:17:13 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2020 10:17:13 -0700 Subject: [16] RFR(S): C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> References: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> Message-ID: Very good. Thanks, Vladimir K On 7/30/20 6:15 AM, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8249605 > http://cr.openjdk.java.net/~chagedorn/8249605/webrev.00/ > > There is a dead memory loop detected during IGVN. In the testcase, many nodes are dying during IGVN because they are not > reachable anymore. In this process, a (not yet dead) memory phi node (150 Phi) with two inputs is processed (see [1]): > (1) 289 MergeMem, whose base memory is 150 Phi and has one slice for 274 StoreD which is again an output of 150 Phi > (2) 356 MergeMem, whose base memory is top (i.e. dead and would be removed when IGVN processes this node) > > In PhiNode::Ideal, we check if a phi node is part of a dead loop where all its inputs reference itself over a > MergeMemNode input whose base memory is the phi node itself again [2]. However, in this check we do not account for dead > MergeMemNodes (like the input 356 MergeMem of 150 Phi). Therefore, we do not return top and apply the optimization [3] > to replace 150 Phi by a new MergeMemNode (380 MergeMem) whose base memory is top and now has again one slice which is > input and output to 274 StoreD [4]. This cycle is later detected and the assertion fails. > > The fix accounts additionally for dead MergeMemNodes when trying to detect dead loops in PhiNode::Ideal to return top > instead of a new dead MergeMemNode. > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/secure/attachment/89582/before_PhiNode_Ideal.png > [2] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2234 > [3] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2246 > [4] https://bugs.openjdk.java.net/secure/attachment/89583/after_PhiNode_Ideal.png From vladimir.kozlov at oracle.com Thu Jul 30 18:23:07 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2020 11:23:07 -0700 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field In-Reply-To: <71F94D35-2B7B-4032-AD01-954524A150B7@tencent.com> References: <71F94D35-2B7B-4032-AD01-954524A150B7@tencent.com> Message-ID: <91ac9d1c-410c-786d-f0c1-e4e4c4afda2c@oracle.com> Hi Jie Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to fields) offset with arbitrary type. As result C2 type system can't find corresponding field. Did you tried to do unaligned unsafe access to instance fields? Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how it behave in unsafe case. Please, extend your test. Otherwise fix is good. Thanks, Vladimir K On 7/30/20 6:09 AM, jiefu(??) wrote: > Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ > > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: > ConP ConL > \ | > \ | > AddP > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. > > Testing: > - tier1-3 on Linux/x64 > > Could you please review it and give me some advice? > > Thanks a lot. > Best regards, > Jie > From xxinliu at amazon.com Thu Jul 30 19:33:23 2020 From: xxinliu at amazon.com (Liu, Xin) Date: Thu, 30 Jul 2020 19:33:23 +0000 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> <1596056152748.75196@amazon.com>, Message-ID: <1596137602709.774@amazon.com> hi, Volker, Your suggestion is great. I took it. The assertion is there because I want to prevent a pointer from releasing more than one time. the downside is I limit how to use the function transfer(). I just came up a new idea. I changed the function name from transfer() to commit(). if _clone is not nullptr, commit() will overwrite _origin and reset itself to nullptr. cloned() provisions a new object to update. commit() finalizes it. it's exaggerated, but we can use the smart pointer repeat. + set.commit(); // update _origin + set.cloned(); // clone it again + set.commit(); // update _origin again + set.commit(); // no-op + set.cloned(); // clone a new one. + set.cloned(); // no-op return set.commit(); here is the new revision: https://cr.openjdk.java.net/~xliu/8249809/03/webrev/ thanks, --lx ________________________________________ From: Volker Simonis Sent: Thursday, July 30, 2020 10:03 AM To: Liu, Xin Cc: Tobias Hartmann; hotspot-compiler-dev at openjdk.java.net Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. On Wed, Jul 29, 2020 at 10:56 PM Liu, Xin wrote: > > hi, Volker and Tobias, > > Here is a new revision. > http://cr.openjdk.java.net/~xliu/8249809/02/webrev/ > > 1. This one add comments about this smart pointer and fix the formation issue. > > 2. Thanks to point me out a new document of hotspot code style. > Since it has updated to -std=c++14, I change all NULL to nullptr. > > 3. I also add NON_COPYABLE because it's not intended to be copied. > > > DirectiveSetPtr is just a thin wrapper of the raw pointer. if users only use it to read, nothing will be cloned. It simply goes through. > Hi Xin, I like the new version :) I think it's fine except the assertion in "transfer()": 308 DirectiveSet* transfer() { 309 assert(_origin != nullptr, "_origin is NULL! transfer() can only be invoked once."); 310 311 if (_clone != nullptr) { 312 // We are returning a (parentless) copy. The original parent don't need to account for this. 313 DirectivesStack::release(_origin); 314 _origin = nullptr; 315 return _clone; 316 } 317 else { 318 return _origin; 319 } 320 } 321 }; You should either move it into the " if (_clone != nullptr)" block or set "_origin" to NULL in the "else" branch as well. Best regards, Volker PS: I won't have access to mail for the next two weeks. If there won't be any fundamental changes to this patch any more you can consider it reviewed from my side. > thanks, > --lx > > ________________________________________ > From: Volker Simonis > Sent: Wednesday, July 29, 2020 7:34 AM > To: Tobias Hartmann > Cc: Liu, Xin; hotspot-compiler-dev at openjdk.java.net > Subject: RE: [EXTERNAL] RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe. > > > > On Wed, Jul 29, 2020 at 9:38 AM Tobias Hartmann > wrote: > > > > Hi Xin, > > > > On 28.07.20 22:56, Liu, Xin wrote: > > > http://cr.openjdk.java.net/~xliu/8249809/01/webrev/ > > > > Overall looks good to me. > > > > Some style comments: > > - Add a comment to 'DirectiveSetPtr' to describe its purpose > > - Why not put the "cloned" logic in "operator->"? > > Because there's also a "read-only" access of the DirectiveSetPtr > which doesn't mutate its content and therefore should clone the > underlying DirectiveSet. See my first mail where I proposed to add a > second, `const`-version of "operator->". But that still required const > casts in the places where we didn't want to clone. I've therefore > voted for the new "cloned()" method which makes cloning and mutating > explicit and which is much easier to understand from my point of view > (compared to two overloaded operators). > > > - Do not use the _clone pointer as boolean (see "Miscellaneous" section in the style guide [1]) > > - Indentation in line 301-303 is wrong > > - Line 306 use brackets around the "else" and move it one line up "} else {" > > > > Best regards, > > Tobias > > > > [1] https://hg.openjdk.java.net/jdk/jdk/raw-file/tip/doc/hotspot-style.html From luhenry at microsoft.com Fri Jul 31 01:26:00 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 31 Jul 2020 01:26:00 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 Message-ID: JBS: I just got authorship status and I'll create a bug as soon as I have access to JBS Webrev: http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.00/ The problem ended up not being with how `ofs` was incremented, but with a callee-saved register not being restored properly before returning from the intrinsic. The performance results from running with JMH are very encouraging. I ran the `org.openjdk.bench.java.security.MessageDigests` with MD5 only enabled, and following are the results with and without the intrinsic. -XX:-UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 3459.747 ? 10.508 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 446.407 ? 3.383 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 10 30.685 ? 0.676 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.483 ? 0.004 ops/ms -XX:+UseMD5Intrinsics Benchmark (digesterName) (length) (provider) Mode Cnt Score Error Units MessageDigests.digest md5 64 DEFAULT thrpt 10 4011.556 ? 10.212 ops/ms MessageDigests.digest md5 1024 DEFAULT thrpt 10 526.873 ? 2.101 ops/ms MessageDigests.digest md5 16384 DEFAULT thrpt 10 35.012 ? 0.088 ops/ms MessageDigests.digest md5 1048576 DEFAULT thrpt 10 0.573 ? 0.002 ops/ms That's overall a jump from ~483MB/s to ~573MB/s on the 1M chunks, or a ~19% speedup. Thank you, Ludovic From ningsheng.jian at arm.com Fri Jul 31 01:41:45 2020 From: ningsheng.jian at arm.com (Ningsheng Jian) Date: Fri, 31 Jul 2020 09:41:45 +0800 Subject: [aarch64-port-dev ] RFR(L): 8231441: AArch64: Initial SVE backend support In-Reply-To: <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> References: <42fca25d-7172-b4f3-335b-92e2b05e8195@arm.com> <707df21c-849d-ac9d-0ab2-61a30d1354f9@arm.com> <2df4a73f-7e84-87f1-6b2f-1ed6b45bbc27@redhat.com> Message-ID: <8bc0d357-07e7-ae55-b7b2-23ec54ea3e6a@arm.com> Hi Andrew, Thanks a lot!! FYI, the latest patch: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-July/039289.html And some descriptions: http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt Thanks, Ningsheng On 7/30/20 7:26 PM, Andrew Dinn wrote: > Hi Ningsheng, > > I will start to review this either later today or (more likely) > tomorrow. It will probably take some time to work through it all. I will > work from the updated patch posted by PengFei. > > regards, > > > Andrew Dinn > ----------- > Red Hat Distinguished Engineer > Red Hat UK Ltd > Registered in England and Wales under Company Registration No. 03798903 > Directors: Michael Cunningham, Michael ("Mike") O'Neill > > On 21/07/2020 07:05, Ningsheng Jian wrote: >> [Ping] >> >> Could anyone please help to review this patch, especially for the c2 >> register allocation part? >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8231441 >> >> The latest webrev: >> http://cr.openjdk.java.net/~njian/8231441/webrev.02 >> >> In the latest webrev, we block one predicate register (p7) with all >> elements preset to TRUE, so that c2 compiled code can use it freely to >> generate instructions for unpredicated operations. >> >> And the split parts: >> >> 1) SVE feature detection: >> http://cr.openjdk.java.net/~njian/8231441/webrev.02-feature >> >> 2) c2 register allocation: >> http://cr.openjdk.java.net/~njian/8231441/webrev.02-ra >> >> 3) SVE c2 backend: >> http://cr.openjdk.java.net/~njian/8231441/webrev.02-c2 >> >> The initial RFR which has some descriptions of the patch: >> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-March/037628.html >> >> >> The description can also be found at: >> http://cr.openjdk.java.net/~njian/8231441/README-RFR.txt >> >> Notes to verify the patch on QEMU user emulation, with an example of >> compiled code: >> http://cr.openjdk.java.net/~njian/8231441/running-sve-in-qemu-user.txt >> >> Thanks, >> Ningsheng >> >> >> On 5/27/20 3:23 PM, Ningsheng Jian wrote: >>> Hi, >>> >>> I have rebased this patch with some more comments added. And also >>> relaxed the instruction matching conditions for 128-bit vector. >>> >>> I would appreciate if someone could help to review this. >>> >>> Whole patch: >>> http://cr.openjdk.java.net/~njian/8231441/webrev.01 >>> >>> Different parts of changes: >>> >>> 1) SVE feature detection >>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-feature >>> >>> 2) c2 registion allocation >>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-ra >>> >>> 3) SVE c2 backend >>> http://cr.openjdk.java.net/~njian/8231441/webrev.01-c2 >>> >>> (Or should I split this into different JBS?) >>> >>> Thanks, >>> Ningsheng >>> >>> On 3/25/20 2:37 PM, Ningsheng Jian wrote: >>>> Hi, >>>> >>>> Could you please help to review this patch adding AArch64 SVE support? >>>> It also touches c2 compiler shared code. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8231441 >>>> Webrev: http://cr.openjdk.java.net/~njian/8231441/webrev.00 >>>> >>>> Arm has released new vector ISA extension for AArch64, SVE [1] and >>>> SVE2 [2]. This patch adds the initial SVE support in OpenJDK. In this >>>> patch we have: >>>> >>>> 1) SVE feature enablement and detection >>>> 2) SVE vector register allocation support with initial predicate >>>> register definition >>>> 3) SVE c2 backend for current SLP based vectorizer. (We also have a POC >>>> patch of a new vectorizer using SVE predicate-driven loop control, but >>>> that's still under development.) >>>> >>>> SVE register definition >>>> ======================= >>>> Unlike other SIMD architectures, SVE allows hardware implementations to >>>> choose a vector register length from 128 and 2048 bits, multiple of 128 >>>> bits. So we introduce a new vector type VectorA, i.e. length agnostic >>>> (scalable) vector type, and Op_VecA for machine vectora register. In the >>>> meantime, to minimize register allocation code changes, we also take >>>> advantage of one JIT compiler aspect, that is during the compile time we >>>> actually know the real hardware SVE vector register size of current >>>> running machine. So, the register allocator actually knows how many >>>> register slots an Op_VecA ideal reg requires, and could work fine >>>> without much modification. >>>> >>>> Since the bottom 128 bits are shared with the NEON, we extend current >>>> register mask definition of V0-V31 registers. Currently, c2 uses one bit >>>> mask for a 32-bit register slot, so to define at most 2048 bits we will >>>> need to add 64 slots in AD file. That's a really large number, and will >>>> also break current regmask assumption. Considering the SVE vector >>>> register is architecturally scalable for different sizes, we just define >>>> double of original NEON vector register slots, i.e. 8 slots: Vx, Vx_H, >>>> Vx_J ... Vx_O. After adlc, the generated register masks now looks like: >>>> >>>> const RegMask _VECTORA_REG_mask( 0x0, 0x0, 0xffffffff, 0xffffffff, >>>> 0xffffffff, 0xffffffff, 0xffffffff, 0xffffffff, ... >>>> >>>> const RegMask _VECTORD_REG_mask( 0x0, 0x0, 0x3030303, 0x3030303, >>>> 0x3030303, 0x3030303, 0x3030303, 0x3030303, ... >>>> >>>> const RegMask _VECTORX_REG_mask( 0x0, 0x0, 0xf0f0f0f, 0xf0f0f0f, >>>> 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, 0xf0f0f0f, ... >>>> >>>> And we use SlotsPerVecA to indicate regmask bit size for a VecA >>>> register. >>>> >>>> Although for physical register allocation, register allocator does not >>>> need to know the real VecA register size, while doing spill/unspill, >>>> current register allocation needs to know actual stack slot size to >>>> store/load VecA registers. SVE is able to do vector size agnostic >>>> spilling, but to minimize the code changes, as I mentioned before, we >>>> just let RA know the actual vector register size in current running >>>> machine, by calling scalable_vector_reg_size(). >>>> >>>> In the meantime, since some vector operations do not have unpredicated >>>> SVE1 instructions, but only predicate version, e.g. vector multiply, >>>> vector load/store. We have also defined predicate registers in this >>>> patch, and c2 register allocator will allocate a temp predicate register >>>> to fulfill the expecting unpredicated operations. And this can also be >>>> used for future predicate-driven vectorizer. This is not efficient for >>>> now, as we can see many ptrue instructions in the generated code. One >>>> possible solution I can see, is to block one predicate register, and >>>> preset it to all true. But to preserve/reinitialize a caller save >>>> register value cross calls seems risky to work in this patch. I decide >>>> to defer it to further optimization work. If anyone has any suggestions >>>> on this, I would appreciate. >>>> >>>> SVE feature detection >>>> ===================== >>>> Since we may have some compiled code based on the initial detected SVE >>>> vector register length and the compiled code is compiled only for that >>>> vector register length, we assume that the SVE vector register length >>>> will not be changed during the JVM lifetime. However, SVE vector length >>>> is per-thread and can be changed by system call [3], so we need to make >>>> sure that each jni call will not change the sve vector length. >>>> >>>> Currently, we verify the SVE vector register length on each JNI return, >>>> and if an SVE vector length change is detected, jvm simply reports error >>>> and stops running. The VM running vector length can also be set by >>>> existing VM option MaxVectorSize with c2 enabled. If MaxVectorSize is >>>> specified not the same as system default sve vector length (in >>>> /proc/sys/abi/sve_default_vector_length), JVM will set current process >>>> sve vector length to the specified vector length. >>>> >>>> Compiled code >>>> ============= >>>> We have added all current c2 backend codegen on par with NEON, but only >>>> for vector length larger than 128-bit. >>>> >>>> On a 1024 bit SVE environment, for the following simple loop with int >>>> array element type: >>>> >>>> ??? for (int i = 0; i < LENGTH; i++) { >>>> ????? c[i] = a[i] + b[i]; >>>> ??? } >>>> >>>> c2 generated loop: >>>> >>>> ??? 0x0000ffff811c0820:?? sbfiz?? x11, x10, #2, #32 >>>> ??? 0x0000ffff811c0824:?? add???? x13, x18, x11 >>>> ??? 0x0000ffff811c0828:?? add???? x14, x1, x11 >>>> ??? 0x0000ffff811c082c:?? add???? x13, x13, #0x10 >>>> ??? 0x0000ffff811c0830:?? add???? x14, x14, #0x10 >>>> ??? 0x0000ffff811c0834:?? add???? x11, x0, x11 >>>> ??? 0x0000ffff811c0838:?? add???? x11, x11, #0x10 >>>> ??? 0x0000ffff811c083c:?? ptrue?? p1.s??? // To be optimized >>>> ??? 0x0000ffff811c0840:?? ld1w??? {z16.s}, p1/z, [x14] >>>> ??? 0x0000ffff811c0844:?? ptrue?? p0.s >>>> ??? 0x0000ffff811c0848:?? ld1w??? {z17.s}, p0/z, [x13] >>>> ??? 0x0000ffff811c084c:?? add???? z16.s, z17.s, z16.s >>>> ??? 0x0000ffff811c0850:?? ptrue?? p1.s >>>> ??? 0x0000ffff811c0854:?? st1w??? {z16.s}, p1, [x11] >>>> ??? 0x0000ffff811c0858:?? add???? w10, w10, #0x20 >>>> ??? 0x0000ffff811c085c:?? cmp???? w10, w12 >>>> ??? 0x0000ffff811c0860:?? b.lt??? 0x0000ffff811c0820 >>>> >>>> Test >>>> ==== >>>> Currently, we don't have real hardware to verify SVE features (and >>>> performance). But we have run jtreg tests with SVE in some emulators. On >>>> QEMU system emulator, which has SVE emulation support, jtreg tier1-3 >>>> passed with different vector sizes. We've also verified it with full >>>> jtreg tests without SVE on both x86 and AArch64, to make sure that >>>> there's no regression. >>>> >>>> The patch has also been applied to Vector API code base, and verified on >>>> emulator. In Vector API, there are more vector related tests and is more >>>> possible to generate vector instructions by intrinsification. >>>> >>>> A simple test can also run in QEMU user emulation, e.g. >>>> >>>> $ qemu-aarch64 -cpu max,sve-max-vq=2 java -XX:UseSVE=1 SIMD >>>> >>>> ( >>>> To run it in user emulation mode, we will need to bypass SVE feature >>>> detection code in this patch. E.g. apply: >>>> http://cr.openjdk.java.net/~njian/8231441/user-emulation.patch >>>> )l >>>> >>>> Others >>>> ====== >>>> Since this patch is a bit large, I've also split it into 3 parts, for >>>> easy review: >>>> >>>> 1) SVE feature detection >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-feature >>>> >>>> 2) c2 registion allocation >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-ra >>>> >>>> 3) SVE c2 backend >>>> http://cr.openjdk.java.net/~njian/8231441/webrev.00-c2 >>>> >>>> Part of this patch has been contributed by Joshua Zhu and Yang Zhang. >>>> >>>> Refs >>>> ==== >>>> [1] https://developer.arm.com/docs/ddi0584/latest >>>> [2] https://developer.arm.com/docs/ddi0602/latest >>>> [3] https://www.kernel.org/doc/Documentation/arm64/sve.txt >>>> >>>> Thanks, >>>> Ningsheng >>>> >>> >> > From vladimir.kozlov at oracle.com Fri Jul 31 02:54:28 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Thu, 30 Jul 2020 19:54:28 -0700 Subject: [16] RFR(M) 8250233: -XX:+CITime triggers guarantee(events != NULL) in jvmci.cpp:173 Message-ID: https://cr.openjdk.java.net/~kvn/8250233/webrev.00/ https://bugs.openjdk.java.net/browse/JDK-8250233 Main issue was missing EnableJVMCI flag check when calling JVMCICompiler::print_compilation_timers(). I addition to fixinf that I did next refactoring. The code which collects and print statistics per compiler was guarded by #if INCLUDE_JVMCI but not by any JVMCI flags. As result it is default code used by all JIT compilers since JVMCI was added in JDK 9. I decided to make it not JVMCI specific and used it on all platforms. I also added statistic per compilation tier which provides more useful information than combined date for C1. Removed in CompileBroker::print_times() code which calculate total values based on data in compiler's statistic. Such data is already collected in CompileBroker's static fields. Added checks for 0 values in print statements to avoid division by 0 (whioch produced NaN values for doubles). Don't print empty data in JVMCICompiler::print_compilation_timers() but print total compilation time in JVMCICompiler::print_timers(). Tested hs-tier1-3. Thanks, Vladimir Beginning of CITime new output: Individual compiler times (for compiled methods only) ------------------------------------------------ C1 {speed: 49626.710 bytes/s; standard: 0.037 s, 1842 bytes, 35 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 51096 bytes; nmethods_code_size: 30880 bytes} C2 {speed: 1451.769 bytes/s; standard: 0.001 s, 2 bytes, 2 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} Individual compilation Tier times (for compiled methods only) ------------------------------------------------ Tier1 {speed: 21162.963 bytes/s; standard: 0.002 s, 47 bytes, 10 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 3160 bytes; nmethods_code_size: 1504 bytes} Tier2 {speed: 0.000 bytes/s; standard: 0.000 s, 0 bytes, 0 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 0 bytes; nmethods_code_size: 0 bytes} Tier3 {speed: 51438.195 bytes/s; standard: 0.035 s, 1795 bytes, 25 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 47936 bytes; nmethods_code_size: 29376 bytes} Tier4 {speed: 1451.769 bytes/s; standard: 0.001 s, 2 bytes, 2 methods; osr: 0.000 s, 0 bytes, 0 methods; nmethods_size: 288 bytes; nmethods_code_size: 128 bytes} Accumulated compiler times ---------------------------------------------------------- Total compilation time : 0.038 s Standard compilation : 0.038 s, Average : 0.001 s Bailed out compilation : 0.000 s, Average : 0.000 s On stack replacement : 0.000 s, Average : 0.000 s Invalidated : 0.000 s, Average : 0.000 s From viv.desh at gmail.com Fri Jul 31 04:17:21 2020 From: viv.desh at gmail.com (Vivek Deshpande) Date: Thu, 30 Jul 2020 21:17:21 -0700 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: Message-ID: Hi Ludovic Your patch looks good to me. Good reuse of existing code for SHA. You have not added the stub generation for 32 bit. Did you also test with a 32 bit build? Thank you. Regards, Vivek On Thu, Jul 30, 2020 at 6:26 PM Ludovic Henry wrote: > JBS: I just got authorship status and I'll create a bug as soon as I have > access to JBS > Webrev: http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.00/ > > The problem ended up not being with how `ofs` was incremented, but with a > callee-saved register not being restored properly before returning from the > intrinsic. > > The performance results from running with JMH are very encouraging. I ran > the `org.openjdk.bench.java.security.MessageDigests` with MD5 only enabled, > and following are the results with and without the intrinsic. > > -XX:-UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 3459.747 ? 10.508 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 446.407 ? 3.383 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 30.685 ? 0.676 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.483 ? 0.004 ops/ms > > -XX:+UseMD5Intrinsics > Benchmark (digesterName) (length) (provider) Mode Cnt > Score Error Units > MessageDigests.digest md5 64 DEFAULT thrpt 10 > 4011.556 ? 10.212 ops/ms > MessageDigests.digest md5 1024 DEFAULT thrpt 10 > 526.873 ? 2.101 ops/ms > MessageDigests.digest md5 16384 DEFAULT thrpt 10 > 35.012 ? 0.088 ops/ms > MessageDigests.digest md5 1048576 DEFAULT thrpt 10 > 0.573 ? 0.002 ops/ms > > That's overall a jump from ~483MB/s to ~573MB/s on the 1M chunks, or a > ~19% speedup. > > Thank you, > Ludovic > -- Thanks and Regards, Vivek Deshpande viv.desh at gmail.com From jiefu at tencent.com Fri Jul 31 05:06:24 2020 From: jiefu at tencent.com (=?utf-8?B?amllZnUo5YKF5p2wKQ==?=) Date: Fri, 31 Jul 2020 05:06:24 +0000 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field Message-ID: Hi Vladimir K, Thanks for your review. The test had been extended here: - http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ Before the patch: The unsafe access (put/get) to static field will crash. The unsafe access (put/get) to instance field is fine. After the patch: All is ok. Thanks a lot. Best regards, Jie ?On 2020/7/31, 2:24 AM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: Hi Jie Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to fields) offset with arbitrary type. As result C2 type system can't find corresponding field. Did you tried to do unaligned unsafe access to instance fields? Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how it behave in unsafe case. Please, extend your test. Otherwise fix is good. Thanks, Vladimir K On 7/30/20 6:09 AM, jiefu(??) wrote: > Hi all, > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ > > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: > ConP ConL > \ | > \ | > AddP > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. > > Testing: > - tier1-3 on Linux/x64 > > Could you please review it and give me some advice? > > Thanks a lot. > Best regards, > Jie > From christian.hagedorn at oracle.com Fri Jul 31 05:56:03 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 31 Jul 2020 07:56:03 +0200 Subject: [16] RFR(S): C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> Message-ID: Hi Vladimir Thanks a lot for your review! Best regards, Christian On 30.07.20 19:17, Vladimir Kozlov wrote: > Very good. > > Thanks, > Vladimir K > > On 7/30/20 6:15 AM, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8249605 >> http://cr.openjdk.java.net/~chagedorn/8249605/webrev.00/ >> >> There is a dead memory loop detected during IGVN. In the testcase, >> many nodes are dying during IGVN because they are not reachable >> anymore. In this process, a (not yet dead) memory phi node (150 Phi) >> with two inputs is processed (see [1]): >> (1) 289 MergeMem, whose base memory is 150 Phi and has one slice for >> 274 StoreD which is again an output of 150 Phi >> (2) 356 MergeMem, whose base memory is top (i.e. dead and would be >> removed when IGVN processes this node) >> >> In PhiNode::Ideal, we check if a phi node is part of a dead loop where >> all its inputs reference itself over a MergeMemNode input whose base >> memory is the phi node itself again [2]. However, in this check we do >> not account for dead MergeMemNodes (like the input 356 MergeMem of 150 >> Phi). Therefore, we do not return top and apply the optimization [3] >> to replace 150 Phi by a new MergeMemNode (380 MergeMem) whose base >> memory is top and now has again one slice which is input and output to >> 274 StoreD [4]. This cycle is later detected and the assertion fails. >> >> The fix accounts additionally for dead MergeMemNodes when trying to >> detect dead loops in PhiNode::Ideal to return top instead of a new >> dead MergeMemNode. >> >> Best regards, >> Christian >> >> >> [1] >> https://bugs.openjdk.java.net/secure/attachment/89582/before_PhiNode_Ideal.png >> >> [2] >> http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2234 >> >> [3] >> http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2246 >> >> [4] >> https://bugs.openjdk.java.net/secure/attachment/89583/after_PhiNode_Ideal.png >> From tobias.hartmann at oracle.com Fri Jul 31 07:06:20 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 09:06:20 +0200 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java In-Reply-To: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> Message-ID: Hi Evgeny, looks good to me. Best regards, Tobias On 27.07.20 21:38, Evgeny Nikitin wrote: > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 > Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ > > Adjusting the test to current state of the VM. > > ??? - Definition of 'trivial code' does not depend on whether the method has been profiled or not; > ??? - Trivial code does only go level 0 to level 1; > ??? - Some refactoring. > > The change has been checked in mach5 for the 5 platforms (passed). > > Please review, > /Evgeny Nikitin. From tobias.hartmann at oracle.com Fri Jul 31 07:10:01 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 09:10:01 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> Message-ID: On 29.07.20 16:34, Volker Simonis wrote: > Because there's also a "read-only" access of the DirectiveSetPtr > which doesn't mutate its content and therefore should clone the > underlying DirectiveSet. See my first mail where I proposed to add a > second, `const`-version of "operator->". But that still required const > casts in the places where we didn't want to clone. I've therefore > voted for the new "cloned()" method which makes cloning and mutating > explicit and which is much easier to understand from my point of view > (compared to two overloaded operators). Right, I've missed the "set->LogOption" usage. Best regards, Tobias From tobias.hartmann at oracle.com Fri Jul 31 07:12:11 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 09:12:11 +0200 Subject: RFR[XS] 8249809 avoid calling DirectiveSet::clone(this) in compilecommand_compatibility_init In-Reply-To: <1596137602709.774@amazon.com> References: <1595807197546.52082@amazon.com> <1595907547514.55531@amazon.com> <1595969785292.62158@amazon.com> <1596056152748.75196@amazon.com> <1596137602709.774@amazon.com> Message-ID: On 30.07.20 21:33, Liu, Xin wrote: > https://cr.openjdk.java.net/~xliu/8249809/03/webrev/ Looks good to me. Thanks for making these changes! Best regards, Tobias From tobias.hartmann at oracle.com Fri Jul 31 07:25:26 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 09:25:26 +0200 Subject: [16] RFR(S): C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> References: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> Message-ID: Hi Christian, nice analysis, looks good to me! Small typo in cfgnode.cpp:2239 "reference" -> "references" Best regards, Tobias On 30.07.20 15:15, Christian Hagedorn wrote: > Hi > > Please review the following patch: > https://bugs.openjdk.java.net/browse/JDK-8249605 > http://cr.openjdk.java.net/~chagedorn/8249605/webrev.00/ > > There is a dead memory loop detected during IGVN. In the testcase, many nodes are dying during IGVN > because they are not reachable anymore. In this process, a (not yet dead) memory phi node (150 Phi) > with two inputs is processed (see [1]): > (1) 289 MergeMem, whose base memory is 150 Phi and has one slice for 274 StoreD which is again an > output of 150 Phi > (2) 356 MergeMem, whose base memory is top (i.e. dead and would be removed when IGVN processes this > node) > > In PhiNode::Ideal, we check if a phi node is part of a dead loop where all its inputs reference > itself over a MergeMemNode input whose base memory is the phi node itself again [2]. However, in > this check we do not account for dead MergeMemNodes (like the input 356 MergeMem of 150 Phi). > Therefore, we do not return top and apply the optimization [3] to replace 150 Phi by a new > MergeMemNode (380 MergeMem) whose base memory is top and now has again one slice which is input and > output to 274 StoreD [4]. This cycle is later detected and the assertion fails. > > The fix accounts additionally for dead MergeMemNodes when trying to detect dead loops in > PhiNode::Ideal to return top instead of a new dead MergeMemNode. > > Best regards, > Christian > > > [1] https://bugs.openjdk.java.net/secure/attachment/89582/before_PhiNode_Ideal.png > [2] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2234 > [3] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2246 > [4] https://bugs.openjdk.java.net/secure/attachment/89583/after_PhiNode_Ideal.png From tobias.hartmann at oracle.com Fri Jul 31 07:49:46 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 09:49:46 +0200 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field In-Reply-To: References: Message-ID: Hi Jie, On 31.07.20 07:06, jiefu(??) wrote: > http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ Looks good to me. Some comments regarding TestUnsafeAccess.java: - Maybe rename the test to something more meaningful, for example "TestMisalignedUnsafeAccess" and add a small comment in the @summary tag - Xcomp already implies Xbatch [1] - I don't think you need 'initUnsafe' in the test, you can just use Unsafe.getUnsafe [2] Best regards, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/file/83aeb4b1079b/src/hotspot/share/runtime/arguments.cpp#l1612 [2] http://hg.openjdk.java.net/jdk/jdk/file/83aeb4b1079b/test/hotspot/jtreg/compiler/unsafe/UnsafeGetStableArrayElement.java#l67 From christian.hagedorn at oracle.com Fri Jul 31 08:38:36 2020 From: christian.hagedorn at oracle.com (Christian Hagedorn) Date: Fri, 31 Jul 2020 10:38:36 +0200 Subject: [16] RFR(S): C2: assert(no_dead_loop) failed: dead loop detected In-Reply-To: References: <66e123b1-35d1-5b96-d0d7-6b4a8cdf2404@oracle.com> Message-ID: Hi Tobias Thanks a lot for your review! I updated the typo directly in webrev.00. Best regards, Christian On 31.07.20 09:25, Tobias Hartmann wrote: > Hi Christian, > > nice analysis, looks good to me! > > Small typo in cfgnode.cpp:2239 "reference" -> "references" > > Best regards, > Tobias > > On 30.07.20 15:15, Christian Hagedorn wrote: >> Hi >> >> Please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8249605 >> http://cr.openjdk.java.net/~chagedorn/8249605/webrev.00/ >> >> There is a dead memory loop detected during IGVN. In the testcase, many nodes are dying during IGVN >> because they are not reachable anymore. In this process, a (not yet dead) memory phi node (150 Phi) >> with two inputs is processed (see [1]): >> (1) 289 MergeMem, whose base memory is 150 Phi and has one slice for 274 StoreD which is again an >> output of 150 Phi >> (2) 356 MergeMem, whose base memory is top (i.e. dead and would be removed when IGVN processes this >> node) >> >> In PhiNode::Ideal, we check if a phi node is part of a dead loop where all its inputs reference >> itself over a MergeMemNode input whose base memory is the phi node itself again [2]. However, in >> this check we do not account for dead MergeMemNodes (like the input 356 MergeMem of 150 Phi). >> Therefore, we do not return top and apply the optimization [3] to replace 150 Phi by a new >> MergeMemNode (380 MergeMem) whose base memory is top and now has again one slice which is input and >> output to 274 StoreD [4]. This cycle is later detected and the assertion fails. >> >> The fix accounts additionally for dead MergeMemNodes when trying to detect dead loops in >> PhiNode::Ideal to return top instead of a new dead MergeMemNode. >> >> Best regards, >> Christian >> >> >> [1] https://bugs.openjdk.java.net/secure/attachment/89582/before_PhiNode_Ideal.png >> [2] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2234 >> [3] http://hg.openjdk.java.net/jdk/jdk/file/8f7ede592c28/src/hotspot/share/opto/cfgnode.cpp#l2246 >> [4] https://bugs.openjdk.java.net/secure/attachment/89583/after_PhiNode_Ideal.png From sergei.tsypanov at yandex.ru Fri Jul 31 09:11:39 2020 From: sergei.tsypanov at yandex.ru (=?utf-8?B?0KHQtdGA0LPQtdC5INCm0YvQv9Cw0L3QvtCy?=) Date: Fri, 31 Jul 2020 11:11:39 +0200 Subject: Performance degradation due to probable (?) C2 issue References: <925401595926726@mail.yandex.ru> Message-ID: <477011596183779@mail.yandex.ru> Hi, could I ask one more question? You wrote previosuly: > Here, it's detected that CharArrayWriter::toString is large > and has already been compiled so there's no sense inlining another copy of it. So as far as CharArrayWriter::toString is compiled, but not inlined into call site we have to do a real method call and it this call causes degradation, right? Regards, Sergey Tsypanov 28.07.2020, 14:12, "Andrew Haley" : > Hi, > > On 28/07/2020 11:35, ?????? ??????? wrote: > >> ?So my question is whether there's something wrong with compier of >> ?the original idea of improvement was wrong? > > No, and (probably) no. > > C2 uses a bunch of of heuristics. Here, it's detected that > CharArrayWriter::toString is large and has already been compiled so > there's no sense inlining another copy of it. This isn't necessarily > true, but it's a good guess. Try playing with InlineSmallCode: start > with =1000, and increases it from there to see if it helps. > > -- > Andrew Haley (he/him) > Java Platform Lead Engineer > Red Hat UK Ltd. > https://keybase.io/andrewhaley > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From jiefu at tencent.com Fri Jul 31 09:40:47 2020 From: jiefu at tencent.com (=?gb2312?B?amllZnUouLW93Ck=?=) Date: Fri, 31 Jul 2020 09:40:47 +0000 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) In-Reply-To: References: , Message-ID: <402d0d22a984470483fef9761d01ad64@tencent.com> Hi Tobias, Thanks for your review and comments. Updated: http://cr.openjdk.java.net/~jiefu/8250825/webrev.02/ - Rename the test to TestMisalignedUnsafeAccess.java - Add @summary tag - Remove Xbatch - Remvoe initUnsafe It seems better now. Thanks. Best regards, Jie ________________________________ From: Tobias Hartmann Sent: Friday, July 31, 2020 3:49 PM To: jiefu(??); Vladimir Kozlov; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) Hi Jie, On 31.07.20 07:06, jiefu(??) wrote: > http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ Looks good to me. Some comments regarding TestUnsafeAccess.java: - Maybe rename the test to something more meaningful, for example "TestMisalignedUnsafeAccess" and add a small comment in the @summary tag - Xcomp already implies Xbatch [1] - I don't think you need 'initUnsafe' in the test, you can just use Unsafe.getUnsafe [2] Best regards, Tobias [1] http://hg.openjdk.java.net/jdk/jdk/file/83aeb4b1079b/src/hotspot/share/runtime/arguments.cpp#l1612 [2] http://hg.openjdk.java.net/jdk/jdk/file/83aeb4b1079b/test/hotspot/jtreg/compiler/unsafe/UnsafeGetStableArrayElement.java#l67 From tobias.hartmann at oracle.com Fri Jul 31 10:00:03 2020 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Fri, 31 Jul 2020 12:00:03 +0200 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) In-Reply-To: <402d0d22a984470483fef9761d01ad64@tencent.com> References: <402d0d22a984470483fef9761d01ad64@tencent.com> Message-ID: <6c860e7b-832f-3583-b282-7ea44e76e00b@oracle.com> Hi Jie, On 31.07.20 11:40, jiefu(??) wrote: > Updated: http://cr.openjdk.java.net/~jiefu/8250825/webrev.02/ Looks good. Best regards, Tobias From forax at univ-mlv.fr Fri Jul 31 12:38:11 2020 From: forax at univ-mlv.fr (Remi Forax) Date: Fri, 31 Jul 2020 14:38:11 +0200 (CEST) Subject: Performance degradation due to probable (?) C2 issue In-Reply-To: <477011596183779@mail.yandex.ru> References: <925401595926726@mail.yandex.ru> <477011596183779@mail.yandex.ru> Message-ID: <1821013837.321064.1596199091093.JavaMail.zimbra@u-pem.fr> ----- Mail original ----- > De: "?????? ???????" > ?: "Andrew Haley" , "hotspot compiler" > Envoy?: Vendredi 31 Juillet 2020 11:11:39 > Objet: Re: Performance degradation due to probable (?) C2 issue > Hi, > > could I ask one more question? > > You wrote previosuly: > >> Here, it's detected that CharArrayWriter::toString is large >> and has already been compiled so there's no sense inlining another copy of it. > > So as far as CharArrayWriter::toString is compiled, but not inlined into call > site > we have to do a real method call and it this call causes degradation, right? yes, if you never share code you ends up with several giga bytes of assembly codes which destroy your perf because you start to have a lot of cache miss on the instructions. so there is a trade off between a theoritical fully inlined program and a never inlined program. > > Regards, > Sergey Tsypanov regards, R?mi > > > > 28.07.2020, 14:12, "Andrew Haley" : >> Hi, >> >> On 28/07/2020 11:35, ?????? ??????? wrote: >> >>> ?So my question is whether there's something wrong with compier of >>> ?the original idea of improvement was wrong? >> >> No, and (probably) no. >> >> C2 uses a bunch of of heuristics. Here, it's detected that >> CharArrayWriter::toString is large and has already been compiled so >> there's no sense inlining another copy of it. This isn't necessarily >> true, but it's a good guess. Try playing with InlineSmallCode: start >> with =1000, and increases it from there to see if it helps. >> >> -- >> Andrew Haley (he/him) >> Java Platform Lead Engineer >> Red Hat UK Ltd. >> https://keybase.io/andrewhaley > > EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From vladimir.kozlov at oracle.com Fri Jul 31 16:45:23 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2020 09:45:23 -0700 Subject: RFR: 8250825: C2 crashes with assert(field != __null) failed: missing field In-Reply-To: References: Message-ID: <83167893-0924-860c-b2eb-fce9348d16eb@oracle.com> Good. thanks, Vladimir K On 7/30/20 10:06 PM, jiefu(??) wrote: > Hi Vladimir K, > > Thanks for your review. > > The test had been extended here: > - http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ > > Before the patch: > The unsafe access (put/get) to static field will crash. > The unsafe access (put/get) to instance field is fine. > > After the patch: > All is ok. > > Thanks a lot. > Best regards, > Jie > > ?On 2020/7/31, 2:24 AM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > Hi Jie > > Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to > fields) offset with arbitrary type. As result C2 type system can't find corresponding field. > > Did you tried to do unaligned unsafe access to instance fields? > Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how > it behave in unsafe case. > > Please, extend your test. > > Otherwise fix is good. > > Thanks, > Vladimir K > > On 7/30/20 6:09 AM, jiefu(??) wrote: > > Hi all, > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 > > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ > > > > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: > > ConP ConL > > \ | > > \ | > > AddP > > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. > > > > Testing: > > - tier1-3 on Linux/x64 > > > > Could you please review it and give me some advice? > > > > Thanks a lot. > > Best regards, > > Jie > > > > > From igor.ignatyev at oracle.com Fri Jul 31 17:11:39 2020 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 31 Jul 2020 10:11:39 -0700 Subject: RFR(M): 8067651: Fix Trivial code path for LevelTransitionTest.java In-Reply-To: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> References: <58fd3cd5-cdce-8e15-3237-d22a3566b0da@oracle.com> Message-ID: <970076A7-1F18-4E88-994F-802590AF4F9B@oracle.com> Hi Evgeny, in general looks good to me, a couple comments/questions though: - I don't see necessity of move Helper.* methods into the enclosing class, nor do I see it as improving readability of the test. why did you decide to move them? - if the test is inapplicable for Xcomp run, you should either throw SkippedException instead of System.err::println at L#67 or use '@requires vm.compMode != "Xcomp"' in jtreg test description. currently, the former provides arguable more clear message that the test wasn't run (as it sets special sub-status which is understood by our test execution system) than the latter (which will just omit test from test results altogether), however @requires is "faster" as jtreg don't need to run any of the test code. in any case, both makes it clean that the test wasn't really performed, while your code will lead to a passed-passed test w/o no automated way to know that the test wasn't run. - from you explanation of the fix it's also unclear why BackgroundCompilation got disabled, could you please explain? Thanks, -- Igor > On Jul 27, 2020, at 12:38 PM, Evgeny Nikitin wrote: > > Hi, > > Bug: https://bugs.openjdk.java.net/browse/JDK-8067651 > Webrev: https://cr.openjdk.java.net/~enikitin/8067651/webrev.00/ > > Adjusting the test to current state of the VM. > > - Definition of 'trivial code' does not depend on whether the method has been profiled or not; > - Trivial code does only go level 0 to level 1; > - Some refactoring. > > The change has been checked in mach5 for the 5 platforms (passed). > > Please review, > /Evgeny Nikitin. From coleen.phillimore at oracle.com Fri Jul 31 19:38:09 2020 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 31 Jul 2020 15:38:09 -0400 Subject: RFR (XXL): 8223347: Integration of Vector API (Incubator): General HotSpot changes In-Reply-To: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> References: <38a7fe74-0c5e-4a28-b128-24c40b8ea01e@oracle.com> Message-ID: <9c538834-903b-5431-bb43-908b58a1b70a@oracle.com> The runtime code still looks good to me. Coleen On 7/28/20 6:29 PM, Vladimir Ivanov wrote: > Hi, > > Thanks for the feedback on webrev.00, Remi, Coleen, Vladimir K., and > Ekaterina! > > Here are the latest changes for Vector API support in HotSpot shared > code: > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01 > > > Incremental changes (diff against webrev.00): > > http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.01_00 > > > I decided to post it here and not initiate a new round of reviews > because the changes are mostly limited to minor cleanups / simple bug > fixes. > > Detailed summary: > ? - rebased to jdk/jdk tip; > ? - got rid of NotV, VLShiftV, VRShiftV, VURShiftV nodes; > ? - restore lazy cleanup logic during incremental inlining (see > needs_cleanup in compile.cpp); > ? - got rid of x86-specific changes in shared code; > ? - fix for 8244867 [1]; > ? - fix Graal test failure: enumerate VectorSupport intrinsics in > CheckGraalIntrinsics > ? - numerous minor cleanups > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/panama/dev/rev/dcfc7b6e8977 > ??? http://jbs.oracle.com/browse/JDK-8244867 > ??? 8244867: 2 vector api tests crash with > assert(is_reference_type(basic_type())) failed: wrong type > Summary: Adding safety checks to prevent intrinsification if class > arguments of non-primitive types are uninitialized. > > On 04.04.2020 02:12, Vladimir Ivanov wrote: >> Hi, >> >> Following up on review requests of API [0] and Java implementation >> [1] for Vector API (JEP 338 [2]), here's a request for review of >> general HotSpot changes (in shared code) required for supporting the >> API: >> >> >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/all.00-03/ >> >> >> (First of all, to set proper expectations: since the JEP is still in >> Candidate state, the intention is to initiate preliminary round(s) of >> review to inform the community and gather feedback before sending out >> final/official RFRs once the JEP is Targeted to a release.) >> >> Vector API (being developed in Project Panama [3]) relies on JVM >> support to utilize optimal vector hardware instructions at runtime. >> It interacts with JVM through intrinsics (declared in >> jdk.internal.vm.vector.VectorSupport [4]) which expose vector >> operations support in C2 JIT-compiler. >> >> As Paul wrote earlier: "A vector intrinsic is an internal low-level >> vector operation. The last argument to the intrinsic is fall back >> behavior in Java, implementing the scalar operation over the number >> of elements held by the vector.? Thus, If the intrinsic is not >> supported in C2 for the other arguments then the Java implementation >> is executed (the Java implementation is always executed when running >> in the interpreter or for C1)." >> >> The rest of JVM support is about aggressively optimizing vector boxes >> to minimize (ideally eliminate) the overhead of boxing for vector >> values. >> It's a stop-the-gap solution for vector box elimination problem until >> inline classes arrive. Vector classes are value-based and in the >> longer term will be migrated to inline classes once the support >> becomes available. >> >> Vector API talk from JVMLS'18 [5] contains brief overview of JVM >> implementation and some details. >> >> Complete implementation resides in vector-unstable branch of >> panama/dev repository [6]. >> >> Now to gory details (the patch is split in multiple "sub-webrevs"): >> >> =========================================================== >> >> (1) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/00.backend.shared/ >> >> >> Ideal vector nodes for new operations introduced by Vector API. >> >> (Platform-specific back end support will be posted for review >> separately). >> >> =========================================================== >> >> (2) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/ >> >> >> JVM Java interface (VectorSupport) and intrinsic support in C2. >> >> Vector instances are initially represented as VectorBox macro nodes >> and "unboxing" is represented by VectorUnbox node. It simplifies >> vector box elimination analysis and the nodes are expanded later >> right before EA pass. >> >> Vectors have 2-level on-heap representation: for the vector value >> primitive array is used as a backing storage and it is encapsulated >> in a typed wrapper (e.g., Int256Vector - vector of 8 ints - contains >> a int[8] instance which is used to store vector value). >> >> Unless VectorBox node goes away, it needs to be expanded into an >> allocation eventually, but it is a pure node and doesn't have any JVM >> state associated with it. The problem is solved by keeping JVM state >> separately in a VectorBoxAllocate node associated with VectorBox node >> and use it during expansion. >> >> Also, to simplify vector box elimination, inlining of vector reboxing >> calls (VectorSupport::maybeRebox) is delayed until the analysis is over. >> >> =========================================================== >> >> (3) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/02.vbox_elimination/ >> >> >> Vector box elimination analysis implementation. (Brief overview: >> slides #36-42 [5].) >> >> The main part is devoted to scalarization across safepoints and >> rematerialization support during deoptimization. In C2-generated code >> vector operations work with raw vector values which live in registers >> or spilled on the stack and it allows to avoid boxing/unboxing when a >> vector value is alive across a safepoint. As with other values, >> there's just a location of the vector value at the safepoint and >> vector type information recorded in the relevant nmethod metadata and >> all the heavy-lifting happens only when rematerialization takes place. >> >> The analysis preserves object identity invariants except during >> aggressive reboxing (guarded by -XX:+EnableAggressiveReboxing). >> >> (Aggressive reboxing is crucial for cases when vectors "escape": it >> allocates a fresh instance at every escape point thus enabling >> original instance to go away.) >> >> =========================================================== >> >> (4) >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/03.module.hotspot/ >> >> >> HotSpot changes for jdk.incubator.vector module. Vector support is >> makred experimental and turned off by default. JEP 338 proposes the >> API to be released as an incubator module, so a user has to specify >> "--add-module jdk.incubator.vector" on the command line to be able to >> use it. >> When user does that, JVM automatically enables Vector API support. >> It improves usability (user doesn't need to separately "open" the API >> and enable JVM support) while minimizing risks of destabilitzation >> from new code when the API is not used. >> >> >> That's it! Will be happy to answer any questions. >> >> And thanks in advance for any feedback! >> >> Best regards, >> Vladimir Ivanov >> >> [0] >> https://mail.openjdk.java.net/pipermail/core-libs-dev/2020-March/065345.html >> >> >> [1] >> https://mail.openjdk.java.net/pipermail/hotspot-dev/2020-April/041228.html >> >> [2] https://openjdk.java.net/jeps/338 >> >> [3] https://openjdk.java.net/projects/panama/ >> >> [4] >> http://cr.openjdk.java.net/~vlivanov/panama/vector/jep338/hotspot.shared/webrev.00/01.intrinsics/src/java.base/share/classes/jdk/internal/vm/vector/VectorSupport.java.html >> >> >> [5] http://cr.openjdk.java.net/~vlivanov/talks/2018_JVMLS_VectorAPI.pdf >> >> [6] http://hg.openjdk.java.net/panama/dev/shortlog/92bbd44386e9 >> >> ???? $ hg clone http://hg.openjdk.java.net/panama/dev/ -b >> vector-unstable From luhenry at microsoft.com Fri Jul 31 21:27:25 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 31 Jul 2020 21:27:25 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: , Message-ID: Hi Vivek, Thank you for your review. > You have not added the stub generation for 32 bit. > Did you also test with a 32 bit build? I've added and tested it. Webrev: http://cr.openjdk.java.net/~luhenry/md5-intrinsics/webrev.01 -- Ludovic ________________________________________ From: Vivek Deshpande Sent: Thursday, July 30, 2020 9:17:21 PM To: Ludovic Henry Cc: Dean Long ; Vladimir Ivanov ; mailto:hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64 ? Hi?Ludovic Your patch looks good?to me. Good reuse of existing code for SHA. You have not added the stub generation for 32 bit. Did you also test with a 32 bit build? Thank you. Regards, Vivek On Thu, Jul 30, 2020 at 6:26 PM Ludovic Henry wrote: JBS: I just got authorship status and I'll create a bug as soon as I have access to JBS Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7C3326ebd9a7874a11b12508d83508a682%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637317658573667275&sdata=Lin4kFKrxpkZWkicMPjDaJf9JhhRECzwsS%2B7FEtWyks%3D&reserved=0 The problem ended up not being with how `ofs` was incremented, but with a callee-saved register not being restored properly before returning from the intrinsic. The performance results from running with JMH are very encouraging. I ran the `org.openjdk.bench.java.security.MessageDigests` with MD5 only enabled, and following are the results with and without the intrinsic. -XX:-UseMD5Intrinsics Benchmark? ? ? ? ? ? ? (digesterName)? (length)? (provider)? ?Mode? Cnt? ? ?Score? ? Error? ?Units MessageDigests.digest? ? ? ? ? ? ?md5? ? ? ? 64? ? ?DEFAULT? thrpt? ?10? 3459.747 ? 10.508? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ? 1024? ? ?DEFAULT? thrpt? ?10? ?446.407 ?? 3.383? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ?16384? ? ?DEFAULT? thrpt? ?10? ? 30.685 ?? 0.676? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ?1048576? ? ?DEFAULT? thrpt? ?10? ? ?0.483 ?? 0.004? ops/ms -XX:+UseMD5Intrinsics Benchmark? ? ? ? ? ? ? (digesterName)? (length)? (provider)? ?Mode? Cnt? ? ?Score? ? Error? ?Units MessageDigests.digest? ? ? ? ? ? ?md5? ? ? ? 64? ? ?DEFAULT? thrpt? ?10? 4011.556 ? 10.212? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ? 1024? ? ?DEFAULT? thrpt? ?10? ?526.873 ?? 2.101? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ?16384? ? ?DEFAULT? thrpt? ?10? ? 35.012 ?? 0.088? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ?1048576? ? ?DEFAULT? thrpt? ?10? ? ?0.573 ?? 0.002? ops/ms That's overall a jump from ~483MB/s to ~573MB/s on the 1M chunks, or a ~19% speedup. Thank you, Ludovic -- Thanks and Regards, Vivek Deshpande mailto:viv.desh at gmail.com From luhenry at microsoft.com Fri Jul 31 22:05:40 2020 From: luhenry at microsoft.com (Ludovic Henry) Date: Fri, 31 Jul 2020 22:05:40 +0000 Subject: RFR[M]: Adding MD5 Intrinsic on x86-64 In-Reply-To: References: , Message-ID: I've just created bug on JBS JBS: https://bugs.openjdk.java.net/browse/JDK-8250902 Webrev: http://cr.openjdk.java.net/~luhenry/8250902/webrev.01/ -----Original Message----- From: hotspot-compiler-dev On Behalf Of Ludovic Henry Sent: Friday, July 31, 2020 2:27 PM To: Vivek Deshpande Cc: Dean Long ; Vladimir Ivanov ; hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64 Hi Vivek, Thank you for your review. > You have not added the stub generation for 32 bit. > Did you also test with a 32 bit build? I've added and tested it. Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2Fmd5-intrinsics%2Fwebrev.01&data=02%7C01%7Cluhenry%40microsoft.com%7Cfc95eb95578b439136bf08d83598a069%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318276934974077&sdata=jdDkoNe3CtEKA8I05p14L580QeUHjCUL6dXGETTNZII%3D&reserved=0 -- Ludovic ________________________________________ From: Vivek Deshpande Sent: Thursday, July 30, 2020 9:17:21 PM To: Ludovic Henry Cc: Dean Long ; Vladimir Ivanov ; mailto:hotspot-compiler-dev at openjdk.java.net Subject: Re: RFR[M]: Adding MD5 Intrinsic on x86-64 ? Hi?Ludovic Your patch looks good?to me. Good reuse of existing code for SHA. You have not added the stub generation for 32 bit. Did you also test with a 32 bit build? Thank you. Regards, Vivek On Thu, Jul 30, 2020 at 6:26 PM Ludovic Henry wrote: JBS: I just got authorship status and I'll create a bug as soon as I have access to JBS Webrev: https://nam06.safelinks.protection.outlook.com/?url=http:%2F%2Fcr.openjdk.java.net%2F~luhenry%2Fmd5-intrinsics%2Fwebrev.00%2F&data=02%7C01%7Cluhenry%40microsoft.com%7Cfc95eb95578b439136bf08d83598a069%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637318276934974077&sdata=nPXHc8Pt048hRUjMCTYT09DVCQEo7Yz%2BD7ZzqO%2BZqWU%3D&reserved=0 The problem ended up not being with how `ofs` was incremented, but with a callee-saved register not being restored properly before returning from the intrinsic. The performance results from running with JMH are very encouraging. I ran the `org.openjdk.bench.java.security.MessageDigests` with MD5 only enabled, and following are the results with and without the intrinsic. -XX:-UseMD5Intrinsics Benchmark? ? ? ? ? ? ? (digesterName)? (length)? (provider)? ?Mode? Cnt? ? ?Score? ? Error? ?Units MessageDigests.digest? ? ? ? ? ? ?md5? ? ? ? 64? ? ?DEFAULT? thrpt? ?10? 3459.747 ? 10.508? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ? 1024? ? ?DEFAULT? thrpt? ?10? ?446.407 ?? 3.383? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ?16384? ? ?DEFAULT? thrpt? ?10? ? 30.685 ?? 0.676? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ?1048576? ? ?DEFAULT? thrpt? ?10? ? ?0.483 ?? 0.004? ops/ms -XX:+UseMD5Intrinsics Benchmark? ? ? ? ? ? ? (digesterName)? (length)? (provider)? ?Mode? Cnt? ? ?Score? ? Error? ?Units MessageDigests.digest? ? ? ? ? ? ?md5? ? ? ? 64? ? ?DEFAULT? thrpt? ?10? 4011.556 ? 10.212? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ? 1024? ? ?DEFAULT? thrpt? ?10? ?526.873 ?? 2.101? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ? ?16384? ? ?DEFAULT? thrpt? ?10? ? 35.012 ?? 0.088? ops/ms MessageDigests.digest? ? ? ? ? ? ?md5? ?1048576? ? ?DEFAULT? thrpt? ?10? ? ?0.573 ?? 0.002? ops/ms That's overall a jump from ~483MB/s to ~573MB/s on the 1M chunks, or a ~19% speedup. Thank you, Ludovic -- Thanks and Regards, Vivek Deshpande mailto:viv.desh at gmail.com From vladimir.x.ivanov at oracle.com Fri Jul 31 23:19:15 2020 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Sat, 1 Aug 2020 02:19:15 +0300 Subject: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 In-Reply-To: References: <92d97d1b-fc53-e368-b249-1cab7db33964@oracle.com> <5f6a3e52-7854-4613-43f1-32a7423a0db6@oracle.com> Message-ID: <8265e303-0f86-b308-be79-740d6b4710f2@oracle.com> > http://cr.openjdk.java.net/~jbhateja/8248830/webrev.05/ Looks good. Tier5 (where I saw the crashes) passed. Please, incorporate the following minor cleanups in the final version: http://cr.openjdk.java.net/~vlivanov/jbhateja/8248830/webrev.05.cleanup/ (Tested with hs-tier1,hs-tier2.) Best regards, Vladimir Ivanov >> -----Original Message----- >> From: Vladimir Ivanov >> Sent: Thursday, July 30, 2020 3:30 AM >> To: Bhateja, Jatin >> Cc: Viswanathan, Sandhya ; hotspot-compiler- >> dev at openjdk.java.net >> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for X86 >> >> >>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.04/ >>> >>> Looks good. (Testing is in progress.) >> >> FYI test results are clean (tier1-tier5). >> >>>> I have removed RotateLeftNode/RotateRightNode::Ideal routines since >>>> we are anyways doing constant folding in LShiftI/URShiftI value >>>> routines. Since JAVA rotate APIs are no longer intrincified hence >>>> these routines may no longer be useful. >>> >>> Nice observation! Good. >> >> As a second thought, it seems there's still a chance left that Rotate nodes >> get their input type narrowed after the folding happened. For example, as a >> result of incremental inlining or CFG transformations during loop >> optimizations. And it does happen in practice since the testing revealed >> some crashes due to the bug in RotateLeftNode/RotateRightNode::Ideal(). >> >> So, it makes sense to keep the transformations. But I'm fine with >> addressing that as a followup enhancement. >> >> Best regards, >> Vladimir Ivanov >> >>> >>>>> It would be really nice to migrate to MacroAssembler along the way >>>>> (as a cleanup). >>>> >>>> I guess you are saying remove opcodes/encoding from patterns and move >>>> then to Assembler, Can we take this cleanup activity separately since >>>> other patterns are also using these matcher directives. >>> >>> I'm perfectly fine with handling it as a separate enhancement. >>> >>>> Other synthetic comments have been taken care of. I have extended the >>>> Test to cover all the newly added scalar transforms. Kindly let me >>>> know if there other comments. >>> >>> Nice! >>> >>> Best regards, >>> Vladimir Ivanov >>> >>>>> -----Original Message----- >>>>> From: Vladimir Ivanov >>>>> Sent: Friday, July 24, 2020 3:21 AM >>>>> To: Bhateja, Jatin >>>>> Cc: Viswanathan, Sandhya ; Andrew >>>>> Haley ; hotspot-compiler-dev at openjdk.java.net >>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification for >>>>> X86 >>>>> >>>>> Hi Jatin, >>>>> >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev.03/ >>>>> >>>>> Much better! Thanks. >>>>> >>>>>> Change Summary: >>>>>> >>>>>> 1) Unified the handling for scalar rotate operation. All scalar >>>>>> rotate >>>>> selection patterns are now dependent on newly created >>>>> RotateLeft/RotateRight nodes. This promotes rotate inferencing. >>>>> Currently >>>>> if DAG nodes corresponding to a sub-pattern are shared (have >>>>> multiple >>>>> users) then existing complex patterns based on Or/LShiftL/URShift >>>>> does not get matched and this prevents inferring rotate nodes. >>>>> Please refer to JIT'ed assembly output with baseline[1] and with >>>>> patch[2] . We can see that generated code size also went done from >>>>> 832 byte to 768 bytes. Also this can cause perf degradation if >>>>> shift-or dependency chain appears inside a hot region. >>>>>> >>>>>> 2) Due to enhanced rotate inferencing new patch shows better >>>>>> performance >>>>> even for legacy targets (non AVX-512). Please refer to the perf >>>>> result[3] over AVX2 machine for JMH benchmark part of the patch. >>>>> >>>>> Very nice! >>>>>> 3) As suggested, removed Java API intrinsification changes and >>>>>> scalar >>>>> rotate transformation are done during OrI/OrL node idealizations. >>>>> >>>>> Good. >>>>> >>>>> (Still would be nice to factor the matching code from Ideal() and >>>>> share it between multiple use sites. Especially considering >>>>> OrVNode::Ideal() now does basically the same thing. As an >>>>> example/idea, take a look at >>>>> is_bmi_pattern() in x86.ad.) >>>>> >>>>>> 4) SLP always gets to work on new scalar Rotate nodes and creates >>>>>> vector >>>>> rotate nodes which are degenerated into OrV/LShiftV/URShiftV nodes >>>>> if target does not supports vector rotates(non-AVX512). >>>>> >>>>> Good. >>>>> >>>>>> 5) Added new instruction patterns for vector shift Left/Right >>>>>> operations >>>>> with constant shift operands. This prevents emitting extra moves to >> XMM. >>>>> >>>>> +instruct vshiftI_imm(vec dst, vec src, immI8 shift) %{ >>>>> +? match(Set dst (LShiftVI src shift)); >>>>> >>>>> I'd prefer to see a uniform Ideal IR shape being used irrespective >>>>> of whether the argument is a constant or not. It should also >>>>> simplify the logic in SuperWord and make it easier to support on >>>>> non-x86 architectures. >>>>> >>>>> For example, here's how it is done on AArch64: >>>>> >>>>> instruct vsll4I_imm(vecX dst, vecX src, immI shift) %{ >>>>> ??? predicate(n->as_Vector()->length() == 4); >>>>> ??? match(Set dst (LShiftVI src (LShiftCntV shift))); ... >>>>> >>>>>> 6) Constant folding scenarios are covered in RotateLeft/RotateRight >>>>> idealization, inferencing of vector rotate through OrV idealization >>>>> covers the vector patterns generated though non SLP route i.e. >>>>> VectorAPI. >>>>> >>>>> I'm fine with keeping OrV::Ideal(), but I'm concerned with the >>>>> general direction here - duplication of scalar transformations to >>>>> lane-wise vector operations. It definitely won't scale and in a >>>>> longer run it risks to diverge. Would be nice to find a way to >>>>> automatically "lift" >>>>> scalar transformations to vectors and apply them uniformly. But >>>>> right now it is just an idea which requires more experimentation. >>>>> >>>>> >>>>> Some other minor comments/suggestions: >>>>> >>>>> +? // Swap the computed left and right shift counts. >>>>> +? if (is_rotate_left) { >>>>> +??? Node* temp = shiftRCnt; >>>>> +??? shiftRCnt? = shiftLCnt; >>>>> +??? shiftLCnt? = temp; >>>>> +? } >>>>> >>>>> Maybe use swap() here (declared in globalDefinitions.hpp)? >>>>> >>>>> >>>>> +? if (Matcher::match_rule_supported_vector(vopc, vlen, bt)) >>>>> +??? return true; >>>>> >>>>> Please, don't omit curly braces (even for simple cases). >>>>> >>>>> >>>>> -// Rotate Right by variable >>>>> -instruct rorI_rReg_Var_C0(no_rcx_RegI dst, rcx_RegI shift, immI0 >>>>> zero, rFlagsReg cr) >>>>> +instruct rorI_immI8_legacy(rRegI dst, immI8 shift, rFlagsReg cr) >>>>> ?? %{ >>>>> -? match(Set dst (OrI (URShiftI dst shift) (LShiftI dst (SubI zero >>>>> shift)))); >>>>> - >>>>> +? predicate(!VM_Version::supports_bmi2() && >>>>> n->bottom_type()->basic_type() == T_INT); >>>>> +? match(Set dst (RotateRight dst shift)); >>>>> +? format %{ "rorl???? $dst, $shift" %} >>>>> ???? expand %{ >>>>> -??? rorI_rReg_CL(dst, shift, cr); >>>>> +??? rorI_rReg_imm8(dst, shift, cr); >>>>> ???? %} >>>>> >>>>> It would be really nice to migrate to MacroAssembler along the way >>>>> (as a cleanup). >>>>> >>>>>> Please push the patch through your testing framework and let me >>>>>> know your >>>>> review feedback. >>>>> >>>>> There's one new assertion failure: >>>>> >>>>> #? Internal Error (.../src/hotspot/share/opto/phaseX.cpp:1238), >>>>> pid=5476, tid=6219 >>>>> #? assert((i->_idx >= k->_idx) || i->is_top()) failed: Idealize >>>>> should return new nodes, use Identity to return old nodes >>>>> >>>>> I believe it comes from RotateLeftNode::Ideal/RotateRightNode::Ideal >>>>> which can return pre-contructed constants. I suggest to get rid of >>>>> Ideal() methods and move constant folding logic into Node::Value() >>>>> (as implemented for other bitwise/arithmethic nodes in >>>>> addnode.cpp/subnode.cpp/mulnode.cpp et al). It's a more generic >>>>> approach since it enables richer type information (ranges vs >>>>> constants) and IMO it's more convenient to work with constants >>>>> through Types than ConNodes. >>>>> >>>>> (I suspect that original/expanded IR shape may already provide more >>>>> precise type info for non-constant case which can affect the >>>>> benchmarks.) >>>>> >>>>> Best regards, >>>>> Vladimir Ivanov >>>>> >>>>>> >>>>>> Best Regards, >>>>>> Jatin >>>>>> >>>>>> [1] >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_baseline_avx2_asm. >>>>>> txt [2] >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_new_patch_avx2_ >>>>>> asm >>>>>> .txt [3] >>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/rotate_perf_avx2_new_p >>>>>> atc >>>>>> h.txt >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Vladimir Ivanov >>>>>>> Sent: Saturday, July 18, 2020 12:25 AM >>>>>>> To: Bhateja, Jatin ; Andrew Haley >>>>>>> >>>>>>> Cc: Viswanathan, Sandhya ; >>>>>>> hotspot-compiler- dev at openjdk.java.net >>>>>>> Subject: Re: RFR[S] : 8248830 : C2 : Rotate API intrinsification >>>>>>> for >>>>>>> X86 >>>>>>> >>>>>>> Hi Jatin, >>>>>>> >>>>>>>> http://cr.openjdk.java.net/~jbhateja/8248830/webrev_02/ >>>>>>> >>>>>>> It definitely looks better, but IMO it hasn't reached the sweet >>>>>>> spot >>>>> yet. >>>>>>> It feels like the focus is on auto-vectorizer while the burden is >>>>>>> put on scalar cases. >>>>>>> >>>>>>> First of all, considering GVN folds relevant operation patterns >>>>>>> into a single Rotate node now, what's the motivation to introduce >>>>>>> intrinsics? >>>>>>> >>>>>>> Another point is there's still significant duplication for scalar >>>>>>> cases. >>>>>>> >>>>>>> I'd prefer to see the legacy cases which rely on pattern matching >>>>>>> to go away and be substituted with instructions which match Rotate >>>>>>> instructions (migrating ). >>>>>>> >>>>>>> I understand that it will penalize the vectorization >>>>>>> implementation, but IMO reducing overall complexity is worth it. >>>>>>> On auto-vectorizer side, I see >>>>>>> 2 ways to fix it: >>>>>>> >>>>>>> ???? (1) introduce additional AD instructions for >>>>>>> RotateLeftV/RotateRightV specifically for pre-AVX512 hardware; >>>>>>> >>>>>>> ???? (2) in SuperWord::output(), when matcher doesn't support >>>>>>> RotateLeftV/RotateLeftV nodes (Matcher::match_rule_supported()), >>>>>>> generate vectorized version of the original pattern. >>>>>>> >>>>>>> Overall, it looks like more and more focus is made on scalar part. >>>>>>> Considering the main goal of the patch is to enable vectorization, >>>>>>> I'm fine with separating cleanup of scalar part. As an interim >>>>>>> solution, it seems that leaving the scalar part as it is now and >>>>>>> matching scalar bit rotate pattern in VectorNode::is_rotate() >>>>>>> should be enough to keep the vectorization part functioning. Then >>>>>>> scalar Rotate nodes and relevant cleanups can be integrated later. >>>>>>> (Or vice >>>>>>> versa: clean up scalar part first and then follow up with >>>>>>> vectorization.) >>>>>>> >>>>>>> Some other comments: >>>>>>> >>>>>>> * There's a lot of duplication between OrINode::Ideal and >>>>> OrLNode::Ideal. >>>>>>> What do you think about introducing a super type >>>>>>> (OrNode) and put a unified version (OrNode::Ideal) there? >>>>>>> >>>>>>> >>>>>>> * src/hotspot/cpu/x86/x86.ad >>>>>>> >>>>>>> +instruct vprotate_immI8(vec dst, vec src, immI8 shift) %{ >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == >>>>>>> T_INT >>>>> || >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == >>>>>>> +T_LONG); >>>>>>> >>>>>>> +instruct vprorate(vec dst, vec src, vec shift) %{ >>>>>>> +? predicate(n->bottom_type()->is_vect()->element_basic_type() == >>>>>>> T_INT >>>>> || >>>>>>> +??????????? n->bottom_type()->is_vect()->element_basic_type() == >>>>>>> +T_LONG); >>>>>>> >>>>>>> The predicates are redundant here. >>>>>>> >>>>>>> >>>>>>> * src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp >>>>>>> >>>>>>> +void C2_MacroAssembler::vprotate_imm(int opcode, BasicType etype, >>>>>>> XMMRegister dst, XMMRegister src, >>>>>>> +???????????????????????????????????? int shift, int vector_len) { >>>>>>> +if (opcode == Op_RotateLeftV) { >>>>>>> +??? if (etype == T_INT) { >>>>>>> +????? evprold(dst, src, shift, vector_len); >>>>>>> +??? } else { >>>>>>> +????? evprolq(dst, src, shift, vector_len); >>>>>>> +??? } >>>>>>> >>>>>>> Please, put an assert for the false case (assert(etype == T_LONG, >>>>> "...")). >>>>>>> >>>>>>> >>>>>>> * On testing (with previous version of the patch): -XX:UseAVX is >>>>>>> x86- specific flag, so new/adjusted tests now fail on non-x86 >> platforms. >>>>>>> Either omitting the flag or adding >>>>>>> -XX:+IgnoreUnrecognizedVMOptions will solve the issue. >>>>>>> >>>>>>> Best regards, >>>>>>> Vladimir Ivanov >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Summary of changes: >>>>>>>> 1) Optimization is specifically targeted to exploit vector >>>>>>>> rotation >>>>>>> instruction added for X86 AVX512. A single rotate instruction >>>>>>> encapsulates entire vector OR/SHIFTs pattern thus offers better >>>>>>> latency at reduced instruction count. >>>>>>>> >>>>>>>> 2) There were two approaches to implement this: >>>>>>>> ?????? a)? Let everything remain the same and add new wide >>>>>>>> complex >>>>>>> instruction patterns in the matcher for e.g. >>>>>>>> ??????????? set Dst ( OrV (Binary (LShiftVI dst (Binary >>>>>>>> ReplicateI >>>>>>>> shift)) >>>>>>> (URShiftVI dst (Binary (SubI (Binary ReplicateI 32) ( Replicate >>>>>>> shift)) >>>>>>>> ?????? It would have been an overoptimistic assumption to expect >>>>>>>> that graph >>>>>>> shape would be preserved till the matcher for correct inferencing. >>>>>>>> ?????? In addition we would have required multiple such bulky >>>>>>>> patterns. >>>>>>>> ?????? b) Create new RotateLeft/RotateRight scalar nodes, these >>>>>>>> gets >>>>>>> generated during intrinsification as well as during additional >>>>>>> pattern >>>>>>>> ?????? matching during node Idealization, later on these nodes >>>>>>>> are consumed >>>>>>> by SLP for valid vectorization scenarios to emit their vector >>>>>>>> ?????? counterparts which eventually emits vector rotates. >>>>>>>> >>>>>>>> 3) I choose approach 2b) since its cleaner, only problem here was >>>>>>>> that in non-evex mode (UseAVX < 3) new scalar Rotate nodes should >>>>>>>> either be >>>>>>> dismantled back to OR/SHIFT pattern or we penalize the >>>>>>> vectorization which would be very costly, other option would have >>>>>>> been to add additional vector rotate pattern for UseAVX=3 in the >>>>>>> matcher which emit vector OR-SHIFTs instruction but then it will >>>>>>> loose on emitting efficient instruction sequence which node >>>>>>> sharing >>>>>>> (OrV/LShiftV/URShift) offer in current implementation - thus it >>>>>>> will not be beneficial for non-AVX512 targets, only saving will be >>>>>>> in terms of cleanup of few existing scalar rotate matcher >>>>>>> patterns, also old targets does not offer this powerful rotate >> instruction. >>>>>>> Therefore new scalar nodes are created only for AVX512 targets. >>>>>>>> >>>>>>>> As per suggestions constant folding scenarios have been covered >>>>>>>> during >>>>>>> Idealizations of newly added scalar nodes. >>>>>>>> >>>>>>>> Please review the latest version and share your feedback and test >>>>>>> results. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Jatin >>>>>>>> >>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Andrew Haley >>>>>>>>> Sent: Saturday, July 11, 2020 2:24 PM >>>>>>>>> To: Vladimir Ivanov ; Bhateja, >>>>>>>>> Jatin ; >>>>>>>>> hotspot-compiler-dev at openjdk.java.net >>>>>>>>> Cc: Viswanathan, Sandhya >>>>>>>>> Subject: Re: 8248830 : RFR[S] : C2 : Rotate API intrinsification >>>>>>>>> for >>>>>>>>> X86 >>>>>>>>> >>>>>>>>> On 10/07/2020 18:32, Vladimir Ivanov wrote: >>>>>>>>> >>>>>>>>> ??? > High-level comment: so far, there were no pressing need in >>>>>>>>>> explicitly marking the methods as intrinsics. ROR/ROL >>>>>>>>> instructions >>>>>>>>>> were selected during matching [1]. Now the patch introduces? > >>>>>>>>> dedicated nodes >>>>>>>>> (RotateLeft/RotateRight) specifically for intrinsics? > which >>>>>>>>> partly duplicates existing logic. >>>>>>>>> >>>>>>>>> The lack of rotate nodes in the IR has always meant that AArch64 >>>>>>>>> doesn't generate optimal code for e.g. >>>>>>>>> >>>>>>>>> ????? (Set dst (XorL reg1 (RotateLeftL reg2 imm))) >>>>>>>>> >>>>>>>>> because, with the RotateLeft expanded to its full combination of >>>>>>>>> ORs and shifts, it's to complicated to match. At the time I put >>>>>>>>> this to one side because it wasn't urgent. This is a shame >>>>>>>>> because although such combinations are unusual they are used in >>>>>>>>> some crypto >>>>> operations. >>>>>>>>> >>>>>>>>> If we can generate immediate-form rotate nodes early by pattern >>>>>>>>> matching during parsing (rather than depending on intrinsics) >>>>>>>>> we'll get more value than by depending on programmers calling >> intrinsics. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> Andrew Haley? (he/him) >>>>>>>>> Java Platform Lead Engineer >>>>>>>>> Red Hat UK Ltd. >>>>>>>>> https://keybase.io/andrewhaley >>>>>>>>> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 >>>>>>>> From vladimir.kozlov at oracle.com Fri Jul 31 23:54:00 2020 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Fri, 31 Jul 2020 16:54:00 -0700 Subject: 8250825: C2 crashes with assert(field != __null) failed: missing field(Internet mail) In-Reply-To: <11584C93-EDD5-42A9-A2CD-0738970F3181@tencent.com> References: <11584C93-EDD5-42A9-A2CD-0738970F3181@tencent.com> Message-ID: <40d947f8-ebdb-0850-274b-583be9a37aa3@oracle.com> Yes, it is good. Thanks, Vladimir On 7/31/20 4:43 PM, jiefu(??) wrote: > Hi Vladimir K, > > The latest version for the test case is here: http://cr.openjdk.java.net/~jiefu/8250825/webrev.02/ > Compared with webrev.01, the changes are: > - Rename the test to TestMisalignedUnsafeAccess.java > - Add @summary tag > - Remove Xbatch > - Remvoe initUnsafe > > Are you still OK with it? > > Thanks. > Best regards, > Jie > > ?On 2020/8/1, 12:46 AM, "Vladimir Kozlov" wrote: > > Good. > > thanks, > Vladimir K > > On 7/30/20 10:06 PM, jiefu(??) wrote: > > Hi Vladimir K, > > > > Thanks for your review. > > > > The test had been extended here: > > - http://cr.openjdk.java.net/~jiefu/8250825/webrev.01/ > > > > Before the patch: > > The unsafe access (put/get) to static field will crash. > > The unsafe access (put/get) to instance field is fine. > > > > After the patch: > > All is ok. > > > > Thanks a lot. > > Best regards, > > Jie > > > > On 2020/7/31, 2:24 AM, "hotspot-compiler-dev on behalf of Vladimir Kozlov" wrote: > > > > Hi Jie > > > > Nodes generated by make_unsafe_address() are correct. The issue is that Unsafe API allows to genereate unaligned (to > > fields) offset with arbitrary type. As result C2 type system can't find corresponding field. > > > > Did you tried to do unaligned unsafe access to instance fields? > > Also try to unsafe set value (Store node). There is code in C2 which checks for narrow stores. Would be interesting how > > it behave in unsafe case. > > > > Please, extend your test. > > > > Otherwise fix is good. > > > > Thanks, > > Vladimir K > > > > On 7/30/20 6:09 AM, jiefu(??) wrote: > > > Hi all, > > > > > > JBS: https://bugs.openjdk.java.net/browse/JDK-8250825 > > > Webrev: http://cr.openjdk.java.net/~jiefu/8250825/webrev.00/ > > > > > > When C2 tries to inline an unsafe-access method, it may generate the following pattern in make_unsafe_address: > > > ConP ConL > > > \ | > > > \ | > > > AddP > > > Current implementation of TypeOopPtr::TypeOopPtr(...) failed to recognize it as an unsafe operation, which leads to the crash. > > > > > > Testing: > > > - tier1-3 on Linux/x64 > > > > > > Could you please review it and give me some advice? > > > > > > Thanks a lot. > > > Best regards, > > > Jie > > > > > > > > > > > >