From matthias.baesken at sap.com Mon Jul 1 06:52:18 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Mon, 1 Jul 2019 06:52:18 +0000 Subject: RFR [XS] : 8226943: compile error in libfollowref003.cpp with XCode 10.2 on macosx In-Reply-To: References: <770df6b7-7588-1030-f13e-e1500a63231e@oracle.com> Message-ID: Hello, thanks for the review ! > I'd suggest to fix it in 13 as it is the test fix. I'll push it then to 13 , fine with me ! Best regards, Matthias > > Hi Matthias, > > The fix is good. > It worked before because both JVMTI_REFERENCE_ARRAY_ELEMENT > and JVMTI_HEAP_REFERENCE_ARRAY_ELEMENT have the same value 3 > as Gary noticed. > > I'd suggest to fix it in 13 as it is the test fix. > I've added labels 'testbug' and 'noreg-self'. > > Thanks, > Serguei > > On 6/28/19 12:04 PM, David Holmes wrote: > > Hi Matthias, > > > > Dropped build-dev and added serviceability-dev as this is a > > servicability test. > > > > On 28/06/2019 7:43 am, Baesken, Matthias wrote: > >> Hello please review this? small fix for a compile issue? on OSX . > >> Today I? compiled?? jdk/jdk?? on a machine? with?? XCode 10.2? . It > >> worked pretty well . > >> However this small issue showed up . > >> > >> > >> In file included from > >> > /open_jdk/jdk_just_clone/jdk/test/hotspot/jtreg/vmTestbase/nsk/jvmti/u > nit/FollowReferences/followref003/libfollowref003.cpp:33: > >> > /open_jdk/jdk_just_clone/jdk/test/hotspot/jtreg/vmTestbase/nsk/jvmti/u > nit/FollowReferences/followref003/followref003.cpp:813:14: > >> error: > >> comparison of two values with different enumeration types in switch > >> statement ('jvmtiHeapReferenceKind' and 'jvmtiObjectReferenceKind') > >> [-Werror,-Wenum-compare-switch] > >> > >> > >> And here XCode 10 is correct , JVMTI_REFERENCE_ARRAY_ELEMENT?? is > >> from a different? enumeration type? and should be replaced? with the > >> value? from the correct enumeration type?? . > >> > >> Bug / webrev : > >> > >> https://bugs.openjdk.java.net/browse/JDK-8226943 > >> > >> http://cr.openjdk.java.net/~mbaesken/webrevs/8226943.0/ > > > > The fix seems reasonable but the issue indicates a further problem > > with the test. If it expected JVMTI_HEAP_REFERENCE_ARRAY_ELEMENT > but > > was checking for JVMTI_REFERENCE_ARRAY_ELEMENT then we should > have hit > > the default clause and failed the test. That suggests the test doesn't > > actually expect JVMTI_HEAP_REFERENCE_ARRAY_ELEMENT in the first > place. > > > > Cheers, > > David > > > >> > >> Thanks, Matthias > >> From martin.doerr at sap.com Mon Jul 1 10:06:05 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 1 Jul 2019 10:06:05 +0000 Subject: RFR(m): 8220351: Cross-modifying code In-Reply-To: <8736k5339e.fsf@oldenburg2.str.redhat.com> References: <4021e6fe-a2e7-7e66-bd54-1ea9d80863ae@oracle.com> <87fto536lo.fsf@oldenburg2.str.redhat.com> <8736k5339e.fsf@oldenburg2.str.redhat.com> Message-ID: Hi Florian, sorry for breaking 32 bit linux. We don't build on this platform so we didn't notice. I believe "Compiler version last used for testing: gcc 4.8.2" is still correct for 64 bit linux. Andrew's proposal looks reasonable. Best regards, Martin > -----Original Message----- > From: Florian Weimer > Sent: Mittwoch, 19. Juni 2019 19:16 > To: Doerr, Martin > Cc: hotspot-dev at openjdk.java.net; aph at redhat.com > Subject: Re: RFR(m): 8220351: Cross-modifying code > > * Florian Weimer: > > > * Martin Doerr: > > > >> Not sure if the inline assembler code on x86 necessarily needs a "clobber > memory" effect. > >> I don't know what a C++ compiler is allowed to do if it doesn't know that > the code has some kind of memory effect. > >> > >> For ebx...edx, you could also use clobber if you want to make it shorter. > >> E.g. with "+a" to use eax as input and output: > >> int idx = 0; > >> __asm__ volatile ("cpuid " : "+a" (idx) : : "ebx", "ecx", "edx", "memory"); > > > > ebx clobbers are not supported on older GCC versions. > > src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp currently says > > this: > > > > // Compiler version last used for testing: gcc 4.8.2 > > > > But this is blatantly not true because GCC 4.8 cannot spill ebx in PIC > > mode. > > I got this patch from Andrew Haley, and the build works again with GCC > 4.8.5 (the system compiler on Red Hat Enterprise Linux 7): > > diff -r d7da94e6c169 > src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp > --- a/src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp Tue Jun 18 > 16:15:15 2019 +0100 > +++ b/src/hotspot/os_cpu/linux_x86/orderAccess_linux_x86.hpp Wed Jun > 19 17:52:26 2019 +0100 > @@ -57,7 +57,11 @@ > > inline void OrderAccess::cross_modify_fence() { > int idx = 0; > +#ifdef AMD64 > __asm__ volatile ("cpuid " : "+a" (idx) : : "ebx", "ecx", "edx", "memory"); > +#else > + __asm__ volatile ("xchg %%esi, %%ebx; cpuid; xchg %%esi, %%ebx " : "+a" > (idx) : : "esi", "ecx", "edx", "memory"); > +#endif > } > > template<> > > GCC can spill %esi without problems since forever, so this should work > everywhere. > > Thanks, > Florian From adam.farley at uk.ibm.com Mon Jul 1 12:27:11 2019 From: adam.farley at uk.ibm.com (Adam Farley8) Date: Mon, 1 Jul 2019 13:27:11 +0100 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN Message-ID: Hi All, The title say it all. If you pass in a value for sun.boot.library.path consisting of one or more paths that are too long, then the vm will fail to start because it can't load one of the libraries it needs (the zip library), despite the fact that the VM automatically prepends the default library path to the sun.boot.library.path property, using the correct separator to divide it from the user-specified path. So we've got the right path, in the right place, at the right time, we just can't *use* it. I've fixed this by changing the relevant os.cpp code to ignore paths that are too long, and to attempt to locate the needed library on the other paths (if any are valid). I've also added functionality to handle the edge case of paths that are neeeeeeearly too long, only for a sub-path (or file name) to push us over the limit *after* the split_path function is done assessing the path length. I've also changed the code we're overriding, on the assumption that someone's still using it somewhere. Bug: https://bugs.openjdk.java.net/browse/JDK-8227021 Webrev: http://cr.openjdk.java.net/~afarley/8227021/webrev/ Thoughts and impressions welcome. Best Regards Adam Farley IBM Runtimes Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From aph at redhat.com Mon Jul 1 13:09:32 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Jul 2019 14:09:32 +0100 Subject: RFR(m): 8220351: Cross-modifying code In-Reply-To: References: <4021e6fe-a2e7-7e66-bd54-1ea9d80863ae@oracle.com> <87fto536lo.fsf@oldenburg2.str.redhat.com> <8736k5339e.fsf@oldenburg2.str.redhat.com> Message-ID: <50cf980f-6ca9-4357-4ca5-5c6f5955162a@redhat.com> On 7/1/19 11:06 AM, Doerr, Martin wrote: > sorry for breaking 32 bit linux. We don't build on this platform so we didn't notice. > I believe "Compiler version last used for testing: gcc 4.8.2" is still correct for 64 bit linux. > > Andrew's proposal looks reasonable. https://bugs.openjdk.java.net/browse/JDK-8226525 I'm on it. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From erik.osterlund at oracle.com Mon Jul 1 13:12:23 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 1 Jul 2019 15:12:23 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic Message-ID: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> Hi, Today it is up to callers of methods changing state on nmethods like make_not_entrant(), to know all other possible concurrent attempts to transition the nmethod, and know that there are no such attempts trying to make the nmethod more dead. There have been multiple occurrences of issues where the caller got it wrong due to the fragile nature of this code. This specific CR deals with a bug where an OSR nmethod was made not entrant (deopt) and made unloaded concurrently. The result of such a race can be that it is first made unloaded and then made not entrant, making the nmethod go backwards in its state machine, effectively resurrecting dead nmethods, causing a subsequent GC to feel awkward (crash). But I have seen other similar incidents with deopt racing with the sweeper. These non-monotonicity problems are unnecessary to have. So I intend to fix the bug by enforcing monotonicity of the nmethod state machine explicitly, instead of trying to reason about all callers of these make_* functions. I swapped the order of unloaded and zombie in the enum as zombies are strictly more dead than unloaded nmethods. All transitions change in the direction of increasing deadness and fail if the transition is not monotonically increasing. For ZGC I moved OSR nmethod unlinking to before the unlinking (where unlinking code belongs), instead of after the handshake (intended for deleting things safely unlinked). Strictly speaking, moving the OSR nmethod unlinking removes the racing between make_not_entrant and make_unloaded, but I still want the monotonicity guards to make this code more robust. I left AOT methods alone. Since they don't die, they don't have resurrection problems, and hence do not benefit from these guards in the same way. Bug: https://bugs.openjdk.java.net/browse/JDK-8224674 Webrev: http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/ Thanks, /Erik From aph at redhat.com Mon Jul 1 14:22:09 2019 From: aph at redhat.com (Andrew Haley) Date: Mon, 1 Jul 2019 15:22:09 +0100 Subject: RFR: 8226525: HotSpot compile-time error for x86-32 Message-ID: <469d68e3-e153-504a-6412-4bb4cc58dcaa@redhat.com> This asm statement: asm__ volatile ("cpuid " : "+a" (idx) : : "ebx", "ecx", "edx", "memory") ... breaks on 32-bit systems because the GCC we use doesn't allow EBX to be clobbered. Fixed thusly: http://cr.openjdk.java.net/~aph/8226525/ There is some small overhead, but given that we're trashing the pipeline anyway the overhead is insignificant. -- Andrew Haley (he/him) Java Platform Lead Engineer Red Hat UK Ltd. https://keybase.io/andrewhaley EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671 From shade at redhat.com Mon Jul 1 14:46:32 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Mon, 1 Jul 2019 16:46:32 +0200 Subject: RFR: 8226525: HotSpot compile-time error for x86-32 In-Reply-To: <469d68e3-e153-504a-6412-4bb4cc58dcaa@redhat.com> References: <469d68e3-e153-504a-6412-4bb4cc58dcaa@redhat.com> Message-ID: <4988619d-5112-4000-3a9e-eb5554f99e03@redhat.com> On 7/1/19 4:22 PM, Andrew Haley wrote: > This asm statement: > > asm__ volatile ("cpuid " : "+a" (idx) : : "ebx", "ecx", "edx", "memory") > > ... breaks on 32-bit systems because the GCC we use doesn't allow EBX to be > clobbered. Fixed thusly: > > http://cr.openjdk.java.net/~aph/8226525/ Looks okay to me. Put the comment, e.g.: // EBX is a reserved register on 32-bit Linux systems, cannot clobber it. -Aleksey From kim.barrett at oracle.com Mon Jul 1 17:49:07 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 1 Jul 2019 13:49:07 -0400 Subject: RFR: 8226525: HotSpot compile-time error for x86-32 In-Reply-To: <4988619d-5112-4000-3a9e-eb5554f99e03@redhat.com> References: <469d68e3-e153-504a-6412-4bb4cc58dcaa@redhat.com> <4988619d-5112-4000-3a9e-eb5554f99e03@redhat.com> Message-ID: > On Jul 1, 2019, at 10:46 AM, Aleksey Shipilev wrote: > > On 7/1/19 4:22 PM, Andrew Haley wrote: >> This asm statement: >> >> asm__ volatile ("cpuid " : "+a" (idx) : : "ebx", "ecx", "edx", "memory") >> >> ... breaks on 32-bit systems because the GCC we use doesn't allow EBX to be >> clobbered. Fixed thusly: >> >> http://cr.openjdk.java.net/~aph/8226525/ > > Looks okay to me. Put the comment, e.g.: > // EBX is a reserved register on 32-bit Linux systems, cannot clobber it. > > -Aleksey Looks good. +1 on the additional comment. From thomas.stuefe at gmail.com Mon Jul 1 18:56:46 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 20:56:46 +0200 Subject: RFR(xs): 8225200: runtime/memory/RunUnitTestsConcurrently.java has a memory leak Message-ID: Hi all, may I please have reviews and opinions about the following patch: Issue: https://bugs.openjdk.java.net/browse/JDK-8227041 cr: http://cr.openjdk.java.net/~stuefe/webrevs/8227041-rununittestsconcurrently-has-a-mem-leak/webrev.00/webrev/index.html There is a memory leak in test_virtual_space_list_large_chunk(), called as part of the whitebox tests WB_RunMemoryUnitTests(). In this test metaspace allocation is tested by rapidly allocating and subsequently leaking a metachunk of ~512K. This is done by a number of threads in a tight loop for 15 seconds, which usually makes for 10-20GB rss. Test is usually OOM killed. This test seems to be often excluded, which makes sense, since this leak makes its memory usage difficult to predict. It is also earmarked by Oracle for gtest-ification, see 8213269. This leak is not easy to fix, among other things because it is not clear what it is it wants to test. Meanwhile, time moved on and we have quite nice gtests to test metaspace allocation (see e.g. test_metaspace_allocation.cpp) and I rather would run those gtests concurrently. Which could be a future RFE. So I just removed this metaspace related test from WB_RunMemoryUnitTests() altogether, since to me it does nothing useful. Once you remove the leaking allocation, not much is left. Without this part RunUnitTestsConcurrently test runs smoothly through its other parts, and in that form it is still useful. What do you think? Cheers, Thomas From stefan.karlsson at oracle.com Mon Jul 1 19:06:46 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Mon, 1 Jul 2019 21:06:46 +0200 Subject: RFR(xs): 8225200: runtime/memory/RunUnitTestsConcurrently.java has a memory leak In-Reply-To: References: Message-ID: On 2019-07-01 20:56, Thomas St?fe wrote: > Hi all, > > may I please have reviews and opinions about the following patch: > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227041 > cr: > http://cr.openjdk.java.net/~stuefe/webrevs/8227041-rununittestsconcurrently-has-a-mem-leak/webrev.00/webrev/index.html > > There is a memory leak in test_virtual_space_list_large_chunk(), called as > part of the whitebox tests WB_RunMemoryUnitTests(). In this test metaspace > allocation is tested by rapidly allocating and subsequently leaking a > metachunk of ~512K. This is done by a number of threads in a tight loop for > 15 seconds, which usually makes for 10-20GB rss. Test is usually OOM killed. > > This test seems to be often excluded, which makes sense, since this leak > makes its memory usage difficult to predict. > > It is also earmarked by Oracle for gtest-ification, see 8213269. > > This leak is not easy to fix, among other things because it is not clear > what it is it wants to test. Meanwhile, time moved on and we have quite > nice gtests to test metaspace allocation (see e.g. > test_metaspace_allocation.cpp) and I rather would run those gtests > concurrently. Which could be a future RFE. > > So I just removed this metaspace related test from WB_RunMemoryUnitTests() > altogether, since to me it does nothing useful. Once you remove the leaking > allocation, not much is left. > > Without this part RunUnitTestsConcurrently test runs smoothly through its > other parts, and in that form it is still useful. > > What do you think? I think this makes sense and it looks good to me. Thanks, StefanK > > Cheers, Thomas From thomas.stuefe at gmail.com Mon Jul 1 19:07:42 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 21:07:42 +0200 Subject: RFR(xs): 8225200: runtime/memory/RunUnitTestsConcurrently.java has a memory leak In-Reply-To: References:

Message-ID: Thanks Stefan! On Mon, Jul 1, 2019, 21:06 Stefan Karlsson wrote: > On 2019-07-01 20:56, Thomas St?fe wrote: > > Hi all, > > > > may I please have reviews and opinions about the following patch: > > > > Issue: https://bugs.openjdk.java.net/browse/JDK-8227041 > > cr: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8227041-rununittestsconcurrently-has-a-mem-leak/webrev.00/webrev/index.html > > > > There is a memory leak in test_virtual_space_list_large_chunk(), called > as > > part of the whitebox tests WB_RunMemoryUnitTests(). In this test > metaspace > > allocation is tested by rapidly allocating and subsequently leaking a > > metachunk of ~512K. This is done by a number of threads in a tight loop > for > > 15 seconds, which usually makes for 10-20GB rss. Test is usually OOM > killed. > > > > This test seems to be often excluded, which makes sense, since this leak > > makes its memory usage difficult to predict. > > > > It is also earmarked by Oracle for gtest-ification, see 8213269. > > > > This leak is not easy to fix, among other things because it is not clear > > what it is it wants to test. Meanwhile, time moved on and we have quite > > nice gtests to test metaspace allocation (see e.g. > > test_metaspace_allocation.cpp) and I rather would run those gtests > > concurrently. Which could be a future RFE. > > > > So I just removed this metaspace related test from > WB_RunMemoryUnitTests() > > altogether, since to me it does nothing useful. Once you remove the > leaking > > allocation, not much is left. > > > > Without this part RunUnitTestsConcurrently test runs smoothly through its > > other parts, and in that form it is still useful. > > > > What do you think? > > I think this makes sense and it looks good to me. > > Thanks, > StefanK > > > > > Cheers, Thomas > > From coleen.phillimore at oracle.com Mon Jul 1 19:13:21 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 1 Jul 2019 15:13:21 -0400 Subject: RFR(xs): 8225200: runtime/memory/RunUnitTestsConcurrently.java has a memory leak In-Reply-To: References:

Message-ID: +1 Thank you for taking care of this! Coleen On 7/1/19 3:07 PM, Thomas St?fe wrote: > Thanks Stefan! > > On Mon, Jul 1, 2019, 21:06 Stefan Karlsson > wrote: > >> On 2019-07-01 20:56, Thomas St?fe wrote: >>> Hi all, >>> >>> may I please have reviews and opinions about the following patch: >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8227041 >>> cr: >>> >> http://cr.openjdk.java.net/~stuefe/webrevs/8227041-rununittestsconcurrently-has-a-mem-leak/webrev.00/webrev/index.html >>> There is a memory leak in test_virtual_space_list_large_chunk(), called >> as >>> part of the whitebox tests WB_RunMemoryUnitTests(). In this test >> metaspace >>> allocation is tested by rapidly allocating and subsequently leaking a >>> metachunk of ~512K. This is done by a number of threads in a tight loop >> for >>> 15 seconds, which usually makes for 10-20GB rss. Test is usually OOM >> killed. >>> This test seems to be often excluded, which makes sense, since this leak >>> makes its memory usage difficult to predict. >>> >>> It is also earmarked by Oracle for gtest-ification, see 8213269. >>> >>> This leak is not easy to fix, among other things because it is not clear >>> what it is it wants to test. Meanwhile, time moved on and we have quite >>> nice gtests to test metaspace allocation (see e.g. >>> test_metaspace_allocation.cpp) and I rather would run those gtests >>> concurrently. Which could be a future RFE. >>> >>> So I just removed this metaspace related test from >> WB_RunMemoryUnitTests() >>> altogether, since to me it does nothing useful. Once you remove the >> leaking >>> allocation, not much is left. >>> >>> Without this part RunUnitTestsConcurrently test runs smoothly through its >>> other parts, and in that form it is still useful. >>> >>> What do you think? >> I think this makes sense and it looks good to me. >> >> Thanks, >> StefanK >> >>> Cheers, Thomas >> From thomas.stuefe at gmail.com Mon Jul 1 19:18:52 2019 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 1 Jul 2019 21:18:52 +0200 Subject: RFR(xs): 8225200: runtime/memory/RunUnitTestsConcurrently.java has a memory leak In-Reply-To: References:

Message-ID: Thanks Coleen! On Mon, Jul 1, 2019, 21:14 wrote: > +1 > Thank you for taking care of this! > Coleen > > On 7/1/19 3:07 PM, Thomas St?fe wrote: > > Thanks Stefan! > > > > On Mon, Jul 1, 2019, 21:06 Stefan Karlsson > > wrote: > > > >> On 2019-07-01 20:56, Thomas St?fe wrote: > >>> Hi all, > >>> > >>> may I please have reviews and opinions about the following patch: > >>> > >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8227041 > >>> cr: > >>> > >> > http://cr.openjdk.java.net/~stuefe/webrevs/8227041-rununittestsconcurrently-has-a-mem-leak/webrev.00/webrev/index.html > >>> There is a memory leak in test_virtual_space_list_large_chunk(), called > >> as > >>> part of the whitebox tests WB_RunMemoryUnitTests(). In this test > >> metaspace > >>> allocation is tested by rapidly allocating and subsequently leaking a > >>> metachunk of ~512K. This is done by a number of threads in a tight loop > >> for > >>> 15 seconds, which usually makes for 10-20GB rss. Test is usually OOM > >> killed. > >>> This test seems to be often excluded, which makes sense, since this > leak > >>> makes its memory usage difficult to predict. > >>> > >>> It is also earmarked by Oracle for gtest-ification, see 8213269. > >>> > >>> This leak is not easy to fix, among other things because it is not > clear > >>> what it is it wants to test. Meanwhile, time moved on and we have quite > >>> nice gtests to test metaspace allocation (see e.g. > >>> test_metaspace_allocation.cpp) and I rather would run those gtests > >>> concurrently. Which could be a future RFE. > >>> > >>> So I just removed this metaspace related test from > >> WB_RunMemoryUnitTests() > >>> altogether, since to me it does nothing useful. Once you remove the > >> leaking > >>> allocation, not much is left. > >>> > >>> Without this part RunUnitTestsConcurrently test runs smoothly through > its > >>> other parts, and in that form it is still useful. > >>> > >>> What do you think? > >> I think this makes sense and it looks good to me. > >> > >> Thanks, > >> StefanK > >> > >>> Cheers, Thomas > >> > > From david.holmes at oracle.com Mon Jul 1 21:10:45 2019 From: david.holmes at oracle.com (David Holmes) Date: Tue, 2 Jul 2019 07:10:45 +1000 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: References: Message-ID: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com> Hi Adam, On 1/07/2019 10:27 pm, Adam Farley8 wrote: > Hi All, > > The title say it all. > > If you pass in a value for sun.boot.library.path consisting > of one or more paths that are too long, then the vm will > fail to start because it can't load one of the libraries it > needs (the zip library), despite the fact that the VM > automatically prepends the default library path to the > sun.boot.library.path property, using the correct separator > to divide it from the user-specified path. > > So we've got the right path, in the right place, at the > right time, we just can't *use* it. > > I've fixed this by changing the relevant os.cpp code to > ignore paths that are too long, and to attempt to locate > the needed library on the other paths (if any are valid). As I just added to the bug report I have a different view of "correct" here. If you just ignore the long path and keep processing other short paths you may find the wrong library. There is a user error here and that error should be reported ASAP and in a way that leads to failure ASAP. Perhaps we should be more aggressive in aborting the VM when this is detected? David ----- > I've also added functionality to handle the edge case of > paths that are neeeeeeearly too long, only for a > sub-path (or file name) to push us over the limit *after* > the split_path function is done assessing the path length. > > I've also changed the code we're overriding, on the assumption > that someone's still using it somewhere. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227021 > Webrev: http://cr.openjdk.java.net/~afarley/8227021/webrev/ > > Thoughts and impressions welcome. > > Best Regards > > Adam Farley > IBM Runtimes > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > From coleen.phillimore at oracle.com Mon Jul 1 21:36:23 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 1 Jul 2019 17:36:23 -0400 Subject: RFR[13]: 8226366: Excessive ServiceThread wakeups for OopStorage cleanup In-Reply-To: References: Message-ID: http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/serviceThread.cpp.frames.html Do you have another bug to add the oopStorage for the ResolvedMethodTable to the list? http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/safepoint.cpp.frames.html I suppose you don't need is_safepoint_needed() to trigger this cleanup in the GuaranteedSafepointInterval because if there is no GC, there won't be any blocks to deallocate. http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html One nit.? The rest of the implementations that do the same thing as this, are called "trigger_concurrent_work".? This is called differently from the safepoint cleanup tasls, but could you call it trigger_cleanup_if_needed() instead?? Then I know it does the same/similar thing as the others without looking. 818 void OopStorage::request_cleanup_if_needed() { 819 MonitorLocker ml(Service_lock, Monitor::_no_safepoint_check_flag); 820 if (Atomic::load(&needs_cleanup_requested) && 821 !needs_cleanup_notified && 822 (os::javaTimeNanos() > cleanup_permit_time)) { 823 needs_cleanup_notified = true; 824 ml.notify_all(); 825 } 826 } The implementation looks good.? I think it's good that you don't have the safepoint cleanup task timer around this. Thanks, Coleen On 6/25/19 10:38 PM, Kim Barrett wrote: > Please review this change to OopStorage's notifications to the ServiceThread > to perform empty block deletion. The existing mechanism (introduced by > JDK-8210986) is driven by entry allocation, and may arbitrarily delay such > cleanup, or alternatively may be much too enthusiastic about waking up the > ServiceThread. > > The new mechanism does not depend on allocations. Instead, a new safepoint > cleanup task is used to (irregularly) check for pending requests and notify > the ServiceThread. That notification has a time-based throttle, and also > avoids duplicate notifications. Also, requests are now only recorded for > to-empty transitions and not for full to not-full transitions. > > Changed the work limit for delete_empty_blocks to have a small surplus to > avoid some common cases with small number of blocks leading to unnecessarily > spinning the ServiceThread. > > While making these changes, noticed and fixed a problem in block allocation > that could result in a mistaken report of allocation failure. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8226366 > > Webrev: > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/ > > Testing: > mach5 tier1-5 > > Locally ran gc/stress/TestReclaimStringsLeaksMemory.java with some extra > logging and verified that the number of ServiceThread notifications was > reduced by a *lot*, down to something reasonable. > From coppa at di.uniroma1.it Tue Jul 2 08:47:22 2019 From: coppa at di.uniroma1.it (Emilio Coppa) Date: Tue, 2 Jul 2019 10:47:22 +0200 Subject: MPLR 2019 - Deadline Extension Message-ID: =============================================== MPLR 2019 16th International Conference on Managed Programming Languages & Runtimes Co-located with SPLASH 2019 Athens, Greece, Oct 20-25, 2019 https://conf.researchr.org/home/mplr-2019 =============================================== The 16th International Conference on Managed Programming Languages & Runtimes (MPLR, formerly ManLang) is a premier forum for presenting and discussing novel results in all aspects of managed programming languages and runtime systems, which serve as building blocks for some of the most important computing systems around, ranging from small-scale (embedded and real-time systems) to large-scale (cloud-computing and big-data platforms) and anything in between (mobile, IoT, and wearable applications). This year, MPLR is co-located with SPLASH 2019 and sponsored by ACM. For more information, check out the conference website: https://conf.researchr.org/home/mplr-2019 # Topics Topics of interest include but are not limited to: * Languages and Compilers - Managed languages (e.g., Java, Scala, JavaScript, Python, Ruby, C#, F#, Clojure, Groovy, Kotlin, R, Smalltalk, Racket, Rust, Go, etc.) - Domain-specific languages - Language design - Compilers and interpreters - Type systems and program logics - Language interoperability - Parallelism, distribution, and concurrency * Virtual Machines - Managed runtime systems (e.g., JVM, Dalvik VM, Android Runtime (ART), LLVM, .NET CLR, RPython, etc.) - VM design and optimization - VMs for mobile and embedded devices - VMs for real-time applications - Memory management - Hardware/software co-design * Techniques, Tools, and Applications - Static and dynamic program analysis - Testing and debugging - Refactoring - Program understanding - Program synthesis - Security and privacy - Performance analysis and monitoring - Compiler and program verification # Submission Categories MPLR accepts four types of submissions: 1. Regular research papers, which describe novel contributions involving managed language platforms (up to 12 pages excluding bibliography and appendix). Research papers will be evaluated based on their relevance, novelty, technical rigor, and contribution to the state-of-the-art. 2. Work-in-progress research papers, which describe promising new ideas but yet have less maturity than full papers (up to 6 pages excluding bibliography and appendix). When evaluating work-in-progress papers, more emphasis will be placed on novelty and the potential of the new ideas than on technical rigor and experimental results. 3. Industry and tool papers, which present technical challenges and solutions for managed language platforms in the context of deployed applications and systems (up to 6 pages excluding bibliography and appendix). Industry and tool papers will be evaluated on their relevance, usefulness, and results. Suitability for demonstration and availability will also be considered for tool papers. 4. Posters, which can be accompanied by a one-page abstract and will be evaluated on similar criteria as Work-in-progress papers. Posters can accompany any submission as a way to provide additional demonstration and discussion opportunities. MPLR 2019 submissions must conform to the ACM Policy on Prior Publication and Simultaneous Submissions and to the SIGPLAN Republication Policy. # Important Dates and Organization Submission Deadline: ***Jul 15, 2019*** (extended) Author Notification: Aug 24, 2019 Camera Ready: Sep 12, 2019 Conference Dates: Oct 20-25, 2019 General Chair: Tony Hosking, Australian National University / Data61, Australia Program Chair: Irene Finocchi, Sapienza University of Rome, Italy Program Committee: * Edd Barrett, King's College London, United Kingdom * Steve Blackburn, Australian National University, Australia * Lubomir Bulej, Charles University, Czech Republic * Shigeru Chiba, University of Tokyo, Japan * Daniele Cono D'Elia, Sapienza University of Rome, Italy * Ana Lucia de Moura, Pontifical Catholic University of Rio de Janeiro, Brazil * Erik Ernst, Google, Denmark * Matthew Hertz, University at Buffalo, United States * Vivek Kumar, Indraprastha Institute of Information Technology, Delhi * Doug Lea, State University of New York (SUNY) Oswego, United States * Magnus Madsen, Aarhus University, Denmark * Hidehiko Masuhara, Tokyo Institute of Technology, Japan * Ana Milanova, Rensselaer Polytechnic Institute, United States * Matthew Parkinson, Microsoft Research, United Kingdom * Gregor Richards, University of Waterloo, Canada * Manuel Rigger, ETH Zurich, Switzerland * Andrea Rosa, University of Lugano, Switzerland * Guido Salvaneschi, TU Darmstadt, Germany * Lukas Stadler, Oracle Labs, Austria * Ben L. Titzer, Google, Germany From adam.farley at uk.ibm.com Tue Jul 2 09:44:04 2019 From: adam.farley at uk.ibm.com (Adam Farley8) Date: Tue, 2 Jul 2019 10:44:04 +0100 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com> References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com> Message-ID: Hi David, Thanks for your thoughts. The user should absolutely have immediate feedback, yes, and I agree that "skipping" paths could lead to us loading the wrong library. Perhaps a compromise? We fire off a stderr warning if any of the paths are too long (without killing the VM), we ignore any path *after* (and including) the first too-long path, and we kill the VM if the first path is too long. Warning message example: ---- Warning: One or more sun.boot.library.path paths were too long for this system, and it (along with all subsequent paths) have been ignored. ---- Another addition could be to check the path lengths for the property sooner, thus aborting the VM faster if the default path is too long. Assuming we posit that the VM will always need to load libraries. Best Regards Adam Farley IBM Runtimes David Holmes wrote on 01/07/2019 22:10:45: > From: David Holmes > To: Adam Farley8 , hotspot-dev at openjdk.java.net > Date: 01/07/2019 22:12 > Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > paths are longer than JVM_MAXPATHLEN > > Hi Adam, > > On 1/07/2019 10:27 pm, Adam Farley8 wrote: > > Hi All, > > > > The title say it all. > > > > If you pass in a value for sun.boot.library.path consisting > > of one or more paths that are too long, then the vm will > > fail to start because it can't load one of the libraries it > > needs (the zip library), despite the fact that the VM > > automatically prepends the default library path to the > > sun.boot.library.path property, using the correct separator > > to divide it from the user-specified path. > > > > So we've got the right path, in the right place, at the > > right time, we just can't *use* it. > > > > I've fixed this by changing the relevant os.cpp code to > > ignore paths that are too long, and to attempt to locate > > the needed library on the other paths (if any are valid). > > As I just added to the bug report I have a different view of "correct" > here. If you just ignore the long path and keep processing other short > paths you may find the wrong library. There is a user error here and > that error should be reported ASAP and in a way that leads to failure > ASAP. Perhaps we should be more aggressive in aborting the VM when this > is detected? > > David > ----- > > > I've also added functionality to handle the edge case of > > paths that are neeeeeeearly too long, only for a > > sub-path (or file name) to push us over the limit *after* > > the split_path function is done assessing the path length. > > > > I've also changed the code we're overriding, on the assumption > > that someone's still using it somewhere. > > > > Bug: https://urldefense.proofpoint.com/v2/url? > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=P5m8KWUXJf- > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= > > Webrev: https://urldefense.proofpoint.com/v2/url? > u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- > siA1ZOg&r=P5m8KWUXJf- > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- > hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= > > > > Thoughts and impressions welcome. > > > > Best Regards > > > > Adam Farley > > IBM Runtimes > > > > Unless stated otherwise above: > > IBM United Kingdom Limited - Registered in England and Wales with number > > 741598. > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > > > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From martin.doerr at sap.com Tue Jul 2 10:19:42 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 2 Jul 2019 10:19:42 +0000 Subject: RFR: 8226238: Improve error output and fix elf issues in os::dll_load In-Reply-To: References: Message-ID: Hi Matthias, thanks for contributing this improvement. Please note that there are endianness macros available. You can use e.g. if (elf_head.e_ident[EI_DATA] != LITTLE_ENDIAN_ONLY(ELFDATA2LSB) BIG_ENDIAN_ONLY(ELFDATA2MSB)) { I don't see why we need a variable "current_endianness". Besides this, change looks good to me. I don't need to see another webrev. Best regards, Martin > -----Original Message----- > From: hotspot-dev On Behalf Of > Baesken, Matthias > Sent: Freitag, 28. Juni 2019 09:06 > To: Langer, Christoph ; 'hotspot- > dev at openjdk.java.net' > Subject: RE: RFR: 8226238: Improve error output and fix elf issues in > os::dll_load > > Hi Christoph, thanks for looking into it. > I did the changes you mentioned, here is my new webrev : > > http://cr.openjdk.java.net/~mbaesken/webrevs/8226238.4/ > > Would be good to get a second review . > > > Thanks and best regards, Matthias > > > > -----Original Message----- > > From: Langer, Christoph > > Sent: Donnerstag, 27. Juni 2019 16:59 > > To: Baesken, Matthias ; 'hotspot- > > dev at openjdk.java.net' > > Subject: RE: RFR: 8226238: Improve error output and fix elf issues in > > os::dll_load > > > > Hi Matthias, > > > > your change looks good overall. > > > > I only have a few style nits: > > > > src/hotspot/os/linux/os_linux.cpp, line 1751 (new): > > > > Can you convert > > > > unsigned char current_endianness = ELFDATA2MSB; // BE > > #if defined(VM_LITTLE_ENDIAN) > > current_endianness = ELFDATA2LSB; // LE > > #endif > > > > to > > > > #if defined(VM_LITTLE_ENDIAN) > > unsigned char current_endianness = ELFDATA2LSB; // LE > > #else > > unsigned char current_endianness = ELFDATA2MSB; // BE > > #endif > > > > And the same in line 1611 of src/hotspot/os/solaris/os_solaris.cpp. > > > > src/hotspot/os/linux/os_linux.cpp, line 1802: you could fix the indentation > of > > ELFDATA2LSB for EM_ARM > > same for line 1580 of src/hotspot/os/solaris/os_solaris.cpp. > > From thomas.schatzl at oracle.com Tue Jul 2 11:00:09 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 02 Jul 2019 13:00:09 +0200 Subject: RFR[13]: 8226366: Excessive ServiceThread wakeups for OopStorage cleanup In-Reply-To: References:

Message-ID: <4b178ed4b7c64e435d05619f95cf0421e5c23d6e.camel@oracle.com> Hi, On Mon, 2019-07-01 at 17:36 -0400, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/serviceThread.cpp.frames.html > > Do you have another bug to add the oopStorage for the > ResolvedMethodTable to the list? > > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/safepoint.cpp.frames.html > > I suppose you don't need is_safepoint_needed() to trigger this > cleanup in the GuaranteedSafepointInterval because if there is no GC, > there won't be any blocks to deallocate. > > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html > > One nit. The rest of the implementations that do the same thing as > this, are called "trigger_concurrent_work". This is called > differently from the safepoint cleanup tasls, but could you call it > trigger_cleanup_if_needed() instead? Then I know it does the > same/similar thing as the others without looking. > > 818 void OopStorage::request_cleanup_if_needed() { > 819 MonitorLocker ml(Service_lock, > Monitor::_no_safepoint_check_flag); > 820 if (Atomic::load(&needs_cleanup_requested) && > 821 !needs_cleanup_notified && > 822 (os::javaTimeNanos() > cleanup_permit_time)) { > 823 needs_cleanup_notified = true; > 824 ml.notify_all(); > 825 } > 826 } > Similar in serviceThread.cpp:136, it would be nice if the method were named "has_work()" like others instead of "test_and_clear_cleanup_request()". While the latter is technically better, it raises the question whether it is the correct thing to do here in some way when compared to others. Feel free to ignore this comment though. I *think* otherwise it is good, but I am kind of new to the OopStorage stuff. Thanks, Thomas From matthias.baesken at sap.com Tue Jul 2 12:54:39 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 2 Jul 2019 12:54:39 +0000 Subject: RFR: 8226238: Improve error output and fix elf issues in os::dll_load In-Reply-To: References: Message-ID: Hi Martin , thanks for the review . I followed your advice and removed the current_endianness - variable . http://cr.openjdk.java.net/~mbaesken/webrevs/8226238.5/ Best regards , Matthias > Hi Matthias, > > thanks for contributing this improvement. > > Please note that there are endianness macros available. You can use e.g. > if (elf_head.e_ident[EI_DATA] != LITTLE_ENDIAN_ONLY(ELFDATA2LSB) > BIG_ENDIAN_ONLY(ELFDATA2MSB)) { > I don't see why we need a variable "current_endianness". > > Besides this, change looks good to me. I don't need to see another webrev. > > Best regards, > Martin > From martin.doerr at sap.com Tue Jul 2 12:56:37 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 2 Jul 2019 12:56:37 +0000 Subject: RFR: 8226238: Improve error output and fix elf issues in os::dll_load In-Reply-To: References: Message-ID: Looks good. Thanks, Martin > -----Original Message----- > From: Baesken, Matthias > Sent: Dienstag, 2. Juli 2019 14:55 > To: Doerr, Martin ; Langer, Christoph > ; 'hotspot-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RE: RFR: 8226238: Improve error output and fix elf issues in > os::dll_load > > Hi Martin , thanks for the review . > > I followed your advice and removed the current_endianness - variable . > > http://cr.openjdk.java.net/~mbaesken/webrevs/8226238.5/ > > > Best regards , Matthias > > > > > Hi Matthias, > > > > thanks for contributing this improvement. > > > > Please note that there are endianness macros available. You can use e.g. > > if (elf_head.e_ident[EI_DATA] != LITTLE_ENDIAN_ONLY(ELFDATA2LSB) > > BIG_ENDIAN_ONLY(ELFDATA2MSB)) { > > I don't see why we need a variable "current_endianness". > > > > Besides this, change looks good to me. I don't need to see another webrev. > > > > Best regards, > > Martin > > From kim.barrett at oracle.com Tue Jul 2 18:36:32 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 2 Jul 2019 14:36:32 -0400 Subject: RFR[13]: 8226366: Excessive ServiceThread wakeups for OopStorage cleanup In-Reply-To: References:

Message-ID: <18A29890-7336-485E-97E8-A0E5C6DE93E6@oracle.com> > On Jul 1, 2019, at 5:36 PM, coleen.phillimore at oracle.com wrote: > > > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/serviceThread.cpp.frames.html > > Do you have another bug to add the oopStorage for the ResolvedMethodTable to the list? JDK-8227053. Also JDK-8227054. > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/safepoint.cpp.frames.html > > I suppose you don't need is_safepoint_needed() to trigger this cleanup in the GuaranteedSafepointInterval because if there is no GC, there won't be any blocks to deallocate. s/is_safepoint_needed()/is_cleanup_needed()/ There doesn't seem to be a clear theory of what that function should check for. Some of the existing safepoint cleanups have checks there, some don't, and it's not always obvious why. This cleanup doesn't seem so urgent that if there are no other reasons to safepoint for a long(ish) time then we should force one for just this purpose. > http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html > > One nit. The rest of the implementations that do the same thing as this, are called "trigger_concurrent_work". This is called differently from the safepoint cleanup tasls, but could you call it trigger_cleanup_if_needed() instead? Then I know it does the same/similar thing as the others without looking. Renamed to trigger_cleanup_if_needed(). Also test_and_clear_cleanup_request() => has_cleanup_work_and_reset(). Thomas asked for "has_work", to be consistent with String/Symbol/ResolvedMethodTable, but I think that's too generic here; what kind of work? (In the case of the tables, it's not always "cleanup" work.) Coleen suggested the "and_reset" suffix, to follow an existing convention. I also made a few corresponding internal name changes. > The implementation looks good. I think it's good that you don't have the safepoint cleanup task timer around this. I added a comment about the lack of task timing, so it's clearly intentional and not simply forgotten. New webrevs: full: http://cr.openjdk.java.net/~kbarrett/8226366/open.01/ incr: http://cr.openjdk.java.net/~kbarrett/8226366/open.01.inc/ Testing: Local build and hotspot_tier1. From coleen.phillimore at oracle.com Tue Jul 2 21:12:47 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 2 Jul 2019 17:12:47 -0400 Subject: RFR[13]: 8226366: Excessive ServiceThread wakeups for OopStorage cleanup In-Reply-To: <18A29890-7336-485E-97E8-A0E5C6DE93E6@oracle.com> References:

<18A29890-7336-485E-97E8-A0E5C6DE93E6@oracle.com> Message-ID: <44159f2d-18c0-bf39-1e41-1af3e32972a3@oracle.com> On 7/2/19 2:36 PM, Kim Barrett wrote: >> On Jul 1, 2019, at 5:36 PM, coleen.phillimore at oracle.com wrote: >> >> >> http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/serviceThread.cpp.frames.html >> >> Do you have another bug to add the oopStorage for the ResolvedMethodTable to the list? > JDK-8227053. Also JDK-8227054. Good. > >> http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/runtime/safepoint.cpp.frames.html >> >> I suppose you don't need is_safepoint_needed() to trigger this cleanup in the GuaranteedSafepointInterval because if there is no GC, there won't be any blocks to deallocate. > s/is_safepoint_needed()/is_cleanup_needed()/ > > There doesn't seem to be a clear theory of what that function should > check for. Some of the existing safepoint cleanups have checks there, > some don't, and it's not always obvious why. This cleanup doesn't seem > so urgent that if there are no other reasons to safepoint for a > long(ish) time then we should force one for just this purpose. Yes, it's not well defined.? If something can run forever and need a cleanup without a safepoint, it should go in the list. > >> http://cr.openjdk.java.net/~kbarrett/8226366/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.frames.html >> >> One nit. The rest of the implementations that do the same thing as this, are called "trigger_concurrent_work". This is called differently from the safepoint cleanup tasls, but could you call it trigger_cleanup_if_needed() instead? Then I know it does the same/similar thing as the others without looking. > Renamed to trigger_cleanup_if_needed(). > > Also test_and_clear_cleanup_request() => has_cleanup_work_and_reset(). > Thomas asked for "has_work", to be consistent with > String/Symbol/ResolvedMethodTable, but I think that's too generic here; > what kind of work? (In the case of the tables, it's not always > "cleanup" work.) Coleen suggested the "and_reset" suffix, to follow an > existing convention. > > I also made a few corresponding internal name changes. > >> The implementation looks good. I think it's good that you don't have the safepoint cleanup task timer around this. > I added a comment about the lack of task timing, so it's clearly > intentional and not simply forgotten. > > New webrevs: > full: http://cr.openjdk.java.net/~kbarrett/8226366/open.01/ > incr: http://cr.openjdk.java.net/~kbarrett/8226366/open.01.inc/ > > Testing: Local build and hotspot_tier1. > Nice! Thanks, Coleen From kim.barrett at oracle.com Tue Jul 2 21:43:46 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 2 Jul 2019 17:43:46 -0400 Subject: RFR[13]: 8226366: Excessive ServiceThread wakeups for OopStorage cleanup In-Reply-To: <44159f2d-18c0-bf39-1e41-1af3e32972a3@oracle.com> References:

<18A29890-7336-485E-97E8-A0E5C6DE93E6@oracle.com> <44159f2d-18c0-bf39-1e41-1af3e32972a3@oracle.com> Message-ID: <67CFC501-B51C-4251-8331-15D0E59C32E9@oracle.com> > On Jul 2, 2019, at 5:12 PM, coleen.phillimore at oracle.com wrote: >> >> New webrevs: >> full: http://cr.openjdk.java.net/~kbarrett/8226366/open.01/ >> incr: http://cr.openjdk.java.net/~kbarrett/8226366/open.01.inc/ >> >> Testing: Local build and hotspot_tier1. >> > > Nice! > Thanks, > Coleen Thanks. From david.holmes at oracle.com Wed Jul 3 07:36:36 2019 From: david.holmes at oracle.com (David Holmes) Date: Wed, 3 Jul 2019 17:36:36 +1000 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com> Message-ID: On 2/07/2019 7:44 pm, Adam Farley8 wrote: > Hi David, > > Thanks for your thoughts. > > The user should absolutely have immediate feedback, yes, and I agree > that "skipping" paths could lead to us loading the wrong library. > > Perhaps a compromise? We fire off a stderr warning if any of the paths > are too long (without killing the VM), we ignore any path *after* > (and including) the first too-long path, and we kill the VM if the > first path is too long. My first though is why be so elaborate and not just fail immediately: Error occurred during initialization of VM One or more sun.boot.library.path elements is too long for this system. --- ? But AFAICS we don't do any sanity checking of the those paths so this would have an impact on startup. I can't locate where we would detect the too-long path element, is it in hostpot or JDK code? Thanks, David ----- > Warning message example: > > ---- > Warning: One or more sun.boot.library.path paths were too long > for this system, and it (along with all subsequent paths) have been > ignored. > ---- > > Another addition could be to check the path lengths for the property > sooner, thus aborting the VM faster if the default path is too long. > > Assuming we posit that the VM will always need to load libraries. > > Best Regards > > Adam Farley > IBM Runtimes > > > David Holmes wrote on 01/07/2019 22:10:45: > >> From: David Holmes >> To: Adam Farley8 , hotspot-dev at openjdk.java.net >> Date: 01/07/2019 22:12 >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path >> paths are longer than JVM_MAXPATHLEN >> >> Hi Adam, >> >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: >> > Hi All, >> > >> > The title say it all. >> > >> > If you pass in a value for sun.boot.library.path consisting >> > of one or more paths that are too long, then the vm will >> > fail to start because it can't load one of the libraries it >> > needs (the zip library), despite the fact that the VM >> > automatically prepends the default library path to the >> > sun.boot.library.path property, using the correct separator >> > to divide it from the user-specified path. >> > >> > So we've got the right path, in the right place, at the >> > right time, we just can't *use* it. >> > >> > I've fixed this by changing the relevant os.cpp code to >> > ignore paths that are too long, and to attempt to locate >> > the needed library on the other paths (if any are valid). >> >> As I just added to the bug report I have a different view of "correct" >> here. If you just ignore the long path and keep processing other short >> paths you may find the wrong library. There is a user error here and >> that error should be reported ASAP and in a way that leads to failure >> ASAP. Perhaps we should be more aggressive in aborting the VM when this >> is detected? >> >> David >> ----- >> >> > I've also added functionality to handle the edge case of >> > paths that are neeeeeeearly too long, only for a >> > sub-path (or file name) to push us over the limit *after* >> > the split_path function is done assessing the path length. >> > >> > I've also changed the code we're overriding, on the assumption >> > that someone's still using it somewhere. >> > >> > Bug: https://urldefense.proofpoint.com/v2/url? >> u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=P5m8KWUXJf- >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= >> > Webrev: https://urldefense.proofpoint.com/v2/url? >> u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- >> siA1ZOg&r=P5m8KWUXJf- >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= >> > >> > Thoughts and impressions welcome. >> > >> > Best Regards >> > >> > Adam Farley >> > IBM Runtimes >> > >> > Unless stated otherwise above: >> > IBM United Kingdom Limited - Registered in England and Wales with number >> > 741598. >> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > >> > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From erik.osterlund at oracle.com Wed Jul 3 10:15:47 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 3 Jul 2019 12:15:47 +0200 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics Message-ID: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> Hi, The heap inspection API performs pointer chasing through dead memory. In particular, it populates a table with classes based on a heap walk using CollectedHeap::object_iterate(). That API can give you dead objects as well, whereas CollectedHeap::safe_object_iterate() gives you only live objects. The possibly dead objects have their possibly dead Klass* recorded. Then we call Klass::collect_statistics on said possibly dead recorded classes. In there, we follow the possibly dead mirror oop of the possibly dead Klass, and subsequently read the Klass* of the possibly dead mirror. Here is an attempted ASCII art explaining this | Klass*????? | -> |??????????? | | hi I am a?? |??? | hi I am a? | | dead object |??? | dead Klass | ?????????????????? |??????????? | ?????????????????? | oop mirror | -> | Klass* ???? | -> | Class klass | ???????????????????????????????????? | hi I am a?? | ???????????????????????????????????? | dead mirror | So as you can see we pointer chase through this chain of possibly dead memory. What could possibly go wrong though? In CMS a crash can manifest in the following way: In a concurrent collection, both the object and its class die. They are dead once we reach final marking. But the memory is kept around and being swept by concurrent sweeping. The sweeping yields to young collections (controllable through the CMSYield flag). So what can happen is that the mirror is swept (which already is a problem in debug builds because we zap the free chunk of memory, but hold on there is a problem in in product builds too) and gets added to a free list. In the yielded safepoint we may perform a young collection that promotes objects to the memory of the freed chunk (where the mirror used to be, except due to coalescing of freed chunks, there might not be a Klass* pointer where there used to be one for the mirror). And then, before sweeping finishes, the heap inspection API is called. That API sometimes tries to perform a STW GC first to get only live objects, but that GC may fail because of the JNI gc locker. And then it just goes ahead calling the unsafe object_iterate API anyway. The object_iterate API will pass in the object to the closure but not the mirror as it has been freed and reused. Buuut... since we pointer chase through the dead object to the stale reference to the dead mirror, we eventually find ourselves in an awkward situation where we try to read and use a Klass* that might really be a primitive vaue now (read: crash). The general rule of thumb is that pointer chasing through dead memory should NOT be done. We allow it in a few rare situations with the following constraints: 1) You have to use AS_NO_KEEPALIVE when reading dead oops, or things can blow up, 2) You may only read dead oops if you are the GC and hence can control that the memory it points at has not and will not be freed until your read finishes (...because you are the GC). Neither of these two constraints hold here. We read the mirrors without AS_NO_KEEPALIVE in the pointer chase, and we can not control that the memory it points at has not been freed. Therefore, this is an invalid use of pointer chasing through dead memory. The fix is simple: use the safe_object_iterate API instead, which only hands out live objects. I also sprinkled in no_keepalive decorators on the mirrors because it's good practice to not use that for such use cases where you clobber the whole heap (causing it to be marked in ZGC) but really just read an some int or something from the oop, without publishing any references to the oop. I tested this with 100 kitchensink iterations without my fix (failed 2 times) and 100 kitchensink iterations with my fix (failed 0 times). Bug: https://bugs.openjdk.java.net/browse/JDK-8224531 Webrev: http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ Thanks, /Erik From coleen.phillimore at oracle.com Wed Jul 3 12:10:22 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 3 Jul 2019 08:10:22 -0400 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> Message-ID: <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html There's another object_iterate() in this file with a comment to change it to safe_object_iterate().? Should you change that too? Did you run the jvmti tests?? There used to be tests that failed if dead objects weren't found, but the tests may have been fixed. The rest of the change looks good.? Thank you for figuring this out! Coleen On 7/3/19 6:15 AM, Erik ?sterlund wrote: > Hi, > > The heap inspection API performs pointer chasing through dead memory. > In particular, it populates a table with classes based on a heap walk > using CollectedHeap::object_iterate(). That API can give you dead > objects as well, whereas CollectedHeap::safe_object_iterate() gives > you only live objects. > The possibly dead objects have their possibly dead Klass* recorded. > Then we call Klass::collect_statistics on said possibly dead recorded > classes. In there, we follow the possibly dead mirror oop of the > possibly dead Klass, and subsequently read the Klass* of the possibly > dead mirror. > Here is an attempted ASCII art explaining this > > | Klass*????? | -> |??????????? | > | hi I am a?? |??? | hi I am a? | > | dead object |??? | dead Klass | > ?????????????????? |??????????? | > ?????????????????? | oop mirror | -> | Klass* ???? | -> | Class klass | > ???????????????????????????????????? | hi I am a?? | > ???????????????????????????????????? | dead mirror | > > So as you can see we pointer chase through this chain of possibly dead > memory. What could possibly go wrong though? > In CMS a crash can manifest in the following way: > > In a concurrent collection, both the object and its class die. They > are dead once we reach final marking. But the memory is kept around > and being swept by concurrent sweeping. > The sweeping yields to young collections (controllable through the > CMSYield flag). So what can happen is that the mirror is swept (which > already is a problem in debug builds because we zap the free chunk of > memory, but hold on there is a problem in in product builds too) and > gets added to a free list. In the yielded safepoint we may perform a > young collection that promotes objects to the memory of the freed > chunk (where the mirror used to be, except due to coalescing of freed > chunks, there might not be a Klass* pointer where there used to be one > for the mirror). And then, before sweeping finishes, the heap > inspection API is called. That API sometimes tries to perform a STW GC > first to get only live objects, but that GC may fail because of the > JNI gc locker. And then it just goes ahead calling the unsafe > object_iterate API anyway. The object_iterate API will pass in the > object to the closure but not the mirror as it has been freed and > reused. Buuut... since we pointer chase through the dead object to the > stale reference to the dead mirror, we eventually find ourselves in an > awkward situation where we try to read and use a Klass* that might > really be a primitive vaue now (read: crash). > > The general rule of thumb is that pointer chasing through dead memory > should NOT be done. We allow it in a few rare situations with the > following constraints: 1) You have to use AS_NO_KEEPALIVE when reading > dead oops, or things can blow up, 2) You may only read dead oops if > you are the GC and hence can control that the memory it points at has > not and will not be freed until your read finishes (...because you are > the GC). > Neither of these two constraints hold here. We read the mirrors > without AS_NO_KEEPALIVE in the pointer chase, and we can not control > that the memory it points at has not been freed. Therefore, this is an > invalid use of pointer chasing through dead memory. The fix is simple: > use the safe_object_iterate API instead, which only hands out live > objects. I also sprinkled in no_keepalive decorators on the mirrors > because it's good practice to not use that for such use cases where > you clobber the whole heap (causing it to be marked in ZGC) but really > just read an some int or something from the oop, without publishing > any references to the oop. > > I tested this with 100 kitchensink iterations without my fix (failed 2 > times) and 100 kitchensink iterations with my fix (failed 0 times). > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8224531 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ > > Thanks, > /Erik From erik.osterlund at oracle.com Wed Jul 3 12:48:41 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 3 Jul 2019 14:48:41 +0200 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> Message-ID: <005dde7d-d2bb-fa19-318c-9f130fa4c2de@oracle.com> Hi Coleen, Thanks for the review. On 2019-07-03 14:10, coleen.phillimore at oracle.com wrote: > > http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html > > > There's another object_iterate() in this file with a comment to change > it to safe_object_iterate().? Should you change that too? It probably should change, but it does not seem to have the same behaviour and I don't know if there is a real bug in that unrelated code. But yeah I would prefer that to change too, but I think that should be a separate change. > Did you run the jvmti tests?? There used to be tests that failed if > dead objects weren't found, but the tests may have been fixed. I will take it for a spin. Note though that behaviour relying on always getting dead objects will fail; the caller of the API will race with concurrent sweeping and either get or not get the dead objects depending on whether the heap iteration happened to kick in before or after sweeping. There is just no way you can rely on that. So if there is a test failure because of that, the test is wrong. Nevertheless, I will try to hunt down such tests. Thanks, /Erik > The rest of the change looks good.? Thank you for figuring this out! > Coleen > > On 7/3/19 6:15 AM, Erik ?sterlund wrote: >> Hi, >> >> The heap inspection API performs pointer chasing through dead memory. >> In particular, it populates a table with classes based on a heap walk >> using CollectedHeap::object_iterate(). That API can give you dead >> objects as well, whereas CollectedHeap::safe_object_iterate() gives >> you only live objects. >> The possibly dead objects have their possibly dead Klass* recorded. >> Then we call Klass::collect_statistics on said possibly dead recorded >> classes. In there, we follow the possibly dead mirror oop of the >> possibly dead Klass, and subsequently read the Klass* of the possibly >> dead mirror. >> Here is an attempted ASCII art explaining this >> >> | Klass*????? | -> |??????????? | >> | hi I am a?? |??? | hi I am a? | >> | dead object |??? | dead Klass | >> ?????????????????? |??????????? | >> ?????????????????? | oop mirror | -> | Klass* ???? | -> | Class klass | >> ???????????????????????????????????? | hi I am a?? | >> ???????????????????????????????????? | dead mirror | >> >> So as you can see we pointer chase through this chain of possibly >> dead memory. What could possibly go wrong though? >> In CMS a crash can manifest in the following way: >> >> In a concurrent collection, both the object and its class die. They >> are dead once we reach final marking. But the memory is kept around >> and being swept by concurrent sweeping. >> The sweeping yields to young collections (controllable through the >> CMSYield flag). So what can happen is that the mirror is swept (which >> already is a problem in debug builds because we zap the free chunk of >> memory, but hold on there is a problem in in product builds too) and >> gets added to a free list. In the yielded safepoint we may perform a >> young collection that promotes objects to the memory of the freed >> chunk (where the mirror used to be, except due to coalescing of freed >> chunks, there might not be a Klass* pointer where there used to be >> one for the mirror). And then, before sweeping finishes, the heap >> inspection API is called. That API sometimes tries to perform a STW >> GC first to get only live objects, but that GC may fail because of >> the JNI gc locker. And then it just goes ahead calling the unsafe >> object_iterate API anyway. The object_iterate API will pass in the >> object to the closure but not the mirror as it has been freed and >> reused. Buuut... since we pointer chase through the dead object to >> the stale reference to the dead mirror, we eventually find ourselves >> in an awkward situation where we try to read and use a Klass* that >> might really be a primitive vaue now (read: crash). >> >> The general rule of thumb is that pointer chasing through dead memory >> should NOT be done. We allow it in a few rare situations with the >> following constraints: 1) You have to use AS_NO_KEEPALIVE when >> reading dead oops, or things can blow up, 2) You may only read dead >> oops if you are the GC and hence can control that the memory it >> points at has not and will not be freed until your read finishes >> (...because you are the GC). >> Neither of these two constraints hold here. We read the mirrors >> without AS_NO_KEEPALIVE in the pointer chase, and we can not control >> that the memory it points at has not been freed. Therefore, this is >> an invalid use of pointer chasing through dead memory. The fix is >> simple: use the safe_object_iterate API instead, which only hands out >> live objects. I also sprinkled in no_keepalive decorators on the >> mirrors because it's good practice to not use that for such use cases >> where you clobber the whole heap (causing it to be marked in ZGC) but >> really just read an some int or something from the oop, without >> publishing any references to the oop. >> >> I tested this with 100 kitchensink iterations without my fix (failed >> 2 times) and 100 kitchensink iterations with my fix (failed 0 times). >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8224531 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ >> >> Thanks, >> /Erik > From coleen.phillimore at oracle.com Wed Jul 3 12:50:02 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 3 Jul 2019 08:50:02 -0400 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <005dde7d-d2bb-fa19-318c-9f130fa4c2de@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> <005dde7d-d2bb-fa19-318c-9f130fa4c2de@oracle.com> Message-ID: <43de7b9d-8888-967d-8eeb-510e6b932cf7@oracle.com> On 7/3/19 8:48 AM, Erik ?sterlund wrote: > Hi Coleen, > > Thanks for the review. > > On 2019-07-03 14:10, coleen.phillimore at oracle.com wrote: >> >> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html >> >> >> There's another object_iterate() in this file with a comment to >> change it to safe_object_iterate().? Should you change that too? > > It probably should change, but it does not seem to have the same > behaviour and I don't know if there is a real bug in that unrelated > code. But yeah I would prefer that to change too, but I think that > should be a separate change. That's fine. > >> Did you run the jvmti tests?? There used to be tests that failed if >> dead objects weren't found, but the tests may have been fixed. > > I will take it for a spin. Note though that behaviour relying on > always getting dead objects will fail; the caller of the API will race > with concurrent sweeping and either get or not get the dead objects > depending on whether the heap iteration happened to kick in before or > after sweeping. There is just no way you can rely on that. So if there > is a test failure because of that, the test is wrong. Nevertheless, I > will try to hunt down such tests. > Yes, please. Thanks, Coleen > Thanks, > /Erik > >> The rest of the change looks good.? Thank you for figuring this out! >> Coleen >> >> On 7/3/19 6:15 AM, Erik ?sterlund wrote: >>> Hi, >>> >>> The heap inspection API performs pointer chasing through dead >>> memory. In particular, it populates a table with classes based on a >>> heap walk using CollectedHeap::object_iterate(). That API can give >>> you dead objects as well, whereas >>> CollectedHeap::safe_object_iterate() gives you only live objects. >>> The possibly dead objects have their possibly dead Klass* recorded. >>> Then we call Klass::collect_statistics on said possibly dead >>> recorded classes. In there, we follow the possibly dead mirror oop >>> of the possibly dead Klass, and subsequently read the Klass* of the >>> possibly dead mirror. >>> Here is an attempted ASCII art explaining this >>> >>> | Klass*????? | -> |??????????? | >>> | hi I am a?? |??? | hi I am a? | >>> | dead object |??? | dead Klass | >>> ?????????????????? |??????????? | >>> ?????????????????? | oop mirror | -> | Klass* ???? | -> | Class klass | >>> ???????????????????????????????????? | hi I am a?? | >>> ???????????????????????????????????? | dead mirror | >>> >>> So as you can see we pointer chase through this chain of possibly >>> dead memory. What could possibly go wrong though? >>> In CMS a crash can manifest in the following way: >>> >>> In a concurrent collection, both the object and its class die. They >>> are dead once we reach final marking. But the memory is kept around >>> and being swept by concurrent sweeping. >>> The sweeping yields to young collections (controllable through the >>> CMSYield flag). So what can happen is that the mirror is swept >>> (which already is a problem in debug builds because we zap the free >>> chunk of memory, but hold on there is a problem in in product builds >>> too) and gets added to a free list. In the yielded safepoint we may >>> perform a young collection that promotes objects to the memory of >>> the freed chunk (where the mirror used to be, except due to >>> coalescing of freed chunks, there might not be a Klass* pointer >>> where there used to be one for the mirror). And then, before >>> sweeping finishes, the heap inspection API is called. That API >>> sometimes tries to perform a STW GC first to get only live objects, >>> but that GC may fail because of the JNI gc locker. And then it just >>> goes ahead calling the unsafe object_iterate API anyway. The >>> object_iterate API will pass in the object to the closure but not >>> the mirror as it has been freed and reused. Buuut... since we >>> pointer chase through the dead object to the stale reference to the >>> dead mirror, we eventually find ourselves in an awkward situation >>> where we try to read and use a Klass* that might really be a >>> primitive vaue now (read: crash). >>> >>> The general rule of thumb is that pointer chasing through dead >>> memory should NOT be done. We allow it in a few rare situations with >>> the following constraints: 1) You have to use AS_NO_KEEPALIVE when >>> reading dead oops, or things can blow up, 2) You may only read dead >>> oops if you are the GC and hence can control that the memory it >>> points at has not and will not be freed until your read finishes >>> (...because you are the GC). >>> Neither of these two constraints hold here. We read the mirrors >>> without AS_NO_KEEPALIVE in the pointer chase, and we can not control >>> that the memory it points at has not been freed. Therefore, this is >>> an invalid use of pointer chasing through dead memory. The fix is >>> simple: use the safe_object_iterate API instead, which only hands >>> out live objects. I also sprinkled in no_keepalive decorators on the >>> mirrors because it's good practice to not use that for such use >>> cases where you clobber the whole heap (causing it to be marked in >>> ZGC) but really just read an some int or something from the oop, >>> without publishing any references to the oop. >>> >>> I tested this with 100 kitchensink iterations without my fix (failed >>> 2 times) and 100 kitchensink iterations with my fix (failed 0 times). >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8224531 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ >>> >>> Thanks, >>> /Erik >> > From stefan.karlsson at oracle.com Wed Jul 3 13:31:27 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Wed, 3 Jul 2019 15:31:27 +0200 Subject: RFR: 8227175: ZGC: ZHeapIterator visits potentially dead objects Message-ID: Hi all, (Sending this RFR to hotspot-dev since it changes CLD claiming.) Please review this patch to fix the ZHeapIterator to not visit potentially dead objects. https://cr.openjdk.java.net/~stefank/8227175/ https://bugs.openjdk.java.net/browse/JDK-8227175 It changes how all heap iterations are done in ZGC. Previously, the marking code visited only the strong CLDs and traced through metadata to find all other CLDs that should be considered alive. The verification code, serviceability heap iterations, and marking without class unloading, skipped the metadata tracing part and visited all CLDs instead. Now, with this patch, all these heap iterations starts with the strong CLDs and trace through the object graph. One complication with that scheme is that non-GC heap iterations might be executing after concurrent marking has started, but before dead CLDs have been unlinked. To allow the GC marking code and one, at a time, non-GC heap iteration to run at the same time, I've introduced a new claim bit in the CLD claiming byte. I've called it "other", so now we have "strong", "finalizable", and "other". The contract is that the "other" bits should only be used in a safepoint operation, and must be cleared before the operation ends. This way we get mutual exclusion between different users of the "other" bits. The patch also adds more precise verification of ZGC references. This patch was written a few weeks ago to make the verification of ZGC references more precise. I've been using it since then to get better verification of other patches and when hunting for bugs. This means I've been running it through tier 1-7 multiple times. The intent was to get this pushed to JDK 14, but now that we've seen that we have an actual bug because of the imprecise nature of the ZHeapIterator, I'd like to get this patch pushed to JDK 13. We considered trying to split this up into two parts, the first part that fixes the heap iterations and the second part that adds the extra ZGC verification, but we thing that would take longer time and be riskier. Thanks, StefanK From zgu at redhat.com Wed Jul 3 14:03:43 2019 From: zgu at redhat.com (Zhengyu Gu) Date: Wed, 3 Jul 2019 10:03:43 -0400 Subject: RFR: 8227175: ZGC: ZHeapIterator visits potentially dead objects In-Reply-To: References: Message-ID: Hi Stefan, Runtime part looks good to me. I had the same thoughts when moving Shenandoah CLDG evacuation to concurrent phase, then heap dump started to interfere concurrent CLDG iteration. Because Shenandoah heap dump uses single-thread to walk CLDG, we used _claimed_none to avoid the problem. The change can potentially allow us to relax the restriction. Thanks, -Zhengyu On 7/3/19 9:31 AM, Stefan Karlsson wrote: > Hi all, > > (Sending this RFR to hotspot-dev since it changes CLD claiming.) > > Please review this patch to fix the ZHeapIterator to not visit > potentially dead objects. > > https://cr.openjdk.java.net/~stefank/8227175/ > https://bugs.openjdk.java.net/browse/JDK-8227175 > > It changes how all heap iterations are done in ZGC. Previously, the > marking code visited only the strong CLDs and traced through metadata to > find all other CLDs that should be considered alive. The verification > code, serviceability heap iterations, and marking without class > unloading, skipped the metadata tracing part and visited all CLDs > instead. Now, with this patch, all these heap iterations starts with the > strong CLDs and trace through the object graph. > > One complication with that scheme is that non-GC heap iterations might > be executing after concurrent marking has started, but before dead CLDs > have been unlinked. To allow the GC marking code and one, at a time, > non-GC heap iteration to run at the same time, I've introduced a new > claim bit in the CLD claiming byte. I've called it "other", so now we > have "strong", "finalizable", and "other". The contract is that the > "other" bits should only be used in a safepoint operation, and must be > cleared before the operation ends. This way we get mutual exclusion > between different users of the "other" bits. > > The patch also adds more precise verification of ZGC references. > > This patch was written a few weeks ago to make the verification of ZGC > references more precise. I've been using it since then to get better > verification of other patches and when hunting for bugs. This means I've > been running it through tier 1-7 multiple times. The intent was to get > this pushed to JDK 14, but now that we've seen that we have an actual > bug because of the imprecise nature of the ZHeapIterator, I'd like to > get this patch pushed to JDK 13. We considered trying to split this up > into two parts, the first part that fixes the heap iterations and the > second part that adds the extra ZGC verification, but we thing that > would take longer time and be riskier. > > Thanks, > StefanK From erik.osterlund at oracle.com Wed Jul 3 14:18:39 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 3 Jul 2019 16:18:39 +0200 Subject: RFR: 8227175: ZGC: ZHeapIterator visits potentially dead objects In-Reply-To: References: Message-ID: <0a8696ee-241f-b9b1-4ee1-1045d0cc21c9@oracle.com> Hi Stefan, Looks good. Thanks, /Erik On 2019-07-03 15:31, Stefan Karlsson wrote: > Hi all, > > (Sending this RFR to hotspot-dev since it changes CLD claiming.) > > Please review this patch to fix the ZHeapIterator to not visit > potentially dead objects. > > https://cr.openjdk.java.net/~stefank/8227175/ > https://bugs.openjdk.java.net/browse/JDK-8227175 > > It changes how all heap iterations are done in ZGC. Previously, the > marking code visited only the strong CLDs and traced through metadata > to find all other CLDs that should be considered alive. The > verification code, serviceability heap iterations, and marking without > class unloading, skipped the metadata tracing part and visited all > CLDs instead. Now, with this patch, all these heap iterations starts > with the strong CLDs and trace through the object graph. > > One complication with that scheme is that non-GC heap iterations might > be executing after concurrent marking has started, but before dead > CLDs have been unlinked. To allow the GC marking code and one, at a > time, non-GC heap iteration to run at the same time, I've introduced a > new claim bit in the CLD claiming byte. I've called it "other", so now > we have "strong", "finalizable", and "other". The contract is that the > "other" bits should only be used in a safepoint operation, and must be > cleared before the operation ends. This way we get mutual exclusion > between different users of the "other" bits. > > The patch also adds more precise verification of ZGC references. > > This patch was written a few weeks ago to make the verification of ZGC > references more precise. I've been using it since then to get better > verification of other patches and when hunting for bugs. This means > I've been running it through tier 1-7 multiple times. The intent was > to get this pushed to JDK 14, but now that we've seen that we have an > actual bug because of the imprecise nature of the ZHeapIterator, I'd > like to get this patch pushed to JDK 13. We considered trying to split > this up into two parts, the first part that fixes the heap iterations > and the second part that adds the extra ZGC verification, but we thing > that would take longer time and be riskier. > > Thanks, > StefanK From adam.farley at uk.ibm.com Wed Jul 3 15:42:29 2019 From: adam.farley at uk.ibm.com (Adam Farley8) Date: Wed, 3 Jul 2019 16:42:29 +0100 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com>

Message-ID: Hi David, I figured it should be elaborate so we can avoid killing the VM if we don't have to. Ultimately, if we have a list of three paths and the last two are invalid, does it matter so long as all the libraries we need are in the first path? As to your question "is it in hostpot or JDK code", I presume you mean in the change set. I'm primarily referring to the hotspot code. Also, if we end up adopting a "kill the vm if any path is too long" approach, we still need to change the JDK code, as those currently seem to want to fail if the total length of the sub.boot.library.path property is longer than the maximum length of a single path. So if you pass in three 100 character paths on Windows, it'll fail because they add up to more than the 260 character path limit. Best Regards Adam Farley IBM Runtimes David Holmes wrote on 03/07/2019 08:36:36: > From: David Holmes > To: Adam Farley8 > Cc: hotspot-dev at openjdk.java.net > Date: 03/07/2019 08:36 > Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > paths are longer than JVM_MAXPATHLEN > > On 2/07/2019 7:44 pm, Adam Farley8 wrote: > > Hi David, > > > > Thanks for your thoughts. > > > > The user should absolutely have immediate feedback, yes, and I agree > > that "skipping" paths could lead to us loading the wrong library. > > > > Perhaps a compromise? We fire off a stderr warning if any of the paths > > are too long (without killing the VM), we ignore any path *after* > > (and including) the first too-long path, and we kill the VM if the > > first path is too long. > > My first though is why be so elaborate and not just fail immediately: > > Error occurred during initialization of VM > One or more sun.boot.library.path elements is too long for this system. > --- > > ? But AFAICS we don't do any sanity checking of the those paths so this > would have an impact on startup. > > I can't locate where we would detect the too-long path element, is it in > hostpot or JDK code? > > Thanks, > David > ----- > > > Warning message example: > > > > ---- > > Warning: One or more sun.boot.library.path paths were too long > > for this system, and it (along with all subsequent paths) have been > > ignored. > > ---- > > > > Another addition could be to check the path lengths for the property > > sooner, thus aborting the VM faster if the default path is too long. > > > > Assuming we posit that the VM will always need to load libraries. > > > > Best Regards > > > > Adam Farley > > IBM Runtimes > > > > > > David Holmes wrote on 01/07/2019 22:10:45: > > > >> From: David Holmes > >> To: Adam Farley8 , hotspot-dev at openjdk.java.net > >> Date: 01/07/2019 22:12 > >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > >> paths are longer than JVM_MAXPATHLEN > >> > >> Hi Adam, > >> > >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: > >> > Hi All, > >> > > >> > The title say it all. > >> > > >> > If you pass in a value for sun.boot.library.path consisting > >> > of one or more paths that are too long, then the vm will > >> > fail to start because it can't load one of the libraries it > >> > needs (the zip library), despite the fact that the VM > >> > automatically prepends the default library path to the > >> > sun.boot.library.path property, using the correct separator > >> > to divide it from the user-specified path. > >> > > >> > So we've got the right path, in the right place, at the > >> > right time, we just can't *use* it. > >> > > >> > I've fixed this by changing the relevant os.cpp code to > >> > ignore paths that are too long, and to attempt to locate > >> > the needed library on the other paths (if any are valid). > >> > >> As I just added to the bug report I have a different view of "correct" > >> here. If you just ignore the long path and keep processing other short > >> paths you may find the wrong library. There is a user error here and > >> that error should be reported ASAP and in a way that leads to failure > >> ASAP. Perhaps we should be more aggressive in aborting the VM when this > >> is detected? > >> > >> David > >> ----- > >> > >> > I've also added functionality to handle the edge case of > >> > paths that are neeeeeeearly too long, only for a > >> > sub-path (or file name) to push us over the limit *after* > >> > the split_path function is done assessing the path length. > >> > > >> > I've also changed the code we're overriding, on the assumption > >> > that someone's still using it somewhere. > >> > > >> > Bug: https://urldefense.proofpoint.com/v2/url? > >> > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- > >> siA1ZOg&r=P5m8KWUXJf- > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= > >> > Webrev: https://urldefense.proofpoint.com/v2/url? > >> > u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- > >> siA1ZOg&r=P5m8KWUXJf- > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- > >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= > >> > > >> > Thoughts and impressions welcome. > >> > > >> > Best Regards > >> > > >> > Adam Farley > >> > IBM Runtimes > >> > > >> > Unless stated otherwise above: > >> > IBM United Kingdom Limited - Registered in England and Wales with number > >> > 741598. > >> > Registered office: PO Box 41, North Harbour, Portsmouth, > Hampshire PO6 3AU > >> > > >> > > > > Unless stated otherwise above: > > IBM United Kingdom Limited - Registered in England and Wales with number > > 741598. > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From kim.barrett at oracle.com Wed Jul 3 18:55:57 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 3 Jul 2019 14:55:57 -0400 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> Message-ID: <39D8C321-D15C-41A4-86D9-403FDB75B224@oracle.com> > On Jul 3, 2019, at 6:15 AM, Erik ?sterlund wrote: > > [?] > > So as you can see we pointer chase through this chain of possibly dead memory. What could possibly go wrong though? Hahaha! Thanks for the detailed description. > Bug: > https://bugs.openjdk.java.net/browse/JDK-8224531 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ > > Thanks, > /Erik Looks good. From kim.barrett at oracle.com Wed Jul 3 18:57:46 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Wed, 3 Jul 2019 14:57:46 -0400 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <005dde7d-d2bb-fa19-318c-9f130fa4c2de@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> <005dde7d-d2bb-fa19-318c-9f130fa4c2de@oracle.com> Message-ID: <9BFB4618-02B1-47CD-9B83-76AC448B82C4@oracle.com> > On Jul 3, 2019, at 8:48 AM, Erik ?sterlund wrote: >> Did you run the jvmti tests? There used to be tests that failed if dead objects weren't found, but the tests may have been fixed. > > I will take it for a spin. Note though that behaviour relying on always getting dead objects will fail; the caller of the API will race with concurrent sweeping and either get or not get the dead objects depending on whether the heap iteration happened to kick in before or after sweeping. There is just no way you can rely on that. So if there is a test failure because of that, the test is wrong. Nevertheless, I will try to hunt down such tests. Good. If there are still such tests, they need to be fixed. That shouldn?t hold back this change. From thomas.schatzl at oracle.com Wed Jul 3 19:31:58 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Wed, 03 Jul 2019 21:31:58 +0200 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> Message-ID: Hi, On Wed, 2019-07-03 at 08:10 -0400, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html > > There's another object_iterate() in this file with a comment to > change it to safe_object_iterate(). Should you change that too? I think this is a different issue as Erik pointed out, this is iteration during a safepoint. Not that I think that this is much safer *and* there is already the comment there that this might not work with CMS either. Erik, can you file a CR? > > Did you run the jvmti tests? There used to be tests that failed if > dead objects weren't found, but the tests may have been fixed. It would be nice to at least know which jvmti tests iterate over dead objects before pushing this if possible. > > The rest of the change looks good. Thank you for figuring this out! Change looks good. Thanks, Thomas From david.holmes at oracle.com Thu Jul 4 06:57:14 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 16:57:14 +1000 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com>

Message-ID: <842ae03e-8574-593e-3ac2-5cc283832be9@oracle.com> Hi Adam, On 4/07/2019 1:42 am, Adam Farley8 wrote: > Hi David, > > I figured it should be elaborate so we can avoid killing the VM > if we don't have to. > > Ultimately, if we have a list of three paths and the last two > are invalid, does it matter so long as all the libraries we need > are in the first path? I prefer not see the users error ignored if we can reasonably detect it. They set the paths for a reason, and if they paths are invalid they probably would like to know. > As to your question "is it in hostpot or JDK code", I presume you > mean in the change set. I'm primarily referring to the hotspot code. No I mean where in the current code will we detect that one of these path elements is too long? > Also, if we end up adopting a "kill the vm if any path is too long" > approach, we still need to change the JDK code, as those currently > seem to want to fail if the total length of the sub.boot.library.path > property is longer than the maximum length of a single path. > > So if you pass in three 100 character paths on Windows, it'll fail > because they add up to more than the 260 character path limit. That seems like a separate bug that should be addressed. :( Thanks, David > Best Regards > > Adam Farley > IBM Runtimes > > > David Holmes wrote on 03/07/2019 08:36:36: > >> From: David Holmes >> To: Adam Farley8 >> Cc: hotspot-dev at openjdk.java.net >> Date: 03/07/2019 08:36 >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path >> paths are longer than JVM_MAXPATHLEN >> >> On 2/07/2019 7:44 pm, Adam Farley8 wrote: >> > Hi David, >> > >> > Thanks for your thoughts. >> > >> > The user should absolutely have immediate feedback, yes, and I agree >> > that "skipping" paths could lead to us loading the wrong library. >> > >> > Perhaps a compromise? We fire off a stderr warning if any of the paths >> > are too long (without killing the VM), we ignore any path *after* >> > (and including) the first too-long path, and we kill the VM if the >> > first path is too long. >> >> My first though is why be so elaborate and not just fail immediately: >> >> Error occurred during initialization of VM >> One or more sun.boot.library.path elements is too long for this system. >> --- >> >> ? But AFAICS we don't do any sanity checking of the those paths so this >> would have an impact on startup. >> >> I can't locate where we would detect the too-long path element, is it in >> hostpot or JDK code? >> >> Thanks, >> David >> ----- >> >> > Warning message example: >> > >> > ---- >> > Warning: One or more sun.boot.library.path paths were too long >> > for this system, and it (along with all subsequent paths) have been >> > ignored. >> > ---- >> > >> > Another addition could be to check the path lengths for the property >> > sooner, thus aborting the VM faster if the default path is too long. >> > >> > Assuming we posit that the VM will always need to load libraries. >> > >> > Best Regards >> > >> > Adam Farley >> > IBM Runtimes >> > >> > >> > David Holmes wrote on 01/07/2019 22:10:45: >> > >> >> From: David Holmes >> >> To: Adam Farley8 , ?hotspot-dev at openjdk.java.net >> >> Date: 01/07/2019 22:12 >> >> Subject: Re: RFR: JDK-8227021: ?VM fails if any sun.boot.library.path >> >> paths are longer than JVM_MAXPATHLEN >> >> >> >> Hi Adam, >> >> >> >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: >> >> > Hi All, >> >> > >> >> > The title say it all. >> >> > >> >> > If you pass in a value for sun.boot.library.path consisting >> >> > of one or more paths that are too long, then the vm will >> >> > fail to start because it can't load one of the libraries it >> >> > needs (the zip library), despite the fact that the VM >> >> > automatically prepends the default library path to the >> >> > sun.boot.library.path property, using the correct separator >> >> > to divide it from the user-specified path. >> >> > >> >> > So we've got the right path, in the right place, at the >> >> > right time, we just can't *use* it. >> >> > >> >> > I've fixed this by changing the relevant os.cpp code to >> >> > ignore paths that are too long, and to attempt to locate >> >> > the needed library on the other paths (if any are valid). >> >> >> >> As I just added to the bug report I have a different view of "correct" >> >> here. If you just ignore the long path and keep processing other short >> >> paths you may find the wrong library. There is a user error here and >> >> that error should be reported ASAP and in a way that leads to failure >> >> ASAP. Perhaps we should be more aggressive in aborting the VM when ?this >> >> is detected? >> >> >> >> David >> >> ----- >> >> >> >> > I've also added functionality to handle the edge case of >> >> > paths that are neeeeeeearly too long, only for a >> >> > sub-path (or file name) to push us over the limit *after* >> >> > the split_path function is done assessing the path length. >> >> > >> >> > I've also changed the code we're overriding, on the assumption >> >> > that someone's still using it somewhere. >> >> > >> >> > Bug: https://urldefense.proofpoint.com/v2/url? >> >> >> u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- >> >> siA1ZOg&r=P5m8KWUXJf- >> >> >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= >> >> > Webrev: https://urldefense.proofpoint.com/v2/url? >> >> >> u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- >> >> siA1ZOg&r=P5m8KWUXJf- >> >> >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- >> >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= >> >> > >> >> > Thoughts and impressions welcome. >> >> > >> >> > Best Regards >> >> > >> >> > Adam Farley >> >> > IBM Runtimes >> >> > >> >> > Unless stated otherwise above: >> >> > IBM United Kingdom Limited - Registered in England and Wales ?with number >> >> > 741598. >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, >> Hampshire ?PO6 3AU >> >> > >> >> >> > >> > Unless stated otherwise above: >> > IBM United Kingdom Limited - Registered in England and Wales with number >> > 741598. >> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From david.holmes at oracle.com Thu Jul 4 07:38:06 2019 From: david.holmes at oracle.com (David Holmes) Date: Thu, 4 Jul 2019 17:38:06 +1000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: References: Message-ID: Hi Matthias, On 27/06/2019 6:56 pm, Baesken, Matthias wrote: > Hello, please review the following small patch . > It adds event logging to the UserHandler (user signal handler) calls . That seems reasonable. > (additionally it adds a function os::win32::get_signal_name > to get signal names for signal numbers ; this is similar to what we already had for posix ). If you add this then we don't need distinct POSIX and non-POSIX versions - the existing os::Posix::get_signal_name etc could all be hoisted into os.cpp and the os class - no? Aside: I spotted this in UserHandler: // 4511530 - sem_post is serialized and handled by the manager thread. When // the program is interrupted by Ctrl-C, SIGINT is sent to every thread. We // don't want to flood the manager thread with sem_post requests. if (sig == SIGINT && Atomic::add(1, &sigint_count) > 1) return; That's a LinuxThreads anachronism which has been copied, unnecessarily into the other OS implementations. I will file a RFE to get rid of it. Thanks, David > > > Bug/webrev : > > https://bugs.openjdk.java.net/browse/JDK-8226816 > > http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.0/ > > Thanks, Matthias > From stefan.karlsson at oracle.com Thu Jul 4 09:48:53 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 4 Jul 2019 11:48:53 +0200 Subject: RFR: 8227175: ZGC: ZHeapIterator visits potentially dead objects In-Reply-To: References:

Message-ID: <003b5282-abf0-1820-4fb9-c9856850078b@oracle.com> Thanks for reviewing! StefanK On 2019-07-03 16:03, Zhengyu Gu wrote: > Hi Stefan, > > Runtime part looks good to me. > > I had the same thoughts when moving Shenandoah CLDG evacuation to > concurrent phase, then heap dump started to interfere concurrent CLDG > iteration. Because Shenandoah heap dump uses single-thread to walk CLDG, > we used _claimed_none to avoid the problem. The change can potentially > allow us to relax the restriction. > > Thanks, > > -Zhengyu > > > > On 7/3/19 9:31 AM, Stefan Karlsson wrote: >> Hi all, >> >> (Sending this RFR to hotspot-dev since it changes CLD claiming.) >> >> Please review this patch to fix the ZHeapIterator to not visit >> potentially dead objects. >> >> https://cr.openjdk.java.net/~stefank/8227175/ >> https://bugs.openjdk.java.net/browse/JDK-8227175 >> >> It changes how all heap iterations are done in ZGC. Previously, the >> marking code visited only the strong CLDs and traced through metadata >> to find all other CLDs that should be considered alive. The >> verification code, serviceability heap iterations, and marking without >> class unloading, skipped the metadata tracing part and visited all >> CLDs instead. Now, with this patch, all these heap iterations starts >> with the strong CLDs and trace through the object graph. >> >> One complication with that scheme is that non-GC heap iterations might >> be executing after concurrent marking has started, but before dead >> CLDs have been unlinked. To allow the GC marking code and one, at a >> time, non-GC heap iteration to run at the same time, I've introduced a >> new claim bit in the CLD claiming byte. I've called it "other", so now >> we have "strong", "finalizable", and "other". The contract is that the >> "other" bits should only be used in a safepoint operation, and must be >> cleared before the operation ends. This way we get mutual exclusion >> between different users of the "other" bits. >> >> The patch also adds more precise verification of ZGC references. >> >> This patch was written a few weeks ago to make the verification of ZGC >> references more precise. I've been using it since then to get better >> verification of other patches and when hunting for bugs. This means >> I've been running it through tier 1-7 multiple times. The intent was >> to get this pushed to JDK 14, but now that we've seen that we have an >> actual bug because of the imprecise nature of the ZHeapIterator, I'd >> like to get this patch pushed to JDK 13. We considered trying to split >> this up into two parts, the first part that fixes the heap iterations >> and the second part that adds the extra ZGC verification, but we thing >> that would take longer time and be riskier. >> >> Thanks, >> StefanK From stefan.karlsson at oracle.com Thu Jul 4 09:53:35 2019 From: stefan.karlsson at oracle.com (Stefan Karlsson) Date: Thu, 4 Jul 2019 11:53:35 +0200 Subject: RFR: 8227175: ZGC: ZHeapIterator visits potentially dead objects In-Reply-To: <0a8696ee-241f-b9b1-4ee1-1045d0cc21c9@oracle.com> References: <0a8696ee-241f-b9b1-4ee1-1045d0cc21c9@oracle.com> Message-ID: Thanks for reviewing! StefanK On 2019-07-03 16:18, wrote: > Hi Stefan, > > Looks good. > > Thanks, > /Erik > > On 2019-07-03 15:31, Stefan Karlsson wrote: >> Hi all, >> >> (Sending this RFR to hotspot-dev since it changes CLD claiming.) >> >> Please review this patch to fix the ZHeapIterator to not visit >> potentially dead objects. >> >> https://cr.openjdk.java.net/~stefank/8227175/ >> https://bugs.openjdk.java.net/browse/JDK-8227175 >> >> It changes how all heap iterations are done in ZGC. Previously, the >> marking code visited only the strong CLDs and traced through metadata >> to find all other CLDs that should be considered alive. The >> verification code, serviceability heap iterations, and marking without >> class unloading, skipped the metadata tracing part and visited all >> CLDs instead. Now, with this patch, all these heap iterations starts >> with the strong CLDs and trace through the object graph. >> >> One complication with that scheme is that non-GC heap iterations might >> be executing after concurrent marking has started, but before dead >> CLDs have been unlinked. To allow the GC marking code and one, at a >> time, non-GC heap iteration to run at the same time, I've introduced a >> new claim bit in the CLD claiming byte. I've called it "other", so now >> we have "strong", "finalizable", and "other". The contract is that the >> "other" bits should only be used in a safepoint operation, and must be >> cleared before the operation ends. This way we get mutual exclusion >> between different users of the "other" bits. >> >> The patch also adds more precise verification of ZGC references. >> >> This patch was written a few weeks ago to make the verification of ZGC >> references more precise. I've been using it since then to get better >> verification of other patches and when hunting for bugs. This means >> I've been running it through tier 1-7 multiple times. The intent was >> to get this pushed to JDK 14, but now that we've seen that we have an >> actual bug because of the imprecise nature of the ZHeapIterator, I'd >> like to get this patch pushed to JDK 13. We considered trying to split >> this up into two parts, the first part that fixes the heap iterations >> and the second part that adds the extra ZGC verification, but we thing >> that would take longer time and be riskier. >> >> Thanks, >> StefanK > From erik.osterlund at oracle.com Thu Jul 4 12:55:27 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 4 Jul 2019 14:55:27 +0200 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: <39D8C321-D15C-41A4-86D9-403FDB75B224@oracle.com> References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <39D8C321-D15C-41A4-86D9-403FDB75B224@oracle.com> Message-ID: Hi Kim, Thanks for the review. /Erik On 2019-07-03 20:55, Kim Barrett wrote: >> On Jul 3, 2019, at 6:15 AM, Erik ?sterlund wrote: >> >> [?] >> >> So as you can see we pointer chase through this chain of possibly dead memory. What could possibly go wrong though? > Hahaha! > > Thanks for the detailed description. > >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8224531 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/ >> >> Thanks, >> /Erik > Looks good. > From erik.osterlund at oracle.com Thu Jul 4 12:59:11 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 4 Jul 2019 14:59:11 +0200 Subject: RFR[13]: 8224531: SEGV while collecting Klass statistics In-Reply-To: References: <479363ab-3157-03c0-c05f-b03b809cde91@oracle.com> <0208491e-37a3-6a84-0f80-d122d28ea4b7@oracle.com> Message-ID: Hi Thomas, Thanks for the review. On 2019-07-03 21:31, Thomas Schatzl wrote: > Hi, > > On Wed, 2019-07-03 at 08:10 -0400, coleen.phillimore at oracle.com wrote: > http://cr.openjdk.java.net/~eosterlund/8224531/webrev.00/src/hotspot/share/memory/heapInspection.cpp.frames.html >> There's another object_iterate() in this file with a comment to >> change it to safe_object_iterate(). Should you change that too? > I think this is a different issue as Erik pointed out, this is > iteration during a safepoint. Not that I think that this is much safer > *and* there is already the comment there that this might not work with > CMS either. > Erik, can you file a CR? Sure, can do. Thanks, /Erik >> Did you run the jvmti tests? There used to be tests that failed if >> dead objects weren't found, but the tests may have been fixed. > It would be nice to at least know which jvmti tests iterate over dead > objects before pushing this if possible. > >> The rest of the change looks good. Thank you for figuring this out! > Change looks good. > > Thanks, > Thomas > > From matthias.baesken at sap.com Thu Jul 4 13:06:27 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Thu, 4 Jul 2019 13:06:27 +0000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: References:

Message-ID: Hi David, thanks for looking into this . > > If you add this then we don't need distinct POSIX and non-POSIX versions > - the existing os::Posix::get_signal_name etc could all be hoisted into > os.cpp and the os class - no? > Should I go for this ? The coding is still a little different (e.g. is_valid_signal (.. ) call in os_posix ) but I think it could be done without much trouble (maybe with a few small ifdefs ) . > That's a LinuxThreads anachronism which has been copied, unnecessarily > into the other OS implementations. I will file a RFE to get rid of it. Good catch ! Best regards, Matthias > > Hi Matthias, > > On 27/06/2019 6:56 pm, Baesken, Matthias wrote: > > Hello, please review the following small patch . > > It adds event logging to the UserHandler (user signal handler) calls . > > That seems reasonable. > > > (additionally it adds a function os::win32::get_signal_name > > to get signal names for signal numbers ; this is similar to what we already > had for posix ). > > If you add this then we don't need distinct POSIX and non-POSIX versions > - the existing os::Posix::get_signal_name etc could all be hoisted into > os.cpp and the os class - no? > > Aside: I spotted this in UserHandler: > > // 4511530 - sem_post is serialized and handled by the manager > thread. When > // the program is interrupted by Ctrl-C, SIGINT is sent to every > thread. We > // don't want to flood the manager thread with sem_post requests. > if (sig == SIGINT && Atomic::add(1, &sigint_count) > 1) > return; > > That's a LinuxThreads anachronism which has been copied, unnecessarily > into the other OS implementations. I will file a RFE to get rid of it. > > Thanks, > David > > > > > > > Bug/webrev : > > > > https://bugs.openjdk.java.net/browse/JDK-8226816 > > > > http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.0/ > > > > Thanks, Matthias > > From erik.osterlund at oracle.com Thu Jul 4 15:02:52 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 4 Jul 2019 17:02:52 +0200 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls Message-ID: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> Hi, The i2c adapter sets a thread-local "callee_target" Method*, which is caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c call is "bad" (e.g. not_entrant). This error handler forwards execution to the callee c2i entry. If the SharedRuntime::handle_wrong_method method is called again due to the i2c2i call being still bad, then we will crash the VM in the following guarantee in SharedRuntime::handle_wrong_method: Method* callee = thread->callee_target(); guarantee(callee != NULL && callee->is_method(), "bad handshake"); Unfortunately, the c2i entry can indeed fail again if it, e.g., hits the new class initialization entry barrier of the c2i adapter. The solution is to simply not clear the thread-local "callee_target" after handling the first failure, as we can't really know there won't be another one. There is no reason to clear this value as nobody else reads it than the SharedRuntime::handle_wrong_method handler (and we really do want it to be able to read the value as many times as it takes until the call goes through). I found some confused clearing of this callee_target in JavaThread::oops_do(), with a comment saying this is a methodOop that we need to clear to make GC happy or something. Seems like old traces of perm gen. So I deleted that too. I caught this in ZGC where the timing window for hitting this issue seems to be wider due to concurrent code cache unloading. But it is equally problematic for all GCs. Bug: https://bugs.openjdk.java.net/browse/JDK-8227260 Webrev: http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ Thanks, /Erik From adam.farley at uk.ibm.com Thu Jul 4 16:41:23 2019 From: adam.farley at uk.ibm.com (Adam Farley8) Date: Thu, 4 Jul 2019 17:41:23 +0100 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: <842ae03e-8574-593e-3ac2-5cc283832be9@oracle.com> References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com>

<842ae03e-8574-593e-3ac2-5cc283832be9@oracle.com> Message-ID: Hi David, To detect a too-long path when it's being passed in, the best option I can see is to check it in two places: 1) when it's being set initially with the location of libjvm.so, either: a)in hotspot/os/[os name]/os_[os name].cpp, right before the call to Arguments::set_dll_dir or b), in the Arguments::set_dll_dir function itself (ideally the latter) 2) when/if the extra paths are being passed in as a parameter, as they pass through hotspot/share/runtime/arguments.cpp, right after the line: --- else if (strcmp(key, "sun.boot.library.path") == 0)"); --- You're right in that this could slow down startup a little, with the length checking, and the potential looping over the -D value to check the length of each path. Not a major slowdown though. Best Regards Adam Farley IBM Runtimes David Holmes wrote on 04/07/2019 07:57:14: > From: David Holmes > To: Adam Farley8 > Cc: hotspot-dev at openjdk.java.net > Date: 04/07/2019 07:58 > Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > paths are longer than JVM_MAXPATHLEN > > Hi Adam, > > On 4/07/2019 1:42 am, Adam Farley8 wrote: > > Hi David, > > > > I figured it should be elaborate so we can avoid killing the VM > > if we don't have to. > > > > Ultimately, if we have a list of three paths and the last two > > are invalid, does it matter so long as all the libraries we need > > are in the first path? > > I prefer not see the users error ignored if we can reasonably detect it. > They set the paths for a reason, and if they paths are invalid they > probably would like to know. > > > As to your question "is it in hostpot or JDK code", I presume you > > mean in the change set. I'm primarily referring to the hotspot code. > > No I mean where in the current code will we detect that one of these > path elements is too long? > > > Also, if we end up adopting a "kill the vm if any path is too long" > > approach, we still need to change the JDK code, as those currently > > seem to want to fail if the total length of the sub.boot.library.path > > property is longer than the maximum length of a single path. > > > > So if you pass in three 100 character paths on Windows, it'll fail > > because they add up to more than the 260 character path limit. > > That seems like a separate bug that should be addressed. :( > > Thanks, > David > > > Best Regards > > > > Adam Farley > > IBM Runtimes > > > > > > David Holmes wrote on 03/07/2019 08:36:36: > > > >> From: David Holmes > >> To: Adam Farley8 > >> Cc: hotspot-dev at openjdk.java.net > >> Date: 03/07/2019 08:36 > >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > >> paths are longer than JVM_MAXPATHLEN > >> > >> On 2/07/2019 7:44 pm, Adam Farley8 wrote: > >> > Hi David, > >> > > >> > Thanks for your thoughts. > >> > > >> > The user should absolutely have immediate feedback, yes, and I agree > >> > that "skipping" paths could lead to us loading the wrong library. > >> > > >> > Perhaps a compromise? We fire off a stderr warning if any of the paths > >> > are too long (without killing the VM), we ignore any path *after* > >> > (and including) the first too-long path, and we kill the VM if the > >> > first path is too long. > >> > >> My first though is why be so elaborate and not just fail immediately: > >> > >> Error occurred during initialization of VM > >> One or more sun.boot.library.path elements is too long for this system. > >> --- > >> > >> ? But AFAICS we don't do any sanity checking of the those paths so this > >> would have an impact on startup. > >> > >> I can't locate where we would detect the too-long path element, is it in > >> hostpot or JDK code? > >> > >> Thanks, > >> David > >> ----- > >> > >> > Warning message example: > >> > > >> > ---- > >> > Warning: One or more sun.boot.library.path paths were too long > >> > for this system, and it (along with all subsequent paths) have been > >> > ignored. > >> > ---- > >> > > >> > Another addition could be to check the path lengths for the property > >> > sooner, thus aborting the VM faster if the default path is too long. > >> > > >> > Assuming we posit that the VM will always need to load libraries. > >> > > >> > Best Regards > >> > > >> > Adam Farley > >> > IBM Runtimes > >> > > >> > > >> > David Holmes wrote on 01/07/2019 22:10:45: > >> > > >> >> From: David Holmes > >> >> To: Adam Farley8 , hotspot-dev at openjdk.java.net > >> >> Date: 01/07/2019 22:12 > >> >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > >> >> paths are longer than JVM_MAXPATHLEN > >> >> > >> >> Hi Adam, > >> >> > >> >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: > >> >> > Hi All, > >> >> > > >> >> > The title say it all. > >> >> > > >> >> > If you pass in a value for sun.boot.library.path consisting > >> >> > of one or more paths that are too long, then the vm will > >> >> > fail to start because it can't load one of the libraries it > >> >> > needs (the zip library), despite the fact that the VM > >> >> > automatically prepends the default library path to the > >> >> > sun.boot.library.path property, using the correct separator > >> >> > to divide it from the user-specified path. > >> >> > > >> >> > So we've got the right path, in the right place, at the > >> >> > right time, we just can't *use* it. > >> >> > > >> >> > I've fixed this by changing the relevant os.cpp code to > >> >> > ignore paths that are too long, and to attempt to locate > >> >> > the needed library on the other paths (if any are valid). > >> >> > >> >> As I just added to the bug report I have a different view of "correct" > >> >> here. If you just ignore the long path and keep processing other short > >> >> paths you may find the wrong library. There is a user error here and > >> >> that error should be reported ASAP and in a way that leads to failure > >> >> ASAP. Perhaps we should be more aggressive in aborting the VMwhen this > >> >> is detected? > >> >> > >> >> David > >> >> ----- > >> >> > >> >> > I've also added functionality to handle the edge case of > >> >> > paths that are neeeeeeearly too long, only for a > >> >> > sub-path (or file name) to push us over the limit *after* > >> >> > the split_path function is done assessing the path length. > >> >> > > >> >> > I've also changed the code we're overriding, on the assumption > >> >> > that someone's still using it somewhere. > >> >> > > >> >> > Bug: https://urldefense.proofpoint.com/v2/url? > >> >> > >> > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- > >> >> siA1ZOg&r=P5m8KWUXJf- > >> >> > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= > >> >> > Webrev: https://urldefense.proofpoint.com/v2/url? > >> >> > >> > u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- > >> >> siA1ZOg&r=P5m8KWUXJf- > >> >> > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- > >> >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= > >> >> > > >> >> > Thoughts and impressions welcome. > >> >> > > >> >> > Best Regards > >> >> > > >> >> > Adam Farley > >> >> > IBM Runtimes > >> >> > > >> >> > Unless stated otherwise above: > >> >> > IBM United Kingdom Limited - Registered in England and > Wales with number > >> >> > 741598. > >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, > >> Hampshire PO6 3AU > >> >> > > >> >> > >> > > >> > Unless stated otherwise above: > >> > IBM United Kingdom Limited - Registered in England and Wales with number > >> > 741598. > >> > Registered office: PO Box 41, North Harbour, Portsmouth, > Hampshire PO6 3AU > >> > > > > Unless stated otherwise above: > > IBM United Kingdom Limited - Registered in England and Wales with number > > 741598. > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From david.holmes at oracle.com Thu Jul 4 21:14:06 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jul 2019 07:14:06 +1000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: References:

Message-ID: <2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> On 4/07/2019 11:06 pm, Baesken, Matthias wrote: > Hi David, thanks for looking into this . > >> >> If you add this then we don't need distinct POSIX and non-POSIX versions >> - the existing os::Posix::get_signal_name etc could all be hoisted into >> os.cpp and the os class - no? >> > > Should I go for this ? > The coding is still a little different (e.g. is_valid_signal (.. ) call in os_posix ) but I think it could be done without much trouble (maybe with a few small ifdefs ) . I think it's worth trying it. I have to apologize in advance though as I'm about to disappear on two weeks vacation so may not be able to follow through on this. Thanks, David > >> That's a LinuxThreads anachronism which has been copied, unnecessarily >> into the other OS implementations. I will file a RFE to get rid of it. > > Good catch ! > > Best regards, Matthias > >> >> Hi Matthias, >> >> On 27/06/2019 6:56 pm, Baesken, Matthias wrote: >>> Hello, please review the following small patch . >>> It adds event logging to the UserHandler (user signal handler) calls . >> >> That seems reasonable. >> >>> (additionally it adds a function os::win32::get_signal_name >>> to get signal names for signal numbers ; this is similar to what we already >> had for posix ). >> >> If you add this then we don't need distinct POSIX and non-POSIX versions >> - the existing os::Posix::get_signal_name etc could all be hoisted into >> os.cpp and the os class - no? >> >> Aside: I spotted this in UserHandler: >> >> // 4511530 - sem_post is serialized and handled by the manager >> thread. When >> // the program is interrupted by Ctrl-C, SIGINT is sent to every >> thread. We >> // don't want to flood the manager thread with sem_post requests. >> if (sig == SIGINT && Atomic::add(1, &sigint_count) > 1) >> return; >> >> That's a LinuxThreads anachronism which has been copied, unnecessarily >> into the other OS implementations. I will file a RFE to get rid of it. >> >> Thanks, >> David >> >>> >>> >>> Bug/webrev : >>> >>> https://bugs.openjdk.java.net/browse/JDK-8226816 >>> >>> http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.0/ >>> >>> Thanks, Matthias >>> From david.holmes at oracle.com Thu Jul 4 21:21:59 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jul 2019 07:21:59 +1000 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com>

<842ae03e-8574-593e-3ac2-5cc283832be9@oracle.com> Message-ID: <08a6c8a3-bd3e-25db-2460-cea7c8fbb3f3@oracle.com> Hi Adam, On 5/07/2019 2:41 am, Adam Farley8 wrote: > Hi David, > > To detect a too-long path when it's being passed in, the best option > I can see is to check it in two places: Right, but my outstanding question relates to the existing code today. Where will we detect that a path element is too long? I'm still not sure whether the VM has the right to dictate behaviour here or whether this belongs to core-libs. And we need to be very careful about any change in behaviour. > 1) when it's being set initially with the location of libjvm.so, either: > ? ? a)in hotspot/os/[os name]/os_[os name].cpp, right before the call > ?to Arguments::set_dll_dir > ? ? ?or b), in the Arguments::set_dll_dirfunction itself (ideally the > latter) > > 2) when/if the extra paths are being passed in as a parameter, as they > pass through hotspot/share/runtime/arguments.cpp, right after the line: > > --- > else if (_strcmp_(key, "sun.boot.library.path") == 0)"); > --- > > You're right in that this could slow down startup a little, with > the length checking, and the potential looping over the -D value > to check the length of each path. Not a major slowdown though. I'm sure Claes would disagree :) Apologies in advance as I'm about to disappear for two weeks vacation. David ----- > Best Regards > > Adam Farley > IBM Runtimes > > > David Holmes wrote on 04/07/2019 07:57:14: > >> From: David Holmes >> To: Adam Farley8 >> Cc: hotspot-dev at openjdk.java.net >> Date: 04/07/2019 07:58 >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path >> paths are longer than JVM_MAXPATHLEN >> >> Hi Adam, >> >> On 4/07/2019 1:42 am, Adam Farley8 wrote: >> > Hi David, >> > >> > I figured it should be elaborate so we can avoid killing the VM >> > if we don't have to. >> > >> > Ultimately, if we have a list of three paths and the last two >> > are invalid, does it matter so long as all the libraries we need >> > are in the first path? >> >> I prefer not see the users error ignored if we can reasonably detect it. >> They set the paths for a reason, and if they paths are invalid they >> probably would like to know. >> >> > As to your question "is it in hostpot or JDK code", I presume you >> > mean in the change set. I'm primarily referring to the hotspot code. >> >> No I mean where in the current code will we detect that one of these >> path elements is too long? >> >> > Also, if we end up adopting a "kill the vm if any path is too long" >> > approach, we still need to change the JDK code, as those currently >> > seem to want to fail if the total length of the sub.boot.library.path >> > property is longer than the maximum length of a single path. >> > >> > So if you pass in three 100 character paths on Windows, it'll fail >> > because they add up to more than the 260 character path limit. >> >> That seems like a separate bug that should be addressed. :( >> >> Thanks, >> David >> >> > Best Regards >> > >> > Adam Farley >> > IBM Runtimes >> > >> > >> > David Holmes wrote on 03/07/2019 08:36:36: >> > >> >> From: David Holmes >> >> To: Adam Farley8 >> >> Cc: hotspot-dev at openjdk.java.net >> >> Date: 03/07/2019 08:36 >> >> Subject: Re: RFR: JDK-8227021: ?VM fails if any sun.boot.library.path >> >> paths are longer than JVM_MAXPATHLEN >> >> >> >> On 2/07/2019 7:44 pm, Adam Farley8 wrote: >> >> > Hi David, >> >> > >> >> > Thanks for your thoughts. >> >> > >> >> > The user should absolutely have immediate feedback, yes, and ?I agree >> >> > that "skipping" paths could lead to us loading the ?wrong library. >> >> > >> >> > Perhaps a compromise? We fire off a stderr warning if any of ?the paths >> >> > are too long (without killing the VM), we ignore any path *after* >> >> > (and including) the first too-long path, and we kill the VM if ?the >> >> > first path is too long. >> >> >> >> My first though is why be so elaborate and not just fail immediately: >> >> >> >> Error occurred during initialization of VM >> >> One or more sun.boot.library.path elements is too long for this system. >> >> --- >> >> >> >> ? But AFAICS we don't do any sanity checking of the those paths so ?this >> >> would have an impact on startup. >> >> >> >> I can't locate where we would detect the too-long path element, is ?it in >> >> hostpot or JDK code? >> >> >> >> Thanks, >> >> David >> >> ----- >> >> >> >> > Warning message example: >> >> > >> >> > ---- >> >> > Warning: One or more sun.boot.library.path paths were too long >> >> > for this system, and it (along with all subsequent paths) have ?been >> >> > ignored. >> >> > ---- >> >> > >> >> > Another addition could be to check the path lengths for the property >> >> > sooner, thus aborting the VM faster if the default path is too ?long. >> >> > >> >> > Assuming we posit that the VM will always need to load libraries. >> >> > >> >> > Best Regards >> >> > >> >> > Adam Farley >> >> > IBM Runtimes >> >> > >> >> > >> >> > David Holmes wrote on 01/07/2019 ?22:10:45: >> >> > >> >> >> From: David Holmes >> >> >> To: Adam Farley8 , ?hotspot-dev at openjdk.java.net >> >> >> Date: 01/07/2019 22:12 >> >> >> Subject: Re: RFR: JDK-8227021: ?VM fails if any sun.boot.library.path >> >> >> paths are longer than JVM_MAXPATHLEN >> >> >> >> >> >> Hi Adam, >> >> >> >> >> >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: >> >> >> > Hi All, >> >> >> > >> >> >> > The title say it all. >> >> >> > >> >> >> > If you pass in a value for sun.boot.library.path consisting >> >> >> > of one or more paths that are too long, then the vm ?will >> >> >> > fail to start because it can't load one of the libraries ?it >> >> >> > needs (the zip library), despite the fact that the VM >> >> >> > automatically prepends the default library path to the >> >> >> > sun.boot.library.path property, using the correct separator >> >> >> > to divide it from the user-specified path. >> >> >> > >> >> >> > So we've got the right path, in the right place, at ?the >> >> >> > right time, we just can't *use* it. >> >> >> > >> >> >> > I've fixed this by changing the relevant os.cpp code ?to >> >> >> > ignore paths that are too long, and to attempt to locate >> >> >> > the needed library on the other paths (if any are valid). >> >> >> >> >> >> As I just added to the bug report I have a different view ?of "correct" >> >> >> here. If you just ignore the long path and keep processing ?other short >> >> >> paths you may find the wrong library. There is a user error ?here and >> >> >> that error should be reported ASAP and in a way that leads ?to failure >> >> >> ASAP. Perhaps we should be more aggressive in aborting the ?VMwhen ?this >> >> >> is detected? >> >> >> >> >> >> David >> >> >> ----- >> >> >> >> >> >> > I've also added functionality to handle the edge case ?of >> >> >> > paths that are neeeeeeearly too long, only for a >> >> >> > sub-path (or file name) to push us over the limit *after* >> >> >> > the split_path function is done assessing the path length. >> >> >> > >> >> >> > I've also changed the code we're overriding, on the ?assumption >> >> >> > that someone's still using it somewhere. >> >> >> > >> >> >> > Bug: https://urldefense.proofpoint.com/v2/url? >> >> >> >> >> >> u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- >> >> >> siA1ZOg&r=P5m8KWUXJf- >> >> >> >> >> >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= >> >> >> > Webrev: https://urldefense.proofpoint.com/v2/url? >> >> >> >> >> >> u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- >> >> >> siA1ZOg&r=P5m8KWUXJf- >> >> >> >> >> >> CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- >> >> >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= >> >> >> > >> >> >> > Thoughts and impressions welcome. >> >> >> > >> >> >> > Best Regards >> >> >> > >> >> >> > Adam Farley >> >> >> > IBM Runtimes >> >> >> > >> >> >> > Unless stated otherwise above: >> >> >> > IBM United Kingdom Limited - Registered in England and >> Wales ?with number >> >> >> > 741598. >> >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, >> >> Hampshire ?PO6 3AU >> >> >> > >> >> >> >> >> > >> >> > Unless stated otherwise above: >> >> > IBM United Kingdom Limited - Registered in England and Wales ?with number >> >> > 741598. >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, >> Hampshire ?PO6 3AU >> >> >> > >> > Unless stated otherwise above: >> > IBM United Kingdom Limited - Registered in England and Wales with number >> > 741598. >> > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU >> > > Unless stated otherwise above: > IBM United Kingdom Limited - Registered in England and Wales with number > 741598. > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From erik.osterlund at oracle.com Fri Jul 5 10:19:14 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 5 Jul 2019 12:19:14 +0200 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects Message-ID: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> Hi, In the HeapInspection::find_instances_at_safepoint function, the unsafe heap iteration API (which also walks dead objects) is used to find objects that are instance of a class, used for concurrent lock dumping where we find dead java.util.concurrent.locks.AbstractOwnableSynchronizer objects and pointer chase to its possibly dead owner threadObj. There is a comment saying that if this starts crashing because we use CMS, we should probably change to use the safe_object_iterate() API instead, which does not include dead objects. Arguably, whether CMS is observed to crash or not, we really should not be walking over dead objects and exposing them anyway. It's not safe... and it will crash sooner or later. For example, CMS yields to safepoints (including young GCs) while sweeping. This means that both the AbstractOwnableSynchronizer and its owner thread might have died, but while sweeping, we could yield for a young GC that promotes objects overriding the memory of the dead thread object with random primitives, but not yet freeing the dead AbstractOwnableSynchronizer. A subsequent dumping operation could use the heap walker to find the dead AbstractOwnableSynchronizer, and pointer chase into its dead owner thread, which by now has been freed and had its memory clobbered with primitive data. This will all eventually end up in a glorious crash. So we shouldn't do this. Bug: https://bugs.openjdk.java.net/browse/JDK-8227277 Webrev: http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ Thanks, /Erik From erik.osterlund at oracle.com Fri Jul 5 10:33:55 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 5 Jul 2019 12:33:55 +0200 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls Message-ID: Hi, The i2c adapter sets a thread-local "callee_target" Method*, which is caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c call is "bad" (e.g. not_entrant). This error handler forwards execution to the callee c2i entry. If the SharedRuntime::handle_wrong_method method is called again due to the i2c2i call being still bad, then we will crash the VM in the following guarantee in SharedRuntime::handle_wrong_method: Method* callee = thread->callee_target(); guarantee(callee != NULL && callee->is_method(), "bad handshake"); Unfortunately, the c2i entry can indeed fail again if it, e.g., hits the new class initialization entry barrier. I think a solution to this problem should stop making assumptions about how many things can go wrong when calling a method from the interpreter. I caught this in ZGC where the timing window for hitting this issue seems to be wider due to concurrent code cache unloading. But it is equally problematic for all GCs. With ZGC, I could catch this failing in SPECjbb2015 where a static method is called from JNI. I could reliably (25% chance) reproduce it, and with the patch it no longer reproduces after 25 runs. I also tried hs-tier1-5, and it looked good. Webrev: http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ Bug: https://bugs.openjdk.java.net/browse/JDK-8227260 Thanks, /Erik From vladimir.x.ivanov at oracle.com Fri Jul 5 11:14:11 2019 From: vladimir.x.ivanov at oracle.com (Vladimir Ivanov) Date: Fri, 5 Jul 2019 14:14:11 +0300 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls In-Reply-To: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> References: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> Message-ID: Thanks for diagnosing the issue, Erik! Can you elaborate, please, on relation to clinit barrier in c2i? I don't see how it is possible to hit clinit barrier during i2c2i transition. Template interpreter has clinit barrier as part of invokestatic handler [1], so by the time c2i is reached, proper checks should be already performed and all the conditions to pass the barrier should be met. If SR::handle_wrong_method is called from clinit barrier in c2i during i2c2i, it signals about a bug in the new clinit logic: somehow clinit barrier is bypassed in interpreter. Are you sure it is not related to upcalls from native code (caller_frame.is_entry_frame() [1])? Also, SR::handle_wrong_method() calls coming from clinit barriers shouldn't hit the fast path w/ callee_target(), because it bypasses the actual initialization check happening during call site re-resolution. Best regards, Vladimir Ivanov PS: regarding clearing JavaThread::_callee_target in JavaThread::oops_do(), I'd prefer to keep it and limit the exposure of a stale Method*. But it's just a matter of preference and I don't have a strong opinion here. [1] src/hotspot/share/runtime/sharedRuntime.cpp: JRT_BLOCK_ENTRY(address, SharedRuntime::handle_wrong_method(JavaThread* thread)) ... if (caller_frame.is_interpreted_frame() || caller_frame.is_entry_frame()) { On 04/07/2019 18:02, Erik ?sterlund wrote: > Hi, > > The i2c adapter sets a thread-local "callee_target" Method*, which is > caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c > call is "bad" (e.g. not_entrant). This error handler forwards execution > to the callee c2i entry. If the SharedRuntime::handle_wrong_method > method is called again due to the i2c2i call being still bad, then we > will crash the VM in the following guarantee in > SharedRuntime::handle_wrong_method: > > Method* callee = thread->callee_target(); > guarantee(callee != NULL && callee->is_method(), "bad handshake"); > > Unfortunately, the c2i entry can indeed fail again if it, e.g., hits the > new class initialization entry barrier of the c2i adapter. > The solution is to simply not clear the thread-local "callee_target" > after handling the first failure, as we can't really know there won't be > another one. There is no reason to clear this value as nobody else reads > it than the SharedRuntime::handle_wrong_method handler (and we really do > want it to be able to read the value as many times as it takes until the > call goes through). I found some confused clearing of this callee_target > in JavaThread::oops_do(), with a comment saying this is a methodOop that > we need to clear to make GC happy or something. Seems like old traces of > perm gen. So I deleted that too. > > I caught this in ZGC where the timing window for hitting this issue > seems to be wider due to concurrent code cache unloading. But it is > equally problematic for all GCs. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227260 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ > > Thanks, > /Erik From david.holmes at oracle.com Fri Jul 5 11:35:59 2019 From: david.holmes at oracle.com (David Holmes) Date: Fri, 5 Jul 2019 21:35:59 +1000 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> Message-ID: <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> Hi Erik, On 5/07/2019 8:19 pm, Erik ?sterlund wrote: > Hi, > > In the HeapInspection::find_instances_at_safepoint function, the unsafe > heap iteration API (which also walks dead objects) is used to find > objects that are instance of a class, used for concurrent lock dumping > where we find dead > java.util.concurrent.locks.AbstractOwnableSynchronizer objects and > pointer chase to its possibly dead owner threadObj. There is a comment > saying that if this starts crashing because we use CMS, we should > probably change to use the safe_object_iterate() API instead, which does > not include dead objects. > > Arguably, whether CMS is observed to crash or not, we really should not > be walking over dead objects and exposing them anyway. It's not safe... > and it will crash sooner or later. > > For example, CMS yields to safepoints (including young GCs) while > sweeping. This means that both the AbstractOwnableSynchronizer and its > owner thread might have died, but while sweeping, we could yield for a > young GC that promotes objects overriding the memory of the dead thread > object with random primitives, but not yet freeing the dead > AbstractOwnableSynchronizer. A subsequent dumping operation could use > the heap walker to find the dead AbstractOwnableSynchronizer, and > pointer chase into its dead owner thread, which by now has been freed > and had its memory clobbered with primitive data. > > This will all eventually end up in a glorious crash. So we shouldn't do > this. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227277 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ That seems eminently reasonable. :) Are there any valid uses for the (unsafe) object_iterate? Cheers, David > Thanks, > /Erik From matthias.baesken at sap.com Fri Jul 5 12:21:57 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 5 Jul 2019 12:21:57 +0000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: <2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> References:

<2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> Message-ID: Hello David , here is another webrev with get_signal_name / get_signal_number moved to os.cpp : http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.1/ Best regards, Matthias > > On 4/07/2019 11:06 pm, Baesken, Matthias wrote: > > Hi David, thanks for looking into this . > > > >> > >> If you add this then we don't need distinct POSIX and non-POSIX versions > >> - the existing os::Posix::get_signal_name etc could all be hoisted into > >> os.cpp and the os class - no? > >> > > > > Should I go for this ? > > The coding is still a little different (e.g. is_valid_signal (.. ) call in os_posix ) > but I think it could be done without much trouble (maybe with a few small > ifdefs ) . > > I think it's worth trying it. > > I have to apologize in advance though as I'm about to disappear on two > weeks vacation so may not be able to follow through on this. > > Thanks, > David > From erik.osterlund at oracle.com Fri Jul 5 15:26:09 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 5 Jul 2019 17:26:09 +0200 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> Message-ID: On 2019-07-05 13:35, David Holmes wrote: > Hi Erik, > > On 5/07/2019 8:19 pm, Erik ?sterlund wrote: >> Hi, >> >> In the HeapInspection::find_instances_at_safepoint function, the >> unsafe heap iteration API (which also walks dead objects) is used to >> find objects that are instance of a class, used for concurrent lock >> dumping where we find dead >> java.util.concurrent.locks.AbstractOwnableSynchronizer objects and >> pointer chase to its possibly dead owner threadObj. There is a >> comment saying that if this starts crashing because we use CMS, we >> should probably change to use the safe_object_iterate() API instead, >> which does not include dead objects. >> >> Arguably, whether CMS is observed to crash or not, we really should >> not be walking over dead objects and exposing them anyway. It's not >> safe... and it will crash sooner or later. >> >> For example, CMS yields to safepoints (including young GCs) while >> sweeping. This means that both the AbstractOwnableSynchronizer and >> its owner thread might have died, but while sweeping, we could yield >> for a young GC that promotes objects overriding the memory of the >> dead thread object with random primitives, but not yet freeing the >> dead AbstractOwnableSynchronizer. A subsequent dumping operation >> could use the heap walker to find the dead >> AbstractOwnableSynchronizer, and pointer chase into its dead owner >> thread, which by now has been freed and had its memory clobbered with >> primitive data. >> >> This will all eventually end up in a glorious crash. So we shouldn't >> do this. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8227277 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ > > That seems eminently reasonable. :) Thanks! > Are there any valid uses for the (unsafe) object_iterate? Well... valid might be an overstatement, but I think it probably won't crash if you don't pointer chase through dead references in dead objects. We simply can't do that. Thanks, /Erik > Cheers, > David > >> Thanks, >> /Erik From david.holmes at oracle.com Fri Jul 5 20:54:10 2019 From: david.holmes at oracle.com (David Holmes) Date: Sat, 6 Jul 2019 06:54:10 +1000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: References:

<2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> Message-ID: <3fc5baee-e709-3294-c79d-f3c4f94c8a02@oracle.com> Hi Matthias, On 5/07/2019 10:21 pm, Baesken, Matthias wrote: > Hello David , here is another webrev with get_signal_name / get_signal_number moved to os.cpp : > > http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.1/ That looks good - thanks. I'm running it through our test system. One query, in os.cpp: + #ifdef _WINDOWS + { SIGBREAK, "SIGBREAK" }, Can that be #ifdef SIGBREAK { SIGBREAK, "SIGBREAK" }, like the other cases? No need for an updated webrev if so. Thanks, David ----- > > Best regards, Matthias > >> >> On 4/07/2019 11:06 pm, Baesken, Matthias wrote: >>> Hi David, thanks for looking into this . >>> >>>> >>>> If you add this then we don't need distinct POSIX and non-POSIX versions >>>> - the existing os::Posix::get_signal_name etc could all be hoisted into >>>> os.cpp and the os class - no? >>>> >>> >>> Should I go for this ? >>> The coding is still a little different (e.g. is_valid_signal (.. ) call in os_posix ) >> but I think it could be done without much trouble (maybe with a few small >> ifdefs ) . >> >> I think it's worth trying it. >> >> I have to apologize in advance though as I'm about to disappear on two >> weeks vacation so may not be able to follow through on this. >> >> Thanks, >> David >> > From dean.long at oracle.com Fri Jul 5 21:46:51 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 5 Jul 2019 14:46:51 -0700 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls In-Reply-To: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> References: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> Message-ID: <1fb16eb4-59af-7f24-3fdd-56b4a892f82f@oracle.com> What is callee->is_method() doing?? Like Vladimir, I'm concerned about pointers to stale metadata. dl On 7/4/19 8:02 AM, Erik ?sterlund wrote: > Hi, > > The i2c adapter sets a thread-local "callee_target" Method*, which is > caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c > call is "bad" (e.g. not_entrant). This error handler forwards > execution to the callee c2i entry. If the > SharedRuntime::handle_wrong_method method is called again due to the > i2c2i call being still bad, then we will crash the VM in the following > guarantee in SharedRuntime::handle_wrong_method: > > Method* callee = thread->callee_target(); > guarantee(callee != NULL && callee->is_method(), "bad handshake"); > > Unfortunately, the c2i entry can indeed fail again if it, e.g., hits > the new class initialization entry barrier of the c2i adapter. > The solution is to simply not clear the thread-local "callee_target" > after handling the first failure, as we can't really know there won't > be another one. There is no reason to clear this value as nobody else > reads it than the SharedRuntime::handle_wrong_method handler (and we > really do want it to be able to read the value as many times as it > takes until the call goes through). I found some confused clearing of > this callee_target in JavaThread::oops_do(), with a comment saying > this is a methodOop that we need to clear to make GC happy or > something. Seems like old traces of perm gen. So I deleted that too. > > I caught this in ZGC where the timing window for hitting this issue > seems to be wider due to concurrent code cache unloading. But it is > equally problematic for all GCs. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227260 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ > > Thanks, > /Erik From kim.barrett at oracle.com Sat Jul 6 00:31:18 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 5 Jul 2019 20:31:18 -0400 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> Message-ID: <90927B2B-DB56-4EAA-A52F-75DF8D3EF98D@oracle.com> > On Jul 5, 2019, at 6:19 AM, Erik ?sterlund wrote: > > Hi, > > In the HeapInspection::find_instances_at_safepoint function, the unsafe heap iteration API (which also walks dead objects) is used to find objects that are instance of a class, used for concurrent lock dumping where we find dead java.util.concurrent.locks.AbstractOwnableSynchronizer objects and pointer chase to its possibly dead owner threadObj. There is a comment saying that if this starts crashing because we use CMS, we should probably change to use the safe_object_iterate() API instead, which does not include dead objects. > > Arguably, whether CMS is observed to crash or not, we really should not be walking over dead objects and exposing them anyway. It's not safe... and it will crash sooner or later. > > For example, CMS yields to safepoints (including young GCs) while sweeping. This means that both the AbstractOwnableSynchronizer and its owner thread might have died, but while sweeping, we could yield for a young GC that promotes objects overriding the memory of the dead thread object with random primitives, but not yet freeing the dead AbstractOwnableSynchronizer. A subsequent dumping operation could use the heap walker to find the dead AbstractOwnableSynchronizer, and pointer chase into its dead owner thread, which by now has been freed and had its memory clobbered with primitive data. > > This will all eventually end up in a glorious crash. So we shouldn't do this. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227277 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ > > Thanks, > /Erik Looks good. From david.holmes at oracle.com Sat Jul 6 02:28:08 2019 From: david.holmes at oracle.com (David Holmes) Date: Sat, 6 Jul 2019 12:28:08 +1000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: <3fc5baee-e709-3294-c79d-f3c4f94c8a02@oracle.com> References:

<2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> <3fc5baee-e709-3294-c79d-f3c4f94c8a02@oracle.com> Message-ID: <3af1d3d2-b243-95e5-9af7-08304ac15860@oracle.com> On 6/07/2019 6:54 am, David Holmes wrote: > Hi Matthias, > > On 5/07/2019 10:21 pm, Baesken, Matthias wrote: >> Hello David , here is another webrev? with? get_signal_name / >> get_signal_number?? moved to os.cpp : >> >> http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.1/ > > That looks good - thanks. I'm running it through our test system. All passed. David ----- > One query, in os.cpp: > > + #ifdef _WINDOWS > +?? {? SIGBREAK,??? "SIGBREAK" }, > > Can that be > > ?#ifdef SIGBREAK > ?? {? SIGBREAK,??? "SIGBREAK" }, > > like the other cases? > > No need for an updated webrev if so. > > Thanks, > David > ----- > >> >> Best regards, Matthias >> >>> >>> On 4/07/2019 11:06 pm, Baesken, Matthias wrote: >>>> Hi David,? thanks for looking into this . >>>> >>>>> >>>>> If you add this then we don't need distinct POSIX and non-POSIX >>>>> versions >>>>> - the existing os::Posix::get_signal_name etc could all be hoisted >>>>> into >>>>> os.cpp and the os class - no? >>>>> >>>> >>>> Should I go for this ? >>>> The coding is still a little different?? (e.g. is_valid_signal (.. ) >>>> call? in os_posix ) >>> but I think it could be done without much trouble (maybe with a few >>> small >>> ifdefs ) . >>> >>> I think it's worth trying it. >>> >>> I have to apologize in advance though as I'm about to disappear on two >>> weeks vacation so may not be able to follow through on this. >>> >>> Thanks, >>> David >>> >> From igor.ignatyev at oracle.com Sat Jul 6 03:09:06 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Fri, 5 Jul 2019 20:09:06 -0700 Subject: RFR(S) [13] : 8226910 : make it possible to use jtreg's -match via run-test framework In-Reply-To: <8B6A5349-A39A-4AE0-980D-5C336C339DE7@oracle.com> References: <8B6A5349-A39A-4AE0-980D-5C336C339DE7@oracle.com> Message-ID: <9DA3B077-FFE6-472E-B3EA-7C4CFFDB45EB@oracle.com> ping? -- Igor > On Jun 27, 2019, at 3:25 PM, Igor Ignatyev wrote: > > http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html >> 25 lines changed: 18 ins; 3 del; 4 mod; > > Hi all, > > could you please review this small patch which adds JTREG_RUN_PROBLEM_LISTS options to run-test framework? when JTREG_RUN_PROBLEM_LISTS is set to true, jtreg will use problem lists as values of -match: instead of -exclude, which effectively means it will run only problem listed tests. > > doc/building.html got changed when I ran update-build-docs, I can exclude it from the patch, but it seems it will keep changing every time we run update-build-docs, so I decided to at least bring it up. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8226910 > webrev: http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html > > Thanks, > -- Igor From david.holmes at oracle.com Sat Jul 6 08:58:16 2019 From: david.holmes at oracle.com (David Holmes) Date: Sat, 6 Jul 2019 18:58:16 +1000 Subject: RFR(S) [13] : 8226910 : make it possible to use jtreg's -match via run-test framework In-Reply-To: <9DA3B077-FFE6-472E-B3EA-7C4CFFDB45EB@oracle.com> References: <8B6A5349-A39A-4AE0-980D-5C336C339DE7@oracle.com> <9DA3B077-FFE6-472E-B3EA-7C4CFFDB45EB@oracle.com> Message-ID: <5b10f093-8aa8-4b5f-14bf-a9b7c5704381@oracle.com> Hi Igor, On 6/07/2019 1:09 pm, Igor Ignatyev wrote: > ping? > > -- Igor > >> On Jun 27, 2019, at 3:25 PM, Igor Ignatyev wrote: >> >> http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html >>> 25 lines changed: 18 ins; 3 del; 4 mod; >> >> Hi all, >> >> could you please review this small patch which adds JTREG_RUN_PROBLEM_LISTS options to run-test framework? when JTREG_RUN_PROBLEM_LISTS is set to true, jtreg will use problem lists as values of -match: instead of -exclude, which effectively means it will run only problem listed tests. doc/testing.md + Set to `true` of `false`. typo: s/of/or/ Build changes seem okay - I can't attest to the operation of the flag. >> doc/building.html got changed when I ran update-build-docs, I can exclude it from the patch, but it seems it will keep changing every time we run update-build-docs, so I decided to at least bring it up. Weird it seems to have removed line-breaks in that paragraph. What platform did you build on? David ----- >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8226910 >> webrev: http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html >> >> Thanks, >> -- Igor > From igor.ignatyev at oracle.com Sat Jul 6 18:50:16 2019 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Sat, 6 Jul 2019 11:50:16 -0700 Subject: RFR(S) [13] : 8226910 : make it possible to use jtreg's -match via run-test framework In-Reply-To: <5b10f093-8aa8-4b5f-14bf-a9b7c5704381@oracle.com> References: <8B6A5349-A39A-4AE0-980D-5C336C339DE7@oracle.com> <9DA3B077-FFE6-472E-B3EA-7C4CFFDB45EB@oracle.com> <5b10f093-8aa8-4b5f-14bf-a9b7c5704381@oracle.com> Message-ID: Hi David, > On Jul 6, 2019, at 1:58 AM, David Holmes wrote: > > Hi Igor, > > On 6/07/2019 1:09 pm, Igor Ignatyev wrote: >> ping? >> -- Igor >>> On Jun 27, 2019, at 3:25 PM, Igor Ignatyev wrote: >>> >>> http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html >>>> 25 lines changed: 18 ins; 3 del; 4 mod; >>> >>> Hi all, >>> >>> could you please review this small patch which adds JTREG_RUN_PROBLEM_LISTS options to run-test framework? when JTREG_RUN_PROBLEM_LISTS is set to true, jtreg will use problem lists as values of -match: instead of -exclude, which effectively means it will run only problem listed tests. > > doc/testing.md > > + Set to `true` of `false`. > > typo: s/of/or/ fixed .md, regenerated .html. > > Build changes seem okay - I can't attest to the operation of the flag. here is how I verified that it does that it supposed to: $ make test "JTREG=OPTIONS=-l;RUN_PROBLEM_LISTS=true" TEST=open/test/hotspot/jtreg/:hotspot_all lists 53 tests, the same command w/o RUN_PROBLEM_LISTS (or w/ RUN_PROBLEM_LISTS=false) lists 6698 tests. $ make test "JTREG=OPTIONS=-l;RUN_PROBLEM_LISTS=true;EXTRA_PROBLEM_LISTS=ProblemList-aot.txt lists 81 tests, the same command w/o RUN_PROBLEM_LISTS lists 6670 tests. > >>> doc/building.html got changed when I ran update-build-docs, I can exclude it from the patch, but it seems it will keep changing every time we run update-build-docs, so I decided to at least bring it up. > > Weird it seems to have removed line-breaks in that paragraph. What platform did you build on? I built on macos. now when I wrote that, I remember pandoc used to produce different results on macos. so I've rerun it on linux on the source w/o my change, and doc/building.html still got changed in the exact same way. > David > ----- > >>> >>> JBS: https://bugs.openjdk.java.net/browse/JDK-8226910 >>> webrev: http://cr.openjdk.java.net/~iignatyev//8226910/webrev.00/index.html >>> >>> Thanks, >>> -- Igor From thomas.schatzl at oracle.com Mon Jul 8 07:00:44 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 08 Jul 2019 09:00:44 +0200 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> Message-ID: Hi, On Fri, 2019-07-05 at 12:19 +0200, Erik ?sterlund wrote: > Hi, > > In the HeapInspection::find_instances_at_safepoint function, the > unsafe heap iteration API (which also walks dead objects) is used to > find objects that are instance of a class, used for concurrent lock > dumping where we find > dead java.util.concurrent.locks.AbstractOwnableSynchronizer objects > and pointer chase to its possibly dead owner threadObj. There is a > comment saying that if this starts crashing because we use CMS, we > should probably change to use the safe_object_iterate() API instead, > which does not include dead objects. > > Arguably, whether CMS is observed to crash or not, we really should > not be walking over dead objects and exposing them anyway. It's not > safe... and it will crash sooner or later. [...] > This will all eventually end up in a glorious crash. So we shouldn't > do this. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8227277 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ looks good. Thomas From matthias.baesken at sap.com Mon Jul 8 07:00:32 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Mon, 8 Jul 2019 07:00:32 +0000 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: <3af1d3d2-b243-95e5-9af7-08304ac15860@oracle.com> References:

<2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> <3fc5baee-e709-3294-c79d-f3c4f94c8a02@oracle.com> <3af1d3d2-b243-95e5-9af7-08304ac15860@oracle.com> Message-ID: Thanks for looking into it and running the tests ! Best regards, Matthias > -----Original Message----- > From: David Holmes > Sent: Samstag, 6. Juli 2019 04:28 > To: Baesken, Matthias ; 'hotspot- > dev at openjdk.java.net' > Subject: Re: RFR: 8226816: add UserHandler calls to event log > > On 6/07/2019 6:54 am, David Holmes wrote: > > Hi Matthias, > > > > On 5/07/2019 10:21 pm, Baesken, Matthias wrote: > >> Hello David , here is another webrev? with? get_signal_name / > >> get_signal_number?? moved to os.cpp : > >> > >> http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.1/ > > > > That looks good - thanks. I'm running it through our test system. > > All passed. > > David > ----- > > > One query, in os.cpp: > > > > + #ifdef _WINDOWS > > +?? {? SIGBREAK,??? "SIGBREAK" }, > > > > Can that be > > > > ?#ifdef SIGBREAK > > ?? {? SIGBREAK,??? "SIGBREAK" }, > > > > like the other cases? > > > > No need for an updated webrev if so. > > > > Thanks, > > David > > ----- > > > >> > >> Best regards, Matthias > >> > >>> > >>> On 4/07/2019 11:06 pm, Baesken, Matthias wrote: > >>>> Hi David,? thanks for looking into this . > >>>> > >>>>> > >>>>> If you add this then we don't need distinct POSIX and non-POSIX > >>>>> versions > >>>>> - the existing os::Posix::get_signal_name etc could all be hoisted > >>>>> into > >>>>> os.cpp and the os class - no? > >>>>> > >>>> > >>>> Should I go for this ? > >>>> The coding is still a little different?? (e.g. is_valid_signal (.. ) > >>>> call? in os_posix ) > >>> but I think it could be done without much trouble (maybe with a few > >>> small > >>> ifdefs ) . > >>> > >>> I think it's worth trying it. > >>> > >>> I have to apologize in advance though as I'm about to disappear on two > >>> weeks vacation so may not be able to follow through on this. > >>> > >>> Thanks, > >>> David > >>> > >> From erik.osterlund at oracle.com Mon Jul 8 10:07:52 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 8 Jul 2019 12:07:52 +0200 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls In-Reply-To: <1fb16eb4-59af-7f24-3fdd-56b4a892f82f@oracle.com> References: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> <1fb16eb4-59af-7f24-3fdd-56b4a892f82f@oracle.com> Message-ID: <4474b767-b53b-ac22-3d98-b013e1bdbd08@oracle.com> Hi Dean and Vladimir, the callee->is_method() in the guarantee is there probably to find corrupt memory. So the problem is specifically when performing upcalls from JNI. The call wrapper tries to "quack like an interpreter" and performs i2c calls, failing due to the nmethod being not entrant. Then the subsequent c2i attempt fails again due to clinit barriers. In the template interpreter calls, the clinit barriers have already been taken, but in the JNI upcall path, we don't perform that barrier. So as our current i2c calls can't actually deal with blocking at all (and no safepoints), the right solution seems to be sticking in some clinit barriers into the JavaCalls API, so that when the call is performed, we know the clinit barrier won't be hit. I still think that allowing only one thing to go wrong across an i2c2i call is pretty scary, and I'd love to remove that restriction. Anyway, Vladimir offered to find the right place to put the clinit barrier, so I'm handing this one over. :) Thanks, /Erik On 2019-07-05 23:46, dean.long at oracle.com wrote: > What is callee->is_method() doing?? Like Vladimir, I'm concerned about > pointers to stale metadata. > > dl > > On 7/4/19 8:02 AM, Erik ?sterlund wrote: >> Hi, >> >> The i2c adapter sets a thread-local "callee_target" Method*, which is >> caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c >> call is "bad" (e.g. not_entrant). This error handler forwards >> execution to the callee c2i entry. If the >> SharedRuntime::handle_wrong_method method is called again due to the >> i2c2i call being still bad, then we will crash the VM in the >> following guarantee in SharedRuntime::handle_wrong_method: >> >> Method* callee = thread->callee_target(); >> guarantee(callee != NULL && callee->is_method(), "bad handshake"); >> >> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits >> the new class initialization entry barrier of the c2i adapter. >> The solution is to simply not clear the thread-local "callee_target" >> after handling the first failure, as we can't really know there won't >> be another one. There is no reason to clear this value as nobody else >> reads it than the SharedRuntime::handle_wrong_method handler (and we >> really do want it to be able to read the value as many times as it >> takes until the call goes through). I found some confused clearing of >> this callee_target in JavaThread::oops_do(), with a comment saying >> this is a methodOop that we need to clear to make GC happy or >> something. Seems like old traces of perm gen. So I deleted that too. >> >> I caught this in ZGC where the timing window for hitting this issue >> seems to be wider due to concurrent code cache unloading. But it is >> equally problematic for all GCs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8227260 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ >> >> Thanks, >> /Erik > From erik.osterlund at oracle.com Mon Jul 8 10:53:28 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 8 Jul 2019 12:53:28 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> Message-ID: Any takers? /Erik On 2019-07-01 15:12, Erik ?sterlund wrote: > Hi, > > Today it is up to callers of methods changing state on nmethods like > make_not_entrant(), to know all other possible concurrent attempts to > transition the nmethod, and know that there are no such attempts > trying to make the nmethod more dead. > There have been multiple occurrences of issues where the caller got it > wrong due to the fragile nature of this code. This specific CR deals > with a bug where an OSR nmethod was made not entrant (deopt) and made > unloaded concurrently. > The result of such a race can be that it is first made unloaded and > then made not entrant, making the nmethod go backwards in its state > machine, effectively resurrecting dead nmethods, causing a subsequent > GC to feel awkward (crash). > But I have seen other similar incidents with deopt racing with the > sweeper. These non-monotonicity problems are unnecessary to have. So I > intend to fix the bug by enforcing monotonicity of the nmethod state > machine explicitly, instead of trying to reason about all callers of > these make_* functions. > I swapped the order of unloaded and zombie in the enum as zombies are > strictly more dead than unloaded nmethods. All transitions change in > the direction of increasing deadness and fail if the transition is not > monotonically increasing. > > For ZGC I moved OSR nmethod unlinking to before the unlinking (where > unlinking code belongs), instead of after the handshake (intended for > deleting things safely unlinked). > Strictly speaking, moving the OSR nmethod unlinking removes the racing > between make_not_entrant and make_unloaded, but I still want the > monotonicity guards to make this code more robust. > > I left AOT methods alone. Since they don't die, they don't have > resurrection problems, and hence do not benefit from these guards in > the same way. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8224674 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/ > > Thanks, > /Erik From adam.farley at uk.ibm.com Mon Jul 8 12:45:28 2019 From: adam.farley at uk.ibm.com (Adam Farley8) Date: Mon, 8 Jul 2019 13:45:28 +0100 Subject: RFR: JDK-8227021: VM fails if any sun.boot.library.path paths are longer than JVM_MAXPATHLEN In-Reply-To: <08a6c8a3-bd3e-25db-2460-cea7c8fbb3f3@oracle.com> References: <2c9e6acd-0e79-13c0-23ea-2cef402ee125@oracle.com>

<842ae03e-8574-593e-3ac2-5cc283832be9@oracle.com> <08a6c8a3-bd3e-25db-2460-cea7c8fbb3f3@oracle.com> Message-ID: Hi David, David Holmes wrote on 04/07/2019 22:21:59: > From: David Holmes > To: Adam Farley8 > Cc: hotspot-dev at openjdk.java.net > Date: 04/07/2019 22:22 > Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > paths are longer than JVM_MAXPATHLEN > > Hi Adam, > > On 5/07/2019 2:41 am, Adam Farley8 wrote: > > Hi David, > > > > To detect a too-long path when it's being passed in, the best option > > I can see is to check it in two places: > > Right, but my outstanding question relates to the existing code today. > Where will we detect that a path element is too long? Ahh, right. Right now, the place checking the length of the paths is split_path, the method I modified in oc.cpp. Specifically this bit: ---- if (len > JVM_MAXPATHLEN) { return NULL; } ---- This seemed wrong to me at the time, because it meant that is *any* of the paths was too long, the method splitting up the paths string would return null, and fail to check any of the library locations. If we agree that the correct behaviour would be to throw an error and kill the vm if any of the paths is too long, then the simplest option would be to replace the "return NULL" in that "if" with an error. Something like this, perhaps? --- if (len > JVM_MAXPATHLEN) { vm_exit_during_initialization("java.lang.VirtualMachineError", "One or more of the sun.boot.library.path " "paths has exceeded the maximum path length " "for this system."); } --- With fewer new-lines, of course. ;) > > I'm still not sure whether the VM has the right to dictate behaviour > here or whether this belongs to core-libs. And we need to be very > careful about any change in behaviour. > > > 1) when it's being set initially with the location of libjvm.so, either: > > a)in hotspot/os/[os name]/os_[os name].cpp, right before the call > > to Arguments::set_dll_dir > > or b), in the Arguments::set_dll_dirfunction itself (ideally the > > latter) > > > > 2) when/if the extra paths are being passed in as a parameter, as they > > pass through hotspot/share/runtime/arguments.cpp, right after the line: > > > > --- > > else if (_strcmp_(key, "sun.boot.library.path") == 0)"); > > --- > > > > You're right in that this could slow down startup a little, with > > the length checking, and the potential looping over the -D value > > to check the length of each path. Not a major slowdown though. > > I'm sure Claes would disagree :) > > Apologies in advance as I'm about to disappear for two weeks vacation. > > David > ----- No worries. :) - Adam > > > Best Regards > > > > Adam Farley > > IBM Runtimes > > > > > > David Holmes wrote on 04/07/2019 07:57:14: > > > >> From: David Holmes > >> To: Adam Farley8 > >> Cc: hotspot-dev at openjdk.java.net > >> Date: 04/07/2019 07:58 > >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > >> paths are longer than JVM_MAXPATHLEN > >> > >> Hi Adam, > >> > >> On 4/07/2019 1:42 am, Adam Farley8 wrote: > >> > Hi David, > >> > > >> > I figured it should be elaborate so we can avoid killing the VM > >> > if we don't have to. > >> > > >> > Ultimately, if we have a list of three paths and the last two > >> > are invalid, does it matter so long as all the libraries we need > >> > are in the first path? > >> > >> I prefer not see the users error ignored if we can reasonably detect it. > >> They set the paths for a reason, and if they paths are invalid they > >> probably would like to know. > >> > >> > As to your question "is it in hostpot or JDK code", I presume you > >> > mean in the change set. I'm primarily referring to the hotspot code. > >> > >> No I mean where in the current code will we detect that one of these > >> path elements is too long? > >> > >> > Also, if we end up adopting a "kill the vm if any path is too long" > >> > approach, we still need to change the JDK code, as those currently > >> > seem to want to fail if the total length of the sub.boot.library.path > >> > property is longer than the maximum length of a single path. > >> > > >> > So if you pass in three 100 character paths on Windows, it'll fail > >> > because they add up to more than the 260 character path limit. > >> > >> That seems like a separate bug that should be addressed. :( > >> > >> Thanks, > >> David > >> > >> > Best Regards > >> > > >> > Adam Farley > >> > IBM Runtimes > >> > > >> > > >> > David Holmes wrote on 03/07/2019 08:36:36: > >> > > >> >> From: David Holmes > >> >> To: Adam Farley8 > >> >> Cc: hotspot-dev at openjdk.java.net > >> >> Date: 03/07/2019 08:36 > >> >> Subject: Re: RFR: JDK-8227021: VM fails if any sun.boot.library.path > >> >> paths are longer than JVM_MAXPATHLEN > >> >> > >> >> On 2/07/2019 7:44 pm, Adam Farley8 wrote: > >> >> > Hi David, > >> >> > > >> >> > Thanks for your thoughts. > >> >> > > >> >> > The user should absolutely have immediate feedback, yes, and I agree > >> >> > that "skipping" paths could lead to us loading the wrong library. > >> >> > > >> >> > Perhaps a compromise? We fire off a stderr warning if any > of the paths > >> >> > are too long (without killing the VM), we ignore any path *after* > >> >> > (and including) the first too-long path, and we kill the VM if the > >> >> > first path is too long. > >> >> > >> >> My first though is why be so elaborate and not just fail immediately: > >> >> > >> >> Error occurred during initialization of VM > >> >> One or more sun.boot.library.path elements is too long for this system. > >> >> --- > >> >> > >> >> ? But AFAICS we don't do any sanity checking of the those > paths so this > >> >> would have an impact on startup. > >> >> > >> >> I can't locate where we would detect the too-long path > element, is it in > >> >> hostpot or JDK code? > >> >> > >> >> Thanks, > >> >> David > >> >> ----- > >> >> > >> >> > Warning message example: > >> >> > > >> >> > ---- > >> >> > Warning: One or more sun.boot.library.path paths were too long > >> >> > for this system, and it (along with all subsequent paths) have been > >> >> > ignored. > >> >> > ---- > >> >> > > >> >> > Another addition could be to check the path lengths for the property > >> >> > sooner, thus aborting the VM faster if the default path is too long. > >> >> > > >> >> > Assuming we posit that the VM will always need to load libraries. > >> >> > > >> >> > Best Regards > >> >> > > >> >> > Adam Farley > >> >> > IBM Runtimes > >> >> > > >> >> > > >> >> > David Holmes wrote on 01/07/2019 22:10:45: > >> >> > > >> >> >> From: David Holmes > >> >> >> To: Adam Farley8 , hotspot- > dev at openjdk.java.net > >> >> >> Date: 01/07/2019 22:12 > >> >> >> Subject: Re: RFR: JDK-8227021: VM fails if any > sun.boot.library.path > >> >> >> paths are longer than JVM_MAXPATHLEN > >> >> >> > >> >> >> Hi Adam, > >> >> >> > >> >> >> On 1/07/2019 10:27 pm, Adam Farley8 wrote: > >> >> >> > Hi All, > >> >> >> > > >> >> >> > The title say it all. > >> >> >> > > >> >> >> > If you pass in a value for sun.boot.library.path consisting > >> >> >> > of one or more paths that are too long, then the vm will > >> >> >> > fail to start because it can't load one of the libraries it > >> >> >> > needs (the zip library), despite the fact that the VM > >> >> >> > automatically prepends the default library path to the > >> >> >> > sun.boot.library.path property, using the correct separator > >> >> >> > to divide it from the user-specified path. > >> >> >> > > >> >> >> > So we've got the right path, in the right place, at the > >> >> >> > right time, we just can't *use* it. > >> >> >> > > >> >> >> > I've fixed this by changing the relevant os.cpp code to > >> >> >> > ignore paths that are too long, and to attempt to locate > >> >> >> > the needed library on the other paths (if any are valid). > >> >> >> > >> >> >> As I just added to the bug report I have a different view > of "correct" > >> >> >> here. If you just ignore the long path and keep processing > other short > >> >> >> paths you may find the wrong library. There is a user > error here and > >> >> >> that error should be reported ASAP and in a way that leads > to failure > >> >> >> ASAP. Perhaps we should be more aggressive in aborting the > VMwhen this > >> >> >> is detected? > >> >> >> > >> >> >> David > >> >> >> ----- > >> >> >> > >> >> >> > I've also added functionality to handle the edge case of > >> >> >> > paths that are neeeeeeearly too long, only for a > >> >> >> > sub-path (or file name) to push us over the limit *after* > >> >> >> > the split_path function is done assessing the path length. > >> >> >> > > >> >> >> > I've also changed the code we're overriding, on the assumption > >> >> >> > that someone's still using it somewhere. > >> >> >> > > >> >> >> > Bug: https://urldefense.proofpoint.com/v2/url? > >> >> >> > >> >> > >> > u=https-3A__bugs.openjdk.java.net_browse_JDK-2D8227021&d=DwICaQ&c=jf_iaSHvJObTbx- > >> >> >> siA1ZOg&r=P5m8KWUXJf- > >> >> >> > >> >> > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=xZzQCnv68xd9hJyyK1obSim38eWSRmLPfuR__9ddZWg&e= > >> >> >> > Webrev: https://urldefense.proofpoint.com/v2/url? > >> >> >> > >> >> > >> > u=http-3A__cr.openjdk.java.net_-7Eafarley_8227021_webrev_&d=DwICaQ&c=jf_iaSHvJObTbx- > >> >> >> siA1ZOg&r=P5m8KWUXJf- > >> >> >> > >> >> > >> > CeVJc0hDGD9AQ2LkcXDC0PMV9ntVw5Ho&m=cSTGBGkEsu5yl0haJ6it9egPSgixg7mRei6lBDB5Y3k&s=- > >> >> >> hKU0zUd_0LDT08wTilexgI54EeSgt8xUk97i6V63Bk&e= > >> >> >> > > >> >> >> > Thoughts and impressions welcome. > >> >> >> > > >> >> >> > Best Regards > >> >> >> > > >> >> >> > Adam Farley > >> >> >> > IBM Runtimes > >> >> >> > > >> >> >> > Unless stated otherwise above: > >> >> >> > IBM United Kingdom Limited - Registered in England and > >> Wales with number > >> >> >> > 741598. > >> >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, > >> >> Hampshire PO6 3AU > >> >> >> > > >> >> >> > >> >> > > >> >> > Unless stated otherwise above: > >> >> > IBM United Kingdom Limited - Registered in England and > Wales with number > >> >> > 741598. > >> >> > Registered office: PO Box 41, North Harbour, Portsmouth, > >> Hampshire PO6 3AU > >> >> > >> > > >> > Unless stated otherwise above: > >> > IBM United Kingdom Limited - Registered in England and Wales with number > >> > 741598. > >> > Registered office: PO Box 41, North Harbour, Portsmouth, > Hampshire PO6 3AU > >> > > > > Unless stated otherwise above: > > IBM United Kingdom Limited - Registered in England and Wales with number > > 741598. > > Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU > Unless stated otherwise above: IBM United Kingdom Limited - Registered in England and Wales with number 741598. Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU From coleen.phillimore at oracle.com Mon Jul 8 15:11:42 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 11:11:42 -0400 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> Message-ID: <9b63a1dc-4bb2-7a07-a9a5-758f0a36e487@oracle.com> On 7/5/19 11:26 AM, Erik ?sterlund wrote: > > > On 2019-07-05 13:35, David Holmes wrote: >> Hi Erik, >> >> On 5/07/2019 8:19 pm, Erik ?sterlund wrote: >>> Hi, >>> >>> In the HeapInspection::find_instances_at_safepoint function, the >>> unsafe heap iteration API (which also walks dead objects) is used to >>> find objects that are instance of a class, used for concurrent lock >>> dumping where we find dead >>> java.util.concurrent.locks.AbstractOwnableSynchronizer objects and >>> pointer chase to its possibly dead owner threadObj. There is a >>> comment saying that if this starts crashing because we use CMS, we >>> should probably change to use the safe_object_iterate() API instead, >>> which does not include dead objects. >>> >>> Arguably, whether CMS is observed to crash or not, we really should >>> not be walking over dead objects and exposing them anyway. It's not >>> safe... and it will crash sooner or later. >>> >>> For example, CMS yields to safepoints (including young GCs) while >>> sweeping. This means that both the AbstractOwnableSynchronizer and >>> its owner thread might have died, but while sweeping, we could yield >>> for a young GC that promotes objects overriding the memory of the >>> dead thread object with random primitives, but not yet freeing the >>> dead AbstractOwnableSynchronizer. A subsequent dumping operation >>> could use the heap walker to find the dead >>> AbstractOwnableSynchronizer, and pointer chase into its dead owner >>> thread, which by now has been freed and had its memory clobbered >>> with primitive data. >>> >>> This will all eventually end up in a glorious crash. So we shouldn't >>> do this. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8227277 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ >> >> That seems eminently reasonable. :) > > Thanks! > >> Are there any valid uses for the (unsafe) object_iterate? > > Well... valid might be an overstatement, but I think it probably won't > crash if you don't pointer chase through dead references in dead > objects. We simply can't do that. This change looks good, but I have to echo David's question.? It looks like we have the same thing in jvmtiTagMap, with some out of date comments. hare/prims/jvmtiTagMap.cpp:??? // consider using safe_object_iterate() which avoids perm gen share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(_blk); share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(&blk); Should we eliminate all uses of Universe::heap() version of object_iterate?? It looks like the GCs call it, and it's probably safe in those places, so should not be a virtual function for each GC? Thanks, Coleen > > Thanks, > /Erik > >> Cheers, >> David >> >>> Thanks, >>> /Erik > From erik.osterlund at oracle.com Mon Jul 8 16:27:57 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Mon, 8 Jul 2019 18:27:57 +0200 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: <9b63a1dc-4bb2-7a07-a9a5-758f0a36e487@oracle.com> References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> <9b63a1dc-4bb2-7a07-a9a5-758f0a36e487@oracle.com> Message-ID: Hi Coleen, Thanks for the review. I am 100% for not using the unsafe API in shared code at all. Could we make that change for 14 though? Thanks, /Erik > On 8 Jul 2019, at 17:11, coleen.phillimore at oracle.com wrote: > > > >> On 7/5/19 11:26 AM, Erik ?sterlund wrote: >> >> >>> On 2019-07-05 13:35, David Holmes wrote: >>> Hi Erik, >>> >>>> On 5/07/2019 8:19 pm, Erik ?sterlund wrote: >>>> Hi, >>>> >>>> In the HeapInspection::find_instances_at_safepoint function, the unsafe heap iteration API (which also walks dead objects) is used to find objects that are instance of a class, used for concurrent lock dumping where we find dead java.util.concurrent.locks.AbstractOwnableSynchronizer objects and pointer chase to its possibly dead owner threadObj. There is a comment saying that if this starts crashing because we use CMS, we should probably change to use the safe_object_iterate() API instead, which does not include dead objects. >>>> >>>> Arguably, whether CMS is observed to crash or not, we really should not be walking over dead objects and exposing them anyway. It's not safe... and it will crash sooner or later. >>>> >>>> For example, CMS yields to safepoints (including young GCs) while sweeping. This means that both the AbstractOwnableSynchronizer and its owner thread might have died, but while sweeping, we could yield for a young GC that promotes objects overriding the memory of the dead thread object with random primitives, but not yet freeing the dead AbstractOwnableSynchronizer. A subsequent dumping operation could use the heap walker to find the dead AbstractOwnableSynchronizer, and pointer chase into its dead owner thread, which by now has been freed and had its memory clobbered with primitive data. >>>> >>>> This will all eventually end up in a glorious crash. So we shouldn't do this. >>>> >>>> Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8227277 >>>> >>>> Webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ >>> >>> That seems eminently reasonable. :) >> >> Thanks! >> >>> Are there any valid uses for the (unsafe) object_iterate? >> >> Well... valid might be an overstatement, but I think it probably won't crash if you don't pointer chase through dead references in dead objects. We simply can't do that. > > This change looks good, but I have to echo David's question. It looks like we have the same thing in jvmtiTagMap, with some out of date comments. > > hare/prims/jvmtiTagMap.cpp: // consider using safe_object_iterate() which avoids perm gen > share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(_blk); > share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(&blk); > > Should we eliminate all uses of Universe::heap() version of object_iterate? It looks like the GCs call it, and it's probably safe in those places, so should not be a virtual function for each GC? > > Thanks, > Coleen > >> >> Thanks, >> /Erik >> >>> Cheers, >>> David >>> >>>> Thanks, >>>> /Erik >> > From coleen.phillimore at oracle.com Mon Jul 8 18:44:26 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 14:44:26 -0400 Subject: RFR: 8226816: add UserHandler calls to event log In-Reply-To: References:

<2ba2c9fa-3faa-2e16-59d6-d76412115515@oracle.com> <3fc5baee-e709-3294-c79d-f3c4f94c8a02@oracle.com> <3af1d3d2-b243-95e5-9af7-08304ac15860@oracle.com> Message-ID: <2c7786a5-f6c6-609a-1857-e946da054ba4@oracle.com> This change looks fine. Coleen On 7/8/19 3:00 AM, Baesken, Matthias wrote: > Thanks for looking into it and running the tests ! > > Best regards, Matthias > >> -----Original Message----- >> From: David Holmes >> Sent: Samstag, 6. Juli 2019 04:28 >> To: Baesken, Matthias ; 'hotspot- >> dev at openjdk.java.net' >> Subject: Re: RFR: 8226816: add UserHandler calls to event log >> >> On 6/07/2019 6:54 am, David Holmes wrote: >>> Hi Matthias, >>> >>> On 5/07/2019 10:21 pm, Baesken, Matthias wrote: >>>> Hello David , here is another webrev? with? get_signal_name / >>>> get_signal_number?? moved to os.cpp : >>>> >>>> http://cr.openjdk.java.net/~mbaesken/webrevs/8226816.1/ >>> That looks good - thanks. I'm running it through our test system. >> All passed. >> >> David >> ----- >> >>> One query, in os.cpp: >>> >>> + #ifdef _WINDOWS >>> +?? {? SIGBREAK,??? "SIGBREAK" }, >>> >>> Can that be >>> >>> ?#ifdef SIGBREAK >>> ?? {? SIGBREAK,??? "SIGBREAK" }, >>> >>> like the other cases? >>> >>> No need for an updated webrev if so. >>> >>> Thanks, >>> David >>> ----- >>> >>>> Best regards, Matthias >>>> >>>>> On 4/07/2019 11:06 pm, Baesken, Matthias wrote: >>>>>> Hi David,? thanks for looking into this . >>>>>> >>>>>>> If you add this then we don't need distinct POSIX and non-POSIX >>>>>>> versions >>>>>>> - the existing os::Posix::get_signal_name etc could all be hoisted >>>>>>> into >>>>>>> os.cpp and the os class - no? >>>>>>> >>>>>> Should I go for this ? >>>>>> The coding is still a little different?? (e.g. is_valid_signal (.. ) >>>>>> call? in os_posix ) >>>>> but I think it could be done without much trouble (maybe with a few >>>>> small >>>>> ifdefs ) . >>>>> >>>>> I think it's worth trying it. >>>>> >>>>> I have to apologize in advance though as I'm about to disappear on two >>>>> weeks vacation so may not be able to follow through on this. >>>>> >>>>> Thanks, >>>>> David >>>>> From coleen.phillimore at oracle.com Mon Jul 8 18:45:04 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 14:45:04 -0400 Subject: RFR[13]: 8227277: HeapInspection::find_instances_at_safepoint walks dead objects In-Reply-To: References: <8cb16d70-edc5-8aeb-a06c-7384ff6e55a3@oracle.com> <5904f2e1-9172-1d4f-aa85-54cf29b6cb52@oracle.com> <9b63a1dc-4bb2-7a07-a9a5-758f0a36e487@oracle.com> Message-ID: <744e055e-04b4-968a-3a99-704cea6b568a@oracle.com> On 7/8/19 12:27 PM, Erik Osterlund wrote: > Hi Coleen, > > Thanks for the review. I am 100% for not using the unsafe API in shared code at all. Could we make that change for 14 though? Oh, definitely! Coleen > > Thanks, > /Erik > >> On 8 Jul 2019, at 17:11, coleen.phillimore at oracle.com wrote: >> >> >> >>> On 7/5/19 11:26 AM, Erik ?sterlund wrote: >>> >>> >>>> On 2019-07-05 13:35, David Holmes wrote: >>>> Hi Erik, >>>> >>>>> On 5/07/2019 8:19 pm, Erik ?sterlund wrote: >>>>> Hi, >>>>> >>>>> In the HeapInspection::find_instances_at_safepoint function, the unsafe heap iteration API (which also walks dead objects) is used to find objects that are instance of a class, used for concurrent lock dumping where we find dead java.util.concurrent.locks.AbstractOwnableSynchronizer objects and pointer chase to its possibly dead owner threadObj. There is a comment saying that if this starts crashing because we use CMS, we should probably change to use the safe_object_iterate() API instead, which does not include dead objects. >>>>> >>>>> Arguably, whether CMS is observed to crash or not, we really should not be walking over dead objects and exposing them anyway. It's not safe... and it will crash sooner or later. >>>>> >>>>> For example, CMS yields to safepoints (including young GCs) while sweeping. This means that both the AbstractOwnableSynchronizer and its owner thread might have died, but while sweeping, we could yield for a young GC that promotes objects overriding the memory of the dead thread object with random primitives, but not yet freeing the dead AbstractOwnableSynchronizer. A subsequent dumping operation could use the heap walker to find the dead AbstractOwnableSynchronizer, and pointer chase into its dead owner thread, which by now has been freed and had its memory clobbered with primitive data. >>>>> >>>>> This will all eventually end up in a glorious crash. So we shouldn't do this. >>>>> >>>>> Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8227277 >>>>> >>>>> Webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8227277/webrev.00/ >>>> That seems eminently reasonable. :) >>> Thanks! >>> >>>> Are there any valid uses for the (unsafe) object_iterate? >>> Well... valid might be an overstatement, but I think it probably won't crash if you don't pointer chase through dead references in dead objects. We simply can't do that. >> This change looks good, but I have to echo David's question. It looks like we have the same thing in jvmtiTagMap, with some out of date comments. >> >> hare/prims/jvmtiTagMap.cpp: // consider using safe_object_iterate() which avoids perm gen >> share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(_blk); >> share/prims/jvmtiTagMap.cpp: Universe::heap()->object_iterate(&blk); >> >> Should we eliminate all uses of Universe::heap() version of object_iterate? It looks like the GCs call it, and it's probably safe in those places, so should not be a virtual function for each GC? >> >> Thanks, >> Coleen >> >>> Thanks, >>> /Erik >>> >>>> Cheers, >>>> David >>>> >>>>> Thanks, >>>>> /Erik From coleen.phillimore at oracle.com Mon Jul 8 19:34:27 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 15:34:27 -0400 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls In-Reply-To: References: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> Message-ID: One comment On 7/5/19 7:14 AM, Vladimir Ivanov wrote: > Thanks for diagnosing the issue, Erik! > > Can you elaborate, please, on relation to clinit barrier in c2i? > > I don't see how it is possible to hit clinit barrier during i2c2i > transition. Template interpreter has clinit barrier as part of > invokestatic handler [1], so by the time c2i is reached, proper checks > should be already performed and all the conditions to pass the barrier > should be met. > > If SR::handle_wrong_method is called from clinit barrier in c2i during > i2c2i, it signals about a bug in the new clinit logic: somehow clinit > barrier is bypassed in interpreter. > > Are you sure it is not related to upcalls from native code > (caller_frame.is_entry_frame() [1])? > > Also, SR::handle_wrong_method() calls coming from clinit barriers > shouldn't hit the fast path w/ callee_target(), because it bypasses > the actual initialization check happening during call site re-resolution. > > Best regards, > Vladimir Ivanov > > PS: regarding clearing JavaThread::_callee_target in > JavaThread::oops_do(), I'd prefer to keep it and limit the exposure of > a stale Method*. But it's just a matter of preference and I don't have > a strong opinion here. The callee_method in JavaThread::oops_do() won't keep the Method* alive.? I'm not sure what keeps it alive in the callee_method field.?? Can there be a GC now with some Method* that you need there? In that case, you should put the callee_method->method_holder()->java_mirror() in a new field in JavaThread::_callee_mirror or something, and have JavaThread::oops_do walk that.? Also, redefinition might have to keep the callee_method alive in the metadata walk, but you can file a separate bug for that if I'm not too confused. Coleen > > [1] > src/hotspot/share/runtime/sharedRuntime.cpp: > > JRT_BLOCK_ENTRY(address, > SharedRuntime::handle_wrong_method(JavaThread* thread)) > ... > ? if (caller_frame.is_interpreted_frame() || > ????? caller_frame.is_entry_frame()) { > > > On 04/07/2019 18:02, Erik ?sterlund wrote: >> Hi, >> >> The i2c adapter sets a thread-local "callee_target" Method*, which is >> caught (and cleared) by SharedRuntime::handle_wrong_method if the i2c >> call is "bad" (e.g. not_entrant). This error handler forwards >> execution to the callee c2i entry. If the >> SharedRuntime::handle_wrong_method method is called again due to the >> i2c2i call being still bad, then we will crash the VM in the >> following guarantee in SharedRuntime::handle_wrong_method: >> >> Method* callee = thread->callee_target(); >> guarantee(callee != NULL && callee->is_method(), "bad handshake"); >> >> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits >> the new class initialization entry barrier of the c2i adapter. >> The solution is to simply not clear the thread-local "callee_target" >> after handling the first failure, as we can't really know there won't >> be another one. There is no reason to clear this value as nobody else >> reads it than the SharedRuntime::handle_wrong_method handler (and we >> really do want it to be able to read the value as many times as it >> takes until the call goes through). I found some confused clearing of >> this callee_target in JavaThread::oops_do(), with a comment saying >> this is a methodOop that we need to clear to make GC happy or >> something. Seems like old traces of perm gen. So I deleted that too. >> >> I caught this in ZGC where the timing window for hitting this issue >> seems to be wider due to concurrent code cache unloading. But it is >> equally problematic for all GCs. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8227260 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ >> >> Thanks, >> /Erik From coleen.phillimore at oracle.com Mon Jul 8 19:46:48 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 15:46:48 -0400 Subject: RFR[13]: 8227260: Can't deal with SharedRuntime::handle_wrong_method triggering more than once for interpreter calls In-Reply-To: <4474b767-b53b-ac22-3d98-b013e1bdbd08@oracle.com> References: <8d183958-197c-600d-edda-22121a8eb677@oracle.com> <1fb16eb4-59af-7f24-3fdd-56b4a892f82f@oracle.com> <4474b767-b53b-ac22-3d98-b013e1bdbd08@oracle.com> Message-ID: <228ee794-8f0d-1834-93d5-8ef8decf7811@oracle.com> On 7/8/19 6:07 AM, Erik ?sterlund wrote: > Hi Dean and Vladimir, > > the callee->is_method() in the guarantee is there probably to find > corrupt memory. > > So the problem is specifically when performing upcalls from JNI. The > call wrapper tries to "quack like an interpreter" and performs i2c > calls, failing due to the nmethod being not entrant. Then the > subsequent c2i attempt fails again due to clinit barriers. In the > template interpreter calls, the clinit barriers have already been > taken, but in the JNI upcall path, we don't perform that barrier. > > So as our current i2c calls can't actually deal with blocking at all > (and no safepoints), the right solution seems to be sticking in some > clinit barriers into the JavaCalls API, so that when the call is > performed, we know the clinit barrier won't be hit. Ok, you *cannot* block with callee_method in JavaThread.? Ignore my last mail!? That comment in oops_do was a leftover from permgen. Thanks, Coleen > > I still think that allowing only one thing to go wrong across an i2c2i > call is pretty scary, and I'd love to remove that restriction. > > Anyway, Vladimir offered to find the right place to put the clinit > barrier, so I'm handing this one over. :) > > Thanks, > /Erik > > On 2019-07-05 23:46, dean.long at oracle.com wrote: >> What is callee->is_method() doing? Like Vladimir, I'm concerned about >> pointers to stale metadata. >> >> dl >> >> On 7/4/19 8:02 AM, Erik ?sterlund wrote: >>> Hi, >>> >>> The i2c adapter sets a thread-local "callee_target" Method*, which >>> is caught (and cleared) by SharedRuntime::handle_wrong_method if the >>> i2c call is "bad" (e.g. not_entrant). This error handler forwards >>> execution to the callee c2i entry. If the >>> SharedRuntime::handle_wrong_method method is called again due to the >>> i2c2i call being still bad, then we will crash the VM in the >>> following guarantee in SharedRuntime::handle_wrong_method: >>> >>> Method* callee = thread->callee_target(); >>> guarantee(callee != NULL && callee->is_method(), "bad handshake"); >>> >>> Unfortunately, the c2i entry can indeed fail again if it, e.g., hits >>> the new class initialization entry barrier of the c2i adapter. >>> The solution is to simply not clear the thread-local "callee_target" >>> after handling the first failure, as we can't really know there >>> won't be another one. There is no reason to clear this value as >>> nobody else reads it than the SharedRuntime::handle_wrong_method >>> handler (and we really do want it to be able to read the value as >>> many times as it takes until the call goes through). I found some >>> confused clearing of this callee_target in JavaThread::oops_do(), >>> with a comment saying this is a methodOop that we need to clear to >>> make GC happy or something. Seems like old traces of perm gen. So I >>> deleted that too. >>> >>> I caught this in ZGC where the timing window for hitting this issue >>> seems to be wider due to concurrent code cache unloading. But it is >>> equally problematic for all GCs. >>> >>> Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8227260 >>> >>> Webrev: >>> http://cr.openjdk.java.net/~eosterlund/8227260/webrev.00/ >>> >>> Thanks, >>> /Erik >> > From coleen.phillimore at oracle.com Mon Jul 8 21:19:12 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 8 Jul 2019 17:19:12 -0400 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> Message-ID: Hi,? From offline discussions, I updated the code in Parse::do_exits() to make the method not compilable if the return types don't match.? Otherwise it would revert a change that Volker made to prevent infinite compilation loops.? It seems that the compiler code has been changed to no longer exercise this path (ShouldNotReachHere never reached), so keeping the conservative path seemed safest. open webrev at http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev I changed the comment Dean, it might need help rewording. Tested with tier1-8. Thanks, Coleen On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: > > Dean,? Thank you for reviewing and for your help and discussion of > this change. > > On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >> For the most part, this looks good.? I only have a couple concerns: >> >> 1) The distinction in both validate_compile_task_dependencies >> functions between "dependencies failed" and "dependencies invalid" is >> even more fuzzy after this change.? I suggest filing an RFE to remove >> this distinction. > > Yes, in jvmciRuntime I had to carefully preserve this logic or some > tests failed.?? I'll file an RFE for you. >> >> 2) In Parse::do_exits(), we don't know that concurrent class loading >> didn't cause the problem.? We should be optimistic and allow the retry: >> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >> rather than more drastic >> ??? C->record_method_not_compilable >> This is actually what the code did in an earlier revision. > > Erik and I were trying to guess which was the right answer.? It seemed > too lucky that you'd do concurrent class loading in this time period, > so we picked the more drastic answer, but I tested both.? So I'll > change it to the optimistic answer. > > Thanks! > Coleen >> >> dl >> >> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>> Summary: Remove SystemDictionary::modification_counter optimization >>> >>> See bug for more details.? To avoid the assert in the bug report, >>> it's necessary to also increase the modification counter for class >>> unloading, which needs special code for concurrent class unloading. >>> The global counter is used to verify that validate_dependencies() >>> gets the same answer based on the subklass hierarchy, but provides a >>> quick exit in production mode.? Removing it may allow more nmethods >>> to be created that don't depend on the classes that may be loaded >>> while the Method is being compiled. Performance testing was done on >>> this with no change in performance.? Also investigated the >>> breakpoint setting code which incremented the modification counter. >>> Dependent compilations are invalidated using evol_method >>> dependencies, so updating the system dictionary modification counter >>> isn't unnecessary. >>> >>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>> test runs with -Xcomp. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>> >>> Thanks, >>> Coleen >> > From tobias.hartmann at oracle.com Tue Jul 9 05:13:06 2019 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 9 Jul 2019 07:13:06 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> Message-ID: <611a0a23-70ca-6bf7-8b35-ab2c1872e360@oracle.com> Hi Erik, this looks reasonable to me but a second review would be good. Please test thoroughly before pushing. Best regards, Tobias On 01.07.19 15:12, Erik ?sterlund wrote: > Hi, > > Today it is up to callers of methods changing state on nmethods like make_not_entrant(), to know all > other possible concurrent attempts to transition the nmethod, and know that there are no such > attempts trying to make the nmethod more dead. > There have been multiple occurrences of issues where the caller got it wrong due to the fragile > nature of this code. This specific CR deals with a bug where an OSR nmethod was made not entrant > (deopt) and made unloaded concurrently. > The result of such a race can be that it is first made unloaded and then made not entrant, making > the nmethod go backwards in its state machine, effectively resurrecting dead nmethods, causing a > subsequent GC to feel awkward (crash). > But I have seen other similar incidents with deopt racing with the sweeper. These non-monotonicity > problems are unnecessary to have. So I intend to fix the bug by enforcing monotonicity of the > nmethod state machine explicitly, instead of trying to reason about all callers of these make_* > functions. > I swapped the order of unloaded and zombie in the enum as zombies are strictly more dead than > unloaded nmethods. All transitions change in the direction of increasing deadness and fail if the > transition is not monotonically increasing. > > For ZGC I moved OSR nmethod unlinking to before the unlinking (where unlinking code belongs), > instead of after the handshake (intended for deleting things safely unlinked). > Strictly speaking, moving the OSR nmethod unlinking removes the racing between make_not_entrant and > make_unloaded, but I still want the monotonicity guards to make this code more robust. > > I left AOT methods alone. Since they don't die, they don't have resurrection problems, and hence do > not benefit from these guards in the same way. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8224674 > > Webrev: > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/ > > Thanks, > /Erik From erik.osterlund at oracle.com Tue Jul 9 06:30:49 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Tue, 9 Jul 2019 08:30:49 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <611a0a23-70ca-6bf7-8b35-ab2c1872e360@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> <611a0a23-70ca-6bf7-8b35-ab2c1872e360@oracle.com> Message-ID: Hi Tobias, Thanks for the review. /Erik > On 9 Jul 2019, at 07:13, Tobias Hartmann wrote: > > Hi Erik, > > this looks reasonable to me but a second review would be good. > > Please test thoroughly before pushing. > > Best regards, > Tobias > >> On 01.07.19 15:12, Erik ?sterlund wrote: >> Hi, >> >> Today it is up to callers of methods changing state on nmethods like make_not_entrant(), to know all >> other possible concurrent attempts to transition the nmethod, and know that there are no such >> attempts trying to make the nmethod more dead. >> There have been multiple occurrences of issues where the caller got it wrong due to the fragile >> nature of this code. This specific CR deals with a bug where an OSR nmethod was made not entrant >> (deopt) and made unloaded concurrently. >> The result of such a race can be that it is first made unloaded and then made not entrant, making >> the nmethod go backwards in its state machine, effectively resurrecting dead nmethods, causing a >> subsequent GC to feel awkward (crash). >> But I have seen other similar incidents with deopt racing with the sweeper. These non-monotonicity >> problems are unnecessary to have. So I intend to fix the bug by enforcing monotonicity of the >> nmethod state machine explicitly, instead of trying to reason about all callers of these make_* >> functions. >> I swapped the order of unloaded and zombie in the enum as zombies are strictly more dead than >> unloaded nmethods. All transitions change in the direction of increasing deadness and fail if the >> transition is not monotonically increasing. >> >> For ZGC I moved OSR nmethod unlinking to before the unlinking (where unlinking code belongs), >> instead of after the handshake (intended for deleting things safely unlinked). >> Strictly speaking, moving the OSR nmethod unlinking removes the racing between make_not_entrant and >> make_unloaded, but I still want the monotonicity guards to make this code more robust. >> >> I left AOT methods alone. Since they don't die, they don't have resurrection problems, and hence do >> not benefit from these guards in the same way. >> >> Bug: >> https://bugs.openjdk.java.net/browse/JDK-8224674 >> >> Webrev: >> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/ >> >> Thanks, >> /Erik From erik.osterlund at oracle.com Tue Jul 9 09:15:18 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Tue, 9 Jul 2019 11:15:18 +0200 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> Message-ID: Hi Coleen, I like the counter removal. This looks good. Thanks for digging into this and fixing it! /Erik On 2019-07-08 23:19, coleen.phillimore at oracle.com wrote: > > Hi,? From offline discussions, I updated the code in Parse::do_exits() > to make the method not compilable if the return types don't match.? > Otherwise it would revert a change that Volker made to prevent > infinite compilation loops.? It seems that the compiler code has been > changed to no longer exercise this path (ShouldNotReachHere never > reached), so keeping the conservative path seemed safest. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev > > I changed the comment Dean, it might need help rewording. > > Tested with tier1-8. > > Thanks, > Coleen > > On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >> >> Dean,? Thank you for reviewing and for your help and discussion of >> this change. >> >> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>> For the most part, this looks good.? I only have a couple concerns: >>> >>> 1) The distinction in both validate_compile_task_dependencies >>> functions between "dependencies failed" and "dependencies invalid" >>> is even more fuzzy after this change.? I suggest filing an RFE to >>> remove this distinction. >> >> Yes, in jvmciRuntime I had to carefully preserve this logic or some >> tests failed.?? I'll file an RFE for you. >>> >>> 2) In Parse::do_exits(), we don't know that concurrent class loading >>> didn't cause the problem.? We should be optimistic and allow the retry: >>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>> rather than more drastic >>> ??? C->record_method_not_compilable >>> This is actually what the code did in an earlier revision. >> >> Erik and I were trying to guess which was the right answer.? It >> seemed too lucky that you'd do concurrent class loading in this time >> period, so we picked the more drastic answer, but I tested both.? So >> I'll change it to the optimistic answer. >> >> Thanks! >> Coleen >>> >>> dl >>> >>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>> Summary: Remove SystemDictionary::modification_counter optimization >>>> >>>> See bug for more details.? To avoid the assert in the bug report, >>>> it's necessary to also increase the modification counter for class >>>> unloading, which needs special code for concurrent class unloading. >>>> The global counter is used to verify that validate_dependencies() >>>> gets the same answer based on the subklass hierarchy, but provides >>>> a quick exit in production mode.? Removing it may allow more >>>> nmethods to be created that don't depend on the classes that may be >>>> loaded while the Method is being compiled. Performance testing was >>>> done on this with no change in performance. Also investigated the >>>> breakpoint setting code which incremented the modification counter. >>>> Dependent compilations are invalidated using evol_method >>>> dependencies, so updating the system dictionary modification >>>> counter isn't unnecessary. >>>> >>>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>>> test runs with -Xcomp. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>> >>>> Thanks, >>>> Coleen >>> >> > From matthias.baesken at sap.com Tue Jul 9 11:36:25 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 9 Jul 2019 11:36:25 +0000 Subject: RFR: 8226816: add UserHandler calls to event log Message-ID: Hi Coleen, thanks for the review . We discussed a bit internally about this and still have some concerns in corner cases / "room for improvement" so I'll not push it immediately . Best regards, Matthias > > Message: 1 > Date: Mon, 8 Jul 2019 14:44:26 -0400 > From: coleen.phillimore at oracle.com > To: hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8226816: add UserHandler calls to event log > Message-ID: <2c7786a5-f6c6-609a-1857-e946da054ba4 at oracle.com> > Content-Type: text/plain; charset=utf-8; format=flowed > > This change looks fine. > Coleen > > On 7/8/19 3:00 AM, Baesken, Matthias wrote: > > Thanks for looking into it and running the tests ! > > > > Best regards, Matthias > > From coleen.phillimore at oracle.com Tue Jul 9 14:06:14 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jul 2019 10:06:14 -0400 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com>

Message-ID: <2f9276e7-5187-7c30-f55e-13f5e44524da@oracle.com> Thanks, Erik! Coleen On 7/9/19 5:15 AM, Erik ?sterlund wrote: > Hi Coleen, > > I like the counter removal. This looks good. Thanks for digging into > this and fixing it! > > /Erik > > On 2019-07-08 23:19, coleen.phillimore at oracle.com wrote: >> >> Hi,? From offline discussions, I updated the code in >> Parse::do_exits() to make the method not compilable if the return >> types don't match.? Otherwise it would revert a change that Volker >> made to prevent infinite compilation loops.? It seems that the >> compiler code has been changed to no longer exercise this path >> (ShouldNotReachHere never reached), so keeping the conservative path >> seemed safest. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev >> >> I changed the comment Dean, it might need help rewording. >> >> Tested with tier1-8. >> >> Thanks, >> Coleen >> >> On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >>> >>> Dean,? Thank you for reviewing and for your help and discussion of >>> this change. >>> >>> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>>> For the most part, this looks good.? I only have a couple concerns: >>>> >>>> 1) The distinction in both validate_compile_task_dependencies >>>> functions between "dependencies failed" and "dependencies invalid" >>>> is even more fuzzy after this change.? I suggest filing an RFE to >>>> remove this distinction. >>> >>> Yes, in jvmciRuntime I had to carefully preserve this logic or some >>> tests failed.?? I'll file an RFE for you. >>>> >>>> 2) In Parse::do_exits(), we don't know that concurrent class >>>> loading didn't cause the problem.? We should be optimistic and >>>> allow the retry: >>>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>>> rather than more drastic >>>> ??? C->record_method_not_compilable >>>> This is actually what the code did in an earlier revision. >>> >>> Erik and I were trying to guess which was the right answer. It >>> seemed too lucky that you'd do concurrent class loading in this time >>> period, so we picked the more drastic answer, but I tested both.? So >>> I'll change it to the optimistic answer. >>> >>> Thanks! >>> Coleen >>>> >>>> dl >>>> >>>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>>> Summary: Remove SystemDictionary::modification_counter optimization >>>>> >>>>> See bug for more details.? To avoid the assert in the bug report, >>>>> it's necessary to also increase the modification counter for class >>>>> unloading, which needs special code for concurrent class >>>>> unloading. The global counter is used to verify that >>>>> validate_dependencies() gets the same answer based on the subklass >>>>> hierarchy, but provides a quick exit in production mode.? Removing >>>>> it may allow more nmethods to be created that don't depend on the >>>>> classes that may be loaded while the Method is being compiled. >>>>> Performance testing was done on this with no change in >>>>> performance. Also investigated the breakpoint setting code which >>>>> incremented the modification counter. Dependent compilations are >>>>> invalidated using evol_method dependencies, so updating the system >>>>> dictionary modification counter isn't unnecessary. >>>>> >>>>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>>>> test runs with -Xcomp. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>>> >>>>> Thanks, >>>>> Coleen >>>> >>> >> > From dean.long at oracle.com Tue Jul 9 21:06:44 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 9 Jul 2019 14:06:44 -0700 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> Message-ID: <4db8e49f-36f4-eb8f-2e6b-34f9e532fbdf@oracle.com> The updated comment sounds good.? Now that you have removed the only place that was failing with retry_class_loading_during_parsing(), we should be able to remove that method and its uses.? That gets rid of the only way to "retry forever" vs the remaining and presumably safe "down-grade and retry just once more".? Or you can file an RFE to clean that up. dl On 7/8/19 2:19 PM, coleen.phillimore at oracle.com wrote: > > Hi,? From offline discussions, I updated the code in Parse::do_exits() > to make the method not compilable if the return types don't match.? > Otherwise it would revert a change that Volker made to prevent > infinite compilation loops.? It seems that the compiler code has been > changed to no longer exercise this path (ShouldNotReachHere never > reached), so keeping the conservative path seemed safest. > > open webrev at http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev > > I changed the comment Dean, it might need help rewording. > > Tested with tier1-8. > > Thanks, > Coleen > > On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >> >> Dean,? Thank you for reviewing and for your help and discussion of >> this change. >> >> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>> For the most part, this looks good.? I only have a couple concerns: >>> >>> 1) The distinction in both validate_compile_task_dependencies >>> functions between "dependencies failed" and "dependencies invalid" >>> is even more fuzzy after this change.? I suggest filing an RFE to >>> remove this distinction. >> >> Yes, in jvmciRuntime I had to carefully preserve this logic or some >> tests failed.?? I'll file an RFE for you. >>> >>> 2) In Parse::do_exits(), we don't know that concurrent class loading >>> didn't cause the problem.? We should be optimistic and allow the retry: >>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>> rather than more drastic >>> ??? C->record_method_not_compilable >>> This is actually what the code did in an earlier revision. >> >> Erik and I were trying to guess which was the right answer.? It >> seemed too lucky that you'd do concurrent class loading in this time >> period, so we picked the more drastic answer, but I tested both.? So >> I'll change it to the optimistic answer. >> >> Thanks! >> Coleen >>> >>> dl >>> >>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>> Summary: Remove SystemDictionary::modification_counter optimization >>>> >>>> See bug for more details.? To avoid the assert in the bug report, >>>> it's necessary to also increase the modification counter for class >>>> unloading, which needs special code for concurrent class unloading. >>>> The global counter is used to verify that validate_dependencies() >>>> gets the same answer based on the subklass hierarchy, but provides >>>> a quick exit in production mode.? Removing it may allow more >>>> nmethods to be created that don't depend on the classes that may be >>>> loaded while the Method is being compiled. Performance testing was >>>> done on this with no change in performance. Also investigated the >>>> breakpoint setting code which incremented the modification counter. >>>> Dependent compilations are invalidated using evol_method >>>> dependencies, so updating the system dictionary modification >>>> counter isn't unnecessary. >>>> >>>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>>> test runs with -Xcomp. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>> >>>> Thanks, >>>> Coleen >>> >> > From coleen.phillimore at oracle.com Tue Jul 9 21:16:10 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Tue, 9 Jul 2019 17:16:10 -0400 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: <4db8e49f-36f4-eb8f-2e6b-34f9e532fbdf@oracle.com> References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> <4db8e49f-36f4-eb8f-2e6b-34f9e532fbdf@oracle.com> Message-ID: <3616e932-6b11-245e-14fa-94394716fa6d@oracle.com> On 7/9/19 5:06 PM, dean.long at oracle.com wrote: > The updated comment sounds good.? Now that you have removed the only > place that was failing with retry_class_loading_during_parsing(), we > should be able to remove that method and its uses.? That gets rid of > the only way to "retry forever" vs the remaining and presumably safe > "down-grade and retry just once more".? Or you can file an RFE to > clean that up. Thanks Dean.? I noticed that C2Compiler::retry_class_loading_during_parsing()); is now not used with my change but didn't want to clean it up with this change.? I'll file an RFE to clean it up (or find some other use for it in the compiler code).? What is the remaining "downgrade and retry just once more" option? Thanks for the help! Coleen > > dl > > On 7/8/19 2:19 PM, coleen.phillimore at oracle.com wrote: >> >> Hi,? From offline discussions, I updated the code in >> Parse::do_exits() to make the method not compilable if the return >> types don't match.? Otherwise it would revert a change that Volker >> made to prevent infinite compilation loops.? It seems that the >> compiler code has been changed to no longer exercise this path >> (ShouldNotReachHere never reached), so keeping the conservative path >> seemed safest. >> >> open webrev at >> http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev >> >> I changed the comment Dean, it might need help rewording. >> >> Tested with tier1-8. >> >> Thanks, >> Coleen >> >> On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >>> >>> Dean,? Thank you for reviewing and for your help and discussion of >>> this change. >>> >>> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>>> For the most part, this looks good.? I only have a couple concerns: >>>> >>>> 1) The distinction in both validate_compile_task_dependencies >>>> functions between "dependencies failed" and "dependencies invalid" >>>> is even more fuzzy after this change.? I suggest filing an RFE to >>>> remove this distinction. >>> >>> Yes, in jvmciRuntime I had to carefully preserve this logic or some >>> tests failed.?? I'll file an RFE for you. >>>> >>>> 2) In Parse::do_exits(), we don't know that concurrent class >>>> loading didn't cause the problem.? We should be optimistic and >>>> allow the retry: >>>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>>> rather than more drastic >>>> ??? C->record_method_not_compilable >>>> This is actually what the code did in an earlier revision. >>> >>> Erik and I were trying to guess which was the right answer. It >>> seemed too lucky that you'd do concurrent class loading in this time >>> period, so we picked the more drastic answer, but I tested both.? So >>> I'll change it to the optimistic answer. >>> >>> Thanks! >>> Coleen >>>> >>>> dl >>>> >>>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>>> Summary: Remove SystemDictionary::modification_counter optimization >>>>> >>>>> See bug for more details.? To avoid the assert in the bug report, >>>>> it's necessary to also increase the modification counter for class >>>>> unloading, which needs special code for concurrent class >>>>> unloading. The global counter is used to verify that >>>>> validate_dependencies() gets the same answer based on the subklass >>>>> hierarchy, but provides a quick exit in production mode.? Removing >>>>> it may allow more nmethods to be created that don't depend on the >>>>> classes that may be loaded while the Method is being compiled. >>>>> Performance testing was done on this with no change in >>>>> performance. Also investigated the breakpoint setting code which >>>>> incremented the modification counter. Dependent compilations are >>>>> invalidated using evol_method dependencies, so updating the system >>>>> dictionary modification counter isn't unnecessary. >>>>> >>>>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>>>> test runs with -Xcomp. >>>>> >>>>> open webrev at >>>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>>> >>>>> Thanks, >>>>> Coleen >>>> >>> >> > From dean.long at oracle.com Tue Jul 9 21:31:54 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 9 Jul 2019 14:31:54 -0700 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> Message-ID: On 7/1/19 6:12 AM, Erik ?sterlund wrote: > For ZGC I moved OSR nmethod unlinking to before the unlinking (where > unlinking code belongs), instead of after the handshake (intended for > deleting things safely unlinked). > Strictly speaking, moving the OSR nmethod unlinking removes the racing > between make_not_entrant and make_unloaded, but I still want the > monotonicity guards to make this code more robust. I see where you added OSR nmethod unlinking, but not where you removed it, so it's not obvious it was a "move". Would it make sense for nmethod::unlink_from_method() to do the OSR unlinking, or to assert that it has already been done? The new bailout in the middle of nmethod::make_not_entrant_or_zombie() worries me a little, because the code up to that point has side-effects, and we could be bailing out in an unexpected state. dl From dean.long at oracle.com Tue Jul 9 21:40:41 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Tue, 9 Jul 2019 14:40:41 -0700 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: <3616e932-6b11-245e-14fa-94394716fa6d@oracle.com> References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> <4db8e49f-36f4-eb8f-2e6b-34f9e532fbdf@oracle.com> <3616e932-6b11-245e-14fa-94394716fa6d@oracle.com> Message-ID: On 7/9/19 2:16 PM, coleen.phillimore at oracle.com wrote: > > > On 7/9/19 5:06 PM, dean.long at oracle.com wrote: >> The updated comment sounds good.? Now that you have removed the only >> place that was failing with retry_class_loading_during_parsing(), we >> should be able to remove that method and its uses.? That gets rid of >> the only way to "retry forever" vs the remaining and presumably safe >> "down-grade and retry just once more".? Or you can file an RFE to >> clean that up. > > Thanks Dean.? I noticed that > C2Compiler::retry_class_loading_during_parsing()); > > is now not used with my change but didn't want to clean it up with > this change.? I'll file an RFE to clean it up (or find some other use > for it in the compiler code).? What is the remaining "downgrade and > retry just once more" option? > The remaining are retry_no_subsuming_loads(), retry_no_escape_analysis(), and has_boxed_value() here: https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/c2compiler.cpp#112 Notice that they all set some kind of flag to disable the current failure, preventing infinite loops. dl > Thanks for the help! > Coleen > >> >> dl >> >> On 7/8/19 2:19 PM, coleen.phillimore at oracle.com wrote: >>> >>> Hi,? From offline discussions, I updated the code in >>> Parse::do_exits() to make the method not compilable if the return >>> types don't match.? Otherwise it would revert a change that Volker >>> made to prevent infinite compilation loops.? It seems that the >>> compiler code has been changed to no longer exercise this path >>> (ShouldNotReachHere never reached), so keeping the conservative path >>> seemed safest. >>> >>> open webrev at >>> http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev >>> >>> I changed the comment Dean, it might need help rewording. >>> >>> Tested with tier1-8. >>> >>> Thanks, >>> Coleen >>> >>> On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Dean,? Thank you for reviewing and for your help and discussion of >>>> this change. >>>> >>>> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>>>> For the most part, this looks good. I only have a couple concerns: >>>>> >>>>> 1) The distinction in both validate_compile_task_dependencies >>>>> functions between "dependencies failed" and "dependencies invalid" >>>>> is even more fuzzy after this change.? I suggest filing an RFE to >>>>> remove this distinction. >>>> >>>> Yes, in jvmciRuntime I had to carefully preserve this logic or some >>>> tests failed.?? I'll file an RFE for you. >>>>> >>>>> 2) In Parse::do_exits(), we don't know that concurrent class >>>>> loading didn't cause the problem.? We should be optimistic and >>>>> allow the retry: >>>>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>>>> rather than more drastic >>>>> ??? C->record_method_not_compilable >>>>> This is actually what the code did in an earlier revision. >>>> >>>> Erik and I were trying to guess which was the right answer. It >>>> seemed too lucky that you'd do concurrent class loading in this >>>> time period, so we picked the more drastic answer, but I tested >>>> both.? So I'll change it to the optimistic answer. >>>> >>>> Thanks! >>>> Coleen >>>>> >>>>> dl >>>>> >>>>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>>>> Summary: Remove SystemDictionary::modification_counter optimization >>>>>> >>>>>> See bug for more details.? To avoid the assert in the bug report, >>>>>> it's necessary to also increase the modification counter for >>>>>> class unloading, which needs special code for concurrent class >>>>>> unloading. The global counter is used to verify that >>>>>> validate_dependencies() gets the same answer based on the >>>>>> subklass hierarchy, but provides a quick exit in production >>>>>> mode.? Removing it may allow more nmethods to be created that >>>>>> don't depend on the classes that may be loaded while the Method >>>>>> is being compiled. Performance testing was done on this with no >>>>>> change in performance. Also investigated the breakpoint setting >>>>>> code which incremented the modification counter. Dependent >>>>>> compilations are invalidated using evol_method dependencies, so >>>>>> updating the system dictionary modification counter isn't >>>>>> unnecessary. >>>>>> >>>>>> Tested with hs-tier1-8 testing, and CTW, and local jvmti/jdi/jdwp >>>>>> test runs with -Xcomp. >>>>>> >>>>>> open webrev at >>>>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>>>> >>>>>> Thanks, >>>>>> Coleen >>>>> >>>> >>> >> > From erik.osterlund at oracle.com Wed Jul 10 08:28:25 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Wed, 10 Jul 2019 10:28:25 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com> Message-ID: Hi Dean, On 2019-07-09 23:31, dean.long at oracle.com wrote: > On 7/1/19 6:12 AM, Erik ?sterlund wrote: >> For ZGC I moved OSR nmethod unlinking to before the unlinking (where >> unlinking code belongs), instead of after the handshake (intended for >> deleting things safely unlinked). >> Strictly speaking, moving the OSR nmethod unlinking removes the >> racing between make_not_entrant and make_unloaded, but I still want >> the monotonicity guards to make this code more robust. > > I see where you added OSR nmethod unlinking, but not where you removed > it, so it's not obvious it was a "move". Sorry, bad wording on my part. I added OSR nmethod unlinking before the global handshake is run. After the handshake, we call make_unloaded() on the same is_unloading() nmethods. That function "tries" to unlink the OSR nmethod, but will just not do it as it's already unlinked at that point. So in a way, I didn't remove the call to unlink the OSR nmethod there, it just won't do anything. I preferred structuring it that way instead of trying to optimize away the call to unlink the OSR nmethod when making it unloaded, but only for the concurrent case. It seemed to introduce more conditional magic than it was worth. So in practice, the unlinking of OSR nmethods has moved for concurrent unloading to before the handshake. > Would it make sense for nmethod::unlink_from_method() to do the OSR > unlinking, or to assert that it has already been done? An earlier version of this patch tried to do that. It is indeed possible. But it requires changing lock ranks of the OSR nmethod lock to special - 1 and moving around a bunch of code as this function is also called both when making nmethods not_entrant, zombie, and unlinking them in that case. For the first two, we conditionally unlink the nmethod based on the current state (which is the old state), whereas when I move it, the current state is the new state. So I had to change things around a bit more to figure out the right condition when to unlink it that works for all 3 callers. In the end, since this is going to 13, I thought it's more important to minimize the risk as much as I can, and leave such refactorings to 14. > The new bailout in the middle of nmethod::make_not_entrant_or_zombie() > worries me a little, because the code up to that point has > side-effects, and we could be bailing out in an unexpected state. Correct. In an earlier version of this patch, I moved the transition to before the side effects. But a bunch of code is using the current nmethod state to determine what to do, and that current state changed from the old to the new state. In particular, we conditionally patch in the jump based on the current (old) state, and we conditionally increment decompile count based on the current (old) state. So I ended up having to rewrite more code than I wanted to for a patch going into 13, and convince myself that I had not implicitly messed something up. It felt safer to reason about the 3 side effects up until the transitioning point: 1) Patching in the jump into VEP. Any state more dead than the current transition, would still want that jump to be there. 2) Incrementing decompile count when making it not_entrant. Seems in order to do regardless, as we had an actual request to make the nmethod not entrant because it was bad somehow. 3) Marking it as seen on stack when making it not_entrant. This will only make can_convert_to_zombie start returning false, which is harmless in general. Also, as both transitions to zombie and not_entrant are performed under the Patching_lock, the only possible race is with make_unloaded. And those nmethods are is_unloading(), which also makes can_convert_to_zombie return false (in a not racy fashion). So it would essentially make no observable difference to any single call to can_convert_to_zombie(). In summary, #1 and #3 don't really observably change the state of the system, and #2 is completely harmless and probably wanted. Therefore I found that moving these things around and finding out where we use the current state(), as well as rewriting it, seemed like a slightly scarier change for 13 to me. So in general, there is some refactoring that could be done (and I have tried it) to make this nicer. But I want to minimize the risk for 13 as much as possible, and perform any risky refactorings in 14 instead. If your risk assessment is different and you would prefer moving the transition higher up (and flipping some conditions) instead, I am totally up for that too though, and I do see where you are coming from. BTW, I have tested this change through hs-tier1-7, and it looks good. Thanks a lot Dean for reviewing this code. /Erik > dl > From coleen.phillimore at oracle.com Wed Jul 10 12:24:28 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Wed, 10 Jul 2019 08:24:28 -0400 Subject: RFR (S) 8222446: assert(C->env()->system_dictionary_modification_counter_changed()) failed: Must invalidate if TypeFuncs differ In-Reply-To: References: <703b29a2-71a6-27d7-99e3-d54216332c33@oracle.com> <154ad551-d397-5abe-1b6a-7a3ddd129f3d@oracle.com> <4db8e49f-36f4-eb8f-2e6b-34f9e532fbdf@oracle.com> <3616e932-6b11-245e-14fa-94394716fa6d@oracle.com> Message-ID: <6451d01e-8e15-ce8d-cb34-6460735e13b4@oracle.com> On 7/9/19 5:40 PM, dean.long at oracle.com wrote: > On 7/9/19 2:16 PM, coleen.phillimore at oracle.com wrote: >> >> >> On 7/9/19 5:06 PM, dean.long at oracle.com wrote: >>> The updated comment sounds good.? Now that you have removed the only >>> place that was failing with retry_class_loading_during_parsing(), we >>> should be able to remove that method and its uses.? That gets rid of >>> the only way to "retry forever" vs the remaining and presumably safe >>> "down-grade and retry just once more".? Or you can file an RFE to >>> clean that up. >> >> Thanks Dean.? I noticed that >> C2Compiler::retry_class_loading_during_parsing()); >> >> is now not used with my change but didn't want to clean it up with >> this change.? I'll file an RFE to clean it up (or find some other use >> for it in the compiler code).? What is the remaining "downgrade and >> retry just once more" option? >> > > The remaining are retry_no_subsuming_loads(), > retry_no_escape_analysis(), and has_boxed_value() here: > > https://java.se.oracle.com/source/xref/jdk-jdk/open/src/hotspot/share/opto/c2compiler.cpp#112 > > > Notice that they all set some kind of flag to disable the current > failure, preventing infinite loops. I see.? Thanks for the code pointer.? I'll add this to the RFE. thanks, Coleen > > dl > >> Thanks for the help! >> Coleen >> >>> >>> dl >>> >>> On 7/8/19 2:19 PM, coleen.phillimore at oracle.com wrote: >>>> >>>> Hi,? From offline discussions, I updated the code in >>>> Parse::do_exits() to make the method not compilable if the return >>>> types don't match.? Otherwise it would revert a change that Volker >>>> made to prevent infinite compilation loops.? It seems that the >>>> compiler code has been changed to no longer exercise this path >>>> (ShouldNotReachHere never reached), so keeping the conservative >>>> path seemed safest. >>>> >>>> open webrev at >>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.02/webrev >>>> >>>> I changed the comment Dean, it might need help rewording. >>>> >>>> Tested with tier1-8. >>>> >>>> Thanks, >>>> Coleen >>>> >>>> On 6/21/19 4:44 PM, coleen.phillimore at oracle.com wrote: >>>>> >>>>> Dean,? Thank you for reviewing and for your help and discussion of >>>>> this change. >>>>> >>>>> On 6/21/19 3:48 PM, dean.long at oracle.com wrote: >>>>>> For the most part, this looks good. I only have a couple concerns: >>>>>> >>>>>> 1) The distinction in both validate_compile_task_dependencies >>>>>> functions between "dependencies failed" and "dependencies >>>>>> invalid" is even more fuzzy after this change.? I suggest filing >>>>>> an RFE to remove this distinction. >>>>> >>>>> Yes, in jvmciRuntime I had to carefully preserve this logic or >>>>> some tests failed.?? I'll file an RFE for you. >>>>>> >>>>>> 2) In Parse::do_exits(), we don't know that concurrent class >>>>>> loading didn't cause the problem.? We should be optimistic and >>>>>> allow the retry: >>>>>> C->record_failure(C2Compiler::retry_class_loading_during_parsing()); >>>>>> rather than more drastic >>>>>> ??? C->record_method_not_compilable >>>>>> This is actually what the code did in an earlier revision. >>>>> >>>>> Erik and I were trying to guess which was the right answer. It >>>>> seemed too lucky that you'd do concurrent class loading in this >>>>> time period, so we picked the more drastic answer, but I tested >>>>> both.? So I'll change it to the optimistic answer. >>>>> >>>>> Thanks! >>>>> Coleen >>>>>> >>>>>> dl >>>>>> >>>>>> On 6/20/19 10:28 AM, coleen.phillimore at oracle.com wrote: >>>>>>> Summary: Remove SystemDictionary::modification_counter optimization >>>>>>> >>>>>>> See bug for more details.? To avoid the assert in the bug >>>>>>> report, it's necessary to also increase the modification counter >>>>>>> for class unloading, which needs special code for concurrent >>>>>>> class unloading. The global counter is used to verify that >>>>>>> validate_dependencies() gets the same answer based on the >>>>>>> subklass hierarchy, but provides a quick exit in production >>>>>>> mode.? Removing it may allow more nmethods to be created that >>>>>>> don't depend on the classes that may be loaded while the Method >>>>>>> is being compiled. Performance testing was done on this with no >>>>>>> change in performance. Also investigated the breakpoint setting >>>>>>> code which incremented the modification counter. Dependent >>>>>>> compilations are invalidated using evol_method dependencies, so >>>>>>> updating the system dictionary modification counter isn't >>>>>>> unnecessary. >>>>>>> >>>>>>> Tested with hs-tier1-8 testing, and CTW, and local >>>>>>> jvmti/jdi/jdwp test runs with -Xcomp. >>>>>>> >>>>>>> open webrev at >>>>>>> http://cr.openjdk.java.net/~coleenp/2019/8222446.01/webrev >>>>>>> bug link https://bugs.openjdk.java.net/browse/JDK-8222446 >>>>>>> >>>>>>> Thanks, >>>>>>> Coleen >>>>>> >>>>> >>>> >>> >> > From dean.long at oracle.com Thu Jul 11 04:42:08 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Wed, 10 Jul 2019 21:42:08 -0700 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

Message-ID: <4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> On 7/10/19 1:28 AM, Erik ?sterlund wrote: > Hi Dean, > > On 2019-07-09 23:31, dean.long at oracle.com wrote: >> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>> For ZGC I moved OSR nmethod unlinking to before the unlinking (where >>> unlinking code belongs), instead of after the handshake (intended >>> for deleting things safely unlinked). >>> Strictly speaking, moving the OSR nmethod unlinking removes the >>> racing between make_not_entrant and make_unloaded, but I still want >>> the monotonicity guards to make this code more robust. >> >> I see where you added OSR nmethod unlinking, but not where you >> removed it, so it's not obvious it was a "move". > > Sorry, bad wording on my part. I added OSR nmethod unlinking before > the global handshake is run. After the handshake, we call > make_unloaded() on the same is_unloading() nmethods. That function > "tries" to unlink the OSR nmethod, but will just not do it as it's > already unlinked at that point. So in a way, I didn't remove the call > to unlink the OSR nmethod there, it just won't do anything. I > preferred structuring it that way instead of trying to optimize away > the call to unlink the OSR nmethod when making it unloaded, but only > for the concurrent case. It seemed to introduce more conditional magic > than it was worth. > So in practice, the unlinking of OSR nmethods has moved for concurrent > unloading to before the handshake. > OK, in that case, could you add a little information to the "Invalidate the osr nmethod only once" comment so that in the future someone isn't tempted to remove the code as redundant? >> Would it make sense for nmethod::unlink_from_method() to do the OSR >> unlinking, or to assert that it has already been done? > > An earlier version of this patch tried to do that. It is indeed > possible. But it requires changing lock ranks of the OSR nmethod lock > to special - 1 and moving around a bunch of code as this function is > also called both when making nmethods not_entrant, zombie, and > unlinking them in that case. For the first two, we conditionally > unlink the nmethod based on the current state (which is the old > state), whereas when I move it, the current state is the new state. So > I had to change things around a bit more to figure out the right > condition when to unlink it that works for all 3 callers. In the end, > since this is going to 13, I thought it's more important to minimize > the risk as much as I can, and leave such refactorings to 14. > OK. >> The new bailout in the middle of >> nmethod::make_not_entrant_or_zombie() worries me a little, because >> the code up to that point has side-effects, and we could be bailing >> out in an unexpected state. > > Correct. In an earlier version of this patch, I moved the transition > to before the side effects. But a bunch of code is using the current > nmethod state to determine what to do, and that current state changed > from the old to the new state. In particular, we conditionally patch > in the jump based on the current (old) state, and we conditionally > increment decompile count based on the current (old) state. So I ended > up having to rewrite more code than I wanted to for a patch going into > 13, and convince myself that I had not implicitly messed something up. > It felt safer to reason about the 3 side effects up until the > transitioning point: > > 1) Patching in the jump into VEP. Any state more dead than the current > transition, would still want that jump to be there. > 2) Incrementing decompile count when making it not_entrant. Seems in > order to do regardless, as we had an actual request to make the > nmethod not entrant because it was bad somehow. > 3) Marking it as seen on stack when making it not_entrant. This will > only make can_convert_to_zombie start returning false, which is > harmless in general. Also, as both transitions to zombie and > not_entrant are performed under the Patching_lock, the only possible > race is with make_unloaded. And those nmethods are is_unloading(), > which also makes can_convert_to_zombie return false (in a not racy > fashion). So it would essentially make no observable difference to any > single call to can_convert_to_zombie(). > > In summary, #1 and #3 don't really observably change the state of the > system, and #2 is completely harmless and probably wanted. Therefore I > found that moving these things around and finding out where we use the > current state(), as well as rewriting it, seemed like a slightly > scarier change for 13 to me. > > So in general, there is some refactoring that could be done (and I > have tried it) to make this nicer. But I want to minimize the risk for > 13 as much as possible, and perform any risky refactorings in 14 instead. > If your risk assessment is different and you would prefer moving the > transition higher up (and flipping some conditions) instead, I am > totally up for that too though, and I do see where you are coming from. > So if we fail, it means that we lost a race to a "deader" state, and assuming this is the only path to the deader state, wouldn't that also mean that #1, #2, and #3 would have already been done by the winning thread?? If so, that makes me feel better about bailing out in the middle, but I'm still not 100% convinced, unless we can assert that 1-3 already happened.? Do you have a prototype of what moving the transition higher up would look like? dl > BTW, I have tested this change through hs-tier1-7, and it looks good. > > Thanks a lot Dean for reviewing this code. > > /Erik > >> dl >> > From erik.osterlund at oracle.com Thu Jul 11 13:53:44 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 11 Jul 2019 09:53:44 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> Message-ID: Hi Dean, On 2019-07-11 00:42, dean.long at oracle.com wrote: > On 7/10/19 1:28 AM, Erik ?sterlund wrote: >> Hi Dean, >> >> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>> For ZGC I moved OSR nmethod unlinking to before the unlinking (where >>>> unlinking code belongs), instead of after the handshake (intended >>>> for deleting things safely unlinked). >>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>> racing between make_not_entrant and make_unloaded, but I still want >>>> the monotonicity guards to make this code more robust. >>> >>> I see where you added OSR nmethod unlinking, but not where you >>> removed it, so it's not obvious it was a "move". >> >> Sorry, bad wording on my part. I added OSR nmethod unlinking before >> the global handshake is run. After the handshake, we call >> make_unloaded() on the same is_unloading() nmethods. That function >> "tries" to unlink the OSR nmethod, but will just not do it as it's >> already unlinked at that point. So in a way, I didn't remove the call >> to unlink the OSR nmethod there, it just won't do anything. I >> preferred structuring it that way instead of trying to optimize away >> the call to unlink the OSR nmethod when making it unloaded, but only >> for the concurrent case. It seemed to introduce more conditional magic >> than it was worth. >> So in practice, the unlinking of OSR nmethods has moved for concurrent >> unloading to before the handshake. >> > > OK, in that case, could you add a little information to the "Invalidate > the osr nmethod only once" comment so that in the future someone isn't > tempted to remove the code as redundant? Sure. >>> Would it make sense for nmethod::unlink_from_method() to do the OSR >>> unlinking, or to assert that it has already been done? >> >> An earlier version of this patch tried to do that. It is indeed >> possible. But it requires changing lock ranks of the OSR nmethod lock >> to special - 1 and moving around a bunch of code as this function is >> also called both when making nmethods not_entrant, zombie, and >> unlinking them in that case. For the first two, we conditionally >> unlink the nmethod based on the current state (which is the old >> state), whereas when I move it, the current state is the new state. So >> I had to change things around a bit more to figure out the right >> condition when to unlink it that works for all 3 callers. In the end, >> since this is going to 13, I thought it's more important to minimize >> the risk as much as I can, and leave such refactorings to 14. >> > > OK. > >>> The new bailout in the middle of >>> nmethod::make_not_entrant_or_zombie() worries me a little, because >>> the code up to that point has side-effects, and we could be bailing >>> out in an unexpected state. >> >> Correct. In an earlier version of this patch, I moved the transition >> to before the side effects. But a bunch of code is using the current >> nmethod state to determine what to do, and that current state changed >> from the old to the new state. In particular, we conditionally patch >> in the jump based on the current (old) state, and we conditionally >> increment decompile count based on the current (old) state. So I ended >> up having to rewrite more code than I wanted to for a patch going into >> 13, and convince myself that I had not implicitly messed something up. >> It felt safer to reason about the 3 side effects up until the >> transitioning point: >> >> 1) Patching in the jump into VEP. Any state more dead than the current >> transition, would still want that jump to be there. >> 2) Incrementing decompile count when making it not_entrant. Seems in >> order to do regardless, as we had an actual request to make the >> nmethod not entrant because it was bad somehow. >> 3) Marking it as seen on stack when making it not_entrant. This will >> only make can_convert_to_zombie start returning false, which is >> harmless in general. Also, as both transitions to zombie and >> not_entrant are performed under the Patching_lock, the only possible >> race is with make_unloaded. And those nmethods are is_unloading(), >> which also makes can_convert_to_zombie return false (in a not racy >> fashion). So it would essentially make no observable difference to any >> single call to can_convert_to_zombie(). >> >> In summary, #1 and #3 don't really observably change the state of the >> system, and #2 is completely harmless and probably wanted. Therefore I >> found that moving these things around and finding out where we use the >> current state(), as well as rewriting it, seemed like a slightly >> scarier change for 13 to me. >> >> So in general, there is some refactoring that could be done (and I >> have tried it) to make this nicer. But I want to minimize the risk for >> 13 as much as possible, and perform any risky refactorings in 14 instead. >> If your risk assessment is different and you would prefer moving the >> transition higher up (and flipping some conditions) instead, I am >> totally up for that too though, and I do see where you are coming from. >> > > So if we fail, it means that we lost a race to a "deader" state, and > assuming this is the only path to the deader state, wouldn't that also > mean that #1, #2, and #3 would have already been done by the winning > thread?? If so, that makes me feel better about bailing out in the > middle, but I'm still not 100% convinced, unless we can assert that 1-3 > already happened.? Do you have a prototype of what moving the transition > higher up would look like? As a matter of fact I do. Here is a webrev: http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ I kind of like it. What do you think? Thanks, /Erik > dl > >> BTW, I have tested this change through hs-tier1-7, and it looks good. >> >> Thanks a lot Dean for reviewing this code. >> >> /Erik >> >>> dl >>> >> > From coleen.phillimore at oracle.com Thu Jul 11 14:46:31 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 11 Jul 2019 10:46:31 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> Message-ID: <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> Hi Erik, I had a look at this also and it seems reasonable and I like that 'unloaded' is less dead than 'zombie' now. http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html 1230 guarantee(try_transition(unloaded), "Invalid nmethod transition to unloaded"); This line worries me.? Can you explain why another thread could not have made this nmethod zombie already, before the handshake in zgc and the call afterward to make_unloaded() for this nmethod??? And add a comment here? Thanks, Coleen On 7/11/19 9:53 AM, Erik ?sterlund wrote: > Hi Dean, > > On 2019-07-11 00:42, dean.long at oracle.com wrote: >> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>> Hi Dean, >>> >>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>> (where unlinking code belongs), instead of after the handshake >>>>> (intended for deleting things safely unlinked). >>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>> racing between make_not_entrant and make_unloaded, but I still >>>>> want the monotonicity guards to make this code more robust. >>>> >>>> I see where you added OSR nmethod unlinking, but not where you >>>> removed it, so it's not obvious it was a "move". >>> >>> Sorry, bad wording on my part. I added OSR nmethod unlinking before >>> the global handshake is run. After the handshake, we call >>> make_unloaded() on the same is_unloading() nmethods. That function >>> "tries" to unlink the OSR nmethod, but will just not do it as it's >>> already unlinked at that point. So in a way, I didn't remove the >>> call to unlink the OSR nmethod there, it just won't do anything. I >>> preferred structuring it that way instead of trying to optimize away >>> the call to unlink the OSR nmethod when making it unloaded, but only >>> for the concurrent case. It seemed to introduce more conditional >>> magic than it was worth. >>> So in practice, the unlinking of OSR nmethods has moved for >>> concurrent unloading to before the handshake. >>> >> >> OK, in that case, could you add a little information to the >> "Invalidate the osr nmethod only once" comment so that in the future >> someone isn't tempted to remove the code as redundant? > > Sure. > >>>> Would it make sense for nmethod::unlink_from_method() to do the OSR >>>> unlinking, or to assert that it has already been done? >>> >>> An earlier version of this patch tried to do that. It is indeed >>> possible. But it requires changing lock ranks of the OSR nmethod >>> lock to special - 1 and moving around a bunch of code as this >>> function is also called both when making nmethods not_entrant, >>> zombie, and unlinking them in that case. For the first two, we >>> conditionally unlink the nmethod based on the current state (which >>> is the old state), whereas when I move it, the current state is the >>> new state. So I had to change things around a bit more to figure out >>> the right condition when to unlink it that works for all 3 callers. >>> In the end, since this is going to 13, I thought it's more important >>> to minimize the risk as much as I can, and leave such refactorings >>> to 14. >>> >> >> OK. >> >>>> The new bailout in the middle of >>>> nmethod::make_not_entrant_or_zombie() worries me a little, because >>>> the code up to that point has side-effects, and we could be bailing >>>> out in an unexpected state. >>> >>> Correct. In an earlier version of this patch, I moved the transition >>> to before the side effects. But a bunch of code is using the current >>> nmethod state to determine what to do, and that current state >>> changed from the old to the new state. In particular, we >>> conditionally patch in the jump based on the current (old) state, >>> and we conditionally increment decompile count based on the current >>> (old) state. So I ended up having to rewrite more code than I wanted >>> to for a patch going into 13, and convince myself that I had not >>> implicitly messed something up. It felt safer to reason about the 3 >>> side effects up until the transitioning point: >>> >>> 1) Patching in the jump into VEP. Any state more dead than the >>> current transition, would still want that jump to be there. >>> 2) Incrementing decompile count when making it not_entrant. Seems in >>> order to do regardless, as we had an actual request to make the >>> nmethod not entrant because it was bad somehow. >>> 3) Marking it as seen on stack when making it not_entrant. This will >>> only make can_convert_to_zombie start returning false, which is >>> harmless in general. Also, as both transitions to zombie and >>> not_entrant are performed under the Patching_lock, the only possible >>> race is with make_unloaded. And those nmethods are is_unloading(), >>> which also makes can_convert_to_zombie return false (in a not racy >>> fashion). So it would essentially make no observable difference to >>> any single call to can_convert_to_zombie(). >>> >>> In summary, #1 and #3 don't really observably change the state of >>> the system, and #2 is completely harmless and probably wanted. >>> Therefore I found that moving these things around and finding out >>> where we use the current state(), as well as rewriting it, seemed >>> like a slightly scarier change for 13 to me. >>> >>> So in general, there is some refactoring that could be done (and I >>> have tried it) to make this nicer. But I want to minimize the risk >>> for 13 as much as possible, and perform any risky refactorings in 14 >>> instead. >>> If your risk assessment is different and you would prefer moving the >>> transition higher up (and flipping some conditions) instead, I am >>> totally up for that too though, and I do see where you are coming from. >>> >> >> So if we fail, it means that we lost a race to a "deader" state, and >> assuming this is the only path to the deader state, wouldn't that >> also mean that #1, #2, and #3 would have already been done by the >> winning thread?? If so, that makes me feel better about bailing out >> in the middle, but I'm still not 100% convinced, unless we can assert >> that 1-3 already happened.? Do you have a prototype of what moving >> the transition higher up would look like? > > As a matter of fact I do. Here is a webrev: > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ > > I kind of like it. What do you think? > > Thanks, > /Erik > >> dl >> >>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>> >>> Thanks a lot Dean for reviewing this code. >>> >>> /Erik >>> >>>> dl >>>> >>> >> From erik.osterlund at oracle.com Thu Jul 11 18:18:30 2019 From: erik.osterlund at oracle.com (Erik Osterlund) Date: Thu, 11 Jul 2019 20:18:30 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> Message-ID: <3D8AC43D-B56B-44AF-AD9F-A0F663A7DBDD@oracle.com> Hi Coleen, > On 11 Jul 2019, at 16:46, coleen.phillimore at oracle.com wrote: > > > Hi Erik, > I had a look at this also and it seems reasonable and I like that 'unloaded' is less dead than 'zombie' now. Glad you like it! > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html > > 1230 guarantee(try_transition(unloaded), "Invalid nmethod transition to unloaded"); > > > This line worries me. Can you explain why another thread could not have made this nmethod zombie already, before the handshake in zgc and the call afterward to make_unloaded() for this nmethod? And add a comment here? Sure, will add a comment. Like this: ?It is an important invariant that there exists no race between the sweeper and GC thread competing for making the same nmethod zombie and unloaded respectively. This is ensured by can_convert_to_zombie() returning false for any is_unloading() nmethod, informing the sweeper not to step on any GC toes? Does that sound comprehensible? Thanks, /Erik > Thanks, > Coleen > >> On 7/11/19 9:53 AM, Erik ?sterlund wrote: >> Hi Dean, >> >>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>> Hi Dean, >>>> >>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking (where unlinking code belongs), instead of after the handshake (intended for deleting things safely unlinked). >>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the racing between make_not_entrant and make_unloaded, but I still want the monotonicity guards to make this code more robust. >>>>> >>>>> I see where you added OSR nmethod unlinking, but not where you removed it, so it's not obvious it was a "move". >>>> >>>> Sorry, bad wording on my part. I added OSR nmethod unlinking before the global handshake is run. After the handshake, we call make_unloaded() on the same is_unloading() nmethods. That function "tries" to unlink the OSR nmethod, but will just not do it as it's already unlinked at that point. So in a way, I didn't remove the call to unlink the OSR nmethod there, it just won't do anything. I preferred structuring it that way instead of trying to optimize away the call to unlink the OSR nmethod when making it unloaded, but only for the concurrent case. It seemed to introduce more conditional magic than it was worth. >>>> So in practice, the unlinking of OSR nmethods has moved for concurrent unloading to before the handshake. >>>> >>> >>> OK, in that case, could you add a little information to the "Invalidate the osr nmethod only once" comment so that in the future someone isn't tempted to remove the code as redundant? >> >> Sure. >> >>>>> Would it make sense for nmethod::unlink_from_method() to do the OSR unlinking, or to assert that it has already been done? >>>> >>>> An earlier version of this patch tried to do that. It is indeed possible. But it requires changing lock ranks of the OSR nmethod lock to special - 1 and moving around a bunch of code as this function is also called both when making nmethods not_entrant, zombie, and unlinking them in that case. For the first two, we conditionally unlink the nmethod based on the current state (which is the old state), whereas when I move it, the current state is the new state. So I had to change things around a bit more to figure out the right condition when to unlink it that works for all 3 callers. In the end, since this is going to 13, I thought it's more important to minimize the risk as much as I can, and leave such refactorings to 14. >>>> >>> >>> OK. >>> >>>>> The new bailout in the middle of nmethod::make_not_entrant_or_zombie() worries me a little, because the code up to that point has side-effects, and we could be bailing out in an unexpected state. >>>> >>>> Correct. In an earlier version of this patch, I moved the transition to before the side effects. But a bunch of code is using the current nmethod state to determine what to do, and that current state changed from the old to the new state. In particular, we conditionally patch in the jump based on the current (old) state, and we conditionally increment decompile count based on the current (old) state. So I ended up having to rewrite more code than I wanted to for a patch going into 13, and convince myself that I had not implicitly messed something up. It felt safer to reason about the 3 side effects up until the transitioning point: >>>> >>>> 1) Patching in the jump into VEP. Any state more dead than the current transition, would still want that jump to be there. >>>> 2) Incrementing decompile count when making it not_entrant. Seems in order to do regardless, as we had an actual request to make the nmethod not entrant because it was bad somehow. >>>> 3) Marking it as seen on stack when making it not_entrant. This will only make can_convert_to_zombie start returning false, which is harmless in general. Also, as both transitions to zombie and not_entrant are performed under the Patching_lock, the only possible race is with make_unloaded. And those nmethods are is_unloading(), which also makes can_convert_to_zombie return false (in a not racy fashion). So it would essentially make no observable difference to any single call to can_convert_to_zombie(). >>>> >>>> In summary, #1 and #3 don't really observably change the state of the system, and #2 is completely harmless and probably wanted. Therefore I found that moving these things around and finding out where we use the current state(), as well as rewriting it, seemed like a slightly scarier change for 13 to me. >>>> >>>> So in general, there is some refactoring that could be done (and I have tried it) to make this nicer. But I want to minimize the risk for 13 as much as possible, and perform any risky refactorings in 14 instead. >>>> If your risk assessment is different and you would prefer moving the transition higher up (and flipping some conditions) instead, I am totally up for that too though, and I do see where you are coming from. >>>> >>> >>> So if we fail, it means that we lost a race to a "deader" state, and assuming this is the only path to the deader state, wouldn't that also mean that #1, #2, and #3 would have already been done by the winning thread? If so, that makes me feel better about bailing out in the middle, but I'm still not 100% convinced, unless we can assert that 1-3 already happened. Do you have a prototype of what moving the transition higher up would look like? >> >> As a matter of fact I do. Here is a webrev: >> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >> >> I kind of like it. What do you think? >> >> Thanks, >> /Erik >> >>> dl >>> >>>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>>> >>>> Thanks a lot Dean for reviewing this code. >>>> >>>> /Erik >>>> >>>>> dl >>>>> >>>> >>> > From coleen.phillimore at oracle.com Thu Jul 11 18:23:54 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 11 Jul 2019 14:23:54 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <3D8AC43D-B56B-44AF-AD9F-A0F663A7DBDD@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> <3D8AC43D-B56B-44AF-AD9F-A0F663A7DBDD@oracle.com> Message-ID: On 7/11/19 2:18 PM, Erik Osterlund wrote: > Hi Coleen, > >> On 11 Jul 2019, at 16:46, coleen.phillimore at oracle.com wrote: >> >> >> Hi Erik, >> I had a look at this also and it seems reasonable and I like that 'unloaded' is less dead than 'zombie' now. > Glad you like it! > >> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html >> >> 1230 guarantee(try_transition(unloaded), "Invalid nmethod transition to unloaded"); >> >> >> This line worries me. Can you explain why another thread could not have made this nmethod zombie already, before the handshake in zgc and the call afterward to make_unloaded() for this nmethod? And add a comment here? > Sure, will add a comment. Like this: > ?It is an important invariant that there exists no race between the sweeper and GC thread competing for making the same nmethod zombie and unloaded respectively. This is ensured by can_convert_to_zombie() returning false for any is_unloading() nmethod, informing the sweeper not to step on any GC toes? Yes, I like the comment.? Should it be an assert instead though? Thanks, Coleen > > Does that sound comprehensible? > > Thanks, > /Erik > >> Thanks, >> Coleen >> >>> On 7/11/19 9:53 AM, Erik ?sterlund wrote: >>> Hi Dean, >>> >>>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>>> Hi Dean, >>>>> >>>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking (where unlinking code belongs), instead of after the handshake (intended for deleting things safely unlinked). >>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the racing between make_not_entrant and make_unloaded, but I still want the monotonicity guards to make this code more robust. >>>>>> I see where you added OSR nmethod unlinking, but not where you removed it, so it's not obvious it was a "move". >>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking before the global handshake is run. After the handshake, we call make_unloaded() on the same is_unloading() nmethods. That function "tries" to unlink the OSR nmethod, but will just not do it as it's already unlinked at that point. So in a way, I didn't remove the call to unlink the OSR nmethod there, it just won't do anything. I preferred structuring it that way instead of trying to optimize away the call to unlink the OSR nmethod when making it unloaded, but only for the concurrent case. It seemed to introduce more conditional magic than it was worth. >>>>> So in practice, the unlinking of OSR nmethods has moved for concurrent unloading to before the handshake. >>>>> >>>> OK, in that case, could you add a little information to the "Invalidate the osr nmethod only once" comment so that in the future someone isn't tempted to remove the code as redundant? >>> Sure. >>> >>>>>> Would it make sense for nmethod::unlink_from_method() to do the OSR unlinking, or to assert that it has already been done? >>>>> An earlier version of this patch tried to do that. It is indeed possible. But it requires changing lock ranks of the OSR nmethod lock to special - 1 and moving around a bunch of code as this function is also called both when making nmethods not_entrant, zombie, and unlinking them in that case. For the first two, we conditionally unlink the nmethod based on the current state (which is the old state), whereas when I move it, the current state is the new state. So I had to change things around a bit more to figure out the right condition when to unlink it that works for all 3 callers. In the end, since this is going to 13, I thought it's more important to minimize the risk as much as I can, and leave such refactorings to 14. >>>>> >>>> OK. >>>> >>>>>> The new bailout in the middle of nmethod::make_not_entrant_or_zombie() worries me a little, because the code up to that point has side-effects, and we could be bailing out in an unexpected state. >>>>> Correct. In an earlier version of this patch, I moved the transition to before the side effects. But a bunch of code is using the current nmethod state to determine what to do, and that current state changed from the old to the new state. In particular, we conditionally patch in the jump based on the current (old) state, and we conditionally increment decompile count based on the current (old) state. So I ended up having to rewrite more code than I wanted to for a patch going into 13, and convince myself that I had not implicitly messed something up. It felt safer to reason about the 3 side effects up until the transitioning point: >>>>> >>>>> 1) Patching in the jump into VEP. Any state more dead than the current transition, would still want that jump to be there. >>>>> 2) Incrementing decompile count when making it not_entrant. Seems in order to do regardless, as we had an actual request to make the nmethod not entrant because it was bad somehow. >>>>> 3) Marking it as seen on stack when making it not_entrant. This will only make can_convert_to_zombie start returning false, which is harmless in general. Also, as both transitions to zombie and not_entrant are performed under the Patching_lock, the only possible race is with make_unloaded. And those nmethods are is_unloading(), which also makes can_convert_to_zombie return false (in a not racy fashion). So it would essentially make no observable difference to any single call to can_convert_to_zombie(). >>>>> >>>>> In summary, #1 and #3 don't really observably change the state of the system, and #2 is completely harmless and probably wanted. Therefore I found that moving these things around and finding out where we use the current state(), as well as rewriting it, seemed like a slightly scarier change for 13 to me. >>>>> >>>>> So in general, there is some refactoring that could be done (and I have tried it) to make this nicer. But I want to minimize the risk for 13 as much as possible, and perform any risky refactorings in 14 instead. >>>>> If your risk assessment is different and you would prefer moving the transition higher up (and flipping some conditions) instead, I am totally up for that too though, and I do see where you are coming from. >>>>> >>>> So if we fail, it means that we lost a race to a "deader" state, and assuming this is the only path to the deader state, wouldn't that also mean that #1, #2, and #3 would have already been done by the winning thread? If so, that makes me feel better about bailing out in the middle, but I'm still not 100% convinced, unless we can assert that 1-3 already happened. Do you have a prototype of what moving the transition higher up would look like? >>> As a matter of fact I do. Here is a webrev: >>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >>> >>> I kind of like it. What do you think? >>> >>> Thanks, >>> /Erik >>> >>>> dl >>>> >>>>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>>>> >>>>> Thanks a lot Dean for reviewing this code. >>>>> >>>>> /Erik >>>>> >>>>>> dl >>>>>> From erik.osterlund at oracle.com Thu Jul 11 19:02:45 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 11 Jul 2019 15:02:45 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> <3D8AC43D-B56B-44AF-AD9F-A0F663A7DBDD@oracle.com> Message-ID: <92ba7623-5c9c-8808-2e82-1151b356de58@oracle.com> Hi Coleen, On 2019-07-11 14:23, coleen.phillimore at oracle.com wrote: > > > On 7/11/19 2:18 PM, Erik Osterlund wrote: >> Hi Coleen, >> >>> On 11 Jul 2019, at 16:46, coleen.phillimore at oracle.com wrote: >>> >>> >>> Hi Erik, >>> I had a look at this also and it seems reasonable and I like that >>> 'unloaded' is less dead than 'zombie' now. >> Glad you like it! >> >>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html >>> >>> >>> 1230 guarantee(try_transition(unloaded), "Invalid nmethod transition >>> to unloaded"); >>> >>> >>> This line worries me.? Can you explain why another thread could not >>> have made this nmethod zombie already, before the handshake in zgc >>> and the call afterward to make_unloaded() for this nmethod??? And add >>> a comment here? >> Sure, will add a comment. Like this: >> ?It is an important invariant that there exists no race between the >> sweeper and GC thread competing for making the same nmethod zombie and >> unloaded respectively. This is ensured by can_convert_to_zombie() >> returning false for any is_unloading() nmethod, informing the sweeper >> not to step on any GC toes? > > Yes, I like the comment.? Should it be an assert instead though? Sure, why not! The comment addition and assert instead of guarantee: http://cr.openjdk.java.net/~eosterlund/8224674/webrev.02/ Thanks, /Erik > Thanks, > Coleen >> >> Does that sound comprehensible? >> >> Thanks, >> /Erik >> >>> Thanks, >>> Coleen >>> >>>> On 7/11/19 9:53 AM, Erik ?sterlund wrote: >>>> Hi Dean, >>>> >>>>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>>>> Hi Dean, >>>>>> >>>>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>>>>> (where unlinking code belongs), instead of after the handshake >>>>>>>> (intended for deleting things safely unlinked). >>>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>>>>> racing between make_not_entrant and make_unloaded, but I still >>>>>>>> want the monotonicity guards to make this code more robust. >>>>>>> I see where you added OSR nmethod unlinking, but not where you >>>>>>> removed it, so it's not obvious it was a "move". >>>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking >>>>>> before the global handshake is run. After the handshake, we call >>>>>> make_unloaded() on the same is_unloading() nmethods. That function >>>>>> "tries" to unlink the OSR nmethod, but will just not do it as it's >>>>>> already unlinked at that point. So in a way, I didn't remove the >>>>>> call to unlink the OSR nmethod there, it just won't do anything. I >>>>>> preferred structuring it that way instead of trying to optimize >>>>>> away the call to unlink the OSR nmethod when making it unloaded, >>>>>> but only for the concurrent case. It seemed to introduce more >>>>>> conditional magic than it was worth. >>>>>> So in practice, the unlinking of OSR nmethods has moved for >>>>>> concurrent unloading to before the handshake. >>>>>> >>>>> OK, in that case, could you add a little information to the >>>>> "Invalidate the osr nmethod only once" comment so that in the >>>>> future someone isn't tempted to remove the code as redundant? >>>> Sure. >>>> >>>>>>> Would it make sense for nmethod::unlink_from_method() to do the >>>>>>> OSR unlinking, or to assert that it has already been done? >>>>>> An earlier version of this patch tried to do that. It is indeed >>>>>> possible. But it requires changing lock ranks of the OSR nmethod >>>>>> lock to special - 1 and moving around a bunch of code as this >>>>>> function is also called both when making nmethods not_entrant, >>>>>> zombie, and unlinking them in that case. For the first two, we >>>>>> conditionally unlink the nmethod based on the current state (which >>>>>> is the old state), whereas when I move it, the current state is >>>>>> the new state. So I had to change things around a bit more to >>>>>> figure out the right condition when to unlink it that works for >>>>>> all 3 callers. In the end, since this is going to 13, I thought >>>>>> it's more important to minimize the risk as much as I can, and >>>>>> leave such refactorings to 14. >>>>>> >>>>> OK. >>>>> >>>>>>> The new bailout in the middle of >>>>>>> nmethod::make_not_entrant_or_zombie() worries me a little, >>>>>>> because the code up to that point has side-effects, and we could >>>>>>> be bailing out in an unexpected state. >>>>>> Correct. In an earlier version of this patch, I moved the >>>>>> transition to before the side effects. But a bunch of code is >>>>>> using the current nmethod state to determine what to do, and that >>>>>> current state changed from the old to the new state. In >>>>>> particular, we conditionally patch in the jump based on the >>>>>> current (old) state, and we conditionally increment decompile >>>>>> count based on the current (old) state. So I ended up having to >>>>>> rewrite more code than I wanted to for a patch going into 13, and >>>>>> convince myself that I had not implicitly messed something up. It >>>>>> felt safer to reason about the 3 side effects up until the >>>>>> transitioning point: >>>>>> >>>>>> 1) Patching in the jump into VEP. Any state more dead than the >>>>>> current transition, would still want that jump to be there. >>>>>> 2) Incrementing decompile count when making it not_entrant. Seems >>>>>> in order to do regardless, as we had an actual request to make the >>>>>> nmethod not entrant because it was bad somehow. >>>>>> 3) Marking it as seen on stack when making it not_entrant. This >>>>>> will only make can_convert_to_zombie start returning false, which >>>>>> is harmless in general. Also, as both transitions to zombie and >>>>>> not_entrant are performed under the Patching_lock, the only >>>>>> possible race is with make_unloaded. And those nmethods are >>>>>> is_unloading(), which also makes can_convert_to_zombie return >>>>>> false (in a not racy fashion). So it would essentially make no >>>>>> observable difference to any single call to can_convert_to_zombie(). >>>>>> >>>>>> In summary, #1 and #3 don't really observably change the state of >>>>>> the system, and #2 is completely harmless and probably wanted. >>>>>> Therefore I found that moving these things around and finding out >>>>>> where we use the current state(), as well as rewriting it, seemed >>>>>> like a slightly scarier change for 13 to me. >>>>>> >>>>>> So in general, there is some refactoring that could be done (and I >>>>>> have tried it) to make this nicer. But I want to minimize the risk >>>>>> for 13 as much as possible, and perform any risky refactorings in >>>>>> 14 instead. >>>>>> If your risk assessment is different and you would prefer moving >>>>>> the transition higher up (and flipping some conditions) instead, I >>>>>> am totally up for that too though, and I do see where you are >>>>>> coming from. >>>>>> >>>>> So if we fail, it means that we lost a race to a "deader" state, >>>>> and assuming this is the only path to the deader state, wouldn't >>>>> that also mean that #1, #2, and #3 would have already been done by >>>>> the winning thread?? If so, that makes me feel better about bailing >>>>> out in the middle, but I'm still not 100% convinced, unless we can >>>>> assert that 1-3 already happened.? Do you have a prototype of what >>>>> moving the transition higher up would look like? >>>> As a matter of fact I do. Here is a webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >>>> >>>> I kind of like it. What do you think? >>>> >>>> Thanks, >>>> /Erik >>>> >>>>> dl >>>>> >>>>>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>>>>> >>>>>> Thanks a lot Dean for reviewing this code. >>>>>> >>>>>> /Erik >>>>>> >>>>>>> dl >>>>>>> > From dean.long at oracle.com Thu Jul 11 19:29:38 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Thu, 11 Jul 2019 12:29:38 -0700 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> Message-ID: On 7/11/19 6:53 AM, Erik ?sterlund wrote: > Hi Dean, > > On 2019-07-11 00:42, dean.long at oracle.com wrote: >> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>> Hi Dean, >>> >>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>> (where unlinking code belongs), instead of after the handshake >>>>> (intended for deleting things safely unlinked). >>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>> racing between make_not_entrant and make_unloaded, but I still >>>>> want the monotonicity guards to make this code more robust. >>>> >>>> I see where you added OSR nmethod unlinking, but not where you >>>> removed it, so it's not obvious it was a "move". >>> >>> Sorry, bad wording on my part. I added OSR nmethod unlinking before >>> the global handshake is run. After the handshake, we call >>> make_unloaded() on the same is_unloading() nmethods. That function >>> "tries" to unlink the OSR nmethod, but will just not do it as it's >>> already unlinked at that point. So in a way, I didn't remove the >>> call to unlink the OSR nmethod there, it just won't do anything. I >>> preferred structuring it that way instead of trying to optimize away >>> the call to unlink the OSR nmethod when making it unloaded, but only >>> for the concurrent case. It seemed to introduce more conditional >>> magic than it was worth. >>> So in practice, the unlinking of OSR nmethods has moved for >>> concurrent unloading to before the handshake. >>> >> >> OK, in that case, could you add a little information to the >> "Invalidate the osr nmethod only once" comment so that in the future >> someone isn't tempted to remove the code as redundant? > > Sure. > I meant the one in zNMethod.cpp :-) >>>> Would it make sense for nmethod::unlink_from_method() to do the OSR >>>> unlinking, or to assert that it has already been done? >>> >>> An earlier version of this patch tried to do that. It is indeed >>> possible. But it requires changing lock ranks of the OSR nmethod >>> lock to special - 1 and moving around a bunch of code as this >>> function is also called both when making nmethods not_entrant, >>> zombie, and unlinking them in that case. For the first two, we >>> conditionally unlink the nmethod based on the current state (which >>> is the old state), whereas when I move it, the current state is the >>> new state. So I had to change things around a bit more to figure out >>> the right condition when to unlink it that works for all 3 callers. >>> In the end, since this is going to 13, I thought it's more important >>> to minimize the risk as much as I can, and leave such refactorings >>> to 14. >>> >> >> OK. >> >>>> The new bailout in the middle of >>>> nmethod::make_not_entrant_or_zombie() worries me a little, because >>>> the code up to that point has side-effects, and we could be bailing >>>> out in an unexpected state. >>> >>> Correct. In an earlier version of this patch, I moved the transition >>> to before the side effects. But a bunch of code is using the current >>> nmethod state to determine what to do, and that current state >>> changed from the old to the new state. In particular, we >>> conditionally patch in the jump based on the current (old) state, >>> and we conditionally increment decompile count based on the current >>> (old) state. So I ended up having to rewrite more code than I wanted >>> to for a patch going into 13, and convince myself that I had not >>> implicitly messed something up. It felt safer to reason about the 3 >>> side effects up until the transitioning point: >>> >>> 1) Patching in the jump into VEP. Any state more dead than the >>> current transition, would still want that jump to be there. >>> 2) Incrementing decompile count when making it not_entrant. Seems in >>> order to do regardless, as we had an actual request to make the >>> nmethod not entrant because it was bad somehow. >>> 3) Marking it as seen on stack when making it not_entrant. This will >>> only make can_convert_to_zombie start returning false, which is >>> harmless in general. Also, as both transitions to zombie and >>> not_entrant are performed under the Patching_lock, the only possible >>> race is with make_unloaded. And those nmethods are is_unloading(), >>> which also makes can_convert_to_zombie return false (in a not racy >>> fashion). So it would essentially make no observable difference to >>> any single call to can_convert_to_zombie(). >>> >>> In summary, #1 and #3 don't really observably change the state of >>> the system, and #2 is completely harmless and probably wanted. >>> Therefore I found that moving these things around and finding out >>> where we use the current state(), as well as rewriting it, seemed >>> like a slightly scarier change for 13 to me. >>> >>> So in general, there is some refactoring that could be done (and I >>> have tried it) to make this nicer. But I want to minimize the risk >>> for 13 as much as possible, and perform any risky refactorings in 14 >>> instead. >>> If your risk assessment is different and you would prefer moving the >>> transition higher up (and flipping some conditions) instead, I am >>> totally up for that too though, and I do see where you are coming from. >>> >> >> So if we fail, it means that we lost a race to a "deader" state, and >> assuming this is the only path to the deader state, wouldn't that >> also mean that #1, #2, and #3 would have already been done by the >> winning thread?? If so, that makes me feel better about bailing out >> in the middle, but I'm still not 100% convinced, unless we can assert >> that 1-3 already happened.? Do you have a prototype of what moving >> the transition higher up would look like? > > As a matter of fact I do. Here is a webrev: > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ > > I kind of like it. What do you think? > Now the code after the transition that says "Must happen before state change" worries me.? Can you remind me again what kind of race can make the state transition fail here?? Did you happen to draw a state diagram while learning this code? :-) dl > Thanks, > /Erik > >> dl >> >>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>> >>> Thanks a lot Dean for reviewing this code. >>> >>> /Erik >>> >>>> dl >>>> >>> >> From coleen.phillimore at oracle.com Thu Jul 11 19:39:24 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Thu, 11 Jul 2019 15:39:24 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: <92ba7623-5c9c-8808-2e82-1151b356de58@oracle.com> References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com> <2fc92bdb-ff1b-ad48-0e5e-9983e619c3c7@oracle.com> <3D8AC43D-B56B-44AF-AD9F-A0F663A7DBDD@oracle.com> <92ba7623-5c9c-8808-2e82-1151b356de58@oracle.com> Message-ID: <929f68da-7365-57a7-e659-42d62adfebab@oracle.com> On 7/11/19 3:02 PM, Erik ?sterlund wrote: > Hi Coleen, > > On 2019-07-11 14:23, coleen.phillimore at oracle.com wrote: >> >> >> On 7/11/19 2:18 PM, Erik Osterlund wrote: >>> Hi Coleen, >>> >>>> On 11 Jul 2019, at 16:46, coleen.phillimore at oracle.com wrote: >>>> >>>> >>>> Hi Erik, >>>> I had a look at this also and it seems reasonable and I like that >>>> 'unloaded' is less dead than 'zombie' now. >>> Glad you like it! >>> >>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.00/src/hotspot/share/code/nmethod.cpp.frames.html >>>> >>>> >>>> 1230 guarantee(try_transition(unloaded), "Invalid nmethod >>>> transition to unloaded"); >>>> >>>> >>>> This line worries me.? Can you explain why another thread could not >>>> have made this nmethod zombie already, before the handshake in zgc >>>> and the call afterward to make_unloaded() for this nmethod??? And >>>> add a comment here? >>> Sure, will add a comment. Like this: >>> ?It is an important invariant that there exists no race between the >>> sweeper and GC thread competing for making the same nmethod zombie >>> and unloaded respectively. This is ensured by >>> can_convert_to_zombie() returning false for any is_unloading() >>> nmethod, informing the sweeper not to step on any GC toes? >> >> Yes, I like the comment.? Should it be an assert instead though? > > Sure, why not! The comment addition and assert instead of guarantee: > http://cr.openjdk.java.net/~eosterlund/8224674/webrev.02/ Looks good! Coleen > > Thanks, > /Erik > >> Thanks, >> Coleen >>> >>> Does that sound comprehensible? >>> >>> Thanks, >>> /Erik >>> >>>> Thanks, >>>> Coleen >>>> >>>>> On 7/11/19 9:53 AM, Erik ?sterlund wrote: >>>>> Hi Dean, >>>>> >>>>>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>>>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>>>>> Hi Dean, >>>>>>> >>>>>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>>>>>> (where unlinking code belongs), instead of after the handshake >>>>>>>>> (intended for deleting things safely unlinked). >>>>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes >>>>>>>>> the racing between make_not_entrant and make_unloaded, but I >>>>>>>>> still want the monotonicity guards to make this code more robust. >>>>>>>> I see where you added OSR nmethod unlinking, but not where you >>>>>>>> removed it, so it's not obvious it was a "move". >>>>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking >>>>>>> before the global handshake is run. After the handshake, we call >>>>>>> make_unloaded() on the same is_unloading() nmethods. That >>>>>>> function "tries" to unlink the OSR nmethod, but will just not do >>>>>>> it as it's already unlinked at that point. So in a way, I didn't >>>>>>> remove the call to unlink the OSR nmethod there, it just won't >>>>>>> do anything. I preferred structuring it that way instead of >>>>>>> trying to optimize away the call to unlink the OSR nmethod when >>>>>>> making it unloaded, but only for the concurrent case. It seemed >>>>>>> to introduce more conditional magic than it was worth. >>>>>>> So in practice, the unlinking of OSR nmethods has moved for >>>>>>> concurrent unloading to before the handshake. >>>>>>> >>>>>> OK, in that case, could you add a little information to the >>>>>> "Invalidate the osr nmethod only once" comment so that in the >>>>>> future someone isn't tempted to remove the code as redundant? >>>>> Sure. >>>>> >>>>>>>> Would it make sense for nmethod::unlink_from_method() to do the >>>>>>>> OSR unlinking, or to assert that it has already been done? >>>>>>> An earlier version of this patch tried to do that. It is indeed >>>>>>> possible. But it requires changing lock ranks of the OSR nmethod >>>>>>> lock to special - 1 and moving around a bunch of code as this >>>>>>> function is also called both when making nmethods not_entrant, >>>>>>> zombie, and unlinking them in that case. For the first two, we >>>>>>> conditionally unlink the nmethod based on the current state >>>>>>> (which is the old state), whereas when I move it, the current >>>>>>> state is the new state. So I had to change things around a bit >>>>>>> more to figure out the right condition when to unlink it that >>>>>>> works for all 3 callers. In the end, since this is going to 13, >>>>>>> I thought it's more important to minimize the risk as much as I >>>>>>> can, and leave such refactorings to 14. >>>>>>> >>>>>> OK. >>>>>> >>>>>>>> The new bailout in the middle of >>>>>>>> nmethod::make_not_entrant_or_zombie() worries me a little, >>>>>>>> because the code up to that point has side-effects, and we >>>>>>>> could be bailing out in an unexpected state. >>>>>>> Correct. In an earlier version of this patch, I moved the >>>>>>> transition to before the side effects. But a bunch of code is >>>>>>> using the current nmethod state to determine what to do, and >>>>>>> that current state changed from the old to the new state. In >>>>>>> particular, we conditionally patch in the jump based on the >>>>>>> current (old) state, and we conditionally increment decompile >>>>>>> count based on the current (old) state. So I ended up having to >>>>>>> rewrite more code than I wanted to for a patch going into 13, >>>>>>> and convince myself that I had not implicitly messed something >>>>>>> up. It felt safer to reason about the 3 side effects up until >>>>>>> the transitioning point: >>>>>>> >>>>>>> 1) Patching in the jump into VEP. Any state more dead than the >>>>>>> current transition, would still want that jump to be there. >>>>>>> 2) Incrementing decompile count when making it not_entrant. >>>>>>> Seems in order to do regardless, as we had an actual request to >>>>>>> make the nmethod not entrant because it was bad somehow. >>>>>>> 3) Marking it as seen on stack when making it not_entrant. This >>>>>>> will only make can_convert_to_zombie start returning false, >>>>>>> which is harmless in general. Also, as both transitions to >>>>>>> zombie and not_entrant are performed under the Patching_lock, >>>>>>> the only possible race is with make_unloaded. And those nmethods >>>>>>> are is_unloading(), which also makes can_convert_to_zombie >>>>>>> return false (in a not racy fashion). So it would essentially >>>>>>> make no observable difference to any single call to >>>>>>> can_convert_to_zombie(). >>>>>>> >>>>>>> In summary, #1 and #3 don't really observably change the state >>>>>>> of the system, and #2 is completely harmless and probably >>>>>>> wanted. Therefore I found that moving these things around and >>>>>>> finding out where we use the current state(), as well as >>>>>>> rewriting it, seemed like a slightly scarier change for 13 to me. >>>>>>> >>>>>>> So in general, there is some refactoring that could be done (and >>>>>>> I have tried it) to make this nicer. But I want to minimize the >>>>>>> risk for 13 as much as possible, and perform any risky >>>>>>> refactorings in 14 instead. >>>>>>> If your risk assessment is different and you would prefer moving >>>>>>> the transition higher up (and flipping some conditions) instead, >>>>>>> I am totally up for that too though, and I do see where you are >>>>>>> coming from. >>>>>>> >>>>>> So if we fail, it means that we lost a race to a "deader" state, >>>>>> and assuming this is the only path to the deader state, wouldn't >>>>>> that also mean that #1, #2, and #3 would have already been done >>>>>> by the winning thread?? If so, that makes me feel better about >>>>>> bailing out in the middle, but I'm still not 100% convinced, >>>>>> unless we can assert that 1-3 already happened.? Do you have a >>>>>> prototype of what moving the transition higher up would look like? >>>>> As a matter of fact I do. Here is a webrev: >>>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >>>>> >>>>> I kind of like it. What do you think? >>>>> >>>>> Thanks, >>>>> /Erik >>>>> >>>>>> dl >>>>>> >>>>>>> BTW, I have tested this change through hs-tier1-7, and it looks >>>>>>> good. >>>>>>> >>>>>>> Thanks a lot Dean for reviewing this code. >>>>>>> >>>>>>> /Erik >>>>>>> >>>>>>>> dl >>>>>>>> >> From erik.osterlund at oracle.com Thu Jul 11 20:13:06 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Thu, 11 Jul 2019 16:13:06 -0400 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com>

Message-ID: Hi Dean, On 2019-07-11 15:29, dean.long at oracle.com wrote: > On 7/11/19 6:53 AM, Erik ?sterlund wrote: >> Hi Dean, >> >> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>> Hi Dean, >>>> >>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>>> (where unlinking code belongs), instead of after the handshake >>>>>> (intended for deleting things safely unlinked). >>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>>> racing between make_not_entrant and make_unloaded, but I still >>>>>> want the monotonicity guards to make this code more robust. >>>>> >>>>> I see where you added OSR nmethod unlinking, but not where you >>>>> removed it, so it's not obvious it was a "move". >>>> >>>> Sorry, bad wording on my part. I added OSR nmethod unlinking before >>>> the global handshake is run. After the handshake, we call >>>> make_unloaded() on the same is_unloading() nmethods. That function >>>> "tries" to unlink the OSR nmethod, but will just not do it as it's >>>> already unlinked at that point. So in a way, I didn't remove the >>>> call to unlink the OSR nmethod there, it just won't do anything. I >>>> preferred structuring it that way instead of trying to optimize away >>>> the call to unlink the OSR nmethod when making it unloaded, but only >>>> for the concurrent case. It seemed to introduce more conditional >>>> magic than it was worth. >>>> So in practice, the unlinking of OSR nmethods has moved for >>>> concurrent unloading to before the handshake. >>>> >>> >>> OK, in that case, could you add a little information to the >>> "Invalidate the osr nmethod only once" comment so that in the future >>> someone isn't tempted to remove the code as redundant? >> >> Sure. >> > > I meant the one in zNMethod.cpp :-) Okay, will put another comment in there once we agree on a direction on the next point. > >>>>> Would it make sense for nmethod::unlink_from_method() to do the OSR >>>>> unlinking, or to assert that it has already been done? >>>> >>>> An earlier version of this patch tried to do that. It is indeed >>>> possible. But it requires changing lock ranks of the OSR nmethod >>>> lock to special - 1 and moving around a bunch of code as this >>>> function is also called both when making nmethods not_entrant, >>>> zombie, and unlinking them in that case. For the first two, we >>>> conditionally unlink the nmethod based on the current state (which >>>> is the old state), whereas when I move it, the current state is the >>>> new state. So I had to change things around a bit more to figure out >>>> the right condition when to unlink it that works for all 3 callers. >>>> In the end, since this is going to 13, I thought it's more important >>>> to minimize the risk as much as I can, and leave such refactorings >>>> to 14. >>>> >>> >>> OK. >>> >>>>> The new bailout in the middle of >>>>> nmethod::make_not_entrant_or_zombie() worries me a little, because >>>>> the code up to that point has side-effects, and we could be bailing >>>>> out in an unexpected state. >>>> >>>> Correct. In an earlier version of this patch, I moved the transition >>>> to before the side effects. But a bunch of code is using the current >>>> nmethod state to determine what to do, and that current state >>>> changed from the old to the new state. In particular, we >>>> conditionally patch in the jump based on the current (old) state, >>>> and we conditionally increment decompile count based on the current >>>> (old) state. So I ended up having to rewrite more code than I wanted >>>> to for a patch going into 13, and convince myself that I had not >>>> implicitly messed something up. It felt safer to reason about the 3 >>>> side effects up until the transitioning point: >>>> >>>> 1) Patching in the jump into VEP. Any state more dead than the >>>> current transition, would still want that jump to be there. >>>> 2) Incrementing decompile count when making it not_entrant. Seems in >>>> order to do regardless, as we had an actual request to make the >>>> nmethod not entrant because it was bad somehow. >>>> 3) Marking it as seen on stack when making it not_entrant. This will >>>> only make can_convert_to_zombie start returning false, which is >>>> harmless in general. Also, as both transitions to zombie and >>>> not_entrant are performed under the Patching_lock, the only possible >>>> race is with make_unloaded. And those nmethods are is_unloading(), >>>> which also makes can_convert_to_zombie return false (in a not racy >>>> fashion). So it would essentially make no observable difference to >>>> any single call to can_convert_to_zombie(). >>>> >>>> In summary, #1 and #3 don't really observably change the state of >>>> the system, and #2 is completely harmless and probably wanted. >>>> Therefore I found that moving these things around and finding out >>>> where we use the current state(), as well as rewriting it, seemed >>>> like a slightly scarier change for 13 to me. >>>> >>>> So in general, there is some refactoring that could be done (and I >>>> have tried it) to make this nicer. But I want to minimize the risk >>>> for 13 as much as possible, and perform any risky refactorings in 14 >>>> instead. >>>> If your risk assessment is different and you would prefer moving the >>>> transition higher up (and flipping some conditions) instead, I am >>>> totally up for that too though, and I do see where you are coming from. >>>> >>> >>> So if we fail, it means that we lost a race to a "deader" state, and >>> assuming this is the only path to the deader state, wouldn't that >>> also mean that #1, #2, and #3 would have already been done by the >>> winning thread?? If so, that makes me feel better about bailing out >>> in the middle, but I'm still not 100% convinced, unless we can assert >>> that 1-3 already happened.? Do you have a prototype of what moving >>> the transition higher up would look like? >> >> As a matter of fact I do. Here is a webrev: >> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >> >> I kind of like it. What do you think? >> > > Now the code after the transition that says "Must happen before state > change" worries me. Yes indeed. This is why I was hesitant to move the transition up. It moves past 3 things that implicitly depends on the current state. This one is extra scary. It actually introduces a race condition that could crash the VM (because can_convert_to_zombie() may observe an nmethod that just turned not_entrant, without being marked on stack). I think this shows (IMO) that trying to move the transition up has 3 problems, and this one is particularly hard to dodge. I think it really has to be before the transition. Would you agree now that keeping the transition where it was is less risky (as I did originally) and convincing ourselves that the 3 "side effects" are not really observable side effects in the system, as I reasoned about earlier? If not, I can try to move the mark-on-stack up above the transition. > Can you remind me again what kind of race can make > the state transition fail here?? Did you happen to draw a state diagram > while learning this code? :-) Yes indeed. Would you like the long story or the short story? Here is the short story: the only known race is between one thread making an nmethod not_entrant and the GC thread making it unloaded. That make_not_entrant is the only transition that can fail. Previously I relied on there never existing any concurrent calls to make_not_entrant() and make_unloaded(). The OSR nmethod was caught as a special case (isn't it always...) where this could happen, violating monotonicity. But I think it feels safer to enforce the monotonicity of transitions in the actual code that performs the transitions, instead of relying on knowledge of the relationships between all state transitioning calls, implicitly ensuring monotonicity. Thanks, /Erik > dl > >> Thanks, >> /Erik >> >>> dl >>> >>>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>>> >>>> Thanks a lot Dean for reviewing this code. >>>> >>>> /Erik >>>> >>>>> dl >>>>> >>>> >>> > From Pengfei.Li at arm.com Fri Jul 12 03:27:55 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Fri, 12 Jul 2019 03:27:55 +0000 Subject: RFR(trivial): 8227512: [TESTBUG] Fix JTReg javac test failures with Graal Message-ID: Hi, Please help review this small fix. JBS: https://bugs.openjdk.java.net/browse/JDK-8227512 Webrev: http://cr.openjdk.java.net/~pli/rfr/8227512/ JTReg javac tests * langtools/tools/javac/modules/InheritRuntimeEnvironmentTest.java * langtools/tools/javac/file/LimitedImage.java failed when Graal is used as JVMCI compiler. These cases test javac behavior with the condition that observable modules are limited. But Graal is unable to be found in the limited module scope. This fixes these two tests by adding "jdk.internal.vm.compiler" into the limited modules. -- Thanks, Pengfei From matthias.baesken at sap.com Fri Jul 12 07:48:32 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 07:48:32 +0000 Subject: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] Message-ID: Hello , when looking into the recent xlc16 / xlclang warnings I came across those 3 : /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: 'this' pointer cannot be null in well-defined C++ code; comparison may be assumed to always evaluate to true [-Wtautological-undefined-compare] if( this != NULL ) { ^~~~ ~~~~ /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: 'this' pointer cannot be null in well-defined C++ code; comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] if( this == NULL ) return; /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' pointer cannot be null in well-defined C++ code; comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] if( this == NULL ) return os::strdup("{no set}"); Do you think the NULL-checks can be removed or is there still some value in doing them ? Best regards, Matthias From erik.osterlund at oracle.com Fri Jul 12 08:22:04 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 12 Jul 2019 10:22:04 +0200 Subject: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] In-Reply-To: References: Message-ID: <55e8bddf-3228-0fd7-3639-cc9bc920e2c5@oracle.com> Hi Matthias, Removing such NULL checks seems like a good idea in general due to the undefined behaviour. Worth mentioning though that there are some tricky ones, like in markOopDesc* where this == NULL means that the mark word has the "inflating" value. So we explicitly check if this == NULL and hope the compiler will not elide the check. Just gonna drop that one here and run for it. Thanks, /Erik On 2019-07-12 09:48, Baesken, Matthias wrote: > Hello , when looking into the recent xlc16 / xlclang warnings I came across those 3 : > > /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: 'this' pointer cannot be null in well-defined C++ code; > comparison may be assumed to always evaluate to true [-Wtautological-undefined-compare] > if( this != NULL ) { > ^~~~ ~~~~ > > /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: 'this' pointer cannot be null in well-defined C++ code; > comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] > if( this == NULL ) return; > > /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' pointer cannot be null in well-defined C++ code; > comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] > if( this == NULL ) return os::strdup("{no set}"); > > > Do you think the NULL-checks can be removed or is there still some value in doing them ? > > Best regards, Matthias From matthias.baesken at sap.com Fri Jul 12 10:34:01 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 10:34:01 +0000 Subject: RFR [XS] : 8227630: adjust format specifiers in loadlib_aix.cpp Message-ID: Hello, please review this very small fix for AIX . Currently we use %llu printf-format specifiers at 2 places in loadlib_aix.cpp where we output size_t variables . This leads to warnings with xlc16/xlclang : /nightly/jdk/src/hotspot/os/aix/loadlib_aix.cpp:210:48: warning: format specifies type 'unsigned long long' but the argument has type 'size_t' (aka 'unsigned long') [-Wformat] trcVerbose("loadquery buffer size is %llu.", buflen); ~~~~ ^~~~~~ %zu We can use correct format specifiers (casting might be another option). Bug/webrev : https://bugs.openjdk.java.net/browse/JDK-8227630 http://cr.openjdk.java.net/~mbaesken/webrevs/8227630.0/ Thanks, Matthias From shade at redhat.com Fri Jul 12 10:48:30 2019 From: shade at redhat.com (Aleksey Shipilev) Date: Fri, 12 Jul 2019 12:48:30 +0200 Subject: RFR [XS] : 8227630: adjust format specifiers in loadlib_aix.cpp In-Reply-To: References: Message-ID: <0f0c1019-c043-9770-24b8-28d89a33bf11@redhat.com> On 7/12/19 12:34 PM, Baesken, Matthias wrote: > http://cr.openjdk.java.net/~mbaesken/webrevs/8227630.0/ Looks fine and trivial. -- Thanks, -Aleksey From matthias.baesken at sap.com Fri Jul 12 11:09:08 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 11:09:08 +0000 Subject: RFR: 8227631: Adjust AIX version check Message-ID: Hello, please review this small AIX related change . For some time, we do not support AIX 5.3 any more. See (where AIX 7.1 or 7.2 is the supported build platform since OpenJDK11) : https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms The currently used xlc 16.1 (XL C/C++ Compilers) even needs minimum AIX 7.1 to run , see http://www-01.ibm.com/support/docview.wss?uid=swg21326972 (and compiling for older releases on 7.1 / 7.2 would not work easily , at least not "out of the box" to my knowledge .) So we should adjust the minimum OS version check done in os_aix.cpp in os::Aix::initialize_os_info() . Additionally the change removes a couple of warnings [-Wwritable-strings category] . /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4081:22: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] char *name_str = "unknown OS"; ^ /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4089:18: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] name_str = "OS/400 (pase)"; ^ /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4100:18: warning: ISO C++11 does not allow conversion from string literal to 'char *' [-Wwritable-strings] name_str = "AIX"; Bug/webrev : https://bugs.openjdk.java.net/browse/JDK-8227631 http://cr.openjdk.java.net/~mbaesken/webrevs/8227631.0/ Thanks, Matthias From martin.doerr at sap.com Fri Jul 12 11:57:36 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 12 Jul 2019 11:57:36 +0000 Subject: RFR [XS] : 8227630: adjust format specifiers in loadlib_aix.cpp In-Reply-To: References: Message-ID: Hi Matthias, looks good to me. Best regards, Martin > -----Original Message----- > From: hotspot-dev On Behalf Of > Baesken, Matthias > Sent: Freitag, 12. Juli 2019 12:34 > To: 'hotspot-dev at openjdk.java.net' > Subject: RFR [XS] : 8227630: adjust format specifiers in loadlib_aix.cpp > > Hello, please review this very small fix for AIX . > > Currently we use %llu printf-format specifiers at 2 places in loadlib_aix.cpp > where we output size_t variables . > This leads to warnings with xlc16/xlclang : > > /nightly/jdk/src/hotspot/os/aix/loadlib_aix.cpp:210:48: warning: format > specifies type 'unsigned long long' but the argument > has type 'size_t' (aka 'unsigned long') [-Wformat] > trcVerbose("loadquery buffer size is %llu.", buflen); > ~~~~ ^~~~~~ > %zu > > We can use correct format specifiers (casting might be another option). > > > > Bug/webrev : > > https://bugs.openjdk.java.net/browse/JDK-8227630 > > http://cr.openjdk.java.net/~mbaesken/webrevs/8227630.0/ > > > Thanks, Matthias From harold.seigel at oracle.com Fri Jul 12 12:14:55 2019 From: harold.seigel at oracle.com (Harold Seigel) Date: Fri, 12 Jul 2019 08:14:55 -0400 Subject: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] In-Reply-To: <55e8bddf-3228-0fd7-3639-cc9bc920e2c5@oracle.com> References: <55e8bddf-3228-0fd7-3639-cc9bc920e2c5@oracle.com> Message-ID: The functions that compare 'this' to NULL could be changed from instance to static functions where 'this' is explicitly passed as a parameter.? Then you could keep the equivalent NULL checks. Harold On 7/12/2019 4:22 AM, Erik ?sterlund wrote: > Hi Matthias, > > Removing such NULL checks seems like a good idea in general due to the > undefined behaviour. > Worth mentioning though that there are some tricky ones, like in > markOopDesc* where this == NULL > means that the mark word has the "inflating" value. So we explicitly > check if this == NULL and > hope the compiler will not elide the check. Just gonna drop that one > here and run for it. > > Thanks, > /Erik > > On 2019-07-12 09:48, Baesken, Matthias wrote: >> Hello , when looking? into? the? recent xlc16 / xlclang?? warnings I >> came? across? those? 3 : >> >> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: >> 'this' pointer cannot be null in well-defined C++ code; >> comparison may be assumed to always evaluate to true >> [-Wtautological-undefined-compare] >> ?? if( this != NULL ) { >> ?????? ^~~~??? ~~~~ >> >> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: >> 'this' pointer cannot be null in well-defined C++ code; >> comparison may be assumed to always evaluate to false >> [-Wtautological-undefined-compare] >> ?? if( this == NULL ) return; >> >> /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' >> pointer cannot be null in well-defined C++ code; >> comparison may be assumed to always evaluate to false >> [-Wtautological-undefined-compare] >> ?? if( this == NULL ) return os::strdup("{no set}"); >> >> >> Do you think the? NULL-checks can be removed or is there still some >> value in doing them ? >> >> Best regards, Matthias > From christoph.langer at sap.com Fri Jul 12 12:17:22 2019 From: christoph.langer at sap.com (Langer, Christoph) Date: Fri, 12 Jul 2019 12:17:22 +0000 Subject: RFR: 8227631: Adjust AIX version check In-Reply-To: References: Message-ID: Hi Matthias, looks good. This might even be something to push to JDK13 still (if you do it within the next few days). Best regards Christoph > -----Original Message----- > From: hotspot-dev On Behalf Of > Baesken, Matthias > Sent: Freitag, 12. Juli 2019 13:09 > To: 'hotspot-dev at openjdk.java.net' ; > 'ppc-aix-port-dev at openjdk.java.net' > Subject: RFR: 8227631: Adjust AIX version check > > Hello, please review this small AIX related change . > > For some time, we do not support AIX 5.3 any more. > See (where AIX 7.1 or 7.2 is the supported build platform since OpenJDK11) : > > https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms > > The currently used xlc 16.1 (XL C/C++ Compilers) even needs minimum AIX > 7.1 to run , see > > http://www-01.ibm.com/support/docview.wss?uid=swg21326972 > > (and compiling for older releases on 7.1 / 7.2 would not work easily , at least > not "out of the box" to my knowledge .) > > So we should adjust the minimum OS version check done in os_aix.cpp in > os::Aix::initialize_os_info() . > > > Additionally the change removes a couple of warnings [-Wwritable-strings > category] . > > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4081:22: warning: ISO C++11 > does not allow conversion from string literal to 'char *' [-Wwritable-strings] > char *name_str = "unknown OS"; > ^ > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4089:18: warning: ISO C++11 > does not allow conversion from string literal to 'char *' [-Wwritable-strings] > name_str = "OS/400 (pase)"; > ^ > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4100:18: warning: ISO C++11 > does not allow conversion from string literal to 'char *' [-Wwritable-strings] > name_str = "AIX"; > > > > Bug/webrev : > > https://bugs.openjdk.java.net/browse/JDK-8227631 > > http://cr.openjdk.java.net/~mbaesken/webrevs/8227631.0/ > > Thanks, Matthias From matthias.baesken at sap.com Fri Jul 12 12:30:35 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 12:30:35 +0000 Subject: RFR: 8227633: avoid comparing this pointers to NULL - was : RE: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] Message-ID: Hello Erik, thanks for the input . We still have a few places in the HS codebase where "this" is compared to NULL. When compiling with xlc16 / xlclang we get these warnings : warning: 'this' pointer cannot be null in well-defined C++ code; comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] so those places should be removed where possible. I adjusted 3 checks , please review ! Bug/webrev : http://cr.openjdk.java.net/~mbaesken/webrevs/8227633.0/ https://bugs.openjdk.java.net/browse/JDK-8227633 Thanks , Matthias > -----Original Message----- > From: Erik ?sterlund > Sent: Freitag, 12. Juli 2019 10:22 > To: Baesken, Matthias ; 'hotspot- > dev at openjdk.java.net' > Subject: Re: this-pointer NULL-checks in hotspot codebase [-Wtautological- > undefined-compare] > > Hi Matthias, > > Removing such NULL checks seems like a good idea in general due to the > undefined behaviour. > Worth mentioning though that there are some tricky ones, like in > markOopDesc* where this == NULL > means that the mark word has the "inflating" value. So we explicitly > check if this == NULL and > hope the compiler will not elide the check. Just gonna drop that one > here and run for it. > > Thanks, > /Erik > > On 2019-07-12 09:48, Baesken, Matthias wrote: > > Hello , when looking into the recent xlc16 / xlclang warnings I came > across those 3 : > > > > /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: 'this' > pointer cannot be null in well-defined C++ code; > > comparison may be assumed to always evaluate to true [-Wtautological- > undefined-compare] > > if( this != NULL ) { > > ^~~~ ~~~~ > > > > /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: 'this' > pointer cannot be null in well-defined C++ code; > > comparison may be assumed to always evaluate to false [-Wtautological- > undefined-compare] > > if( this == NULL ) return; > > > > /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' pointer > cannot be null in well-defined C++ code; > > comparison may be assumed to always evaluate to false [-Wtautological- > undefined-compare] > > if( this == NULL ) return os::strdup("{no set}"); > > > > > > Do you think the NULL-checks can be removed or is there still some value > in doing them ? > > > > Best regards, Matthias From coleen.phillimore at oracle.com Fri Jul 12 12:48:45 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Fri, 12 Jul 2019 08:48:45 -0400 Subject: RFR: 8227633: avoid comparing this pointers to NULL - was : RE: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] In-Reply-To: References: Message-ID: http://cr.openjdk.java.net/~mbaesken/webrevs/8227633.0/src/hotspot/share/adlc/formssel.cpp.udiff.html + if (mnode) mnode->count_instr_names(names); We also try to avoid implicit checks against null for pointers so change this to: + if (mnode != NULL) mnode->count_instr_names(names); I didn't see that you added a check for NULL in the callers of print_opcodes or setstr.? Can those callers never pass NULL? We've done a few passes to clean up these this == NULL checks. Thank you for doing this! Coleen On 7/12/19 8:30 AM, Baesken, Matthias wrote: > Hello Erik, thanks for the input . > > We still have a few places in the HS codebase where "this" is compared to NULL. > When compiling with xlc16 / xlclang we get these warnings : > > warning: 'this' pointer cannot be null in well-defined C++ code; comparison may be assumed to always evaluate to false [-Wtautological-undefined-compare] > > so those places should be removed where possible. > > > I adjusted 3 checks , please review ! > > > > Bug/webrev : > > http://cr.openjdk.java.net/~mbaesken/webrevs/8227633.0/ > > https://bugs.openjdk.java.net/browse/JDK-8227633 > > Thanks , Matthias > > >> -----Original Message----- >> From: Erik ?sterlund >> Sent: Freitag, 12. Juli 2019 10:22 >> To: Baesken, Matthias ; 'hotspot- >> dev at openjdk.java.net' >> Subject: Re: this-pointer NULL-checks in hotspot codebase [-Wtautological- >> undefined-compare] >> >> Hi Matthias, >> >> Removing such NULL checks seems like a good idea in general due to the >> undefined behaviour. >> Worth mentioning though that there are some tricky ones, like in >> markOopDesc* where this == NULL >> means that the mark word has the "inflating" value. So we explicitly >> check if this == NULL and >> hope the compiler will not elide the check. Just gonna drop that one >> here and run for it. >> >> Thanks, >> /Erik >> >> On 2019-07-12 09:48, Baesken, Matthias wrote: >>> Hello , when looking into the recent xlc16 / xlclang warnings I came >> across those 3 : >>> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: 'this' >> pointer cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to true [-Wtautological- >> undefined-compare] >>> if( this != NULL ) { >>> ^~~~ ~~~~ >>> >>> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: 'this' >> pointer cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to false [-Wtautological- >> undefined-compare] >>> if( this == NULL ) return; >>> >>> /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' pointer >> cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to false [-Wtautological- >> undefined-compare] >>> if( this == NULL ) return os::strdup("{no set}"); >>> >>> >>> Do you think the NULL-checks can be removed or is there still some value >> in doing them ? >>> Best regards, Matthias From matthias.baesken at sap.com Fri Jul 12 13:01:31 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Fri, 12 Jul 2019 13:01:31 +0000 Subject: RFR: 8227633: avoid comparing this pointers to NULL Message-ID: > > + if (mnode) mnode->count_instr_names(names); > > > We also try to avoid implicit checks against null for pointers so change > this to: > Hi Coleen, sure I can change this ; I just found a lot of places in formssel.cpp where if (ptr) { ... } is used . > > I didn't see that you added a check for NULL in the callers of > print_opcodes or setstr.? Can those callers never pass NULL? > It looked to me that the setstr is never really called and void Set::print() const { ... } where it is used is used for debug printing - did I miss something ? Regarding print_opcodes , there probably the NULL checks at caller palces should better be added . Regards, Matthias > ------------------------------ > > Message: 4 > Date: Fri, 12 Jul 2019 08:48:45 -0400 > From: coleen.phillimore at oracle.com > To: hotspot-dev at openjdk.java.net > Subject: Re: RFR: 8227633: avoid comparing this pointers to NULL - was > : RE: this-pointer NULL-checks in hotspot codebase > [-Wtautological-undefined-compare] > Message-ID: > Content-Type: text/plain; charset=utf-8; format=flowed > > > http://cr.openjdk.java.net/~mbaesken/webrevs/8227633.0/src/hotspot/sha > re/adlc/formssel.cpp.udiff.html > > + if (mnode) mnode->count_instr_names(names); > > > We also try to avoid implicit checks against null for pointers so change > this to: > > + if (mnode != NULL) mnode->count_instr_names(names); > > I didn't see that you added a check for NULL in the callers of > print_opcodes or setstr.? Can those callers never pass NULL? > > We've done a few passes to clean up these this == NULL checks. Thank you > for doing this! > > Coleen > > From erik.osterlund at oracle.com Fri Jul 12 14:46:43 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Fri, 12 Jul 2019 16:46:43 +0200 Subject: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] In-Reply-To: References: <55e8bddf-3228-0fd7-3639-cc9bc920e2c5@oracle.com> Message-ID: Hi Harold, It's worse than that though, unfortunately. You are not allowed to have "this" equal to NULL, whether you perform such explicit NULL comparisons or not. The implication is that as long as "inflating" is NULL, we kind of can't use any of the functions on markOop and hence? mustrewrite pretty much all uses of markOop to do something else. The same goes for things like Register, where rax == NULL. To be compliant, we would similarly have to rewrite all uses of Register. In other words, if we are to really hunt down uses of this == NULL and remove them, we will find ourselves with a mountain of work. Again, just gonna drop that here and run. /Erik On 2019-07-12 14:14, Harold Seigel wrote: > The functions that compare 'this' to NULL could be changed from > instance to static functions where 'this' is explicitly passed as a > parameter.? Then you could keep the equivalent NULL checks. > > Harold > > On 7/12/2019 4:22 AM, Erik ?sterlund wrote: >> Hi Matthias, >> >> Removing such NULL checks seems like a good idea in general due to >> the undefined behaviour. >> Worth mentioning though that there are some tricky ones, like in >> markOopDesc* where this == NULL >> means that the mark word has the "inflating" value. So we explicitly >> check if this == NULL and >> hope the compiler will not elide the check. Just gonna drop that one >> here and run for it. >> >> Thanks, >> /Erik >> >> On 2019-07-12 09:48, Baesken, Matthias wrote: >>> Hello , when looking? into? the? recent xlc16 / xlclang?? warnings I >>> came? across? those? 3 : >>> >>> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:1729:7: warning: >>> 'this' pointer cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to true >>> [-Wtautological-undefined-compare] >>> ?? if( this != NULL ) { >>> ?????? ^~~~??? ~~~~ >>> >>> /nightly/jdk/src/hotspot/share/adlc/formssel.cpp:3416:7: warning: >>> 'this' pointer cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to false >>> [-Wtautological-undefined-compare] >>> ?? if( this == NULL ) return; >>> >>> /nightly/jdk/src/hotspot/share/libadt/set.cpp:46:7: warning: 'this' >>> pointer cannot be null in well-defined C++ code; >>> comparison may be assumed to always evaluate to false >>> [-Wtautological-undefined-compare] >>> ?? if( this == NULL ) return os::strdup("{no set}"); >>> >>> >>> Do you think the? NULL-checks can be removed or is there still some >>> value in doing them ? >>> >>> Best regards, Matthias >> From fweimer at redhat.com Fri Jul 12 15:36:32 2019 From: fweimer at redhat.com (Florian Weimer) Date: Fri, 12 Jul 2019 17:36:32 +0200 Subject: this-pointer NULL-checks in hotspot codebase [-Wtautological-undefined-compare] In-Reply-To: (Matthias Baesken's message of "Fri, 12 Jul 2019 07:48:32 +0000") References: Message-ID: <87blxzz2m7.fsf@oldenburg2.str.redhat.com> * Matthias Baesken: > Do you think the NULL-checks can be removed or is there still some > value in doing them ? I believe you need to build OpenJDK in a mode where the compiler assumes that the this pointer can be null: # These flags are required for GCC 6 builds as undefined behaviour in # OpenJDK code runs afoul of the more aggressive versions of these # optimisations. Notably, value range propagation now assumes that # the this pointer of C++ member functions is non-null. Thanks, Florian From sgehwolf at redhat.com Fri Jul 12 18:08:18 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Fri, 12 Jul 2019 20:08:18 +0200 Subject: RFR: 8227642: [TESTBUG] Make docker tests podman compatible Message-ID: <32c8a1934bf07e4c9c6a961e60dcb7abd9931fe1.camel@redhat.com> Hi, There is an alternative container engine which is being used by Fedora and RHEL 8, called podman[1]. It's mostly compatible with docker. It looks like OpenJDK docker tests can be made podman compatible with a few little tweaks. One "interesting" one is to not assert "Successfully built" in the build output but only rely on the exit code, which seems to be OK for my testing. Interestingly the test would be skipped in that case. Bug: https://bugs.openjdk.java.net/browse/JDK-8227642 webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8227642/01/webrev/ Adjustments I've done: * Don't assert "Successfully built" in image build output[2]. * Add /usr/sbin to PATH as the podman binary relies on iptables for it to work which is in /usr/sbin on Fedora * Allow for Metrics.getCpuSystemUsage() and Metrics.getCpuUserUsage() to be equal to the previous value. I've found those counters to be slowly increasing, which made the tests unreliable. Testing: Running docker tests with docker as engine. Did the same with podman as engine via -Djdk.test.docker.command=podman on Linux x86_64. Both passed (non-trivially). Thoughts? Thanks, Severin [1] https://podman.io/ [2] Image builds with podman look like ("COMMIT" over "Successfully built"): STEP 1: FROM fedora:29 STEP 2: RUN dnf install -y java-11-openjdk-devel && dnf clean all --> Using cache 96f8b1a0dfe7dba581a64fc67a27002ddf52e032af55f9ddc765182a690afd9d STEP 3: COPY TestMetrics.class TestMetrics.java /opt/ 269042160f7a4e6a06789cd19640ea658a8f941bc53de0fd40a574dc3bdb49a8 STEP 4: CMD /usr/lib/jvm/java-11-openjdk/bin/java -cp /opt --add-modules java.base --add-exports java.base/jdk.internal.platform=ALL-UNNAMED TestMetrics STEP 5: COMMIT fedora-metrics-11 d749088d6ce4510f212820ad4eca55a9b05e5c5c245f2372b6cfe91926e8cd7e From dean.long at oracle.com Fri Jul 12 21:50:15 2019 From: dean.long at oracle.com (dean.long at oracle.com) Date: Fri, 12 Jul 2019 14:50:15 -0700 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com>

Message-ID: On 7/11/19 1:13 PM, Erik ?sterlund wrote: > Hi Dean, > > On 2019-07-11 15:29, dean.long at oracle.com wrote: >> On 7/11/19 6:53 AM, Erik ?sterlund wrote: >>> Hi Dean, >>> >>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>>> Hi Dean, >>>>> >>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>>>> (where unlinking code belongs), instead of after the handshake >>>>>>> (intended for deleting things safely unlinked). >>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>>>> racing between make_not_entrant and make_unloaded, but I still >>>>>>> want the monotonicity guards to make this code more robust. >>>>>> >>>>>> I see where you added OSR nmethod unlinking, but not where you >>>>>> removed it, so it's not obvious it was a "move". >>>>> >>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking >>>>> before the global handshake is run. After the handshake, we call >>>>> make_unloaded() on the same is_unloading() nmethods. That function >>>>> "tries" to unlink the OSR nmethod, but will just not do it as it's >>>>> already unlinked at that point. So in a way, I didn't remove the >>>>> call to unlink the OSR nmethod there, it just won't do anything. I >>>>> preferred structuring it that way instead of trying to optimize >>>>> away the call to unlink the OSR nmethod when making it unloaded, >>>>> but only for the concurrent case. It seemed to introduce more >>>>> conditional magic than it was worth. >>>>> So in practice, the unlinking of OSR nmethods has moved for >>>>> concurrent unloading to before the handshake. >>>>> >>>> >>>> OK, in that case, could you add a little information to the >>>> "Invalidate the osr nmethod only once" comment so that in the >>>> future someone isn't tempted to remove the code as redundant? >>> >>> Sure. >>> >> >> I meant the one in zNMethod.cpp :-) > > Okay, will put another comment in there once we agree on a direction > on the next point. > >> >>>>>> Would it make sense for nmethod::unlink_from_method() to do the >>>>>> OSR unlinking, or to assert that it has already been done? >>>>> >>>>> An earlier version of this patch tried to do that. It is indeed >>>>> possible. But it requires changing lock ranks of the OSR nmethod >>>>> lock to special - 1 and moving around a bunch of code as this >>>>> function is also called both when making nmethods not_entrant, >>>>> zombie, and unlinking them in that case. For the first two, we >>>>> conditionally unlink the nmethod based on the current state (which >>>>> is the old state), whereas when I move it, the current state is >>>>> the new state. So I had to change things around a bit more to >>>>> figure out the right condition when to unlink it that works for >>>>> all 3 callers. In the end, since this is going to 13, I thought >>>>> it's more important to minimize the risk as much as I can, and >>>>> leave such refactorings to 14. >>>>> >>>> >>>> OK. >>>> >>>>>> The new bailout in the middle of >>>>>> nmethod::make_not_entrant_or_zombie() worries me a little, >>>>>> because the code up to that point has side-effects, and we could >>>>>> be bailing out in an unexpected state. >>>>> >>>>> Correct. In an earlier version of this patch, I moved the >>>>> transition to before the side effects. But a bunch of code is >>>>> using the current nmethod state to determine what to do, and that >>>>> current state changed from the old to the new state. In >>>>> particular, we conditionally patch in the jump based on the >>>>> current (old) state, and we conditionally increment decompile >>>>> count based on the current (old) state. So I ended up having to >>>>> rewrite more code than I wanted to for a patch going into 13, and >>>>> convince myself that I had not implicitly messed something up. It >>>>> felt safer to reason about the 3 side effects up until the >>>>> transitioning point: >>>>> >>>>> 1) Patching in the jump into VEP. Any state more dead than the >>>>> current transition, would still want that jump to be there. >>>>> 2) Incrementing decompile count when making it not_entrant. Seems >>>>> in order to do regardless, as we had an actual request to make the >>>>> nmethod not entrant because it was bad somehow. >>>>> 3) Marking it as seen on stack when making it not_entrant. This >>>>> will only make can_convert_to_zombie start returning false, which >>>>> is harmless in general. Also, as both transitions to zombie and >>>>> not_entrant are performed under the Patching_lock, the only >>>>> possible race is with make_unloaded. And those nmethods are >>>>> is_unloading(), which also makes can_convert_to_zombie return >>>>> false (in a not racy fashion). So it would essentially make no >>>>> observable difference to any single call to can_convert_to_zombie(). >>>>> >>>>> In summary, #1 and #3 don't really observably change the state of >>>>> the system, and #2 is completely harmless and probably wanted. >>>>> Therefore I found that moving these things around and finding out >>>>> where we use the current state(), as well as rewriting it, seemed >>>>> like a slightly scarier change for 13 to me. >>>>> >>>>> So in general, there is some refactoring that could be done (and I >>>>> have tried it) to make this nicer. But I want to minimize the risk >>>>> for 13 as much as possible, and perform any risky refactorings in >>>>> 14 instead. >>>>> If your risk assessment is different and you would prefer moving >>>>> the transition higher up (and flipping some conditions) instead, I >>>>> am totally up for that too though, and I do see where you are >>>>> coming from. >>>>> >>>> >>>> So if we fail, it means that we lost a race to a "deader" state, >>>> and assuming this is the only path to the deader state, wouldn't >>>> that also mean that #1, #2, and #3 would have already been done by >>>> the winning thread?? If so, that makes me feel better about bailing >>>> out in the middle, but I'm still not 100% convinced, unless we can >>>> assert that 1-3 already happened.? Do you have a prototype of what >>>> moving the transition higher up would look like? >>> >>> As a matter of fact I do. Here is a webrev: >>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >>> >>> I kind of like it. What do you think? >>> >> >> Now the code after the transition that says "Must happen before state >> change" worries me. > > Yes indeed. This is why I was hesitant to move the transition up. It > moves past 3 things that implicitly depends on the current state. This > one is extra scary. It actually introduces a race condition that could > crash the VM (because can_convert_to_zombie() may observe an nmethod > that just turned not_entrant, without being marked on stack). > > I think this shows (IMO) that trying to move the transition up has 3 > problems, and this one is particularly hard to dodge. I think it > really has to be before the transition. > > Would you agree now that keeping the transition where it was is less > risky (as I did originally) Yes. > and convincing ourselves that the 3 "side effects" are not really > observable side effects in the system, as I reasoned about earlier? > yes, but I'm hoping we can do more than just reason, like adding asserts.? More below... > If not, I can try to move the mark-on-stack up above the transition. > >> Can you remind me again what kind of race can make the state >> transition fail here?? Did you happen to draw a state diagram while >> learning this code? :-) > > Yes indeed. Would you like the long story or the short story? Here is > the short story: the only known race is between one thread making an > nmethod not_entrant and the GC thread making it unloaded. That > make_not_entrant is the only transition that can fail. Previously I > relied on there never existing any concurrent calls to > make_not_entrant() and make_unloaded(). The OSR nmethod was caught as > a special case (isn't it always...) where this could happen, violating > monotonicity. But I think it feels safer to enforce the monotonicity > of transitions in the actual code that performs the transitions, > instead of relying on knowledge of the relationships between all state > transitioning calls, implicitly ensuring monotonicity. > Can we enforce in_use --> not_entrant --> unloaded --> zombie, and not allow jumps or skipped states?? Then we can assert that cleanup from a less-dead state has already been done.? So if make_not_entrant failed, it would assert that all the cleanup that would have been done by a successful make_not_entrant has already been done. dl > Thanks, > /Erik > >> dl >> >>> Thanks, >>> /Erik >>> >>>> dl >>>> >>>>> BTW, I have tested this change through hs-tier1-7, and it looks good. >>>>> >>>>> Thanks a lot Dean for reviewing this code. >>>>> >>>>> /Erik >>>>> >>>>>> dl >>>>>> >>>>> >>>> >> From mikhailo.seledtsov at oracle.com Fri Jul 12 22:19:49 2019 From: mikhailo.seledtsov at oracle.com (mikhailo.seledtsov at oracle.com) Date: Fri, 12 Jul 2019 15:19:49 -0700 Subject: RFR: 8227642: [TESTBUG] Make docker tests podman compatible In-Reply-To: <32c8a1934bf07e4c9c6a961e60dcb7abd9931fe1.camel@redhat.com> References: <32c8a1934bf07e4c9c6a961e60dcb7abd9931fe1.camel@redhat.com> Message-ID: <5bc3ac00-6ac9-99aa-052d-0a4aa6b04f8f@oracle.com> Hi Severin, ? The change looks good to me. Thank you for adding support for Podman container technology. Testing: I ran both HotSpot and JDK container tests with your patch; tests executed on Oracle Linux 7.6 using default container engine (Docker): ??? test/hotspot/jtreg/containers/?? AND test/jdk/jdk/internal/platform/docker/ All PASS Thanks, Misha On 7/12/19 11:08 AM, Severin Gehwolf wrote: > Hi, > > There is an alternative container engine which is being used by Fedora > and RHEL 8, called podman[1]. It's mostly compatible with docker. It > looks like OpenJDK docker tests can be made podman compatible with a > few little tweaks. One "interesting" one is to not assert "Successfully > built" in the build output but only rely on the exit code, which seems > to be OK for my testing. Interestingly the test would be skipped in > that case. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227642 > webrev: http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8227642/01/webrev/ > > Adjustments I've done: > * Don't assert "Successfully built" in image build output[2]. > * Add /usr/sbin to PATH as the podman binary relies on iptables for it > to work which is in /usr/sbin on Fedora > * Allow for Metrics.getCpuSystemUsage() and Metrics.getCpuUserUsage() > to be equal to the previous value. I've found those counters to be > slowly increasing, which made the tests unreliable. > > Testing: > > Running docker tests with docker as engine. Did the same with podman as > engine via -Djdk.test.docker.command=podman on Linux x86_64. Both > passed (non-trivially). > > Thoughts? > > Thanks, > Severin > > [1] https://podman.io/ > [2] Image builds with podman look > like ("COMMIT" over "Successfully built"): > STEP 1: FROM fedora:29 > STEP 2: RUN dnf install -y java-11-openjdk-devel && dnf clean all > --> Using cache 96f8b1a0dfe7dba581a64fc67a27002ddf52e032af55f9ddc765182a690afd9d > STEP 3: COPY TestMetrics.class TestMetrics.java /opt/ > 269042160f7a4e6a06789cd19640ea658a8f941bc53de0fd40a574dc3bdb49a8 > STEP 4: CMD /usr/lib/jvm/java-11-openjdk/bin/java -cp /opt --add-modules java.base --add-exports java.base/jdk.internal.platform=ALL-UNNAMED TestMetrics > STEP 5: COMMIT fedora-metrics-11 > d749088d6ce4510f212820ad4eca55a9b05e5c5c245f2372b6cfe91926e8cd7e > From Pengfei.Li at arm.com Mon Jul 15 01:38:33 2019 From: Pengfei.Li at arm.com (Pengfei Li (Arm Technology China)) Date: Mon, 15 Jul 2019 01:38:33 +0000 Subject: RFR(trivial): 8227512: [TESTBUG] Fix JTReg javac test failures with Graal In-Reply-To: References: Message-ID: CC compiler-dev -- Thanks, Pengfei > Hi, > > Please help review this small fix. > JBS: https://bugs.openjdk.java.net/browse/JDK-8227512 > Webrev: http://cr.openjdk.java.net/~pli/rfr/8227512/ > > JTReg javac tests > * langtools/tools/javac/modules/InheritRuntimeEnvironmentTest.java > * langtools/tools/javac/file/LimitedImage.java > failed when Graal is used as JVMCI compiler. > > These cases test javac behavior with the condition that observable modules > are limited. But Graal is unable to be found in the limited module scope. This > fixes these two tests by adding "jdk.internal.vm.compiler" into the limited > modules. > > -- > Thanks, > Pengfei From sgehwolf at redhat.com Mon Jul 15 08:04:17 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Mon, 15 Jul 2019 10:04:17 +0200 Subject: RFR: 8227642: [TESTBUG] Make docker tests podman compatible In-Reply-To: <5bc3ac00-6ac9-99aa-052d-0a4aa6b04f8f@oracle.com> References: <32c8a1934bf07e4c9c6a961e60dcb7abd9931fe1.camel@redhat.com> <5bc3ac00-6ac9-99aa-052d-0a4aa6b04f8f@oracle.com> Message-ID: Hi Misha, On Fri, 2019-07-12 at 15:19 -0700, mikhailo.seledtsov at oracle.com wrote: > Hi Severin, > > The change looks good to me. Thank you for adding support for Podman > container technology. > > Testing: I ran both HotSpot and JDK container tests with your patch; > tests executed on Oracle Linux 7.6 using default container engine (Docker): > > test/hotspot/jtreg/containers/ AND > test/jdk/jdk/internal/platform/docker/ > > All PASS Thanks for the review and check! Cheers, Severin > > Thanks, > > Misha > > > On 7/12/19 11:08 AM, Severin Gehwolf wrote: > > Hi, > > > > There is an alternative container engine which is being used by > > Fedora > > and RHEL 8, called podman[1]. It's mostly compatible with docker. > > It > > looks like OpenJDK docker tests can be made podman compatible with > > a > > few little tweaks. One "interesting" one is to not assert > > "Successfully > > built" in the build output but only rely on the exit code, which > > seems > > to be OK for my testing. Interestingly the test would be skipped in > > that case. > > > > Bug: https://bugs.openjdk.java.net/browse/JDK-8227642 > > webrev: > > http://cr.openjdk.java.net/~sgehwolf/webrevs/JDK-8227642/01/webrev/ > > > > Adjustments I've done: > > * Don't assert "Successfully built" in image build output[2]. > > * Add /usr/sbin to PATH as the podman binary relies on iptables > > for it > > to work which is in /usr/sbin on Fedora > > * Allow for Metrics.getCpuSystemUsage() and > > Metrics.getCpuUserUsage() > > to be equal to the previous value. I've found those counters to > > be > > slowly increasing, which made the tests unreliable. > > > > Testing: > > > > Running docker tests with docker as engine. Did the same with > > podman as > > engine via -Djdk.test.docker.command=podman on Linux x86_64. Both > > passed (non-trivially). > > > > Thoughts? > > > > Thanks, > > Severin > > > > [1] https://podman.io/ > > [2] Image builds with podman look > > like ("COMMIT" over "Successfully built"): > > STEP 1: FROM fedora:29 > > STEP 2: RUN dnf install -y java-11-openjdk-devel && dnf clean > > all > > --> Using cache > > 96f8b1a0dfe7dba581a64fc67a27002ddf52e032af55f9ddc765182a690afd9d > > STEP 3: COPY TestMetrics.class TestMetrics.java /opt/ > > 269042160f7a4e6a06789cd19640ea658a8f941bc53de0fd40a574dc3bdb49a8 > > STEP 4: CMD /usr/lib/jvm/java-11-openjdk/bin/java -cp /opt --add- > > modules java.base --add-exports > > java.base/jdk.internal.platform=ALL-UNNAMED TestMetrics > > STEP 5: COMMIT fedora-metrics-11 > > d749088d6ce4510f212820ad4eca55a9b05e5c5c245f2372b6cfe91926e8cd7e > > From erik.osterlund at oracle.com Mon Jul 15 09:10:02 2019 From: erik.osterlund at oracle.com (=?UTF-8?Q?Erik_=c3=96sterlund?=) Date: Mon, 15 Jul 2019 11:10:02 +0200 Subject: RFR[13]: 8224674: NMethod state machine is not monotonic In-Reply-To: References: <625f018c-4eb1-09bb-e2b3-0a41ba65db19@oracle.com>

<4380063e-f08a-5c0d-5f90-aac4e0fdb570@oracle.com>

Message-ID: <00d16c64-dc06-f0fa-6bd3-2d3fbc3a857c@oracle.com> Hi Dean, On 2019-07-12 23:50, dean.long at oracle.com wrote: > On 7/11/19 1:13 PM, Erik ?sterlund wrote: >> Hi Dean, >> >> On 2019-07-11 15:29, dean.long at oracle.com wrote: >>> On 7/11/19 6:53 AM, Erik ?sterlund wrote: >>>> Hi Dean, >>>> >>>> On 2019-07-11 00:42, dean.long at oracle.com wrote: >>>>> On 7/10/19 1:28 AM, Erik ?sterlund wrote: >>>>>> Hi Dean, >>>>>> >>>>>> On 2019-07-09 23:31, dean.long at oracle.com wrote: >>>>>>> On 7/1/19 6:12 AM, Erik ?sterlund wrote: >>>>>>>> For ZGC I moved OSR nmethod unlinking to before the unlinking >>>>>>>> (where unlinking code belongs), instead of after the handshake >>>>>>>> (intended for deleting things safely unlinked). >>>>>>>> Strictly speaking, moving the OSR nmethod unlinking removes the >>>>>>>> racing between make_not_entrant and make_unloaded, but I still >>>>>>>> want the monotonicity guards to make this code more robust. >>>>>>> >>>>>>> I see where you added OSR nmethod unlinking, but not where you >>>>>>> removed it, so it's not obvious it was a "move". >>>>>> >>>>>> Sorry, bad wording on my part. I added OSR nmethod unlinking >>>>>> before the global handshake is run. After the handshake, we call >>>>>> make_unloaded() on the same is_unloading() nmethods. That >>>>>> function "tries" to unlink the OSR nmethod, but will just not do >>>>>> it as it's already unlinked at that point. So in a way, I didn't >>>>>> remove the call to unlink the OSR nmethod there, it just won't do >>>>>> anything. I preferred structuring it that way instead of trying >>>>>> to optimize away the call to unlink the OSR nmethod when making >>>>>> it unloaded, but only for the concurrent case. It seemed to >>>>>> introduce more conditional magic than it was worth. >>>>>> So in practice, the unlinking of OSR nmethods has moved for >>>>>> concurrent unloading to before the handshake. >>>>>> >>>>> >>>>> OK, in that case, could you add a little information to the >>>>> "Invalidate the osr nmethod only once" comment so that in the >>>>> future someone isn't tempted to remove the code as redundant? >>>> >>>> Sure. >>>> >>> >>> I meant the one in zNMethod.cpp :-) >> >> Okay, will put another comment in there once we agree on a direction >> on the next point. >> >>> >>>>>>> Would it make sense for nmethod::unlink_from_method() to do the >>>>>>> OSR unlinking, or to assert that it has already been done? >>>>>> >>>>>> An earlier version of this patch tried to do that. It is indeed >>>>>> possible. But it requires changing lock ranks of the OSR nmethod >>>>>> lock to special - 1 and moving around a bunch of code as this >>>>>> function is also called both when making nmethods not_entrant, >>>>>> zombie, and unlinking them in that case. For the first two, we >>>>>> conditionally unlink the nmethod based on the current state >>>>>> (which is the old state), whereas when I move it, the current >>>>>> state is the new state. So I had to change things around a bit >>>>>> more to figure out the right condition when to unlink it that >>>>>> works for all 3 callers. In the end, since this is going to 13, I >>>>>> thought it's more important to minimize the risk as much as I >>>>>> can, and leave such refactorings to 14. >>>>>> >>>>> >>>>> OK. >>>>> >>>>>>> The new bailout in the middle of >>>>>>> nmethod::make_not_entrant_or_zombie() worries me a little, >>>>>>> because the code up to that point has side-effects, and we could >>>>>>> be bailing out in an unexpected state. >>>>>> >>>>>> Correct. In an earlier version of this patch, I moved the >>>>>> transition to before the side effects. But a bunch of code is >>>>>> using the current nmethod state to determine what to do, and that >>>>>> current state changed from the old to the new state. In >>>>>> particular, we conditionally patch in the jump based on the >>>>>> current (old) state, and we conditionally increment decompile >>>>>> count based on the current (old) state. So I ended up having to >>>>>> rewrite more code than I wanted to for a patch going into 13, and >>>>>> convince myself that I had not implicitly messed something up. It >>>>>> felt safer to reason about the 3 side effects up until the >>>>>> transitioning point: >>>>>> >>>>>> 1) Patching in the jump into VEP. Any state more dead than the >>>>>> current transition, would still want that jump to be there. >>>>>> 2) Incrementing decompile count when making it not_entrant. Seems >>>>>> in order to do regardless, as we had an actual request to make >>>>>> the nmethod not entrant because it was bad somehow. >>>>>> 3) Marking it as seen on stack when making it not_entrant. This >>>>>> will only make can_convert_to_zombie start returning false, which >>>>>> is harmless in general. Also, as both transitions to zombie and >>>>>> not_entrant are performed under the Patching_lock, the only >>>>>> possible race is with make_unloaded. And those nmethods are >>>>>> is_unloading(), which also makes can_convert_to_zombie return >>>>>> false (in a not racy fashion). So it would essentially make no >>>>>> observable difference to any single call to can_convert_to_zombie(). >>>>>> >>>>>> In summary, #1 and #3 don't really observably change the state of >>>>>> the system, and #2 is completely harmless and probably wanted. >>>>>> Therefore I found that moving these things around and finding out >>>>>> where we use the current state(), as well as rewriting it, seemed >>>>>> like a slightly scarier change for 13 to me. >>>>>> >>>>>> So in general, there is some refactoring that could be done (and >>>>>> I have tried it) to make this nicer. But I want to minimize the >>>>>> risk for 13 as much as possible, and perform any risky >>>>>> refactorings in 14 instead. >>>>>> If your risk assessment is different and you would prefer moving >>>>>> the transition higher up (and flipping some conditions) instead, >>>>>> I am totally up for that too though, and I do see where you are >>>>>> coming from. >>>>>> >>>>> >>>>> So if we fail, it means that we lost a race to a "deader" state, >>>>> and assuming this is the only path to the deader state, wouldn't >>>>> that also mean that #1, #2, and #3 would have already been done by >>>>> the winning thread?? If so, that makes me feel better about >>>>> bailing out in the middle, but I'm still not 100% convinced, >>>>> unless we can assert that 1-3 already happened.? Do you have a >>>>> prototype of what moving the transition higher up would look like? >>>> >>>> As a matter of fact I do. Here is a webrev: >>>> http://cr.openjdk.java.net/~eosterlund/8224674/webrev.01/ >>>> >>>> I kind of like it. What do you think? >>>> >>> >>> Now the code after the transition that says "Must happen before >>> state change" worries me. >> >> Yes indeed. This is why I was hesitant to move the transition up. It >> moves past 3 things that implicitly depends on the current state. >> This one is extra scary. It actually introduces a race condition that >> could crash the VM (because can_convert_to_zombie() may observe an >> nmethod that just turned not_entrant, without being marked on stack). >> >> I think this shows (IMO) that trying to move the transition up has 3 >> problems, and this one is particularly hard to dodge. I think it >> really has to be before the transition. >> >> Would you agree now that keeping the transition where it was is less >> risky (as I did originally) > > Yes. > >> and convincing ourselves that the 3 "side effects" are not really >> observable side effects in the system, as I reasoned about earlier? >> > > yes, but I'm hoping we can do more than just reason, like adding > asserts.? More below... > >> If not, I can try to move the mark-on-stack up above the transition. >> >>> Can you remind me again what kind of race can make the state >>> transition fail here?? Did you happen to draw a state diagram while >>> learning this code? :-) >> >> Yes indeed. Would you like the long story or the short story? Here is >> the short story: the only known race is between one thread making an >> nmethod not_entrant and the GC thread making it unloaded. That >> make_not_entrant is the only transition that can fail. Previously I >> relied on there never existing any concurrent calls to >> make_not_entrant() and make_unloaded(). The OSR nmethod was caught as >> a special case (isn't it always...) where this could happen, >> violating monotonicity. But I think it feels safer to enforce the >> monotonicity of transitions in the actual code that performs the >> transitions, instead of relying on knowledge of the relationships >> between all state transitioning calls, implicitly ensuring monotonicity. >> > > Can we enforce in_use --> not_entrant --> unloaded --> zombie, and not > allow jumps or skipped states?? Then we can assert that cleanup from a > less-dead state has already been done.? So if make_not_entrant failed, > it would assert that all the cleanup that would have been done by a > successful make_not_entrant has already been done. I'm afraid not. The state machine skips states by design. For example, the set of {not_installed, in_use, not_entrant} states are alive and {unloaded, zombie} are not alive. Any nmethod in an "alive" state may transition to the unloaded state due to an oop dying. Actually strictly speaking, only {in_use, not_entrant} may become unloaded, as nmethods are made in_use within the same thread_in_vm critical section that they finalized oops in the nmethod, and hence could not yet have died. Similarly, the "unloaded" state is reserved for unloading by the GC. And not all nmethods that become zombie were unloaded by the GC. I think changing so that all these transitions are taken for all nmethods, sounds like it will break invariants and be quite dangerous. Note though that what all dead (!is_alive()) states have in common is that they can never be called or be on-stack; by the time an nmethod enters a dead state (unloaded or zombie), its inline caches and all other stale pointers to the nmethod have been cleaned out, and either a safepoint or global thread-local handshake with cross-modifying fences has finished, without finding activation records on-stack. That is the unwritten definition of being !is_alive() (e.g. unloaded or zombie). Therefore, if a transition to not_entrant fails due to entering a more dead state (unloaded or zombie), then that implies the following: 1) The jump at VEP is no longer needed because the jump is no longer reachable code, as another thread had enough knowledge to determine it was dead (all references to it have been unlinked, followed by a handshake/safepoint with cross-modifying fencing and stack scanning). So whether another transition performed this step or not is unimportant. Note that for example make_unloaded() does not patch in a jump at VEP, despite transitioning nmethods directly from in_use to unloaded, for this exact reason. By the time the nmethod is killed, that jump better be dead code already. It's only needed for the not_entrant state, where the nmethod may still alive but we want to stop calls into it. 2) The mark_as_seen_on_stack() prevents the sweeper from transitioning not_entrant() nmethods to zombie until it's no longer seen on stack, so it doesn't accidentally kill not_entrant nmethods. But if the transition failed, it's already dead, and the only path that looks at that value, is not taken (looking for not_entrant nmethods that can be made zombie). Again, it is totally fine that another thread killing the nmethod for a different reason did not perform this step. 3) The inc_decompile_count() is still valid, as the caller had a valid reason to deopt the nmethod, regardless of whether there were multiple reasons for discarding the nmethod or not. So in summary, if a make_not_entrant attempt fails due to a make_unloaded (or hypothetically make_zombie even though that race is impossible) attempt, then the presence or lack of presence of the VEP jump and the mark-on-stack value no longer matter, as they are properties that only matter to is_alive() nmethods. And inc_decompile_count is fine to do as well as there was a valid deopt reason for the make_not_entrant() caller. Would it feel better if I wrote this reasoning down in comments in make_not_entrant_or_zombie? Thanks, /Erik > dl > >> Thanks, >> /Erik >> >>> dl >>> >>>> Thanks, >>>> /Erik >>>> >>>>> dl >>>>> >>>>>> BTW, I have tested this change through hs-tier1-7, and it looks >>>>>> good. >>>>>> >>>>>> Thanks a lot Dean for reviewing this code. >>>>>> >>>>>> /Erik >>>>>> >>>>>>> dl >>>>>>> >>>>>> >>>>> >>> > From martin.doerr at sap.com Mon Jul 15 13:06:32 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 15 Jul 2019 13:06:32 +0000 Subject: dbg feature: PrintMallocStatistics still wanted? Message-ID: Hi, I recently noticed that the implementation for PrintMallocStatistics slows down the VM in fastdbg builds even if the feature is not active: https://bugs.openjdk.java.net/browse/JDK-8227597 My current proposal just improves the performance impact: http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.01/ But now, the question has come up, if PrintMallocStatistics is still needed since we have NMT. Note that PrintMallocStatistics is only available in dbg builds. Does anybody still want to use it? Would anybody vote for removing this feature? Best regards, Martin From maurizio.cimadamore at oracle.com Mon Jul 15 15:25:43 2019 From: maurizio.cimadamore at oracle.com (Maurizio Cimadamore) Date: Mon, 15 Jul 2019 16:25:43 +0100 Subject: RFR(trivial): 8227512: [TESTBUG] Fix JTReg javac test failures with Graal In-Reply-To: References: Message-ID: <808dfddf-1a5f-be51-4078-fccfac3f19f8@oracle.com> Looks good! Thanks Maurizio On 15/07/2019 02:38, Pengfei Li (Arm Technology China) wrote: > CC compiler-dev > > -- > Thanks, > Pengfei > >> Hi, >> >> Please help review this small fix. >> JBS: https://bugs.openjdk.java.net/browse/JDK-8227512 >> Webrev: http://cr.openjdk.java.net/~pli/rfr/8227512/ >> >> JTReg javac tests >> * langtools/tools/javac/modules/InheritRuntimeEnvironmentTest.java >> * langtools/tools/javac/file/LimitedImage.java >> failed when Graal is used as JVMCI compiler. >> >> These cases test javac behavior with the condition that observable modules >> are limited. But Graal is unable to be found in the limited module scope. This >> fixes these two tests by adding "jdk.internal.vm.compiler" into the limited >> modules. >> >> -- >> Thanks, >> Pengfei From coleen.phillimore at oracle.com Mon Jul 15 16:37:12 2019 From: coleen.phillimore at oracle.com (coleen.phillimore at oracle.com) Date: Mon, 15 Jul 2019 12:37:12 -0400 Subject: dbg feature: PrintMallocStatistics still wanted? In-Reply-To: References: Message-ID: I didn't realize this was still in the sources.? I think you should remove it. Coleen On 7/15/19 9:06 AM, Doerr, Martin wrote: > Hi, > > I recently noticed that the implementation for PrintMallocStatistics slows down the VM in fastdbg builds even if the feature is not active: > https://bugs.openjdk.java.net/browse/JDK-8227597 > > My current proposal just improves the performance impact: > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocated/webrev.01/ > > But now, the question has come up, if PrintMallocStatistics is still needed since we have NMT. Note that PrintMallocStatistics is only available in dbg builds. > Does anybody still want to use it? > Would anybody vote for removing this feature? > > Best regards, > Martin > From martin.doerr at sap.com Mon Jul 15 19:48:49 2019 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 15 Jul 2019 19:48:49 +0000 Subject: dbg feature: PrintMallocStatistics still wanted? In-Reply-To: References:

Message-ID: Hi Coleen, thanks for your feedback. I've created JDK-8227692: "Remove develop feature PrintMallocStatistics" and I'll send an RFR, soon. Best regards, Martin > -----Original Message----- > From: hotspot-dev On Behalf Of > coleen.phillimore at oracle.com > Sent: Montag, 15. Juli 2019 18:37 > To: hotspot-dev at openjdk.java.net > Subject: Re: dbg feature: PrintMallocStatistics still wanted? > > > I didn't realize this was still in the sources.? I think you should > remove it. > Coleen > > On 7/15/19 9:06 AM, Doerr, Martin wrote: > > Hi, > > > > I recently noticed that the implementation for PrintMallocStatistics slows > down the VM in fastdbg builds even if the feature is not active: > > https://bugs.openjdk.java.net/browse/JDK-8227597 > > > > My current proposal just improves the performance impact: > > > http://cr.openjdk.java.net/~mdoerr/8227597_DBG_Inline_inc_bytes_allocat > ed/webrev.01/ > > > > But now, the question has come up, if PrintMallocStatistics is still needed > since we have NMT. Note that PrintMallocStatistics is only available in dbg > builds. > > Does anybody still want to use it? > > Would anybody vote for removing this feature? > > > > Best regards, > > Martin > > From kim.barrett at oracle.com Mon Jul 15 19:51:22 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 15 Jul 2019 15:51:22 -0400 Subject: 8227652: SetupOperatorNewDeleteCheck should discuss deleting destructors Message-ID: <40590A26-1A32-4B3F-B1D8-55A56090C5F4@oracle.com> Please review this explanatory comment being added to the description of the check for using global operator new/delete in Hotspot code. The described situation is somewhat obscure, and encountering it for the first time (or again after a long time, as happened to me recently) can be quite puzzling. CR: https://bugs.openjdk.java.net/browse/JDK-8227652 Webrev: http://cr.openjdk.java.net/~kbarrett/8227652/open.00/ From kim.barrett at oracle.com Tue Jul 16 01:18:43 2019 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 15 Jul 2019 21:18:43 -0400 Subject: RFR: 8227653: Add VM Global OopStorage Message-ID: Please review this change which adds a VMGlobal OopStorage object. It is initially being used instead of the conditional JVMCI global storage object, which is being removed. Looking for reviewers from all of gc, runtime, and compiler. To keep things simple for now, this new storage object is (optionally) included in the processing done by SystemDictionary::oops_do. Most existing storage processors use that mechanism. For most processors, that's consistent with how JNI global handles are processed. ZGC uses a different approach, and provides enough infrastructure that it was easy to process this new storage object in a way that is consistent with ZGC's handling of JNI globals. This change does not attempt to address the problems around changing the set of OopStorage instance described by JDK-8227054. This change was a useful bit of preparation for the work I'm doing on JDK-8227054, so I split it out as a separate change. This change also includes a minimal update of Shenandoah, using the processing of the new storage object by SystemDictionary::oops_do. It looks like Shenandoah is conceptually similar to ZGC in it's handling of JNI globals, and should be able to handle this new storage object similarly, but I'm leaving that to the Shenandoah developers. You might want to wait for JDK-8227054 though. Note that neither ZGC nor Shenandoah processed the former conditional JVMCI global storage. CR: https://bugs.openjdk.java.net/browse/JDK-8227653 Webrev: http://cr.openjdk.java.net/~kbarrett/8227653/open.00/ Testing: mach5 tier1-5 From matthias.baesken at sap.com Tue Jul 16 07:51:42 2019 From: matthias.baesken at sap.com (Baesken, Matthias) Date: Tue, 16 Jul 2019 07:51:42 +0000 Subject: RFR: 8227631: Adjust AIX version check In-Reply-To: References: Message-ID: > would print the OS, too, as it did before. Hi Goetz I do not think that we currently run in OpenJDK on OS/400 , so printing the OS does not have much value currently ( it is always AIX) . > Didn't the warning stem from the > assert(false, name_str); I think clang does not like the usage of string literals for non-constant strings [-Wwritable-strings] . Best regards, Matthias > -----Original Message----- > From: Lindenmaier, Goetz > Sent: Dienstag, 16. Juli 2019 09:30 > To: Baesken, Matthias ; Doerr, Martin > > Subject: RE: RFR: 8227631: Adjust AIX version check > > Hi Matthias, > > limiting the version is a good idea. > > As this is only a trace and an assertion (debug build), > I don't think it is necessary to push it to 13, but > pushing it there is fine too. > > I would appreciate if the > trcVerbose("We run on %s %s", name_str, ver_str); > would print the OS, too, as it did before. > As the code for OS/400 is in there, also the tracing > should be complete. > > Didn't the warning stem from the > assert(false, name_str); > which correctly could be > assert(false, "%s", name_str); > ? (Your version for the assert is fine, too.) > > Best regards, > Goetz. > > > -----Original Message----- > > From: Langer, Christoph > > Sent: Freitag, 12. Juli 2019 14:17 > > To: Baesken, Matthias ; 'hotspot- > > dev at openjdk.java.net' ; 'ppc-aix-port- > > dev at openjdk.java.net' > > Subject: RE: RFR: 8227631: Adjust AIX version check > > > > Hi Matthias, > > > > looks good. This might even be something to push to JDK13 still (if you do it > > within the next few days). > > > > Best regards > > Christoph > > > > > > > -----Original Message----- > > > From: hotspot-dev On Behalf > Of > > > Baesken, Matthias > > > Sent: Freitag, 12. Juli 2019 13:09 > > > To: 'hotspot-dev at openjdk.java.net' ; > > > 'ppc-aix-port-dev at openjdk.java.net' dev at openjdk.java.net> > > > Subject: RFR: 8227631: Adjust AIX version check > > > > > > Hello, please review this small AIX related change . > > > > > > For some time, we do not support AIX 5.3 any more. > > > See (where AIX 7.1 or 7.2 is the supported build platform since > OpenJDK11) : > > > > > > https://wiki.openjdk.java.net/display/Build/Supported+Build+Platforms > > > > > > The currently used xlc 16.1 (XL C/C++ Compilers) even needs minimum > AIX > > > 7.1 to run , see > > > > > > http://www-01.ibm.com/support/docview.wss?uid=swg21326972 > > > > > > (and compiling for older releases on 7.1 / 7.2 would not work easily , at > least > > > not "out of the box" to my knowledge .) > > > > > > So we should adjust the minimum OS version check done in os_aix.cpp in > > > os::Aix::initialize_os_info() . > > > > > > > > > Additionally the change removes a couple of warnings [-Wwritable- > strings > > > category] . > > > > > > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4081:22: warning: ISO C++11 > > > does not allow conversion from string literal to 'char *' [-Wwritable- > strings] > > > char *name_str = "unknown OS"; > > > ^ > > > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4089:18: warning: ISO C++11 > > > does not allow conversion from string literal to 'char *' [-Wwritable- > strings] > > > name_str = "OS/400 (pase)"; > > > ^ > > > /nightly/jdk/src/hotspot/os/aix/os_aix.cpp:4100:18: warning: ISO C++11 > > > does not allow conversion from string literal to 'char *' [-Wwritable- > strings] > > > name_str = "AIX"; > > > > > > > > > > > > Bug/webrev : > > > > > > https://bugs.openjdk.java.net/browse/JDK-8227631 > > > > > > http://cr.openjdk.java.net/~mbaesken/webrevs/8227631.0/ > > > > > > Thanks, Matthias From thomas.schatzl at oracle.com Tue Jul 16 09:25:32 2019 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Tue, 16 Jul 2019 11:25:32 +0200 Subject: RFR: 8227653: Add VM Global OopStorage In-Reply-To: References: Message-ID: Hi, On Mon, 2019-07-15 at 21:18 -0400, Kim Barrett wrote: > Please review this change which adds a VMGlobal OopStorage > object. It is initially being used instead of the conditional JVMCI > global storage object, which is being removed. > > Looking for reviewers from all of gc, runtime, and compiler. > > To keep things simple for now, this new storage object is > (optionally) included in the processing done by > SystemDictionary::oops_do. Most existing storage processors use that > mechanism. For most processors, that's consistent with how JNI > global handles are processed. ZGC uses a different approach, and > provides enough infrastructure that it was easy to process this new > storage object in a way that is consistent with ZGC's handling of JNI > globals. > > This change does not attempt to address the problems around changing > the set of OopStorage instance described by JDK-8227054. This change > was a useful bit of preparation for the work I'm doing on JDK- > 8227054, so I split it out as a separate change. > > This change also includes a minimal update of Shenandoah, using the > processing of the new storage object by SystemDictionary::oops_do. > It looks like Shenandoah is conceptually similar to ZGC in it's > handling of JNI globals, and should be able to handle this new > storage object similarly, but I'm leaving that to the Shenandoah > developers. You might want to wait for JDK-8227054 though. > > Note that neither ZGC nor Shenandoah processed the former conditional > JVMCI global storage. > > CR: > https://bugs.openjdk.java.net/browse/JDK-8227653 > > Webrev: > http://cr.openjdk.java.net/~kbarrett/8227653/open.00/ looks good to me. Thomas From christoph.goettschkes at microdoc.com Tue Jul 16 10:05:59 2019 From: christoph.goettschkes at microdoc.com (christoph.goettschkes at microdoc.com) Date: Tue, 16 Jul 2019 12:05:59 +0200 Subject: [PATCH] Use of an unitialized register in 32-bit ARM template interpreter Message-ID: Hello, while working with the OpenJDK 11 on a 32-bit ARMv7-A platform, I noticed something weird in the template interpreter, regarding the template for the bytecode instruction ldc2_w. The type check for the operand is only done correctly if the ABI is a hard-float one. For soft-float, the check is done wrong, using an uninitialized register Rtemp. Please see the following diff: diff -r 327d5994b2fb src/hotspot/cpu/arm/templateTable_arm.cpp --- a/src/hotspot/cpu/arm/templateTable_arm.cpp Tue Mar 12 11:13:39 2019 -0400 +++ b/src/hotspot/cpu/arm/templateTable_arm.cpp Tue Jul 16 11:22:14 2019 +0200 @@ -515,36 +515,37 @@ void TemplateTable::ldc2_w() { transition(vtos, vtos); const Register Rtags = R2_tmp; const Register Rindex = R3_tmp; const Register Rcpool = R4_tmp; const Register Rbase = R5_tmp; __ get_unsigned_2_byte_index_at_bcp(Rindex, 1); __ get_cpool_and_tags(Rcpool, Rtags); const int base_offset = ConstantPool::header_size() * wordSize; const int tags_offset = Array::base_offset_in_bytes(); __ add(Rbase, Rcpool, AsmOperand(Rindex, lsl, LogBytesPerWord)); + // get type from tags + __ add(Rtemp, Rtags, tags_offset); + __ ldrb(Rtemp, Address(Rtemp, Rindex)); + Label Condy, exit; #ifdef __ABI_HARD__ Label Long; - // get type from tags - __ add(Rtemp, Rtags, tags_offset); - __ ldrb(Rtemp, Address(Rtemp, Rindex)); __ cmp(Rtemp, JVM_CONSTANT_Double); __ b(Long, ne); __ ldr_double(D0_tos, Address(Rbase, base_offset)); __ push(dtos); __ b(exit); __ bind(Long); #endif __ cmp(Rtemp, JVM_CONSTANT_Long); __ b(Condy, ne); #ifdef AARCH64 __ ldr(R0_tos, Address(Rbase, base_offset)); #else __ ldr(R0_tos_lo, Address(Rbase, base_offset + 0 * wordSize)); If the check for the type of the operand is done correctly, the call to InterpreterRuntime::resolve_ldc should never happen. Currently, for 32-bit soft-float arm, InterpreterRuntime::resolve_ldc is called if the operand for ldc2_w is of type long. Also, I find it weird that the "condy_helper" code is genarted for the ldc2_w bytecode instruction on 32-bit hard-float arm (and also on x86). Aren't the only two valid types for ldc2_w long and double? -- Christoph From sgehwolf at redhat.com Tue Jul 16 12:36:05 2019 From: sgehwolf at redhat.com (Severin Gehwolf) Date: Tue, 16 Jul 2019 14:36:05 +0200 Subject: RFR: 8227642: [TESTBUG] Make docker tests podman compatible In-Reply-To: <5bc3ac00-6ac9-99aa-052d-0a4aa6b04f8f@oracle.com> References: <32c8a1934bf07e4c9c6a961e60dcb7abd9931fe1.camel@redhat.com> <5bc3ac00-6ac9-99aa-052d-0a4aa6b04f8f@oracle.com> Message-ID: