From kim.barrett at oracle.com Sat Oct 1 00:08:21 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Fri, 30 Sep 2016 20:08:21 -0400 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> Message-ID: > On Sep 30, 2016, at 9:55 AM, Alan Burlison wrote: > > On 30/09/2016 01:03, Kim Barrett wrote: > >> The old code used literal integer sizes and indices. The new code >> assumes avn > AV_HW1_IDX. I don?t see anything that guarantees that >> to be true (other than, perhaps, the source code for getisax). If >> the array was allocated with a size of the larger of AV_HW1_IDX+1 and >> AV_HW2_IDX+1 then we?d be guaranteed safe to access >> av[AV_HW{1,2}_IDX]. > > They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants. I like the use of symbolic constants. It?s the unnecessary assumptions about their values that I dislike. I?d prefer something like const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1; uint_t* av = alloca(av_size); getisax(av, av_size); So av is known to be big enough to access the desired elements. From HORII at jp.ibm.com Sun Oct 2 14:46:39 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sun, 2 Oct 2016 23:46:39 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1475236951.6301.72.camel@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1e40040e-b494-6e1e-0 <1475236951.6301.72.camel@oracle.com> Message-ID: Hi, Thomas, and David, Thank you for your comments. > I think Hiroshi thinks that since the work stealing itself does a CAS > with barrier after obtaining "new_obj" in the other thread, it should > be safe (for other threads consuming an object on the task queue). Thank you. What Thomas thankfully explain is that I wanted to mention why relaxed CAS is available for copy_to_survivor. > I also do not think it is safe as is - for example, at least > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > returned new_obj (in another log message :)) regardless of failure. > > That method also reads the forwardee if forwarded, and then again uses > object information in that same log message. A quick look did not show > other issues, but don't count this as a review. Thank you for your comments. As Carsten suggested, I guess, size may not be necessary for logging when CAS is failed (the size will be logged by the other thread that successfully operates the CAS). By reducing printing a size of new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. In my understanding, PSPromotionManager::copy_and_push_safe_barrier() updates a card table for new_obj. However, this new_obj will not be used fro card tables in the same GC as a root of GC because all of entries in card tables were registered as tasks before any calls of copy_and_push_safe_barrier. I created a new webrev that reduces print formats when CAS is failed. Could you review this and give comments on it? http://cr.openjdk.java.net/~horii/8154736/webrev.00/ Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo Thomas Schatzl wrote on 09/30/2016 21:02:31: > From: Thomas Schatzl > To: David Holmes , Hiroshi H Horii/Japan/IBM at IBMJP > Cc: hotspot-compiler-dev , > Tim Ellison , Michihiro Horie/Japan/ > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > > Date: 09/30/2016 21:04 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Hi, > > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > > > > > > Dear David, and Dan, > > > > > > Thank you for your comments. > > > > > > > > > > > In > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > > > > 266 the log line reads data from the forwardee even when the CAS > > > > fails. I believe those reads will be unsafe without barriers > > > > after > > > > the copy of the content of the object. > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > > > > 8 > > > > same problem as in line 266 > > > Can we use o->size() or new_obj_size instead of new_obj->size()? > > They are not equivalent. Parallel GC and other collectors creatively > reuse the "length" field of objArrays to indicate progress in the > scanning them during GC. > > new_obj_size is the result of a call to o->size() (and the compiler may > redo computations at any point), so has the same issue. > > > > > If you feel that the use of new_obj->size() is potentially unsafe > > > > then > > > > the fact we return new_obj means that any use of new_obj by the > > > > caller > > > > may also potentially be unsafe. > > > In my understanding, while copying objects to a survivor space, if > > > a thread creates a new_obj and sets a pointer with CAS, the other > > > threads can touch the new_obj after the thread calls > > > push_contents(new_obj) (Line: 239). In push_contents, > > > OrderAccess::release_store is called before pushing the object as a > > > task into a deque of workstealing (taskqueue.inline.hpp). If the > > > other thread reads the task, all of copy for new_obj is safe. > > I'm not familiar with the larger picture of the GC protocols here, > > but just looking at this code fragment in isolation if the CAS fails > > we read o->forwardee() to set new_obj. That in itself is fine because > > we're reading the field that we were testing with the CAS. But we > > could then deference new_obj before the thread that won the CAS calls > > push_contents; and even if it is after push_contents we have not done > > an acquire to pair with the release-store in push_contents. > > I think Hiroshi thinks that since the work stealing itself does a CAS > with barrier after obtaining "new_obj" in the other thread, it should > be safe (for other threads consuming an object on the task queue). > > > So I'm really not seeing how we can use a barrier-less CAS here. > > I also do not think it is safe as is - for example, at least > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > returned new_obj (in another log message :)) regardless of failure. > > That method also reads the forwardee if forwarded, and then again uses > object information in that same log message. A quick look did not show > other issues, but don't count this as a review. > > Thanks, > Thomas > From varming at gmail.com Mon Oct 3 03:55:25 2016 From: varming at gmail.com (Carsten Varming) Date: Sun, 2 Oct 2016 23:55:25 -0400 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com> <201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> Message-ID: Dear Hiroshi, It looks like psPromotionManager.cpp:509 contains a logging statement that could read data from an oop forwarded by another thread. I don't see how your new logging in PSPromotionManager::copy_and_push_safe_barrier can be safe. In the two new statements you read data from new_obj, but in both cases it is possible that another thread still haven't written the data in new_obj (new_obj->klass() reads new_obj->_metadata). Carsten On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii wrote: > Hi, Thomas, and David, > > Thank you for your comments. > > > I think Hiroshi thinks that since the work stealing itself does a CAS > > with barrier after obtaining "new_obj" in the other thread, it should > > be safe (for other threads consuming an object on the task queue). > > Thank you. What Thomas thankfully explain is that I wanted to mention why > relaxed CAS is available for copy_to_survivor. > > > I also do not think it is safe as is - for example, at least > > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > > returned new_obj (in another log message :)) regardless of failure. > > > > That method also reads the forwardee if forwarded, and then again uses > > object information in that same log message. A quick look did not show > > other issues, but don't count this as a review. > > Thank you for your comments. > > As Carsten suggested, I guess, size may not be necessary for logging when > CAS is failed (the size will be logged by the other thread that > successfully operates the CAS). By reducing printing a size of new_obj, > relaxing CAS for forwarding pointers becomes safe, I believe. > > In my understanding, PSPromotionManager::copy_and_push_safe_barrier() > updates a card table for new_obj. However, this new_obj will not be used > fro card tables in the same GC as a root of GC because all of entries in > card tables were registered as tasks before any calls of > copy_and_push_safe_barrier. > > I created a new webrev that reduces print formats when CAS is failed. > Could you review this and give comments on it? > http://cr.openjdk.java.net/~horii/8154736/webrev.00/ > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > Thomas Schatzl wrote on 09/30/2016 21:02:31: > > > From: Thomas Schatzl > > To: David Holmes , Hiroshi H > Horii/Japan/IBM at IBMJP > > Cc: hotspot-compiler-dev , > > Tim Ellison , Michihiro Horie/Japan/ > > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > > > > Date: 09/30/2016 21:04 > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > Hi, > > > > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > > > > > > > > Dear David, and Dan, > > > > > > > > Thank you for your comments. > > > > > > > > > > > > > > In > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > > > > > 266 the log line reads data from the forwardee even when the CAS > > > > > fails. I believe those reads will be unsafe without barriers > > > > > after > > > > > the copy of the content of the object. > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > > > > > 8 > > > > > same problem as in line 266 > > > > Can we use o->size() or new_obj_size instead of new_obj->size()? > > > > They are not equivalent. Parallel GC and other collectors creatively > > reuse the "length" field of objArrays to indicate progress in the > > scanning them during GC. > > > > new_obj_size is the result of a call to o->size() (and the compiler may > > redo computations at any point), so has the same issue. > > > > > > > If you feel that the use of new_obj->size() is potentially unsafe > > > > > then > > > > > the fact we return new_obj means that any use of new_obj by the > > > > > caller > > > > > may also potentially be unsafe. > > > > In my understanding, while copying objects to a survivor space, if > > > > a thread creates a new_obj and sets a pointer with CAS, the other > > > > threads can touch the new_obj after the thread calls > > > > push_contents(new_obj) (Line: 239). In push_contents, > > > > OrderAccess::release_store is called before pushing the object as a > > > > task into a deque of workstealing (taskqueue.inline.hpp). If the > > > > other thread reads the task, all of copy for new_obj is safe. > > > I'm not familiar with the larger picture of the GC protocols here, > > > but just looking at this code fragment in isolation if the CAS fails > > > we read o->forwardee() to set new_obj. That in itself is fine because > > > we're reading the field that we were testing with the CAS. But we > > > could then deference new_obj before the thread that won the CAS calls > > > push_contents; and even if it is after push_contents we have not done > > > an acquire to pair with the release-store in push_contents. > > > > I think Hiroshi thinks that since the work stealing itself does a CAS > > with barrier after obtaining "new_obj" in the other thread, it should > > be safe (for other threads consuming an object on the task queue). > > > > > So I'm really not seeing how we can use a barrier-less CAS here. > > > > I also do not think it is safe as is - for example, at least > > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > > returned new_obj (in another log message :)) regardless of failure. > > > > That method also reads the forwardee if forwarded, and then again uses > > object information in that same log message. A quick look did not show > > other issues, but don't count this as a review. > > > > Thanks, > > Thomas > > > > From igor.ignatyev at oracle.com Mon Oct 3 09:49:34 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 3 Oct 2016 12:49:34 +0300 Subject: RFR(S): 8166804: Convert TestMetachunk_test to GTest In-Reply-To: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com> References: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com> Message-ID: <456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com> Kirill, looks good to me, Reviewed. Thanks, ? Igor > On Sep 28, 2016, at 5:35 PM, Kirill Zhaldybin wrote: > > Dear all, > > Could you please review this fix for 8166804? > > WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166804/webrev.00/ > CR: https://bugs.openjdk.java.net/browse/JDK-8166804 > > Thank you. > > Regards, Kirill From igor.ignatyev at oracle.com Mon Oct 3 09:51:18 2016 From: igor.ignatyev at oracle.com (Igor Ignatyev) Date: Mon, 3 Oct 2016 12:51:18 +0300 Subject: RFR(S): 8166563: Convert GuardedMemory_test to Gtest In-Reply-To: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com> References: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com> Message-ID: <398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com> Kirill, looks good to me, Reviewed. Thanks, ? Igor > On Sep 28, 2016, at 5:26 PM, Kirill Zhaldybin wrote: > > Dear all, > > Could you please review this fix for 8166563? > > The test was separated to a few testcases but the logic of the original test was preserved. > > WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166563/webrev.00/ > CR: https://bugs.openjdk.java.net/browse/JDK-8166563 > > Thank you. > > Regards, Kirill From kirill.zhaldybin at oracle.com Mon Oct 3 09:53:15 2016 From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin) Date: Mon, 3 Oct 2016 12:53:15 +0300 Subject: RFR(S): 8166804: Convert TestMetachunk_test to GTest In-Reply-To: <456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com> References: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com> <456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com> Message-ID: <57F22A8B.1060307@oracle.com> Igor, Thank you! Regards, Kirill On 03.10.2016 12:49, Igor Ignatyev wrote: > Kirill, > > looks good to me, Reviewed. > > Thanks, > ? Igor >> On Sep 28, 2016, at 5:35 PM, Kirill Zhaldybin wrote: >> >> Dear all, >> >> Could you please review this fix for 8166804? >> >> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166804/webrev.00/ >> CR: https://bugs.openjdk.java.net/browse/JDK-8166804 >> >> Thank you. >> >> Regards, Kirill > From kirill.zhaldybin at oracle.com Mon Oct 3 09:53:40 2016 From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin) Date: Mon, 3 Oct 2016 12:53:40 +0300 Subject: RFR(S): 8166563: Convert GuardedMemory_test to Gtest In-Reply-To: <398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com> References: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com> <398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com> Message-ID: <57F22AA4.1040503@oracle.com> Igor, Thank you! Regards, Kirill On 03.10.2016 12:51, Igor Ignatyev wrote: > Kirill, > > looks good to me, Reviewed. > > Thanks, > ? Igor > >> On Sep 28, 2016, at 5:26 PM, Kirill Zhaldybin wrote: >> >> Dear all, >> >> Could you please review this fix for 8166563? >> >> The test was separated to a few testcases but the logic of the original test was preserved. >> >> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166563/webrev.00/ >> CR: https://bugs.openjdk.java.net/browse/JDK-8166563 >> >> Thank you. >> >> Regards, Kirill > From HORII at jp.ibm.com Mon Oct 3 14:15:10 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Mon, 3 Oct 2016 23:15:10 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> Message-ID: Dear Carsten, Thank you for your correction. And very sorry about my easy mistakes... I created webrev again. http://cr.openjdk.java.net/~horii/8154736/webrev.01/ I believe, all of the unsafe usages of new_obj, which has been pointed in this thread, is fixed with this webrev. Dear all, Can I ask a review of this webrev and give thoughts and comments again? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo Carsten Varming wrote on 10/03/2016 12:55:25: > From: Carsten Varming > To: Hiroshi H Horii/Japan/IBM at IBMJP > Cc: Thomas Schatzl , David Holmes > , hotspot-compiler-dev dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > , Michihiro Horie/Japan/ > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" dev at openjdk.java.net>, Tim Ellison > Date: 10/03/2016 12:56 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > Dear Hiroshi, > > It looks like psPromotionManager.cpp:509 contains a logging > statement that could read data from an oop forwarded by another thread. > > I don't see how your new logging > in PSPromotionManager::copy_and_push_safe_barrier can be safe. In > the two new statements you read data from new_obj, but in both cases > it is possible that another thread still haven't written the data in > new_obj (new_obj->klass() reads new_obj->_metadata). > > Carsten > > On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii wrote: > Hi, Thomas, and David, > > Thank you for your comments. > > > I think Hiroshi thinks that since the work stealing itself does a CAS > > with barrier after obtaining "new_obj" in the other thread, it should > > be safe (for other threads consuming an object on the task queue). > > Thank you. What Thomas thankfully explain is that I wanted to > mention why relaxed CAS is available for copy_to_survivor. > > > I also do not think it is safe as is - for example, at least > > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > > returned new_obj (in another log message :)) regardless of failure. > > > > That method also reads the forwardee if forwarded, and then again uses > > object information in that same log message. A quick look did not show > > other issues, but don't count this as a review. > > Thank you for your comments. > > As Carsten suggested, I guess, size may not be necessary for logging > when CAS is failed (the size will be logged by the other thread that > successfully operates the CAS). By reducing printing a size of > new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. > > In my understanding, PSPromotionManager::copy_and_push_safe_barrier > () updates a card table for new_obj. However, this new_obj will not > be used fro card tables in the same GC as a root of GC because all > of entries in card tables were registered as tasks before any calls > of copy_and_push_safe_barrier. > > I created a new webrev that reduces print formats when CAS is > failed. Could you review this and give comments on it? > http://cr.openjdk.java.net/~horii/8154736/webrev.00/ > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > Thomas Schatzl wrote on 09/30/2016 21:02:31: > > > From: Thomas Schatzl > > To: David Holmes , Hiroshi H Horii/Japan/IBM at IBMJP > > Cc: hotspot-compiler-dev , > > Tim Ellison , Michihiro Horie/Japan/ > > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > > > > Date: 09/30/2016 21:04 > > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > > copy_to_survivor for ppc64 > > > > Hi, > > > > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > > > > > > > > Dear David, and Dan, > > > > > > > > Thank you for your comments. > > > > > > > > > > > > > > In > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > > > > > 266 the log line reads data from the forwardee even when the CAS > > > > > fails. I believe those reads will be unsafe without barriers > > > > > after > > > > > the copy of the content of the object. > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > > > > > 8 > > > > > same problem as in line 266 > > > > Can we use o->size() or new_obj_size instead of new_obj->size()? > > > > They are not equivalent. Parallel GC and other collectors creatively > > reuse the "length" field of objArrays to indicate progress in the > > scanning them during GC. > > > > new_obj_size is the result of a call to o->size() (and the compiler may > > redo computations at any point), so has the same issue. > > > > > > > If you feel that the use of new_obj->size() is potentially unsafe > > > > > then > > > > > the fact we return new_obj means that any use of new_obj by the > > > > > caller > > > > > may also potentially be unsafe. > > > > In my understanding, while copying objects to a survivor space, if > > > > a thread creates a new_obj and sets a pointer with CAS, the other > > > > threads can touch the new_obj after the thread calls > > > > push_contents(new_obj) (Line: 239). In push_contents, > > > > OrderAccess::release_store is called before pushing the object as a > > > > task into a deque of workstealing (taskqueue.inline.hpp). If the > > > > other thread reads the task, all of copy for new_obj is safe. > > > I'm not familiar with the larger picture of the GC protocols here, > > > but just looking at this code fragment in isolation if the CAS fails > > > we read o->forwardee() to set new_obj. That in itself is fine because > > > we're reading the field that we were testing with the CAS. But we > > > could then deference new_obj before the thread that won the CAS calls > > > push_contents; and even if it is after push_contents we have not done > > > an acquire to pair with the release-store in push_contents. > > > > I think Hiroshi thinks that since the work stealing itself does a CAS > > with barrier after obtaining "new_obj" in the other thread, it should > > be safe (for other threads consuming an object on the task queue). > > > > > So I'm really not seeing how we can use a barrier-less CAS here. > > > > I also do not think it is safe as is - for example, at least > > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > > returned new_obj (in another log message :)) regardless of failure. > > > > That method also reads the forwardee if forwarded, and then again uses > > object information in that same log message. A quick look did not show > > other issues, but don't count this as a review. > > > > Thanks, > > Thomas > > From Alan.Burlison at oracle.com Mon Oct 3 15:04:51 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Mon, 3 Oct 2016 16:04:51 +0100 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> Message-ID: On 01/10/2016 01:08, Kim Barrett wrote: >> They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants. > > I like the use of symbolic constants. It?s the unnecessary assumptions about their values that I dislike. > > I?d prefer something like > > const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1; > uint_t* av = alloca(av_size); > getisax(av, av_size); > > So av is known to be big enough to access the desired elements. The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without breaking binary compatibility, and we simply aren't ever going to do that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known, namely 0 and 1 respectively, there's no point in comparing them. Plus if/when we get the 3rd capabilities word it will need changing anyway. I've therefore changed the code to remove the alloca() call and be just: // Extract valid instruction set extensions. uint_t avs[AV_HW2_IDX + 1]; uint_t avn = getisax(avs, sizeof(avs)); webrev updated accordingly. -- Alan Burlison -- From kim.barrett at oracle.com Mon Oct 3 18:24:14 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Mon, 3 Oct 2016 14:24:14 -0400 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> Message-ID: <5FE62FCC-5AA3-47AD-AEBA-C2CFF6D33613@oracle.com> > On Oct 3, 2016, at 11:04 AM, Alan Burlison wrote: > > On 01/10/2016 01:08, Kim Barrett wrote: > >>> They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants. >> >> I like the use of symbolic constants. It?s the unnecessary assumptions about their values that I dislike. >> >> I?d prefer something like >> >> const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1; >> uint_t* av = alloca(av_size); >> getisax(av, av_size); >> >> So av is known to be big enough to access the desired elements. > > The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without breaking binary compatibility, and we simply aren't ever going to do that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known, namely 0 and 1 respectively, there's no point in comparing them. Plus if/when we get the 3rd capabilities word it will need changing anyway. > > I've therefore changed the code to remove the alloca() call and be just: > > // Extract valid instruction set extensions. > uint_t avs[AV_HW2_IDX + 1]; > uint_t avn = getisax(avs, sizeof(avs)); > > webrev updated accordingly. > > -- > Alan Burlison > -- Looks good. From david.holmes at oracle.com Tue Oct 4 01:45:03 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Oct 2016 11:45:03 +1000 Subject: (M) RFR: 8081800: AbstractMethodError when evaluating a private method in an interface via debugger In-Reply-To: References: Message-ID: Hi Coleen, Thanks for the review. On 1/10/2016 6:55 AM, Coleen Phillimore wrote: > http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/src/share/vm/oops/klassVtable.cpp.udiff.html > > > + assert(!mh()->is_private(), "private interface method in the default > method list"); > > > Nit, don't need mh() parentheses. methodHandle has an operator -> Fixed. > + // private methods in classes always have a new entry in the vtable. > + // Specification interpretation since classic has private methods not > overriding. > > What does this mean exactly? Does it mean that we add private methods > to the vtable but we don't have to because they do not override other > private methods? Why is this compatible with classic? I know this is > something pre-existing but could you clarify the comment since you > touched it? I only "touched it" by deleting the irrelevant: // JDK8 adds private methods in interfaces which require invokespecial because we've already bailed out if dealing with a non-abstract interface method. I do not know what the remaining existing comments refer to exactly so I can not clarify them. I have no knowledge of "classic" at all. > http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/src/share/vm/oops/method.cpp.udiff.html > > + // private methods in classes get vtable entries for backward class > compatibility. > > This is a bit more clear, and it's not important why. Is this something > that can be cleaned up in future release? If so, it would be good to > have the explanation in an RFE. I only added the "in classes" because this doesn't apply to interface methods. The whole treatment of private methods is subject to re-examination in the future as private methods should be treated as effectively final - I thought Karen had filed an RFE for that but I can't find it. :( I suspect the backward compatibility rationale is extremely old and probably no longer an issue - private methods should not need vtable entries. > http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/test/runtime/RedefineTests/RedefineInterfaceMethods.java.html > > Thank you for adding this test. Thanks for pointing me to these tests as it made it trivially easy to write the new one! > You other test looks good. > > http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot-rename/ > > Renaming also looks good. Thanks! I don't think there is anything preventing this from pushing now so will go ahead and do that. David ----- > Thanks, > Coleen > > On 9/28/16 7:50 AM, David Holmes wrote: >> Warning: long discussion, but in the end relatively simple code >> change. :) >> >> Thanks to Karen for explaining vtables and itables and pointing out >> various tests to be executed; Coleen for the discussions around >> interface initialization and terminology, and pointing me to simple >> redefinition tests; and Stas Lukyanov for indicating the right JCK >> tests to run. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8081800 >> >> Background: >> >> In JDK 8 default and static interface methods were added to the Java >> language. Private interface methods were also considered, and support >> in the VM was added, but were dropped due to schedule pressure. In >> Java 9 private interface methods have now been enabled at the >> language-level and because the VM already supported invokespecial for >> private interface methods, the direct language use, and core >> reflection use, of these methods works fine. However, what was >> overlooked (and which the test case in this bug report highlighted) >> was that the other interfaces to the VM (JNI, JDWP, JDI, JVM TI) had >> not been updated to account for private interface method, and such >> usage did not work. >> >> The updates to the specifications, plus some small JDI/JDWP related >> code changes are being handled under: >> >> JDK-8165827 Support private interface methods in JDI, JDWP and JDB >> https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> This bug, although originally discovered via JDI/JDB, is being used to >> fix the underlying mechanics in the VM used by the JNI layer - after >> which the test in the bug report will run fine. >> >> Problem: >> >> Because private interface methods are only invocable via invokespecial >> (the JVMS goes to great lengths to explicitly prohibit all other >> invocation forms on them) they are in essence always statically bound >> and don't require lookup in either itables (for invokeinterface) or >> vtables (for general lookup). However, JNI etc, uses itables/vtables >> to perform their invocations, and what we got was behaviour where the >> private interface methods did have an itable entry, which made them >> appear to be regular abstract interface methods, and so they ended up >> with initial vtable entries that were set to throw AbstractMethodError >> on invocation (normally those vtable entries would be replaced by the >> concrete methods in the implementing class) - and that is what was >> observed via JDB. It turns out that depending on whether a class >> method with the same signature existed in a class implementing the >> interface, that you could also get IllegalAccessError (a path that >> actually crashes the debug VM due to an assertion failure in jni.cpp!). >> >> Solution: >> >> Private interface methods do not need, and should not have, an itable >> entry - they are never invoked via invokeinterface. (Thanks Karen) >> >> Private interface methods can always be statically bound - >> Method::can_be_statically_bound() should return true, and their vtable >> entry should be Method::nonvirtual_vtable_index. >> >> Private interface methods are not default methods and >> Method::is_default_method() should return false. There is a >> terminology confusion here that I address further down. >> >> See the bug report for a detailed analysis of all the places where >> changing these Method properties may have had an affect. >> >> Main webrev: >> >> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/ >> >> The main changes are in: >> - klassVtable.cpp >> - method.cpp >> >> there are minor changes to comments and assertions in other files (the >> jni.cpp change was due to the crash I encountered that I referred to >> earlier). The change in linkResolver.cpp fixes an error in the tracing >> code as the bytecode need not be "invokeinterface" and clarifies it is >> an interface method (and adds a missing colon in the message) - there >> is a corresponding tweak to the logging/ItablesTest.java test. >> >> I added new tests for JNI invocations of private, interface methods, >> and also to test JVM TI retransformation of private and default >> interface methods. >> >> --- >> >> Terminology problem: >> >> While working on this issue, and helping Coleen with: >> >> 8163969: Cyclic interface initialization causes JVM crash >> >> it became apparent that there was a terminology error in the VM code >> with respect to default methods. A "default method" is very >> specifically a public interface method, marked with the default >> keyword, which has a method body defined. A static interface method >> also has a body, but is not a default method. A private interface >> method also has a body, but is not a default method. The JVMS refers >> to non-static, non-abstract interface methods - which covers default >> methods and private interface methods. But the code in the VM, >> primilarly in instanceKlass.cpp and classFileParser.cpp used the term >> "default methods" to mean "non-abstract and non-static" - which is >> wrong and potentially very confusing. So a second part of this change >> is to rename "has_default_methods" (and related variables) to >> "has_nonstatic_concrete_methods". This is somewhat of a mouthful, >> though less so than has_nonstatic_nonabstract_methods. Suggestions to >> abbreviate this to has_nsna_methods, or has_nans_methods, were >> rejected during pre-review. >> >> The renaming webrev is here: >> >> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot-rename/ >> >> and is best viewed via the patch file, where the renaming is more >> obvious. In classFileParser.cpp I also simplified the check for static >> interface methods in pre-java8 classfiles. >> >> --- >> >> Testing: >> - JPRT >> - nsk.jdb/jdi/jdwp/jvmti >> - jtreg: com/sun/jdi (including InterfaceMethodsTest) >> runtime/SelectionResolution/ >> - internal: vm.defmeth >> - JCK: subset of lang and vm tests that cover default/static/private >> interface methods >> - new tests >> >> Together these tests cover interface method invocation at the language >> level, via core reflection, via MethodHandles, via JNI, via >> JDI/JDWP/JDB, and via JVM TI. >> >> Thanks, >> David > From david.holmes at oracle.com Tue Oct 4 07:32:35 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Oct 2016 17:32:35 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> Message-ID: <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote: > Dear Carsten, > > Thank you for your correction. And very sorry about my easy mistakes... > I created webrev again. http://cr.openjdk.java.net/~horii/8154736/webrev.01/ > I believe, all of the unsafe usages of new_obj, which has been pointed > in this thread, is fixed with this webrev. I still am uneasy about this. If it is not safe to access the fields of new_obj in the tracing statements but we return new_obj to the caller, then it may not be safe for the caller to access the fields of new_obj! That aside: src/share/vm/gc/parallel/psPromotionManager.inline.hpp 293 if (o->is_forwarded()) { 294 new_obj = o->forwardee(); 295 // fields in new_obj may not be synchronized. 296 if (log_develop_is_enabled(Trace, gc, scavenge) && o->is_forwarded()) { Why the second check of o->is_forwarded() ? 297 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " PTR_FORMAT "}", 298 "forwarding", Why are you passing "forwarding" as an argument for the first %s instead of just expressing it directly? I see this is a copy'n'paste from the existing code - and I'm guessing at one point there was a conditional around that. I think it should be fixed. Thanks, David > Dear all, > > Can I ask a review of this webrev and give thoughts and comments again? > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > Carsten Varming wrote on 10/03/2016 12:55:25: > >> From: Carsten Varming >> To: Hiroshi H Horii/Japan/IBM at IBMJP >> Cc: Thomas Schatzl , David Holmes >> , hotspot-compiler-dev > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> , Michihiro Horie/Japan/ >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > dev at openjdk.java.net>, Tim Ellison >> Date: 10/03/2016 12:56 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> Dear Hiroshi, >> >> It looks like psPromotionManager.cpp:509 contains a logging >> statement that could read data from an oop forwarded by another thread. >> >> I don't see how your new logging >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In >> the two new statements you read data from new_obj, but in both cases >> it is possible that another thread still haven't written the data in >> new_obj (new_obj->klass() reads new_obj->_metadata). >> >> Carsten >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii wrote: >> Hi, Thomas, and David, >> >> Thank you for your comments. >> >> > I think Hiroshi thinks that since the work stealing itself does a CAS >> > with barrier after obtaining "new_obj" in the other thread, it should >> > be safe (for other threads consuming an object on the task queue). >> >> Thank you. What Thomas thankfully explain is that I wanted to >> mention why relaxed CAS is available for copy_to_survivor. >> >> > I also do not think it is safe as is - for example, at least >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the >> > returned new_obj (in another log message :)) regardless of failure. >> > >> > That method also reads the forwardee if forwarded, and then again uses >> > object information in that same log message. A quick look did not show >> > other issues, but don't count this as a review. >> >> Thank you for your comments. >> >> As Carsten suggested, I guess, size may not be necessary for logging >> when CAS is failed (the size will be logged by the other thread that >> successfully operates the CAS). By reducing printing a size of >> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. >> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier >> () updates a card table for new_obj. However, this new_obj will not >> be used fro card tables in the same GC as a root of GC because all >> of entries in card tables were registered as tasks before any calls >> of copy_and_push_safe_barrier. >> >> I created a new webrev that reduces print formats when CAS is >> failed. Could you review this and give comments on it? >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/ >> >> Regards, >> Hiroshi >> ----------------------- >> Hiroshi Horii, Ph.D. >> IBM Research - Tokyo >> >> >> Thomas Schatzl wrote on 09/30/2016 21:02:31: >> >> > From: Thomas Schatzl >> > To: David Holmes , Hiroshi H > Horii/Japan/IBM at IBMJP >> > Cc: hotspot-compiler-dev , >> > Tim Ellison , Michihiro Horie/Japan/ >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> > >> > Date: 09/30/2016 21:04 >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> > copy_to_survivor for ppc64 >> > >> > Hi, >> > >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: >> > > > >> > > > Dear David, and Dan, >> > > > >> > > > Thank you for your comments. >> > > > >> > > > > >> > > > > In >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: >> > > > > 266 the log line reads data from the forwardee even when the CAS >> > > > > fails. I believe those reads will be unsafe without barriers >> > > > > after >> > > > > the copy of the content of the object. >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 >> > > > > 8 >> > > > > same problem as in line 266 >> > > > Can we use o->size() or new_obj_size instead of new_obj->size()? >> > >> > They are not equivalent. Parallel GC and other collectors creatively >> > reuse the "length" field of objArrays to indicate progress in the >> > scanning them during GC. >> > >> > new_obj_size is the result of a call to o->size() (and the compiler may >> > redo computations at any point), so has the same issue. >> > >> > > > > If you feel that the use of new_obj->size() is potentially unsafe >> > > > > then >> > > > > the fact we return new_obj means that any use of new_obj by the >> > > > > caller >> > > > > may also potentially be unsafe. >> > > > In my understanding, while copying objects to a survivor space, if >> > > > a thread creates a new_obj and sets a pointer with CAS, the other >> > > > threads can touch the new_obj after the thread calls >> > > > push_contents(new_obj) (Line: 239). In push_contents, >> > > > OrderAccess::release_store is called before pushing the object as a >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the >> > > > other thread reads the task, all of copy for new_obj is safe. >> > > I'm not familiar with the larger picture of the GC protocols here, >> > > but just looking at this code fragment in isolation if the CAS fails >> > > we read o->forwardee() to set new_obj. That in itself is fine because >> > > we're reading the field that we were testing with the CAS. But we >> > > could then deference new_obj before the thread that won the CAS calls >> > > push_contents; and even if it is after push_contents we have not done >> > > an acquire to pair with the release-store in push_contents. >> > >> > I think Hiroshi thinks that since the work stealing itself does a CAS >> > with barrier after obtaining "new_obj" in the other thread, it should >> > be safe (for other threads consuming an object on the task queue). >> > >> > > So I'm really not seeing how we can use a barrier-less CAS here. >> > >> > I also do not think it is safe as is - for example, at least >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the >> > returned new_obj (in another log message :)) regardless of failure. >> > >> > That method also reads the forwardee if forwarded, and then again uses >> > object information in that same log message. A quick look did not show >> > other issues, but don't count this as a review. >> > >> > Thanks, >> > Thomas >> > From david.holmes at oracle.com Tue Oct 4 08:15:20 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Oct 2016 18:15:20 +1000 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> Message-ID: <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> On 4/10/2016 1:04 AM, Alan Burlison wrote: > On 01/10/2016 01:08, Kim Barrett wrote: > >>> They are never going to change value, but if you'd prefer they >>> weren't used let me know and I'll revert to the integer constants. >> >> I like the use of symbolic constants. It?s the unnecessary >> assumptions about their values that I dislike. >> >> I?d prefer something like >> >> const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1; >> uint_t* av = alloca(av_size); >> getisax(av, av_size); >> >> So av is known to be big enough to access the desired elements. > > The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without > breaking binary compatibility, and we simply aren't ever going to do > that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known, > namely 0 and 1 respectively, there's no point in comparing them. Plus > if/when we get the 3rd capabilities word it will need changing anyway. > > I've therefore changed the code to remove the alloca() call and be just: > > // Extract valid instruction set extensions. > uint_t avs[AV_HW2_IDX + 1]; > uint_t avn = getisax(avs, sizeof(avs)); > > webrev updated accordingly. But it shouldn't be passing sizeof(avs), it should be passing (AV_HW2_IDX + 1) David ----- From HORII at jp.ibm.com Tue Oct 4 10:22:49 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Tue, 4 Oct 2016 19:22:49 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> Message-ID: Dear David, Thank you for your comments. You are correct. In the previous webrev, a caller (in copy_and_push_safe_barrier) may use new_obj's fields unsafely. Very sorry. I changed the log format in copy_and_push_safe_barrier not to use fields of new_obj. Could you review this again? http://cr.openjdk.java.net/~horii/8154736/webrev.02/ The callers of PSPromotionManager::copy_to_survivor_space are here. PSPromotionManager::copy_and_push_safe_barrier PSScavengeFromKlassClosure::do_oop I confirmed any fields of new_obj is not used in the two methods in this webrev. In addition, I reduced passing a constant literal "forwarding" in copy_and_push_safe_barrier and added some guards before logging in PSPromotionManager::copy_to_survivor_space as follows. if (log_develop_is_enabled(Trace, gc, scavenge)) { log_develop_trace(gc, scavenge)(...); } If copy_to_survivor_space should not return new_obj if its fields are unsafe, I would like to change the return type of copy_to_survivor_space to "void" (or allow copy_to_survivor_space to return NULL). Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 10/04/2016 16:32:35: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming > Cc: hotspot-compiler-dev , > "hotspot-gc-dev at openjdk.java.net" , > "hotspot-runtime-dev at openjdk.java.net" dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- > port-dev at openjdk.java.net" , > Thomas Schatzl , Tim Ellison > > Date: 10/04/2016 16:33 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > On 4/10/2016 12:15 AM, Hiroshi H Horii wrote: > > Dear Carsten, > > > > Thank you for your correction. And very sorry about my easy mistakes... > > I created webrev again. http://cr.openjdk.java.net/~horii/8154736/webrev.01/ > > I believe, all of the unsafe usages of new_obj, which has been pointed > > in this thread, is fixed with this webrev. > > I still am uneasy about this. If it is not safe to access the fields of > new_obj in the tracing statements but we return new_obj to the caller, > then it may not be safe for the caller to access the fields of new_obj! > > That aside: > > src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > 293 if (o->is_forwarded()) { > 294 new_obj = o->forwardee(); > 295 // fields in new_obj may not be synchronized. > 296 if (log_develop_is_enabled(Trace, gc, scavenge) && > o->is_forwarded()) { > > Why the second check of o->is_forwarded() ? > > 297 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " > PTR_FORMAT "}", > 298 "forwarding", > > Why are you passing "forwarding" as an argument for the first %s instead > of just expressing it directly? I see this is a copy'n'paste from the > existing code - and I'm guessing at one point there was a conditional > around that. I think it should be fixed. > > Thanks, > David > > > Dear all, > > > > Can I ask a review of this webrev and give thoughts and comments again? > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > > Carsten Varming wrote on 10/03/2016 12:55:25: > > > >> From: Carsten Varming > >> To: Hiroshi H Horii/Japan/IBM at IBMJP > >> Cc: Thomas Schatzl , David Holmes > >> , hotspot-compiler-dev >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > >> , Michihiro Horie/Japan/ > >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" >> dev at openjdk.java.net>, Tim Ellison > >> Date: 10/03/2016 12:56 > >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> copy_to_survivor for ppc64 > >> > >> Dear Hiroshi, > >> > >> It looks like psPromotionManager.cpp:509 contains a logging > >> statement that could read data from an oop forwarded by another thread. > >> > >> I don't see how your new logging > >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In > >> the two new statements you read data from new_obj, but in both cases > >> it is possible that another thread still haven't written the data in > >> new_obj (new_obj->klass() reads new_obj->_metadata). > >> > >> Carsten > >> > >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii wrote: > >> Hi, Thomas, and David, > >> > >> Thank you for your comments. > >> > >> > I think Hiroshi thinks that since the work stealing itself does a CAS > >> > with barrier after obtaining "new_obj" in the other thread, it should > >> > be safe (for other threads consuming an object on the task queue). > >> > >> Thank you. What Thomas thankfully explain is that I wanted to > >> mention why relaxed CAS is available for copy_to_survivor. > >> > >> > I also do not think it is safe as is - for example, at least > >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > >> > returned new_obj (in another log message :)) regardless of failure. > >> > > >> > That method also reads the forwardee if forwarded, and then again uses > >> > object information in that same log message. A quick look did not show > >> > other issues, but don't count this as a review. > >> > >> Thank you for your comments. > >> > >> As Carsten suggested, I guess, size may not be necessary for logging > >> when CAS is failed (the size will be logged by the other thread that > >> successfully operates the CAS). By reducing printing a size of > >> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. > >> > >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier > >> () updates a card table for new_obj. However, this new_obj will not > >> be used fro card tables in the same GC as a root of GC because all > >> of entries in card tables were registered as tasks before any calls > >> of copy_and_push_safe_barrier. > >> > >> I created a new webrev that reduces print formats when CAS is > >> failed. Could you review this and give comments on it? > >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/ > >> > >> Regards, > >> Hiroshi > >> ----------------------- > >> Hiroshi Horii, Ph.D. > >> IBM Research - Tokyo > >> > >> > >> Thomas Schatzl wrote on 09/30/2016 21:02:31: > >> > >> > From: Thomas Schatzl > >> > To: David Holmes , Hiroshi H > > Horii/Japan/IBM at IBMJP > >> > Cc: hotspot-compiler-dev , > >> > Tim Ellison , Michihiro Horie/Japan/ > >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > >> > > >> > Date: 09/30/2016 21:04 > >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> > copy_to_survivor for ppc64 > >> > > >> > Hi, > >> > > >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > >> > > > > >> > > > Dear David, and Dan, > >> > > > > >> > > > Thank you for your comments. > >> > > > > >> > > > > > >> > > > > In > >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > >> > > > > 266 the log line reads data from the forwardee even when the CAS > >> > > > > fails. I believe those reads will be unsafe without barriers > >> > > > > after > >> > > > > the copy of the content of the object. > >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > >> > > > > 8 > >> > > > > same problem as in line 266 > >> > > > Can we use o->size() or new_obj_size instead of new_obj->size()? > >> > > >> > They are not equivalent. Parallel GC and other collectors creatively > >> > reuse the "length" field of objArrays to indicate progress in the > >> > scanning them during GC. > >> > > >> > new_obj_size is the result of a call to o->size() (and the compiler may > >> > redo computations at any point), so has the same issue. > >> > > >> > > > > If you feel that the use of new_obj->size() is potentially unsafe > >> > > > > then > >> > > > > the fact we return new_obj means that any use of new_obj by the > >> > > > > caller > >> > > > > may also potentially be unsafe. > >> > > > In my understanding, while copying objects to a survivor space, if > >> > > > a thread creates a new_obj and sets a pointer with CAS, the other > >> > > > threads can touch the new_obj after the thread calls > >> > > > push_contents(new_obj) (Line: 239). In push_contents, > >> > > > OrderAccess::release_store is called before pushing the object as a > >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the > >> > > > other thread reads the task, all of copy for new_obj is safe. > >> > > I'm not familiar with the larger picture of the GC protocols here, > >> > > but just looking at this code fragment in isolation if the CAS fails > >> > > we read o->forwardee() to set new_obj. That in itself is fine because > >> > > we're reading the field that we were testing with the CAS. But we > >> > > could then deference new_obj before the thread that won the CAS calls > >> > > push_contents; and even if it is after push_contents we have not done > >> > > an acquire to pair with the release-store in push_contents. > >> > > >> > I think Hiroshi thinks that since the work stealing itself does a CAS > >> > with barrier after obtaining "new_obj" in the other thread, it should > >> > be safe (for other threads consuming an object on the task queue). > >> > > >> > > So I'm really not seeing how we can use a barrier-less CAS here. > >> > > >> > I also do not think it is safe as is - for example, at least > >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > >> > returned new_obj (in another log message :)) regardless of failure. > >> > > >> > That method also reads the forwardee if forwarded, and then again uses > >> > object information in that same log message. A quick look did not show > >> > other issues, but don't count this as a review. > >> > > >> > Thanks, > >> > Thomas > >> > > From david.holmes at oracle.com Tue Oct 4 12:16:33 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 4 Oct 2016 22:16:33 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> Message-ID: <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> On 4/10/2016 8:22 PM, Hiroshi H Horii wrote: > Dear David, > > Thank you for your comments. You are correct. In the previous webrev, a > caller (in copy_and_push_safe_barrier) may use new_obj's fields > unsafely. Very sorry. > > I changed the log format in copy_and_push_safe_barrier not to use fields > of new_obj. Could you review this again? > http://cr.openjdk.java.net/~horii/8154736/webrev.02/ src/share/vm/gc/parallel/psPromotionManager.inline.hpp 274 new_obj = NULL; 285 new_obj = NULL; Sorry but you are losing me here. You've gone from simply removing barriers on the cmpxchg to changing the functionality of the methods that use the cmpxchg - instead of return the forwardee() you are now returning NULL! ?? David ----- > The callers of PSPromotionManager::copy_to_survivor_space are here. > PSPromotionManager::copy_and_push_safe_barrier > PSScavengeFromKlassClosure::do_oop > > I confirmed any fields of new_obj is not used in the two methods in this > webrev. > > In addition, I reduced passing a constant literal "forwarding" in > copy_and_push_safe_barrier and added some guards before logging in > PSPromotionManager::copy_to_survivor_space as follows. > > if (log_develop_is_enabled(Trace, gc, scavenge)) { > log_develop_trace(gc, scavenge)(...); > } > > If copy_to_survivor_space should not return new_obj if its fields are > unsafe, I would like to change the return type of copy_to_survivor_space > to "void" (or allow copy_to_survivor_space to return NULL). > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 10/04/2016 16:32:35: > >> From: David Holmes >> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming >> Cc: hotspot-compiler-dev , >> "hotspot-gc-dev at openjdk.java.net" , >> "hotspot-runtime-dev at openjdk.java.net" > dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- >> port-dev at openjdk.java.net" , >> Thomas Schatzl , Tim Ellison >> >> Date: 10/04/2016 16:33 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote: >> > Dear Carsten, >> > >> > Thank you for your correction. And very sorry about my easy mistakes... >> > I created webrev again. > http://cr.openjdk.java.net/~horii/8154736/webrev.01/ >> > I believe, all of the unsafe usages of new_obj, which has been pointed >> > in this thread, is fixed with this webrev. >> >> I still am uneasy about this. If it is not safe to access the fields of >> new_obj in the tracing statements but we return new_obj to the caller, >> then it may not be safe for the caller to access the fields of new_obj! >> >> That aside: >> >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> >> 293 if (o->is_forwarded()) { >> 294 new_obj = o->forwardee(); >> 295 // fields in new_obj may not be synchronized. >> 296 if (log_develop_is_enabled(Trace, gc, scavenge) && >> o->is_forwarded()) { >> >> Why the second check of o->is_forwarded() ? >> >> 297 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " >> PTR_FORMAT "}", >> 298 "forwarding", >> >> Why are you passing "forwarding" as an argument for the first %s instead >> of just expressing it directly? I see this is a copy'n'paste from the >> existing code - and I'm guessing at one point there was a conditional >> around that. I think it should be fixed. >> >> Thanks, >> David >> >> > Dear all, >> > >> > Can I ask a review of this webrev and give thoughts and comments again? >> > >> > Regards, >> > Hiroshi >> > ----------------------- >> > Hiroshi Horii, Ph.D. >> > IBM Research - Tokyo >> > >> > >> > Carsten Varming wrote on 10/03/2016 12:55:25: >> > >> >> From: Carsten Varming >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP >> >> Cc: Thomas Schatzl , David Holmes >> >> , hotspot-compiler-dev > >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> >> , Michihiro Horie/Japan/ >> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > >> dev at openjdk.java.net>, Tim Ellison >> >> Date: 10/03/2016 12:56 >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> copy_to_survivor for ppc64 >> >> >> >> Dear Hiroshi, >> >> >> >> It looks like psPromotionManager.cpp:509 contains a logging >> >> statement that could read data from an oop forwarded by another thread. >> >> >> >> I don't see how your new logging >> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In >> >> the two new statements you read data from new_obj, but in both cases >> >> it is possible that another thread still haven't written the data in >> >> new_obj (new_obj->klass() reads new_obj->_metadata). >> >> >> >> Carsten >> >> >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii > wrote: >> >> Hi, Thomas, and David, >> >> >> >> Thank you for your comments. >> >> >> >> > I think Hiroshi thinks that since the work stealing itself does a CAS >> >> > with barrier after obtaining "new_obj" in the other thread, it should >> >> > be safe (for other threads consuming an object on the task queue). >> >> >> >> Thank you. What Thomas thankfully explain is that I wanted to >> >> mention why relaxed CAS is available for copy_to_survivor. >> >> >> >> > I also do not think it is safe as is - for example, at least >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the >> >> > returned new_obj (in another log message :)) regardless of failure. >> >> > >> >> > That method also reads the forwardee if forwarded, and then again > uses >> >> > object information in that same log message. A quick look did not > show >> >> > other issues, but don't count this as a review. >> >> >> >> Thank you for your comments. >> >> >> >> As Carsten suggested, I guess, size may not be necessary for logging >> >> when CAS is failed (the size will be logged by the other thread that >> >> successfully operates the CAS). By reducing printing a size of >> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. >> >> >> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier >> >> () updates a card table for new_obj. However, this new_obj will not >> >> be used fro card tables in the same GC as a root of GC because all >> >> of entries in card tables were registered as tasks before any calls >> >> of copy_and_push_safe_barrier. >> >> >> >> I created a new webrev that reduces print formats when CAS is >> >> failed. Could you review this and give comments on it? >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/ >> >> >> >> Regards, >> >> Hiroshi >> >> ----------------------- >> >> Hiroshi Horii, Ph.D. >> >> IBM Research - Tokyo >> >> >> >> >> >> Thomas Schatzl wrote on 09/30/2016 > 21:02:31: >> >> >> >> > From: Thomas Schatzl >> >> > To: David Holmes , Hiroshi H >> > Horii/Japan/IBM at IBMJP >> >> > Cc: hotspot-compiler-dev , >> >> > Tim Ellison , Michihiro Horie/Japan/ >> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> >> > >> >> > Date: 09/30/2016 21:04 >> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> > copy_to_survivor for ppc64 >> >> > >> >> > Hi, >> >> > >> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: >> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: >> >> > > > >> >> > > > Dear David, and Dan, >> >> > > > >> >> > > > Thank you for your comments. >> >> > > > >> >> > > > > >> >> > > > > In >> >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: >> >> > > > > 266 the log line reads data from the forwardee even when > the CAS >> >> > > > > fails. I believe those reads will be unsafe without barriers >> >> > > > > after >> >> > > > > the copy of the content of the object. >> >> > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 >> >> > > > > 8 >> >> > > > > same problem as in line 266 >> >> > > > Can we use o->size() or new_obj_size instead of new_obj->size()? >> >> > >> >> > They are not equivalent. Parallel GC and other collectors creatively >> >> > reuse the "length" field of objArrays to indicate progress in the >> >> > scanning them during GC. >> >> > >> >> > new_obj_size is the result of a call to o->size() (and the > compiler may >> >> > redo computations at any point), so has the same issue. >> >> > >> >> > > > > If you feel that the use of new_obj->size() is potentially > unsafe >> >> > > > > then >> >> > > > > the fact we return new_obj means that any use of new_obj by the >> >> > > > > caller >> >> > > > > may also potentially be unsafe. >> >> > > > In my understanding, while copying objects to a survivor > space, if >> >> > > > a thread creates a new_obj and sets a pointer with CAS, the other >> >> > > > threads can touch the new_obj after the thread calls >> >> > > > push_contents(new_obj) (Line: 239). In push_contents, >> >> > > > OrderAccess::release_store is called before pushing the > object as a >> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the >> >> > > > other thread reads the task, all of copy for new_obj is safe. >> >> > > I'm not familiar with the larger picture of the GC protocols here, >> >> > > but just looking at this code fragment in isolation if the CAS > fails >> >> > > we read o->forwardee() to set new_obj. That in itself is fine > because >> >> > > we're reading the field that we were testing with the CAS. But we >> >> > > could then deference new_obj before the thread that won the CAS > calls >> >> > > push_contents; and even if it is after push_contents we have > not done >> >> > > an acquire to pair with the release-store in push_contents. >> >> > >> >> > I think Hiroshi thinks that since the work stealing itself does a CAS >> >> > with barrier after obtaining "new_obj" in the other thread, it should >> >> > be safe (for other threads consuming an object on the task queue). >> >> > >> >> > > So I'm really not seeing how we can use a barrier-less CAS here. >> >> > >> >> > I also do not think it is safe as is - for example, at least >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the >> >> > returned new_obj (in another log message :)) regardless of failure. >> >> > >> >> > That method also reads the forwardee if forwarded, and then again > uses >> >> > object information in that same log message. A quick look did not > show >> >> > other issues, but don't count this as a review. >> >> > >> >> > Thanks, >> >> > Thomas >> >> > >> > From martin.doerr at sap.com Tue Oct 4 13:15:46 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 4 Oct 2016 13:15:46 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE References: Message-ID: Hi Coleen, thank you very much. Thomas is currently out. The idea to use offsetof wouldn't work because it can only be computed after the layout of the class is computed. (_name[MONITOR_NAME_LEN] is a field of the Monitor class in the original implementation. The MONITOR_NAME_LEN is already needed to compute the class layout.) Just for information: The sizes appear to be: debug build on linux PPC64le: sizeof(MonitorBase):96, sizeof(Monitor):160, CACHE_LINE_PADDING:32 (no padding used because _name would get less than 64 characters) product build on linux PPC64le: sizeof(MonitorBase):56, sizeof(Monitor):128, CACHE_LINE_PADDING:72 (the length of _name gets extended from 64 to 72) Hence, the change is also relevant for platforms with DEFAULT_CACHE_LINE_SIZE=128 (like PPC64). A large amount of padding only gets inserted on s390 where we have DEFAULT_CACHE_LINE_SIZE=256. Maybe someone else wants to review the change. (Thomas is not an official reviewer.) Thanks and best regards, Martin -----Original Message----- From: Coleen Phillimore [mailto:coleen.phillimore at oracle.com] Sent: Montag, 3. Oktober 2016 23:33 To: Doerr, Martin Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Hi Martin, This change was approved for JDK9 so I can sponsor it for you anytime. I don't know if Thomas Steufe was a reviewer or not. I think I prefer the way you did it to his suggestion, because I like subclasses. I think you still need another reviewer though, then commit the change and send me the export file (so it has your comments, etc in it). thanks, Coleen On 9/30/16 11:48 AM, Doerr, Martin wrote: > Hi, > > the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. > > My proposed change is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > Please review. If will also need a sponsor. > > Thanks and best regards, > Martin > From Alan.Burlison at oracle.com Tue Oct 4 14:14:08 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Tue, 4 Oct 2016 15:14:08 +0100 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> Message-ID: <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> On 04/10/2016 09:15, David Holmes wrote: > But it shouldn't be passing sizeof(avs), it should be passing > (AV_HW2_IDX + 1) You are right, it expects the number of elements rather than the more usual convention of passing buffer length in bytes. Sigh. I've replaced it with: uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0])); as that will auto-adapt if the declaration of avs is ever changed. -- Alan Burlison -- From kim.barrett at oracle.com Tue Oct 4 16:18:09 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Tue, 4 Oct 2016 12:18:09 -0400 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> Message-ID: > On Oct 4, 2016, at 10:14 AM, Alan Burlison wrote: > > On 04/10/2016 09:15, David Holmes wrote: > >> But it shouldn't be passing sizeof(avs), it should be passing >> (AV_HW2_IDX + 1) > > You are right, it expects the number of elements rather than the more usual convention of passing buffer length in bytes. Sigh. Yikes! Sorry I missed that. > I've replaced it with: > > uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0])); > > as that will auto-adapt if the declaration of avs is ever changed. We have a macro for that - ARRAY_SIZE(avs) It?s in globalDefinitions.hpp, on the off chance that?s somehow not already being included. From Alan.Burlison at oracle.com Tue Oct 4 18:37:11 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Tue, 4 Oct 2016 19:37:11 +0100 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> Message-ID: On 04/10/16 17:18, Kim Barrett wrote: >> You are right, it expects the number of elements rather than the more usual convention of passing buffer length in bytes. Sigh. > > Yikes! Sorry I missed that. Dunno what you are apologizing for, it was my bug ;-) >> I've replaced it with: >> >> uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0])); >> >> as that will auto-adapt if the declaration of avs is ever changed. > > We have a macro for that - ARRAY_SIZE(avs) > > It?s in globalDefinitions.hpp, on the off chance that?s somehow not already being included. Cool, I'll pop that in instead - thanks! -- Alan Burlison -- From HORII at jp.ibm.com Wed Oct 5 00:36:37 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Wed, 5 Oct 2016 09:36:37 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Dear David, Thank you for your comments. I just used to think that it may be better that copy_to_survivor_space doesn't return forwardee if CAS was failed in order to prevent from reading fields in forwardee. But as you pointed, this extends fix for this topic. I removed two NULL assignments from the previous wevrev. http://cr.openjdk.java.net/~horii/8154736/webrev.03/ Thank you for reviewing multiple times... Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo David Holmes wrote on 10/04/2016 21:16:33: > From: David Holmes > To: Hiroshi H Horii/Japan/IBM at IBMJP > Cc: hotspot-compiler-dev , > "hotspot-gc-dev at openjdk.java.net" , > "hotspot-runtime-dev at openjdk.java.net" dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- > port-dev at openjdk.java.net" , > Thomas Schatzl , Tim Ellison > , Carsten Varming > Date: 10/04/2016 21:17 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > On 4/10/2016 8:22 PM, Hiroshi H Horii wrote: > > Dear David, > > > > Thank you for your comments. You are correct. In the previous webrev, a > > caller (in copy_and_push_safe_barrier) may use new_obj's fields > > unsafely. Very sorry. > > > > I changed the log format in copy_and_push_safe_barrier not to use fields > > of new_obj. Could you review this again? > > http://cr.openjdk.java.net/~horii/8154736/webrev.02/ > > src/share/vm/gc/parallel/psPromotionManager.inline.hpp > > 274 new_obj = NULL; > 285 new_obj = NULL; > > Sorry but you are losing me here. You've gone from simply removing > barriers on the cmpxchg to changing the functionality of the methods > that use the cmpxchg - instead of return the forwardee() you are now > returning NULL! ?? > > David > ----- > > > The callers of PSPromotionManager::copy_to_survivor_space are here. > > PSPromotionManager::copy_and_push_safe_barrier > > PSScavengeFromKlassClosure::do_oop > > > > I confirmed any fields of new_obj is not used in the two methods in this > > webrev. > > > > In addition, I reduced passing a constant literal "forwarding" in > > copy_and_push_safe_barrier and added some guards before logging in > > PSPromotionManager::copy_to_survivor_space as follows. > > > > if (log_develop_is_enabled(Trace, gc, scavenge)) { > > log_develop_trace(gc, scavenge)(...); > > } > > > > If copy_to_survivor_space should not return new_obj if its fields are > > unsafe, I would like to change the return type of copy_to_survivor_space > > to "void" (or allow copy_to_survivor_space to return NULL). > > > > Regards, > > Hiroshi > > ----------------------- > > Hiroshi Horii, Ph.D. > > IBM Research - Tokyo > > > > > > David Holmes wrote on 10/04/2016 16:32:35: > > > >> From: David Holmes > >> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming > >> Cc: hotspot-compiler-dev , > >> "hotspot-gc-dev at openjdk.java.net" , > >> "hotspot-runtime-dev at openjdk.java.net" >> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- > >> port-dev at openjdk.java.net" , > >> Thomas Schatzl , Tim Ellison > >> > >> Date: 10/04/2016 16:33 > >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> copy_to_survivor for ppc64 > >> > >> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote: > >> > Dear Carsten, > >> > > >> > Thank you for your correction. And very sorry about my easy mistakes... > >> > I created webrev again. > > http://cr.openjdk.java.net/~horii/8154736/webrev.01/ > >> > I believe, all of the unsafe usages of new_obj, which has been pointed > >> > in this thread, is fixed with this webrev. > >> > >> I still am uneasy about this. If it is not safe to access the fields of > >> new_obj in the tracing statements but we return new_obj to the caller, > >> then it may not be safe for the caller to access the fields of new_obj! > >> > >> That aside: > >> > >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp > >> > >> 293 if (o->is_forwarded()) { > >> 294 new_obj = o->forwardee(); > >> 295 // fields in new_obj may not be synchronized. > >> 296 if (log_develop_is_enabled(Trace, gc, scavenge) && > >> o->is_forwarded()) { > >> > >> Why the second check of o->is_forwarded() ? > >> > >> 297 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " > >> PTR_FORMAT "}", > >> 298 "forwarding", > >> > >> Why are you passing "forwarding" as an argument for the first %s instead > >> of just expressing it directly? I see this is a copy'n'paste from the > >> existing code - and I'm guessing at one point there was a conditional > >> around that. I think it should be fixed. > >> > >> Thanks, > >> David > >> > >> > Dear all, > >> > > >> > Can I ask a review of this webrev and give thoughts and comments again? > >> > > >> > Regards, > >> > Hiroshi > >> > ----------------------- > >> > Hiroshi Horii, Ph.D. > >> > IBM Research - Tokyo > >> > > >> > > >> > Carsten Varming wrote on 10/03/2016 12:55:25: > >> > > >> >> From: Carsten Varming > >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP > >> >> Cc: Thomas Schatzl , David Holmes > >> >> , hotspot-compiler-dev >> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" >> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > >> >> , Michihiro Horie/Japan/ > >> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" >> >> dev at openjdk.java.net>, Tim Ellison > >> >> Date: 10/03/2016 12:56 > >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> >> copy_to_survivor for ppc64 > >> >> > >> >> Dear Hiroshi, > >> >> > >> >> It looks like psPromotionManager.cpp:509 contains a logging > >> >> statement that could read data from an oop forwarded by another thread. > >> >> > >> >> I don't see how your new logging > >> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In > >> >> the two new statements you read data from new_obj, but in both cases > >> >> it is possible that another thread still haven't written the data in > >> >> new_obj (new_obj->klass() reads new_obj->_metadata). > >> >> > >> >> Carsten > >> >> > >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii > > wrote: > >> >> Hi, Thomas, and David, > >> >> > >> >> Thank you for your comments. > >> >> > >> >> > I think Hiroshi thinks that since the work stealing itself does a CAS > >> >> > with barrier after obtaining "new_obj" in the other thread, it should > >> >> > be safe (for other threads consuming an object on the task queue). > >> >> > >> >> Thank you. What Thomas thankfully explain is that I wanted to > >> >> mention why relaxed CAS is available for copy_to_survivor. > >> >> > >> >> > I also do not think it is safe as is - for example, at least > >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > >> >> > returned new_obj (in another log message :)) regardless of failure. > >> >> > > >> >> > That method also reads the forwardee if forwarded, and then again > > uses > >> >> > object information in that same log message. A quick look did not > > show > >> >> > other issues, but don't count this as a review. > >> >> > >> >> Thank you for your comments. > >> >> > >> >> As Carsten suggested, I guess, size may not be necessary for logging > >> >> when CAS is failed (the size will be logged by the other thread that > >> >> successfully operates the CAS). By reducing printing a size of > >> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe. > >> >> > >> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier > >> >> () updates a card table for new_obj. However, this new_obj will not > >> >> be used fro card tables in the same GC as a root of GC because all > >> >> of entries in card tables were registered as tasks before any calls > >> >> of copy_and_push_safe_barrier. > >> >> > >> >> I created a new webrev that reduces print formats when CAS is > >> >> failed. Could you review this and give comments on it? > >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/ > >> >> > >> >> Regards, > >> >> Hiroshi > >> >> ----------------------- > >> >> Hiroshi Horii, Ph.D. > >> >> IBM Research - Tokyo > >> >> > >> >> > >> >> Thomas Schatzl wrote on 09/30/2016 > > 21:02:31: > >> >> > >> >> > From: Thomas Schatzl > >> >> > To: David Holmes , Hiroshi H > >> > Horii/Japan/IBM at IBMJP > >> >> > Cc: hotspot-compiler-dev , > >> >> > Tim Ellison , Michihiro Horie/Japan/ > >> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" >> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" >> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" > >> >> > > >> >> > Date: 09/30/2016 21:04 > >> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > >> >> > copy_to_survivor for ppc64 > >> >> > > >> >> > Hi, > >> >> > > >> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: > >> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: > >> >> > > > > >> >> > > > Dear David, and Dan, > >> >> > > > > >> >> > > > Thank you for your comments. > >> >> > > > > >> >> > > > > > >> >> > > > > In > >> >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: > >> >> > > > > 266 the log line reads data from the forwardee even when > > the CAS > >> >> > > > > fails. I believe those reads will be unsafe without barriers > >> >> > > > > after > >> >> > > > > the copy of the content of the object. > >> >> > > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 > >> >> > > > > 8 > >> >> > > > > same problem as in line 266 > >> >> > > > Can we use o->size() or new_obj_size instead of new_obj->size()? > >> >> > > >> >> > They are not equivalent. Parallel GC and other collectors creatively > >> >> > reuse the "length" field of objArrays to indicate progress in the > >> >> > scanning them during GC. > >> >> > > >> >> > new_obj_size is the result of a call to o->size() (and the > > compiler may > >> >> > redo computations at any point), so has the same issue. > >> >> > > >> >> > > > > If you feel that the use of new_obj->size() is potentially > > unsafe > >> >> > > > > then > >> >> > > > > the fact we return new_obj means that any use of new_obj by the > >> >> > > > > caller > >> >> > > > > may also potentially be unsafe. > >> >> > > > In my understanding, while copying objects to a survivor > > space, if > >> >> > > > a thread creates a new_obj and sets a pointer with CAS, the other > >> >> > > > threads can touch the new_obj after the thread calls > >> >> > > > push_contents(new_obj) (Line: 239). In push_contents, > >> >> > > > OrderAccess::release_store is called before pushing the > > object as a > >> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the > >> >> > > > other thread reads the task, all of copy for new_obj is safe. > >> >> > > I'm not familiar with the larger picture of the GC protocols here, > >> >> > > but just looking at this code fragment in isolation if the CAS > > fails > >> >> > > we read o->forwardee() to set new_obj. That in itself is fine > > because > >> >> > > we're reading the field that we were testing with the CAS. But we > >> >> > > could then deference new_obj before the thread that won the CAS > > calls > >> >> > > push_contents; and even if it is after push_contents we have > > not done > >> >> > > an acquire to pair with the release-store in push_contents. > >> >> > > >> >> > I think Hiroshi thinks that since the work stealing itself does a CAS > >> >> > with barrier after obtaining "new_obj" in the other thread, it should > >> >> > be safe (for other threads consuming an object on the task queue). > >> >> > > >> >> > > So I'm really not seeing how we can use a barrier-less CAS here. > >> >> > > >> >> > I also do not think it is safe as is - for example, at least > >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the > >> >> > returned new_obj (in another log message :)) regardless of failure. > >> >> > > >> >> > That method also reads the forwardee if forwarded, and then again > > uses > >> >> > object information in that same log message. A quick look did not > > show > >> >> > other issues, but don't count this as a review. > >> >> > > >> >> > Thanks, > >> >> > Thomas > >> >> > > >> > > > From robbin.ehn at oracle.com Wed Oct 5 08:09:39 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 5 Oct 2016 10:09:39 +0200 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 Message-ID: Hi all, please review! We want to try the below work-around for this closed bug. The closed bug concerns same failure as in: https://bugs.openjdk.java.net/browse/JDK-8079441 Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388) EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic. In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER. Thanks! /Robbin diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 @@ -786,3 +790,3 @@ RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info ); - } __except(EXCEPTION_CONTINUE_EXECUTION) {} + } __except(EXCEPTION_EXECUTE_HANDLER) {} } From david.holmes at oracle.com Wed Oct 5 08:16:51 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 5 Oct 2016 18:16:51 +1000 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: References: Message-ID: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> Hi Robbin, This seems fine to me as it is the MSDN way of using this mechanism. https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx Thanks, David On 5/10/2016 6:09 PM, Robbin Ehn wrote: > Hi all, please review! > > We want to try the below work-around for this closed bug. > The closed bug concerns same failure as in: > > https://bugs.openjdk.java.net/browse/JDK-8079441 > Intermittent failures on Windows with "Unexpected exit from test [exit > code: 1080890248]" (0x406d1388) > > EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can > be problematic. > In this case we do not see any direct issue but still want to change it > to EXCEPTION_EXECUTE_HANDLER. > > Thanks! > > /Robbin > > diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp > --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 > +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 > @@ -786,3 +790,3 @@ > RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), > (const ULONG_PTR*)&info ); > - } __except(EXCEPTION_CONTINUE_EXECUTION) {} > + } __except(EXCEPTION_EXECUTE_HANDLER) {} > } From robbin.ehn at oracle.com Wed Oct 5 11:04:21 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 5 Oct 2016 13:04:21 +0200 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> References: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> Message-ID: Thanks David! /Robbin On 10/05/2016 10:16 AM, David Holmes wrote: > Hi Robbin, > > This seems fine to me as it is the MSDN way of using this mechanism. > > https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx > > Thanks, > David > > On 5/10/2016 6:09 PM, Robbin Ehn wrote: >> Hi all, please review! >> >> We want to try the below work-around for this closed bug. >> The closed bug concerns same failure as in: >> >> https://bugs.openjdk.java.net/browse/JDK-8079441 >> Intermittent failures on Windows with "Unexpected exit from test [exit >> code: 1080890248]" (0x406d1388) >> >> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can >> be problematic. >> In this case we do not see any direct issue but still want to change it >> to EXCEPTION_EXECUTE_HANDLER. >> >> Thanks! >> >> /Robbin >> >> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp >> --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 >> +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 >> @@ -786,3 +790,3 @@ >> RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), >> (const ULONG_PTR*)&info ); >> - } __except(EXCEPTION_CONTINUE_EXECUTION) {} >> + } __except(EXCEPTION_EXECUTE_HANDLER) {} >> } From staffan.larsen at oracle.com Wed Oct 5 11:07:57 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 5 Oct 2016 13:07:57 +0200 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: References: Message-ID: Looks good! Thanks, /Staffan > On 5 Oct 2016, at 10:09, Robbin Ehn wrote: > > Hi all, please review! > > We want to try the below work-around for this closed bug. > The closed bug concerns same failure as in: > > https://bugs.openjdk.java.net/browse/JDK-8079441 > Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388) > > EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic. > In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER. > > Thanks! > > /Robbin > > diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp > --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 > +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 > @@ -786,3 +790,3 @@ > RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info ); > - } __except(EXCEPTION_CONTINUE_EXECUTION) {} > + } __except(EXCEPTION_EXECUTE_HANDLER) {} > } From robbin.ehn at oracle.com Wed Oct 5 11:19:40 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 5 Oct 2016 13:19:40 +0200 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: References: Message-ID: <574bf750-8275-656e-7a15-fdebc93e7433@oracle.com> Thanks Staffan! /Robbin On 10/05/2016 01:07 PM, Staffan Larsen wrote: > Looks good! > > Thanks, > /Staffan > >> On 5 Oct 2016, at 10:09, Robbin Ehn wrote: >> >> Hi all, please review! >> >> We want to try the below work-around for this closed bug. >> The closed bug concerns same failure as in: >> >> https://bugs.openjdk.java.net/browse/JDK-8079441 >> Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388) >> >> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic. >> In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER. >> >> Thanks! >> >> /Robbin >> >> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp >> --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 >> +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 >> @@ -786,3 +790,3 @@ >> RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info ); >> - } __except(EXCEPTION_CONTINUE_EXECUTION) {} >> + } __except(EXCEPTION_EXECUTE_HANDLER) {} >> } > From marcus.larsson at oracle.com Wed Oct 5 13:26:04 2016 From: marcus.larsson at oracle.com (Marcus Larsson) Date: Wed, 5 Oct 2016 15:26:04 +0200 Subject: RFR: 8166117: Add UTC timestamp decorator for UL Message-ID: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> Hi, Please review the following patch to add a UTC timestamp decorator for UL. os::iso8601_time() has been modified to allow timestamps based on UTC. os::gmtime_pd() has been added to replace os::localtime_pd() when UTC is requested. Patch also includes a unit test for the new decoration. Webrev: http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/ Issue: https://bugs.openjdk.java.net/browse/JDK-8166117 Testing: New unit test through JPRT. Thanks, Marcus From gerard.ziemski at oracle.com Wed Oct 5 14:37:26 2016 From: gerard.ziemski at oracle.com (Gerard Ziemski) Date: Wed, 5 Oct 2016 09:37:26 -0500 Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with ExitCode 0 In-Reply-To: References: Message-ID: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> Ping. Can I have this simple fix reviewed please? > On Sep 29, 2016, at 11:08 AM, Gerard Ziemski wrote: > > hi all, > > Please review this straightforward fix for a regression caused by JDK-8138760 > > For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?. > > In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix. > > The new debug info is refactored into its own method ?printPerformanceInfoDetails? > > We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?. > > bug: https://bugs.openjdk.java.net/browse/JDK-8166145 > webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1 > > Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all > From george.triantafillou at oracle.com Wed Oct 5 14:58:33 2016 From: george.triantafillou at oracle.com (George Triantafillou) Date: Wed, 5 Oct 2016 10:58:33 -0400 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> References: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> Message-ID: +1 -George On 10/5/2016 4:16 AM, David Holmes wrote: > Hi Robbin, > > This seems fine to me as it is the MSDN way of using this mechanism. > > https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx > > Thanks, > David > > On 5/10/2016 6:09 PM, Robbin Ehn wrote: >> Hi all, please review! >> >> We want to try the below work-around for this closed bug. >> The closed bug concerns same failure as in: >> >> https://bugs.openjdk.java.net/browse/JDK-8079441 >> Intermittent failures on Windows with "Unexpected exit from test [exit >> code: 1080890248]" (0x406d1388) >> >> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can >> be problematic. >> In this case we do not see any direct issue but still want to change it >> to EXCEPTION_EXECUTE_HANDLER. >> >> Thanks! >> >> /Robbin >> >> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp >> --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 -0400 >> +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 +0100 >> @@ -786,3 +790,3 @@ >> RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), >> (const ULONG_PTR*)&info ); >> - } __except(EXCEPTION_CONTINUE_EXECUTION) {} >> + } __except(EXCEPTION_EXECUTE_HANDLER) {} >> } From coleen.phillimore at oracle.com Wed Oct 5 19:12:47 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 5 Oct 2016 15:12:47 -0400 Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with ExitCode 0 In-Reply-To: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> References: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> Message-ID: <2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com> http://cr.openjdk.java.net/~gziemski/8166145_rev1/src/share/vm/classfile/dictionary.cpp.udiff.html I was going to suggest that you change 732 void Dictionary::print(bool details) { to pass outputStream so it can be converted to logging, but that's a bigger change than we should do right now. Can you file an RFE for 10 to convert the hashtable printing to UL? This change looks good. Coleen On 10/5/16 10:37 AM, Gerard Ziemski wrote: > Ping. Can I have this simple fix reviewed please? > > >> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski wrote: >> >> hi all, >> >> Please review this straightforward fix for a regression caused by JDK-8138760 >> >> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?. >> >> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix. >> >> The new debug info is refactored into its own method ?printPerformanceInfoDetails? >> >> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8166145 >> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1 >> >> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all >> From robbin.ehn at oracle.com Wed Oct 5 20:28:51 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 5 Oct 2016 22:28:51 +0200 Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388 In-Reply-To: References: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com> Message-ID: <93e77f72-a35d-e30b-92f3-f6d60672aef2@oracle.com> Thanks George! /Robbin On 10/05/2016 04:58 PM, George Triantafillou wrote: > +1 > > -George > > On 10/5/2016 4:16 AM, David Holmes wrote: >> Hi Robbin, >> >> This seems fine to me as it is the MSDN way of using this mechanism. >> >> https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx >> >> Thanks, >> David >> >> On 5/10/2016 6:09 PM, Robbin Ehn wrote: >>> Hi all, please review! >>> >>> We want to try the below work-around for this closed bug. >>> The closed bug concerns same failure as in: >>> >>> https://bugs.openjdk.java.net/browse/JDK-8079441 >>> Intermittent failures on Windows with "Unexpected exit from test [exit >>> code: 1080890248]" (0x406d1388) >>> >>> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can >>> be problematic. >>> In this case we do not see any direct issue but still want to change it >>> to EXCEPTION_EXECUTE_HANDLER. >>> >>> Thanks! >>> >>> /Robbin >>> >>> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp >>> --- a/src/os/windows/vm/os_windows.cpp Mon Oct 03 21:48:21 2016 >>> -0400 >>> +++ b/src/os/windows/vm/os_windows.cpp Wed Oct 05 06:24:02 2016 >>> +0100 >>> @@ -786,3 +790,3 @@ >>> RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), >>> (const ULONG_PTR*)&info ); >>> - } __except(EXCEPTION_CONTINUE_EXECUTION) {} >>> + } __except(EXCEPTION_EXECUTE_HANDLER) {} >>> } > From robbin.ehn at oracle.com Wed Oct 5 20:34:41 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Wed, 5 Oct 2016 22:34:41 +0200 Subject: RFR: 8166117: Add UTC timestamp decorator for UL In-Reply-To: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> Message-ID: <50847eea-27db-136a-8192-4c524a1e894a@oracle.com> Hi Marcus, looks good! /Robbin On 10/05/2016 03:26 PM, Marcus Larsson wrote: > Hi, > > Please review the following patch to add a UTC timestamp decorator for > UL. > > os::iso8601_time() has been modified to allow timestamps based on UTC. > os::gmtime_pd() has been added to replace os::localtime_pd() when UTC > is requested. Patch also includes a unit test for the new decoration. > > Webrev: > http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/ > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8166117 > > Testing: > New unit test through JPRT. > > Thanks, > Marcus From gerard.ziemski at oracle.com Wed Oct 5 20:45:52 2016 From: gerard.ziemski at oracle.com (Gerard Ziemski) Date: Wed, 5 Oct 2016 15:45:52 -0500 Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with ExitCode 0 In-Reply-To: <2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com> References: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> <2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com> Message-ID: Thank you for the review! > On Oct 5, 2016, at 2:12 PM, Coleen Phillimore wrote: > > http://cr.openjdk.java.net/~gziemski/8166145_rev1/src/share/vm/classfile/dictionary.cpp.udiff.html > > I was going to suggest that you change > > 732 void Dictionary::print(bool details) { > > > to pass outputStream so it can be converted to logging, but that's a bigger change than we should do right now. Can you file an RFE for 10 to convert the hashtable printing to UL? Done, please see JDK-8167232 cheers > > This change looks good. > > Coleen > > > On 10/5/16 10:37 AM, Gerard Ziemski wrote: >> Ping. Can I have this simple fix reviewed please? >> >> >>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski wrote: >>> >>> hi all, >>> >>> Please review this straightforward fix for a regression caused by JDK-8138760 >>> >>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?. >>> >>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix. >>> >>> The new debug info is refactored into its own method ?printPerformanceInfoDetails? >>> >>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8166145 >>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1 >>> >>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all >>> > From rachel.protacio at oracle.com Wed Oct 5 21:45:48 2016 From: rachel.protacio at oracle.com (Rachel Protacio) Date: Wed, 5 Oct 2016 17:45:48 -0400 Subject: RFR: 8166117: Add UTC timestamp decorator for UL In-Reply-To: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> Message-ID: <362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com> Looks good to me too! Rachel On 10/5/2016 9:26 AM, Marcus Larsson wrote: > Hi, > > Please review the following patch to add a UTC timestamp decorator for > UL. > > os::iso8601_time() has been modified to allow timestamps based on UTC. > os::gmtime_pd() has been added to replace os::localtime_pd() when UTC > is requested. Patch also includes a unit test for the new decoration. > > Webrev: > http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/ > > Issue: > https://bugs.openjdk.java.net/browse/JDK-8166117 > > Testing: > New unit test through JPRT. > > Thanks, > Marcus From david.holmes at oracle.com Wed Oct 5 23:52:58 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 6 Oct 2016 09:52:58 +1000 Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with ExitCode 0 In-Reply-To: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> References: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> Message-ID: Sorry for the delay - takes a while to catch up after a long weekend :) On 6/10/2016 12:37 AM, Gerard Ziemski wrote: > Ping. Can I have this simple fix reviewed please? The changes seem fine to me too. Thanks, David > >> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski wrote: >> >> hi all, >> >> Please review this straightforward fix for a regression caused by JDK-8138760 >> >> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?. >> >> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix. >> >> The new debug info is refactored into its own method ?printPerformanceInfoDetails? >> >> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8166145 >> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1 >> >> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all >> > From david.holmes at oracle.com Thu Oct 6 01:36:15 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 6 Oct 2016 11:36:15 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: > Dear David, > > Thank you for your comments. > > I just used to think that it may be better that copy_to_survivor_space > doesn't return forwardee if CAS was failed in order to prevent from > reading fields in forwardee. But as you pointed, this extends fix for > this topic. > > I removed two NULL assignments from the previous wevrev. > http://cr.openjdk.java.net/~horii/8154736/webrev.03/ Which simply takes us back to where we were. It may not be safe for the caller of those methods to access the fields of the returned "forwardee". Sorry but I'm not seeing anything here that justifies removing the barriers from the cas in this code. GC lurkers feel free to jump in here - this is your code afterall! ;-) David ----- > Thank you for reviewing multiple times... > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > David Holmes wrote on 10/04/2016 21:16:33: > >> From: David Holmes >> To: Hiroshi H Horii/Japan/IBM at IBMJP >> Cc: hotspot-compiler-dev , >> "hotspot-gc-dev at openjdk.java.net" , >> "hotspot-runtime-dev at openjdk.java.net" > dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- >> port-dev at openjdk.java.net" , >> Thomas Schatzl , Tim Ellison >> , Carsten Varming >> Date: 10/04/2016 21:17 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> On 4/10/2016 8:22 PM, Hiroshi H Horii wrote: >> > Dear David, >> > >> > Thank you for your comments. You are correct. In the previous webrev, a >> > caller (in copy_and_push_safe_barrier) may use new_obj's fields >> > unsafely. Very sorry. >> > >> > I changed the log format in copy_and_push_safe_barrier not to use fields >> > of new_obj. Could you review this again? >> > http://cr.openjdk.java.net/~horii/8154736/webrev.02/ >> >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> >> 274 new_obj = NULL; >> 285 new_obj = NULL; >> >> Sorry but you are losing me here. You've gone from simply removing >> barriers on the cmpxchg to changing the functionality of the methods >> that use the cmpxchg - instead of return the forwardee() you are now >> returning NULL! ?? >> >> David >> ----- >> >> > The callers of PSPromotionManager::copy_to_survivor_space are here. >> > PSPromotionManager::copy_and_push_safe_barrier >> > PSScavengeFromKlassClosure::do_oop >> > >> > I confirmed any fields of new_obj is not used in the two methods in this >> > webrev. >> > >> > In addition, I reduced passing a constant literal "forwarding" in >> > copy_and_push_safe_barrier and added some guards before logging in >> > PSPromotionManager::copy_to_survivor_space as follows. >> > >> > if (log_develop_is_enabled(Trace, gc, scavenge)) { >> > log_develop_trace(gc, scavenge)(...); >> > } >> > >> > If copy_to_survivor_space should not return new_obj if its fields are >> > unsafe, I would like to change the return type of copy_to_survivor_space >> > to "void" (or allow copy_to_survivor_space to return NULL). >> > >> > Regards, >> > Hiroshi >> > ----------------------- >> > Hiroshi Horii, Ph.D. >> > IBM Research - Tokyo >> > >> > >> > David Holmes wrote on 10/04/2016 16:32:35: >> > >> >> From: David Holmes >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming > >> >> Cc: hotspot-compiler-dev , >> >> "hotspot-gc-dev at openjdk.java.net" , >> >> "hotspot-runtime-dev at openjdk.java.net" > >> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix- >> >> port-dev at openjdk.java.net" , >> >> Thomas Schatzl , Tim Ellison >> >> >> >> Date: 10/04/2016 16:33 >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> copy_to_survivor for ppc64 >> >> >> >> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote: >> >> > Dear Carsten, >> >> > >> >> > Thank you for your correction. And very sorry about my easy > mistakes... >> >> > I created webrev again. >> > http://cr.openjdk.java.net/~horii/8154736/webrev.01/ >> >> > I believe, all of the unsafe usages of new_obj, which has been > pointed >> >> > in this thread, is fixed with this webrev. >> >> >> >> I still am uneasy about this. If it is not safe to access the fields of >> >> new_obj in the tracing statements but we return new_obj to the caller, >> >> then it may not be safe for the caller to access the fields of new_obj! >> >> >> >> That aside: >> >> >> >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp >> >> >> >> 293 if (o->is_forwarded()) { >> >> 294 new_obj = o->forwardee(); >> >> 295 // fields in new_obj may not be synchronized. >> >> 296 if (log_develop_is_enabled(Trace, gc, scavenge) && >> >> o->is_forwarded()) { >> >> >> >> Why the second check of o->is_forwarded() ? >> >> >> >> 297 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " >> >> PTR_FORMAT "}", >> >> 298 "forwarding", >> >> >> >> Why are you passing "forwarding" as an argument for the first %s > instead >> >> of just expressing it directly? I see this is a copy'n'paste from the >> >> existing code - and I'm guessing at one point there was a conditional >> >> around that. I think it should be fixed. >> >> >> >> Thanks, >> >> David >> >> >> >> > Dear all, >> >> > >> >> > Can I ask a review of this webrev and give thoughts and comments > again? >> >> > >> >> > Regards, >> >> > Hiroshi >> >> > ----------------------- >> >> > Hiroshi Horii, Ph.D. >> >> > IBM Research - Tokyo >> >> > >> >> > >> >> > Carsten Varming wrote on 10/03/2016 12:55:25: >> >> > >> >> >> From: Carsten Varming >> >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP >> >> >> Cc: Thomas Schatzl , David Holmes >> >> >> , hotspot-compiler-dev > >> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > >> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> >> >> , Michihiro Horie/Japan/ >> >> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > >> >> dev at openjdk.java.net>, Tim Ellison >> >> >> Date: 10/03/2016 12:56 >> >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> >> copy_to_survivor for ppc64 >> >> >> >> >> >> Dear Hiroshi, >> >> >> >> >> >> It looks like psPromotionManager.cpp:509 contains a logging >> >> >> statement that could read data from an oop forwarded by another > thread. >> >> >> >> >> >> I don't see how your new logging >> >> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In >> >> >> the two new statements you read data from new_obj, but in both cases >> >> >> it is possible that another thread still haven't written the data in >> >> >> new_obj (new_obj->klass() reads new_obj->_metadata). >> >> >> >> >> >> Carsten >> >> >> >> >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii >> > wrote: >> >> >> Hi, Thomas, and David, >> >> >> >> >> >> Thank you for your comments. >> >> >> >> >> >> > I think Hiroshi thinks that since the work stealing itself > does a CAS >> >> >> > with barrier after obtaining "new_obj" in the other thread, it > should >> >> >> > be safe (for other threads consuming an object on the task queue). >> >> >> >> >> >> Thank you. What Thomas thankfully explain is that I wanted to >> >> >> mention why relaxed CAS is available for copy_to_survivor. >> >> >> >> >> >> > I also do not think it is safe as is - for example, at least >> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data > from the >> >> >> > returned new_obj (in another log message :)) regardless of > failure. >> >> >> > >> >> >> > That method also reads the forwardee if forwarded, and then again >> > uses >> >> >> > object information in that same log message. A quick look did not >> > show >> >> >> > other issues, but don't count this as a review. >> >> >> >> >> >> Thank you for your comments. >> >> >> >> >> >> As Carsten suggested, I guess, size may not be necessary for logging >> >> >> when CAS is failed (the size will be logged by the other thread that >> >> >> successfully operates the CAS). By reducing printing a size of >> >> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I > believe. >> >> >> >> >> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier >> >> >> () updates a card table for new_obj. However, this new_obj will not >> >> >> be used fro card tables in the same GC as a root of GC because all >> >> >> of entries in card tables were registered as tasks before any calls >> >> >> of copy_and_push_safe_barrier. >> >> >> >> >> >> I created a new webrev that reduces print formats when CAS is >> >> >> failed. Could you review this and give comments on it? >> >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/ >> >> >> >> >> >> Regards, >> >> >> Hiroshi >> >> >> ----------------------- >> >> >> Hiroshi Horii, Ph.D. >> >> >> IBM Research - Tokyo >> >> >> >> >> >> >> >> >> Thomas Schatzl wrote on 09/30/2016 >> > 21:02:31: >> >> >> >> >> >> > From: Thomas Schatzl >> >> >> > To: David Holmes , Hiroshi H >> >> > Horii/Japan/IBM at IBMJP >> >> >> > Cc: hotspot-compiler-dev , >> >> >> > Tim Ellison , Michihiro Horie/Japan/ >> >> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" > >> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" > >> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" >> >> >> > >> >> >> > Date: 09/30/2016 21:04 >> >> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> >> >> > copy_to_survivor for ppc64 >> >> >> > >> >> >> > Hi, >> >> >> > >> >> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote: >> >> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote: >> >> >> > > > >> >> >> > > > Dear David, and Dan, >> >> >> > > > >> >> >> > > > Thank you for your comments. >> >> >> > > > >> >> >> > > > > >> >> >> > > > > In >> >> >> > > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp: >> >> >> > > > > 266 the log line reads data from the forwardee even when >> > the CAS >> >> >> > > > > fails. I believe those reads will be unsafe without barriers >> >> >> > > > > after >> >> >> > > > > the copy of the content of the object. >> >> >> > > > > >> > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28 >> >> >> > > > > 8 >> >> >> > > > > same problem as in line 266 >> >> >> > > > Can we use o->size() or new_obj_size instead of > new_obj->size()? >> >> >> > >> >> >> > They are not equivalent. Parallel GC and other collectors > creatively >> >> >> > reuse the "length" field of objArrays to indicate progress in the >> >> >> > scanning them during GC. >> >> >> > >> >> >> > new_obj_size is the result of a call to o->size() (and the >> > compiler may >> >> >> > redo computations at any point), so has the same issue. >> >> >> > >> >> >> > > > > If you feel that the use of new_obj->size() is potentially >> > unsafe >> >> >> > > > > then >> >> >> > > > > the fact we return new_obj means that any use of new_obj > by the >> >> >> > > > > caller >> >> >> > > > > may also potentially be unsafe. >> >> >> > > > In my understanding, while copying objects to a survivor >> > space, if >> >> >> > > > a thread creates a new_obj and sets a pointer with CAS, > the other >> >> >> > > > threads can touch the new_obj after the thread calls >> >> >> > > > push_contents(new_obj) (Line: 239). In push_contents, >> >> >> > > > OrderAccess::release_store is called before pushing the >> > object as a >> >> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). > If the >> >> >> > > > other thread reads the task, all of copy for new_obj is safe. >> >> >> > > I'm not familiar with the larger picture of the GC protocols > here, >> >> >> > > but just looking at this code fragment in isolation if the CAS >> > fails >> >> >> > > we read o->forwardee() to set new_obj. That in itself is fine >> > because >> >> >> > > we're reading the field that we were testing with the CAS. > But we >> >> >> > > could then deference new_obj before the thread that won the CAS >> > calls >> >> >> > > push_contents; and even if it is after push_contents we have >> > not done >> >> >> > > an acquire to pair with the release-store in push_contents. >> >> >> > >> >> >> > I think Hiroshi thinks that since the work stealing itself > does a CAS >> >> >> > with barrier after obtaining "new_obj" in the other thread, it > should >> >> >> > be safe (for other threads consuming an object on the task queue). >> >> >> > >> >> >> > > So I'm really not seeing how we can use a barrier-less CAS here. >> >> >> > >> >> >> > I also do not think it is safe as is - for example, at least >> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data > from the >> >> >> > returned new_obj (in another log message :)) regardless of > failure. >> >> >> > >> >> >> > That method also reads the forwardee if forwarded, and then again >> > uses >> >> >> > object information in that same log message. A quick look did not >> > show >> >> >> > other issues, but don't count this as a review. >> >> >> > >> >> >> > Thanks, >> >> >> > Thomas >> >> >> > >> >> >> > >> > From david.holmes at oracle.com Thu Oct 6 02:30:46 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 6 Oct 2016 12:30:46 +1000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: On 1/10/2016 1:48 AM, Doerr, Martin wrote: > Hi, > > the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. > > My proposed change is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ Not sure I understand the existing padding code. What false sharing are we trying to avoid? And if the existing code assumes a cache line size of 64 and declares _name to be 64 chars, then why can't the new code declare name to be DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code is wrong (not just hard-wired). Which platforms will this cause an actual change in Monitor size other than S390? Thanks, David > Please review. If will also need a sponsor. > > Thanks and best regards, > Martin > From marcus.larsson at oracle.com Thu Oct 6 06:58:27 2016 From: marcus.larsson at oracle.com (Marcus Larsson) Date: Thu, 6 Oct 2016 08:58:27 +0200 Subject: RFR: 8166117: Add UTC timestamp decorator for UL In-Reply-To: <50847eea-27db-136a-8192-4c524a1e894a@oracle.com> References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> <50847eea-27db-136a-8192-4c524a1e894a@oracle.com> Message-ID: Thanks Robbin! On 10/05/2016 10:34 PM, Robbin Ehn wrote: > Hi Marcus, looks good! > > /Robbin > > > On 10/05/2016 03:26 PM, Marcus Larsson wrote: >> Hi, >> >> Please review the following patch to add a UTC timestamp decorator >> for UL. >> >> os::iso8601_time() has been modified to allow timestamps based on >> UTC. os::gmtime_pd() has been added to replace os::localtime_pd() >> when UTC is requested. Patch also includes a unit test for the new >> decoration. >> >> Webrev: >> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/ >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8166117 >> >> Testing: >> New unit test through JPRT. >> >> Thanks, >> Marcus > From marcus.larsson at oracle.com Thu Oct 6 06:59:18 2016 From: marcus.larsson at oracle.com (Marcus Larsson) Date: Thu, 6 Oct 2016 08:59:18 +0200 Subject: RFR: 8166117: Add UTC timestamp decorator for UL In-Reply-To: <362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com> References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com> <362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com> Message-ID: <2b070fbd-9529-a0fb-d10f-113160eb9e62@oracle.com> Thanks Rachel! On 10/05/2016 11:45 PM, Rachel Protacio wrote: > Looks good to me too! > > Rachel > > > On 10/5/2016 9:26 AM, Marcus Larsson wrote: >> Hi, >> >> Please review the following patch to add a UTC timestamp decorator >> for UL. >> >> os::iso8601_time() has been modified to allow timestamps based on >> UTC. os::gmtime_pd() has been added to replace os::localtime_pd() >> when UTC is requested. Patch also includes a unit test for the new >> decoration. >> >> Webrev: >> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/ >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8166117 >> >> Testing: >> New unit test through JPRT. >> >> Thanks, >> Marcus > From martin.doerr at sap.com Thu Oct 6 09:09:21 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 6 Oct 2016 09:09:21 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> Hi David, thanks for taking a look at my proposal. Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive. The purpose of the padding is to avoid the following situation: 2 Monitor instances are located behind each other and some fields end up on the same cache line. Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one. The cache line needs to get transferred between all involved processors. If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more. The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short. x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well. (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Donnerstag, 6. Oktober 2016 04:31 To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE On 1/10/2016 1:48 AM, Doerr, Martin wrote: > Hi, > > the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. > > My proposed change is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ Not sure I understand the existing padding code. What false sharing are we trying to avoid? And if the existing code assumes a cache line size of 64 and declares _name to be 64 chars, then why can't the new code declare name to be DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code is wrong (not just hard-wired). Which platforms will this cause an actual change in Monitor size other than S390? Thanks, David > Please review. If will also need a sponsor. > > Thanks and best regards, > Martin > From david.holmes at oracle.com Thu Oct 6 10:20:31 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 6 Oct 2016 20:20:31 +1000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> References: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> Message-ID: <68676855-6c87-44f5-2030-be3045298f39@oracle.com> On 6/10/2016 7:09 PM, Doerr, Martin wrote: > Hi David, > > thanks for taking a look at my proposal. > > Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive. > > The purpose of the padding is to avoid the following situation: > 2 Monitor instances are located behind each other and some fields end up on the same cache line. > Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one. > The cache line needs to get transferred between all involved processors. > > If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more. But that only helps for the case where the two monitors are exactly the wrong distance apart. Two other monitors that previously did not share cache lines may now do so if you make the monitors bigger. This seems completely ad-hoc. ?? David ----- > The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short. > > x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well. > (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390. > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 04:31 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 1/10/2016 1:48 AM, Doerr, Martin wrote: >> Hi, >> >> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >> >> My proposed change is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > Not sure I understand the existing padding code. What false sharing are > we trying to avoid? > > And if the existing code assumes a cache line size of 64 and declares > _name to be 64 chars, then why can't the new code declare name to be > DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code > is wrong (not just hard-wired). > > Which platforms will this cause an actual change in Monitor size other > than S390? > > Thanks, > David > >> Please review. If will also need a sponsor. >> >> Thanks and best regards, >> Martin >> From martin.doerr at sap.com Thu Oct 6 11:05:40 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 6 Oct 2016 11:05:40 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <68676855-6c87-44f5-2030-be3045298f39@oracle.com> References: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> <68676855-6c87-44f5-2030-be3045298f39@oracle.com> Message-ID: Hi David, there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general. The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing. Even without having the cache line alignment, the padding does help: Please note that the padding is inserted at the end. The critical fields are at the beginning. Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple). Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Donnerstag, 6. Oktober 2016 12:21 To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE On 6/10/2016 7:09 PM, Doerr, Martin wrote: > Hi David, > > thanks for taking a look at my proposal. > > Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive. > > The purpose of the padding is to avoid the following situation: > 2 Monitor instances are located behind each other and some fields end up on the same cache line. > Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one. > The cache line needs to get transferred between all involved processors. > > If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more. But that only helps for the case where the two monitors are exactly the wrong distance apart. Two other monitors that previously did not share cache lines may now do so if you make the monitors bigger. This seems completely ad-hoc. ?? David ----- > The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short. > > x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well. > (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390. > > Best regards, > Martin > > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 04:31 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 1/10/2016 1:48 AM, Doerr, Martin wrote: >> Hi, >> >> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >> >> My proposed change is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > Not sure I understand the existing padding code. What false sharing are > we trying to avoid? > > And if the existing code assumes a cache line size of 64 and declares > _name to be 64 chars, then why can't the new code declare name to be > DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code > is wrong (not just hard-wired). > > Which platforms will this cause an actual change in Monitor size other > than S390? > > Thanks, > David > >> Please review. If will also need a sponsor. >> >> Thanks and best regards, >> Martin >> From Alan.Burlison at oracle.com Thu Oct 6 12:10:16 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Thu, 6 Oct 2016 13:10:16 +0100 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> Message-ID: <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> On 04/10/2016 19:37, Alan Burlison wrote: >> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >> already being included. > > Cool, I'll pop that in instead - thanks! Done, webrev updated, jprt hotspot testset is clean. -- Alan Burlison -- From david.holmes at oracle.com Thu Oct 6 13:33:47 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 6 Oct 2016 23:33:47 +1000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> <68676855-6c87-44f5-2030-be3045298f39@oracle.com> Message-ID: Hi Martin, Thanks for bearing with me here, these optimizations are not really my forte. On 6/10/2016 9:05 PM, Doerr, Martin wrote: > Hi David, > > there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general. > The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing. So this is for all the mutexes/monitors created in mutex_init() which are assumed to be laid out in a linear fashion. Ok. Has anyone actually done any metrics on this or is it all theoretical? ie are any adjacent, or otherwise cache-line-aligned, monitors actually contended at the same time? Padding to avoid false-sharing always seems a very local optimization to me - more obvious with hot fields in the same object than with distinct fields in distinct objects. > Even without having the cache line alignment, the padding does help: > Please note that the padding is inserted at the end. The critical fields are at the beginning. > Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple). Seems to me the existing code, as it doesn't take into account the size of the rest of the Monitor, isn't really addressing this correctly at all - even on platforms with a 64-byte cache line. > Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution. The other option is an operator new that only allocates on the desired alignment - as we do in some other places. That also avoids wasted space with Monitors embedded in other objects - not that I think we have that many of them. Thanks, David > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 12:21 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 6/10/2016 7:09 PM, Doerr, Martin wrote: >> Hi David, >> >> thanks for taking a look at my proposal. >> >> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive. >> >> The purpose of the padding is to avoid the following situation: >> 2 Monitor instances are located behind each other and some fields end up on the same cache line. >> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one. >> The cache line needs to get transferred between all involved processors. >> >> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more. > > But that only helps for the case where the two monitors are exactly the > wrong distance apart. Two other monitors that previously did not share > cache lines may now do so if you make the monitors bigger. > > This seems completely ad-hoc. ?? > > David > ----- > > > >> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short. >> >> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well. >> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 6. Oktober 2016 04:31 >> To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> On 1/10/2016 1:48 AM, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >>> >>> My proposed change is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >> >> Not sure I understand the existing padding code. What false sharing are >> we trying to avoid? >> >> And if the existing code assumes a cache line size of 64 and declares >> _name to be 64 chars, then why can't the new code declare name to be >> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code >> is wrong (not just hard-wired). >> >> Which platforms will this cause an actual change in Monitor size other >> than S390? >> >> Thanks, >> David >> >>> Please review. If will also need a sponsor. >>> >>> Thanks and best regards, >>> Martin >>> From claes.redestad at oracle.com Thu Oct 6 13:55:21 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 6 Oct 2016 15:55:21 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> <68676855-6c87-44f5-2030-be3045298f39@oracle.com> Message-ID: <57F657C9.20204@oracle.com> Hi, (cc:ing hotspot-gc-dev) On 2016-10-06 15:33, David Holmes wrote: > Hi Martin, > > Thanks for bearing with me here, these optimizations are not really my > forte. > > On 6/10/2016 9:05 PM, Doerr, Martin wrote: >> Hi David, >> >> there are many Monitor instances behind each other so I think the idea >> of padding (which was not mine) was not bad in general. >> The ideal situation would be to have them cache line aligned and >> sizeof(Monitor) equals the cache line size (or a multiple). This would >> completely prevent cache line sharing. > > So this is for all the mutexes/monitors created in mutex_init() which > are assumed to be laid out in a linear fashion. Ok. > > Has anyone actually done any metrics on this or is it all theoretical? > ie are any adjacent, or otherwise cache-line-aligned, monitors actually > contended at the same time? Padding to avoid false-sharing always seems > a very local optimization to me - more obvious with hot fields in the > same object than with distinct fields in distinct objects. > >> Even without having the cache line alignment, the padding does help: >> Please note that the padding is inserted at the end. The critical >> fields are at the beginning. >> Especially _LockWord of 2 Monitors will never be on the same cache >> line when sizeof(Monitor) equals the cache line size (or a multiple). > > Seems to me the existing code, as it doesn't take into account the size > of the rest of the Monitor, isn't really addressing this correctly at > all - even on platforms with a 64-byte cache line. > >> Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case >> of bad alignment, but I didn't want to waste more space. I'd rather >> prefer the alignment solution. > > The other option is an operator new that only allocates on the desired > alignment - as we do in some other places. That also avoids wasted space > with Monitors embedded in other objects - not that I think we have that > many of them. IIRC GC code has a number of places where Monitors are created and embedded in other, larger objects and have reported footprint overhead issues with the current anti-sharing solution. Additionally, if memory serves me, it appears this char[64] name field is only ever set to an actual name for the Monitors that are allocated globally in mutex_list, so... ... wouldn't a possibly better solution be to remove padding altogether from the base Monitor and wrap the mutex_list Monitors in some class that adds the name/padding? Thanks! /Claes > > Thanks, > David > >> Best regards, >> Martin >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 6. Oktober 2016 12:21 >> To: Doerr, Martin ; >> hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >> DEFAULT_CACHE_LINE_SIZE >> >> On 6/10/2016 7:09 PM, Doerr, Martin wrote: >>> Hi David, >>> >>> thanks for taking a look at my proposal. >>> >>> Maybe "unnecessary cache line sharing of contended memory" would be >>> more comprehensive. >>> >>> The purpose of the padding is to avoid the following situation: >>> 2 Monitor instances are located behind each other and some fields end >>> up on the same cache line. >>> Some threads running on some processors compete for the first Monitor >>> while some other threads running on some processors compete for the >>> second one. >>> The cache line needs to get transferred between all involved processors. >>> >>> If we add enough padding, the fields which are accessed by many >>> processors end up on different cache lines. This splits the problem >>> into 2 independent problems. The threads competing for the first >>> Monitor don't interfere with those ones competing for the second one >>> any more. >> >> But that only helps for the case where the two monitors are exactly the >> wrong distance apart. Two other monitors that previously did not share >> cache lines may now do so if you make the monitors bigger. >> >> This seems completely ad-hoc. ?? >> >> David >> ----- >> >> >> >>> The existing padding implementation is not optimal. It's a little too >>> small on some platforms. On other platforms, it is not wrong to pad >>> more than necessary, but ideally, one would pad to make the Monitor >>> size equal to the cache line size. I have kept the minimum of 64 >>> because _name is not only used for padding and I guess people don't >>> want it too short. >>> >>> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical >>> configurations. That's like PPC64 so the change also improves the >>> padding on these platforms as well. >>> (On x86_64 we get the same result as on PPC64: The length of _name >>> gets extended from 64 to 72 in product build). The padding increase >>> only gets huge on S390. >>> >>> Best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: David Holmes [mailto:david.holmes at oracle.com] >>> Sent: Donnerstag, 6. Oktober 2016 04:31 >>> To: Doerr, Martin ; >>> hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> On 1/10/2016 1:48 AM, Doerr, Martin wrote: >>>> Hi, >>>> >>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>> that cache lines are 64 Bytes. There's a platform dependent define >>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>> padding is to avoid false sharing. >>>> >>>> My proposed change is here: >>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>> >>> Not sure I understand the existing padding code. What false sharing are >>> we trying to avoid? >>> >>> And if the existing code assumes a cache line size of 64 and declares >>> _name to be 64 chars, then why can't the new code declare name to be >>> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code >>> is wrong (not just hard-wired). >>> >>> Which platforms will this cause an actual change in Monitor size other >>> than S390? >>> >>> Thanks, >>> David >>> >>>> Please review. If will also need a sponsor. >>>> >>>> Thanks and best regards, >>>> Martin >>>> From gerard.ziemski at oracle.com Thu Oct 6 14:10:15 2016 From: gerard.ziemski at oracle.com (Gerard Ziemski) Date: Thu, 6 Oct 2016 09:10:15 -0500 Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with ExitCode 0 In-Reply-To: References: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com> Message-ID: <8519EAB4-FD98-49F4-A37A-B00129087542@oracle.com> Thank you for the review! > On Oct 5, 2016, at 6:52 PM, David Holmes wrote: > > Sorry for the delay - takes a while to catch up after a long weekend :) > > On 6/10/2016 12:37 AM, Gerard Ziemski wrote: >> Ping. Can I have this simple fix reviewed please? > > The changes seem fine to me too. > > Thanks, > David > >> >>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski wrote: >>> >>> hi all, >>> >>> Please review this straightforward fix for a regression caused by JDK-8138760 >>> >>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?. >>> >>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix. >>> >>> The new debug info is refactored into its own method ?printPerformanceInfoDetails? >>> >>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8166145 >>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1 >>> >>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all >>> >> From martin.doerr at sap.com Thu Oct 6 16:15:29 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 6 Oct 2016 16:15:29 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap> <68676855-6c87-44f5-2030-be3045298f39@oracle.com> Message-ID: <97c50a5b06cf44e5a8bb8478e3710c3f@DEWDFE13DE14.global.corp.sap> Hi David, we have made the change a long time ago when we were looking for concurrency issues. I don't remember if it was a fix for anything specific which we observed. I don't know if the authors of the original code had seen issues or made performance measurements. I think the current implementation is not too bad for 64 byte cache lines because the _LockWord fields are always on different cache lines (with 64 byte _name[]). The intention of my proposal was to improve the situation for 128 and especially 256 byte cache lines which I still think gets achieved by my webrev. Not sure if more sophisticated solutions would be kind of overbuilt. Thanks for your time and best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Donnerstag, 6. Oktober 2016 15:34 To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Hi Martin, Thanks for bearing with me here, these optimizations are not really my forte. On 6/10/2016 9:05 PM, Doerr, Martin wrote: > Hi David, > > there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general. > The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing. So this is for all the mutexes/monitors created in mutex_init() which are assumed to be laid out in a linear fashion. Ok. Has anyone actually done any metrics on this or is it all theoretical? ie are any adjacent, or otherwise cache-line-aligned, monitors actually contended at the same time? Padding to avoid false-sharing always seems a very local optimization to me - more obvious with hot fields in the same object than with distinct fields in distinct objects. > Even without having the cache line alignment, the padding does help: > Please note that the padding is inserted at the end. The critical fields are at the beginning. > Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple). Seems to me the existing code, as it doesn't take into account the size of the rest of the Monitor, isn't really addressing this correctly at all - even on platforms with a 64-byte cache line. > Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution. The other option is an operator new that only allocates on the desired alignment - as we do in some other places. That also avoids wasted space with Monitors embedded in other objects - not that I think we have that many of them. Thanks, David > Best regards, > Martin > > -----Original Message----- > From: David Holmes [mailto:david.holmes at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 12:21 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 6/10/2016 7:09 PM, Doerr, Martin wrote: >> Hi David, >> >> thanks for taking a look at my proposal. >> >> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive. >> >> The purpose of the padding is to avoid the following situation: >> 2 Monitor instances are located behind each other and some fields end up on the same cache line. >> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one. >> The cache line needs to get transferred between all involved processors. >> >> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more. > > But that only helps for the case where the two monitors are exactly the > wrong distance apart. Two other monitors that previously did not share > cache lines may now do so if you make the monitors bigger. > > This seems completely ad-hoc. ?? > > David > ----- > > > >> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short. >> >> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well. >> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390. >> >> Best regards, >> Martin >> >> >> -----Original Message----- >> From: David Holmes [mailto:david.holmes at oracle.com] >> Sent: Donnerstag, 6. Oktober 2016 04:31 >> To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> On 1/10/2016 1:48 AM, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >>> >>> My proposed change is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >> >> Not sure I understand the existing padding code. What false sharing are >> we trying to avoid? >> >> And if the existing code assumes a cache line size of 64 and declares >> _name to be 64 chars, then why can't the new code declare name to be >> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code >> is wrong (not just hard-wired). >> >> Which platforms will this cause an actual change in Monitor size other >> than S390? >> >> Thanks, >> David >> >>> Please review. If will also need a sponsor. >>> >>> Thanks and best regards, >>> Martin >>> From daniel.daugherty at oracle.com Thu Oct 6 21:13:01 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 6 Oct 2016 15:13:01 -0600 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: On 9/30/16 9:48 AM, Doerr, Martin wrote: > Hi, > > the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. > > My proposed change is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ src/share/vm/runtime/mutex.hpp Please update the copyright year before pushing. L172: // The default length of monitor name is chosen to avoid false sharing. L173: enum { L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - sizeof(MonitorBase), L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? CACHE_LINE_PADDING : 64 L176: }; L177: char _name[MONITOR_NAME_LEN]; // Name of mutex I have to say that I'm not fond of the fact that MONITOR_NAME_LEN can vary between platforms; I like that it is a minimum of 64 bytes and is still a constant. I'm also not happy that the resulting sizeof(Monitor) may not be a multiple of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate that unhappiness with the fact that sizeof(Monitor) hasn't been a multiple of the cache line size since at least 2008 and no one complained (that I know of). So if I was making this change, I would make MONITOR_NAME_LEN 64 bytes (like it was) and add a pad field that would bring up sizeof(Monitor) to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes would be unhappy with me and anyone embedding a Monitor into another data structure would be unhappy with me, but I'm used to that :-) So what you have is fine, especially for JDK9. L180: public: L181: #ifndef PRODUCT L182: debug_only(static bool contains(Monitor * locks, Monitor * lock);) L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * locks);) L184: debug_only(Monitor * get_least_ranked_lock_besides_this(Monitor * locks);) L185: #endif L186: L187: void set_owner_implementation(Thread* owner) PRODUCT_RETURN; L188: void check_prelock_state (Thread* thread) PRODUCT_RETURN; L189: void check_block_state (Thread* thread) These were all "protected" before. Now they are "public". Any particular reason? Thumbs up on the mechanics of this change. I'm interested in the answer to the "protected" versus "public" question, but don't considered that query to be a blocker. The rest of this isn't code review, but some of this caught my attention. src/share/vm/runtime/mutex.hpp old L84: // The default length of monitor name is chosen to be 64 to avoid false sharing. old L85: static const int MONITOR_NAME_LEN = 64; I had to look up the history of this comment: $ hg log -r 55 src/share/vm/runtime/mutex.hpp changeset: 55:2a8eb116ebbe user: xlu date: Tue Feb 05 23:21:57 2008 -0800 summary: 6610420: Debug VM crashes during monitor lock rank checking $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 @@ -82,6 +82,9 @@ class ParkEvent ; // *in that order*. If their implementations change such that these // assumptions are violated, a whole lot of code will break. +// The default length of monitor name is choosen to be 64 to avoid false sharing. +static const int MONITOR_NAME_LEN = 64; + class Monitor : public CHeapObj { public: @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { volatile intptr_t _WaitLock [1] ; // Protects _WaitSet ParkEvent * volatile _WaitSet ; // LL of ParkEvents volatile bool _snuck; // Used for sneaky locking (evil). - const char * _name; // Name of mutex int NotifyCount ; // diagnostic assist - double pad [8] ; // avoid false sharing + char _name[MONITOR_NAME_LEN]; // Name of mutex // Debugging fields for naming, deadlock detection, etc. (some only used in debug mode) #ifndef PRODUCT @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { int ILocked () ; protected: - static void ClearMonitor (Monitor * m) ; + static void ClearMonitor (Monitor * m, const char* name = NULL) ; Monitor() ; So the original code had an 8-double pad for avoiding false sharing. Sounds very much like the old ObjectMonitor padding. I'm sure at the time that Dice determined that 8-double value, the result was to pad the size of Monitor to an even multiple of a particular cache line size. Xiobin changed the 'name' field to be an array so that the name chars could serve double duty as the cache line pad... pun intended. Unfortunately that pad doesn't make sure that the resulting Monitor size is a multiple of the cache line size. Dan > > Please review. If will also need a sponsor. > > Thanks and best regards, > Martin > From claes.redestad at oracle.com Thu Oct 6 21:51:36 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Thu, 6 Oct 2016 23:51:36 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: <57F6C768.7060605@oracle.com> Hi Dan, yes, I'm slighly unhappy with this change... :-) ... and would rather see a reuse of PaddedEnd<[Monitor|Mutex]> from share/vm/memory/padded.hpp in the places where padding makes sense, such as the globally allocated lists, rather than perpetuating the wart of dual-purposing the name field for padding. This is sort of like what you're already suggesting, except that PaddedEnd uses template magic to actually add nothing if we're already cache aligned, as well as allowing us to not add any footprint overhead to existing uses where Monitors and Mutexes are already embedded (and there are a number of existing uses in key places in both GC and compiler code, see, e.g., CompileTask). Thanks! /Claes On 2016-10-06 23:13, Daniel D. Daugherty wrote: > On 9/30/16 9:48 AM, Doerr, Martin wrote: >> Hi, >> >> the current implementation of Monitor padding (mutex.cpp) assumes that >> cache lines are 64 Bytes. There's a platform dependent define >> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >> padding is to avoid false sharing. >> >> My proposed change is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > src/share/vm/runtime/mutex.hpp > Please update the copyright year before pushing. > > L172: // The default length of monitor name is chosen to avoid > false sharing. > L173: enum { > L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - > sizeof(MonitorBase), > L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? > CACHE_LINE_PADDING : 64 > L176: }; > L177: char _name[MONITOR_NAME_LEN]; // Name of mutex > > I have to say that I'm not fond of the fact that MONITOR_NAME_LEN > can vary between platforms; I like that it is a minimum of 64 > bytes > and is still a constant. > > I'm also not happy that the resulting sizeof(Monitor) may not > be a multiple > of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate > that unhappiness > with the fact that sizeof(Monitor) hasn't been a multiple of > the cache line > size since at least 2008 and no one complained (that I know of). > > So if I was making this change, I would make MONITOR_NAME_LEN > 64 bytes > (like it was) and add a pad field that would bring up > sizeof(Monitor) > to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes > would be > unhappy with me and anyone embedding a Monitor into another data > structure would be unhappy with me, but I'm used to that :-) > > So what you have is fine, especially for JDK9. > > L180: public: > L181: #ifndef PRODUCT > L182: debug_only(static bool contains(Monitor * locks, Monitor * > lock);) > L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * > locks);) > L184: debug_only(Monitor * > get_least_ranked_lock_besides_this(Monitor * locks);) > L185: #endif > L186: > L187: void set_owner_implementation(Thread* > owner) PRODUCT_RETURN; > L188: void check_prelock_state (Thread* > thread) PRODUCT_RETURN; > L189: void check_block_state (Thread* thread) > > These were all "protected" before. Now they are "public". > Any particular reason? > > Thumbs up on the mechanics of this change. I'm interested in the > answer to the "protected" versus "public" question, but don't > considered that query to be a blocker. > > > The rest of this isn't code review, but some of this caught > my attention. > > src/share/vm/runtime/mutex.hpp > > old L84: // The default length of monitor name is chosen to be 64 > to avoid false sharing. > old L85: static const int MONITOR_NAME_LEN = 64; > > I had to look up the history of this comment: > > $ hg log -r 55 src/share/vm/runtime/mutex.hpp > changeset: 55:2a8eb116ebbe > user: xlu > date: Tue Feb 05 23:21:57 2008 -0800 > summary: 6610420: Debug VM crashes during monitor lock rank checking > > $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp > diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp > --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 > +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 > @@ -82,6 +82,9 @@ class ParkEvent ; > // *in that order*. If their implementations change such that these > // assumptions are violated, a whole lot of code will break. > > +// The default length of monitor name is choosen to be 64 to avoid > false sharing. > +static const int MONITOR_NAME_LEN = 64; > + > class Monitor : public CHeapObj { > > public: > @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { > volatile intptr_t _WaitLock [1] ; // Protects _WaitSet > ParkEvent * volatile _WaitSet ; // LL of ParkEvents > volatile bool _snuck; // Used for sneaky locking > (evil). > - const char * _name; // Name of mutex > int NotifyCount ; // diagnostic assist > - double pad [8] ; // avoid false sharing > + char _name[MONITOR_NAME_LEN]; // Name of mutex > > // Debugging fields for naming, deadlock detection, etc. (some only > used in debug mode) > #ifndef PRODUCT > @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { > int ILocked () ; > > protected: > - static void ClearMonitor (Monitor * m) ; > + static void ClearMonitor (Monitor * m, const char* name = NULL) ; > Monitor() ; > > So the original code had an 8-double pad for avoiding false sharing. > Sounds very much like the old ObjectMonitor padding. I'm sure at the > time that Dice determined that 8-double value, the result was to pad > the size of Monitor to an even multiple of a particular cache line > size. > > Xiobin changed the 'name' field to be an array so that the name > chars could serve double duty as the cache line pad... pun intended. > Unfortunately that pad doesn't make sure that the resulting Monitor > size is a multiple of the cache line size. > > Dan > > >> >> Please review. If will also need a sponsor. >> >> Thanks and best regards, >> Martin >> > From kim.barrett at oracle.com Thu Oct 6 22:16:28 2016 From: kim.barrett at oracle.com (Kim Barrett) Date: Thu, 6 Oct 2016 18:16:28 -0400 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: > On Oct 5, 2016, at 9:36 PM, David Holmes wrote: > > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: >> Dear David, >> >> Thank you for your comments. >> >> I just used to think that it may be better that copy_to_survivor_space >> doesn't return forwardee if CAS was failed in order to prevent from >> reading fields in forwardee. But as you pointed, this extends fix for >> this topic. >> >> I removed two NULL assignments from the previous wevrev. >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/ > > Which simply takes us back to where we were. It may not be safe for the caller of those methods to access the fields of the returned "forwardee". > > Sorry but I'm not seeing anything here that justifies removing the barriers from the cas in this code. GC lurkers feel free to jump in here - this is your code afterall! ;-) > > David > ----- Using a CAS with memory_order_relaxed in copy_to_survivor_space seems to me to be extremely fragile and hard to reason about. The places where that copied object might escape to and be examined seem to be myriad. And not only do we need to worry about them today, but also for future maintenance. Even if it can modified and shown to be correct today, it would be very easy to intoduce a bug later, as should be obvious from the various issues pointed out so far during this review. The key issue here is that we copy obj into new_obj, and then make new_obj accessible to other threads via the CAS. Those other threads might attempt to access data in new_obj. This suggests the CAS ought to have at least a release fence to ensure the copy is complete before the CAS is performed. No amount of fencing on the read side (such as in the work stealing) can remove that need. And that might be all that is needed. On the post-CAS side, we load the forwardee and then load values from it. I thik we can use implicit consume with dependent loads (except on Alpha) plus the suggested release fence to get the desired effect. (If not, use an acquire form of forwardee()?) I'm not certain that just a release fence is sufficient (I'm less familiar with ParallelGC than I'd like for looking at something like this), but I'm pretty sure I wouldn't want to go any weaker than that. From daniel.daugherty at oracle.com Thu Oct 6 23:02:53 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 6 Oct 2016 17:02:53 -0600 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <57F6C768.7060605@oracle.com> References: <57F6C768.7060605@oracle.com> Message-ID: <0f3ed330-11cb-1a9d-70e3-45515309032e@oracle.com> I was going to bring up PaddedEnd, but decided not to since the example closest to my fingertips is what we did with ObjectMonitor and PaddedEnd... I didn't think you liked that one either... maybe I'm just confused... :-) Dan On 10/6/16 3:51 PM, Claes Redestad wrote: > Hi Dan, > > yes, I'm slighly unhappy with this change... :-) > > ... and would rather see a reuse of PaddedEnd<[Monitor|Mutex]> from > share/vm/memory/padded.hpp in the places where padding makes sense, > such as the globally allocated lists, rather than perpetuating the wart > of dual-purposing the name field for padding. > > This is sort of like what you're already suggesting, except that > PaddedEnd uses template magic to actually add nothing if we're already > cache aligned, as well as allowing us to not add any footprint overhead > to existing uses where Monitors and Mutexes are already embedded (and > there are a number of existing uses in key places in both GC and > compiler code, see, e.g., CompileTask). > > Thanks! > > /Claes > > On 2016-10-06 23:13, Daniel D. Daugherty wrote: >> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of Monitor padding (mutex.cpp) assumes that >>> cache lines are 64 Bytes. There's a platform dependent define >>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>> padding is to avoid false sharing. >>> >>> My proposed change is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >> >> src/share/vm/runtime/mutex.hpp >> Please update the copyright year before pushing. >> >> L172: // The default length of monitor name is chosen to avoid >> false sharing. >> L173: enum { >> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >> sizeof(MonitorBase), >> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >> CACHE_LINE_PADDING : 64 >> L176: }; >> L177: char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> I have to say that I'm not fond of the fact that >> MONITOR_NAME_LEN >> can vary between platforms; I like that it is a minimum of 64 >> bytes >> and is still a constant. >> >> I'm also not happy that the resulting sizeof(Monitor) may not >> be a multiple >> of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate >> that unhappiness >> with the fact that sizeof(Monitor) hasn't been a multiple of >> the cache line >> size since at least 2008 and no one complained (that I know >> of). >> >> So if I was making this change, I would make MONITOR_NAME_LEN >> 64 bytes >> (like it was) and add a pad field that would bring up >> sizeof(Monitor) >> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes >> would be >> unhappy with me and anyone embedding a Monitor into another >> data >> structure would be unhappy with me, but I'm used to that :-) >> >> So what you have is fine, especially for JDK9. >> >> L180: public: >> L181: #ifndef PRODUCT >> L182: debug_only(static bool contains(Monitor * locks, Monitor * >> lock);) >> L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * >> locks);) >> L184: debug_only(Monitor * >> get_least_ranked_lock_besides_this(Monitor * locks);) >> L185: #endif >> L186: >> L187: void set_owner_implementation(Thread* >> owner) PRODUCT_RETURN; >> L188: void check_prelock_state (Thread* >> thread) PRODUCT_RETURN; >> L189: void check_block_state (Thread* thread) >> >> These were all "protected" before. Now they are "public". >> Any particular reason? >> >> Thumbs up on the mechanics of this change. I'm interested in the >> answer to the "protected" versus "public" question, but don't >> considered that query to be a blocker. >> >> >> The rest of this isn't code review, but some of this caught >> my attention. >> >> src/share/vm/runtime/mutex.hpp >> >> old L84: // The default length of monitor name is chosen to be 64 >> to avoid false sharing. >> old L85: static const int MONITOR_NAME_LEN = 64; >> >> I had to look up the history of this comment: >> >> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >> changeset: 55:2a8eb116ebbe >> user: xlu >> date: Tue Feb 05 23:21:57 2008 -0800 >> summary: 6610420: Debug VM crashes during monitor lock rank checking >> >> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >> @@ -82,6 +82,9 @@ class ParkEvent ; >> // *in that order*. If their implementations change such that these >> // assumptions are violated, a whole lot of code will break. >> >> +// The default length of monitor name is choosen to be 64 to avoid >> false sharing. >> +static const int MONITOR_NAME_LEN = 64; >> + >> class Monitor : public CHeapObj { >> >> public: >> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >> volatile bool _snuck; // Used for sneaky locking >> (evil). >> - const char * _name; // Name of mutex >> int NotifyCount ; // diagnostic assist >> - double pad [8] ; // avoid false sharing >> + char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> // Debugging fields for naming, deadlock detection, etc. (some only >> used in debug mode) >> #ifndef PRODUCT >> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >> int ILocked () ; >> >> protected: >> - static void ClearMonitor (Monitor * m) ; >> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >> Monitor() ; >> >> So the original code had an 8-double pad for avoiding false sharing. >> Sounds very much like the old ObjectMonitor padding. I'm sure at the >> time that Dice determined that 8-double value, the result was to pad >> the size of Monitor to an even multiple of a particular cache line >> size. >> >> Xiobin changed the 'name' field to be an array so that the name >> chars could serve double duty as the cache line pad... pun intended. >> Unfortunately that pad doesn't make sure that the resulting Monitor >> size is a multiple of the cache line size. >> >> Dan >> >> >>> >>> Please review. If will also need a sponsor. >>> >>> Thanks and best regards, >>> Martin >>> >> From HORII at jp.ibm.com Fri Oct 7 02:50:51 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Fri, 7 Oct 2016 11:50:51 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Dear Kim, David, and all, Thank you for your comments. I created a new webrev. I added memory_order_release as a new enum of cmpxchg_memory_order (atomic.hpp) and use it to update forwardees. http://cr.openjdk.java.net/~horii/8154736/webrev.04/ Originally, two sync were called before and after cmpxchg in ppc. With this change, one of them is reduced. Though one sync still remains, performance will be improved. Could you give your comments on this new webrev? Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo Kim Barrett wrote on 10/07/2016 07:16:28: > From: Kim Barrett > To: David Holmes > Cc: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-compiler-dev compiler-dev at openjdk.java.net>, Tim Ellison > , "ppc-aix-port-dev at openjdk.java.net" aix-port-dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, > "hotspot-gc-dev at openjdk.java.net" , > "hotspot-runtime-dev at openjdk.java.net" > Date: 10/07/2016 07:17 > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and > copy_to_survivor for ppc64 > > > On Oct 5, 2016, at 9:36 PM, David Holmes wrote: > > > > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: > >> Dear David, > >> > >> Thank you for your comments. > >> > >> I just used to think that it may be better that copy_to_survivor_space > >> doesn't return forwardee if CAS was failed in order to prevent from > >> reading fields in forwardee. But as you pointed, this extends fix for > >> this topic. > >> > >> I removed two NULL assignments from the previous wevrev. > >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/ > > > > Which simply takes us back to where we were. It may not be safe > for the caller of those methods to access the fields of the returned > "forwardee". > > > > Sorry but I'm not seeing anything here that justifies removing the > barriers from the cas in this code. GC lurkers feel free to jump in > here - this is your code afterall! ;-) > > > > David > > ----- > > Using a CAS with memory_order_relaxed in copy_to_survivor_space seems > to me to be extremely fragile and hard to reason about. The places > where that copied object might escape to and be examined seem to be > myriad. And not only do we need to worry about them today, but also > for future maintenance. Even if it can modified and shown to be > correct today, it would be very easy to intoduce a bug later, as > should be obvious from the various issues pointed out so far during > this review. > > The key issue here is that we copy obj into new_obj, and then make > new_obj accessible to other threads via the CAS. Those other threads > might attempt to access data in new_obj. This suggests the CAS ought > to have at least a release fence to ensure the copy is complete before > the CAS is performed. No amount of fencing on the read side (such as > in the work stealing) can remove that need. > > And that might be all that is needed. On the post-CAS side, we load > the forwardee and then load values from it. I thik we can use > implicit consume with dependent loads (except on Alpha) plus the > suggested release fence to get the desired effect. (If not, use an > acquire form of forwardee()?) > > I'm not certain that just a release fence is sufficient (I'm less > familiar with ParallelGC than I'd like for looking at something like > this), but I'm pretty sure I wouldn't want to go any weaker than that. > > From david.holmes at oracle.com Fri Oct 7 03:23:03 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 7 Oct 2016 13:23:03 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <1ca79f91-4096-f404-349e-0906ce976748@oracle.com> On 7/10/2016 12:50 PM, Hiroshi H Horii wrote: > Dear Kim, David, and all, > > Thank you for your comments. > > I created a new webrev. I added memory_order_release as a new enum of > cmpxchg_memory_order (atomic.hpp) and use it to update forwardees. > > http://cr.openjdk.java.net/~horii/8154736/webrev.04/ I think you intended to modify cmpxchg_pre_membar not cmpxchg_post_membar! Release semantics require the "post" fence. Though technically release semantics would put the barrier before the store, not after. But with no pre-fence you could in theory have a store before the cas move inside the cas implementation (on ppc/arm) and get reordered with the store performed by the cas. src/share/vm/gc/parallel/psPromotionManager.cpp still uses memory_order_relaxed. That aside this seems too reactive to me. Kim may be right that release semantics are sufficient for this code, but that is a claim that needs some consideration and validation before we just run with it and make the change. The approach to changes like this needs a lot more discipline and methodology in my opinion. David ----- > Originally, two sync were called before and after cmpxchg in ppc. With > this change, one of them is reduced. Though one sync still remains, > performance will be improved. > > Could you give your comments on this new webrev? > > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > > > Kim Barrett wrote on 10/07/2016 07:16:28: > >> From: Kim Barrett >> To: David Holmes >> Cc: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-compiler-dev > compiler-dev at openjdk.java.net>, Tim Ellison >> , "ppc-aix-port-dev at openjdk.java.net" > aix-port-dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, >> "hotspot-gc-dev at openjdk.java.net" , >> "hotspot-runtime-dev at openjdk.java.net" > >> Date: 10/07/2016 07:17 >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and >> copy_to_survivor for ppc64 >> >> > On Oct 5, 2016, at 9:36 PM, David Holmes > wrote: >> > >> > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: >> >> Dear David, >> >> >> >> Thank you for your comments. >> >> >> >> I just used to think that it may be better that copy_to_survivor_space >> >> doesn't return forwardee if CAS was failed in order to prevent from >> >> reading fields in forwardee. But as you pointed, this extends fix for >> >> this topic. >> >> >> >> I removed two NULL assignments from the previous wevrev. >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/ >> > >> > Which simply takes us back to where we were. It may not be safe >> for the caller of those methods to access the fields of the returned >> "forwardee". >> > >> > Sorry but I'm not seeing anything here that justifies removing the >> barriers from the cas in this code. GC lurkers feel free to jump in >> here - this is your code afterall! ;-) >> > >> > David >> > ----- >> >> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems >> to me to be extremely fragile and hard to reason about. The places >> where that copied object might escape to and be examined seem to be >> myriad. And not only do we need to worry about them today, but also >> for future maintenance. Even if it can modified and shown to be >> correct today, it would be very easy to intoduce a bug later, as >> should be obvious from the various issues pointed out so far during >> this review. >> >> The key issue here is that we copy obj into new_obj, and then make >> new_obj accessible to other threads via the CAS. Those other threads >> might attempt to access data in new_obj. This suggests the CAS ought >> to have at least a release fence to ensure the copy is complete before >> the CAS is performed. No amount of fencing on the read side (such as >> in the work stealing) can remove that need. >> >> And that might be all that is needed. On the post-CAS side, we load >> the forwardee and then load values from it. I thik we can use >> implicit consume with dependent loads (except on Alpha) plus the >> suggested release fence to get the desired effect. (If not, use an >> acquire form of forwardee()?) >> >> I'm not certain that just a release fence is sufficient (I'm less >> familiar with ParallelGC than I'd like for looking at something like >> this), but I'm pretty sure I wouldn't want to go any weaker than that. >> >> > From jiangli.zhou at Oracle.COM Fri Oct 7 04:39:00 2016 From: jiangli.zhou at Oracle.COM (Jiangli Zhou) Date: Thu, 6 Oct 2016 21:39:00 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases Message-ID: Hi, Please review the following fix for JDK-8167333 : webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info. Tested with all existing class data sharing tests. Thanks, Jiangli From david.holmes at oracle.com Fri Oct 7 05:33:13 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 7 Oct 2016 15:33:13 +1000 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: References: Message-ID: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> Hi Jiangli, On 7/10/2016 2:39 PM, Jiangli Zhou wrote: > Hi, > > Please review the following fix for JDK-8167333 : > > webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ > > When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info. It isn't obvious to me that all callers of CFS::source()/clone_source() will handle getting a NULL. Of course I can't tell which of those callers may be involved in this particular use-case. Thanks, David > Tested with all existing class data sharing tests. > > Thanks, > Jiangli > From dmitry.samersoff at oracle.com Fri Oct 7 08:36:57 2016 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Fri, 7 Oct 2016 11:36:57 +0300 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: References: Message-ID: <09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com> Jiangli, I see couple of places in hotspot where result of FileMapInfo::shared_classpath() is de-referenced without additional null check. Could you insert check/assert/comments as appropriate to these places? -Dmitry On 2016-10-07 07:39, Jiangli Zhou wrote: > Hi, > > Please review the following fix for JDK-8167333 > : > > webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ > > > When a shared class is transformed by a JVMTI agent during initial > loading (via CFLH), the VM creates a new ClassFileStream using the > transformed class data. The source path info from the class? > associated SharedClassPathEntry is passed as the ?source? argument to > ClassFileStream. However, some shared classes may not have an > associated SharedClassPathEntry and the class_path_index is -1. The > VM needs to detect such case and not passing an invalid source path > info. > > Tested with all existing class data sharing tests. > > Thanks, Jiangli > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From martin.doerr at sap.com Fri Oct 7 09:34:10 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 7 Oct 2016 09:34:10 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: Hi Dan, thank you very much for reviewing and for investigating the history. It was not intended to make the functions you mentioned public. I've fixed that. I also updated the copyright information. New webrev is here: http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ @Coleen: Please use this one. I have also added reviewer attribution. Thanks and best regards, Martin -----Original Message----- From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] Sent: Donnerstag, 6. Oktober 2016 23:13 To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE On 9/30/16 9:48 AM, Doerr, Martin wrote: > Hi, > > the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. > > My proposed change is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ src/share/vm/runtime/mutex.hpp Please update the copyright year before pushing. L172: // The default length of monitor name is chosen to avoid false sharing. L173: enum { L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - sizeof(MonitorBase), L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? CACHE_LINE_PADDING : 64 L176: }; L177: char _name[MONITOR_NAME_LEN]; // Name of mutex I have to say that I'm not fond of the fact that MONITOR_NAME_LEN can vary between platforms; I like that it is a minimum of 64 bytes and is still a constant. I'm also not happy that the resulting sizeof(Monitor) may not be a multiple of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate that unhappiness with the fact that sizeof(Monitor) hasn't been a multiple of the cache line size since at least 2008 and no one complained (that I know of). So if I was making this change, I would make MONITOR_NAME_LEN 64 bytes (like it was) and add a pad field that would bring up sizeof(Monitor) to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes would be unhappy with me and anyone embedding a Monitor into another data structure would be unhappy with me, but I'm used to that :-) So what you have is fine, especially for JDK9. L180: public: L181: #ifndef PRODUCT L182: debug_only(static bool contains(Monitor * locks, Monitor * lock);) L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * locks);) L184: debug_only(Monitor * get_least_ranked_lock_besides_this(Monitor * locks);) L185: #endif L186: L187: void set_owner_implementation(Thread* owner) PRODUCT_RETURN; L188: void check_prelock_state (Thread* thread) PRODUCT_RETURN; L189: void check_block_state (Thread* thread) These were all "protected" before. Now they are "public". Any particular reason? Thumbs up on the mechanics of this change. I'm interested in the answer to the "protected" versus "public" question, but don't considered that query to be a blocker. The rest of this isn't code review, but some of this caught my attention. src/share/vm/runtime/mutex.hpp old L84: // The default length of monitor name is chosen to be 64 to avoid false sharing. old L85: static const int MONITOR_NAME_LEN = 64; I had to look up the history of this comment: $ hg log -r 55 src/share/vm/runtime/mutex.hpp changeset: 55:2a8eb116ebbe user: xlu date: Tue Feb 05 23:21:57 2008 -0800 summary: 6610420: Debug VM crashes during monitor lock rank checking $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 @@ -82,6 +82,9 @@ class ParkEvent ; // *in that order*. If their implementations change such that these // assumptions are violated, a whole lot of code will break. +// The default length of monitor name is choosen to be 64 to avoid false sharing. +static const int MONITOR_NAME_LEN = 64; + class Monitor : public CHeapObj { public: @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { volatile intptr_t _WaitLock [1] ; // Protects _WaitSet ParkEvent * volatile _WaitSet ; // LL of ParkEvents volatile bool _snuck; // Used for sneaky locking (evil). - const char * _name; // Name of mutex int NotifyCount ; // diagnostic assist - double pad [8] ; // avoid false sharing + char _name[MONITOR_NAME_LEN]; // Name of mutex // Debugging fields for naming, deadlock detection, etc. (some only used in debug mode) #ifndef PRODUCT @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { int ILocked () ; protected: - static void ClearMonitor (Monitor * m) ; + static void ClearMonitor (Monitor * m, const char* name = NULL) ; Monitor() ; So the original code had an 8-double pad for avoiding false sharing. Sounds very much like the old ObjectMonitor padding. I'm sure at the time that Dice determined that 8-double value, the result was to pad the size of Monitor to an even multiple of a particular cache line size. Xiobin changed the 'name' field to be an array so that the name chars could serve double duty as the cache line pad... pun intended. Unfortunately that pad doesn't make sure that the resulting Monitor size is a multiple of the cache line size. Dan > > Please review. If will also need a sponsor. > > Thanks and best regards, > Martin > From claes.redestad at oracle.com Fri Oct 7 09:59:30 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 7 Oct 2016 11:59:30 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: Message-ID: <57F77202.8070201@oracle.com> Hi, after due consideration I strongly consider this change unacceptable since it adds footprint overhead to performance critcial compiler and GC code with little to no data to support this won't cause regressions. Changes to Monitor/Mutex needs to be done with more surgical precision than this. If I do have a veto on the matter, here it is. Thanks! /Claes On 2016-10-07 11:34, Doerr, Martin wrote: > Hi Dan, > > thank you very much for reviewing and for investigating the history. > > It was not intended to make the functions you mentioned public. I've fixed that. > I also updated the copyright information. > > New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ > > @Coleen: Please use this one. I have also added reviewer attribution. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 23:13 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 9/30/16 9:48 AM, Doerr, Martin wrote: >> Hi, >> >> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >> >> My proposed change is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > src/share/vm/runtime/mutex.hpp > Please update the copyright year before pushing. > > L172: // The default length of monitor name is chosen to avoid > false sharing. > L173: enum { > L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - > sizeof(MonitorBase), > L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? > CACHE_LINE_PADDING : 64 > L176: }; > L177: char _name[MONITOR_NAME_LEN]; // Name of mutex > > I have to say that I'm not fond of the fact that MONITOR_NAME_LEN > can vary between platforms; I like that it is a minimum of 64 bytes > and is still a constant. > > I'm also not happy that the resulting sizeof(Monitor) may not > be a multiple > of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate > that unhappiness > with the fact that sizeof(Monitor) hasn't been a multiple of > the cache line > size since at least 2008 and no one complained (that I know of). > > So if I was making this change, I would make MONITOR_NAME_LEN > 64 bytes > (like it was) and add a pad field that would bring up > sizeof(Monitor) > to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes > would be > unhappy with me and anyone embedding a Monitor into another data > structure would be unhappy with me, but I'm used to that :-) > > So what you have is fine, especially for JDK9. > > L180: public: > L181: #ifndef PRODUCT > L182: debug_only(static bool contains(Monitor * locks, Monitor * > lock);) > L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * > locks);) > L184: debug_only(Monitor * > get_least_ranked_lock_besides_this(Monitor * locks);) > L185: #endif > L186: > L187: void set_owner_implementation(Thread* > owner) PRODUCT_RETURN; > L188: void check_prelock_state (Thread* > thread) PRODUCT_RETURN; > L189: void check_block_state (Thread* thread) > > These were all "protected" before. Now they are "public". > Any particular reason? > > Thumbs up on the mechanics of this change. I'm interested in the > answer to the "protected" versus "public" question, but don't > considered that query to be a blocker. > > > The rest of this isn't code review, but some of this caught > my attention. > > src/share/vm/runtime/mutex.hpp > > old L84: // The default length of monitor name is chosen to be 64 > to avoid false sharing. > old L85: static const int MONITOR_NAME_LEN = 64; > > I had to look up the history of this comment: > > $ hg log -r 55 src/share/vm/runtime/mutex.hpp > changeset: 55:2a8eb116ebbe > user: xlu > date: Tue Feb 05 23:21:57 2008 -0800 > summary: 6610420: Debug VM crashes during monitor lock rank checking > > $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp > diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp > --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 > +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 > @@ -82,6 +82,9 @@ class ParkEvent ; > // *in that order*. If their implementations change such that these > // assumptions are violated, a whole lot of code will break. > > +// The default length of monitor name is choosen to be 64 to avoid > false sharing. > +static const int MONITOR_NAME_LEN = 64; > + > class Monitor : public CHeapObj { > > public: > @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { > volatile intptr_t _WaitLock [1] ; // Protects _WaitSet > ParkEvent * volatile _WaitSet ; // LL of ParkEvents > volatile bool _snuck; // Used for sneaky locking > (evil). > - const char * _name; // Name of mutex > int NotifyCount ; // diagnostic assist > - double pad [8] ; // avoid false sharing > + char _name[MONITOR_NAME_LEN]; // Name of mutex > > // Debugging fields for naming, deadlock detection, etc. (some only > used in debug mode) > #ifndef PRODUCT > @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { > int ILocked () ; > > protected: > - static void ClearMonitor (Monitor * m) ; > + static void ClearMonitor (Monitor * m, const char* name = NULL) ; > Monitor() ; > > So the original code had an 8-double pad for avoiding false sharing. > Sounds very much like the old ObjectMonitor padding. I'm sure at the > time that Dice determined that 8-double value, the result was to pad > the size of Monitor to an even multiple of a particular cache line > size. > > Xiobin changed the 'name' field to be an array so that the name > chars could serve double duty as the cache line pad... pun intended. > Unfortunately that pad doesn't make sure that the resulting Monitor > size is a multiple of the cache line size. > > Dan > > >> >> Please review. If will also need a sponsor. >> >> Thanks and best regards, >> Martin >> > From martin.doerr at sap.com Fri Oct 7 10:18:56 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 7 Oct 2016 10:18:56 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <57F77202.8070201@oracle.com> References: <57F77202.8070201@oracle.com> Message-ID: <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> Hi Claes, what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler. What exactly is your concern about the footprint overhead? Are you not concerned about the risk of false sharing? Best regards, Martin -----Original Message----- From: Claes Redestad [mailto:claes.redestad at oracle.com] Sent: Freitag, 7. Oktober 2016 12:00 To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Hi, after due consideration I strongly consider this change unacceptable since it adds footprint overhead to performance critcial compiler and GC code with little to no data to support this won't cause regressions. Changes to Monitor/Mutex needs to be done with more surgical precision than this. If I do have a veto on the matter, here it is. Thanks! /Claes On 2016-10-07 11:34, Doerr, Martin wrote: > Hi Dan, > > thank you very much for reviewing and for investigating the history. > > It was not intended to make the functions you mentioned public. I've fixed that. > I also updated the copyright information. > > New webrev is here: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ > > @Coleen: Please use this one. I have also added reviewer attribution. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] > Sent: Donnerstag, 6. Oktober 2016 23:13 > To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > On 9/30/16 9:48 AM, Doerr, Martin wrote: >> Hi, >> >> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >> >> My proposed change is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ > > src/share/vm/runtime/mutex.hpp > Please update the copyright year before pushing. > > L172: // The default length of monitor name is chosen to avoid > false sharing. > L173: enum { > L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - > sizeof(MonitorBase), > L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? > CACHE_LINE_PADDING : 64 > L176: }; > L177: char _name[MONITOR_NAME_LEN]; // Name of mutex > > I have to say that I'm not fond of the fact that MONITOR_NAME_LEN > can vary between platforms; I like that it is a minimum of 64 bytes > and is still a constant. > > I'm also not happy that the resulting sizeof(Monitor) may not > be a multiple > of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate > that unhappiness > with the fact that sizeof(Monitor) hasn't been a multiple of > the cache line > size since at least 2008 and no one complained (that I know of). > > So if I was making this change, I would make MONITOR_NAME_LEN > 64 bytes > (like it was) and add a pad field that would bring up > sizeof(Monitor) > to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes > would be > unhappy with me and anyone embedding a Monitor into another data > structure would be unhappy with me, but I'm used to that :-) > > So what you have is fine, especially for JDK9. > > L180: public: > L181: #ifndef PRODUCT > L182: debug_only(static bool contains(Monitor * locks, Monitor * > lock);) > L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * > locks);) > L184: debug_only(Monitor * > get_least_ranked_lock_besides_this(Monitor * locks);) > L185: #endif > L186: > L187: void set_owner_implementation(Thread* > owner) PRODUCT_RETURN; > L188: void check_prelock_state (Thread* > thread) PRODUCT_RETURN; > L189: void check_block_state (Thread* thread) > > These were all "protected" before. Now they are "public". > Any particular reason? > > Thumbs up on the mechanics of this change. I'm interested in the > answer to the "protected" versus "public" question, but don't > considered that query to be a blocker. > > > The rest of this isn't code review, but some of this caught > my attention. > > src/share/vm/runtime/mutex.hpp > > old L84: // The default length of monitor name is chosen to be 64 > to avoid false sharing. > old L85: static const int MONITOR_NAME_LEN = 64; > > I had to look up the history of this comment: > > $ hg log -r 55 src/share/vm/runtime/mutex.hpp > changeset: 55:2a8eb116ebbe > user: xlu > date: Tue Feb 05 23:21:57 2008 -0800 > summary: 6610420: Debug VM crashes during monitor lock rank checking > > $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp > diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp > --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 > +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 > @@ -82,6 +82,9 @@ class ParkEvent ; > // *in that order*. If their implementations change such that these > // assumptions are violated, a whole lot of code will break. > > +// The default length of monitor name is choosen to be 64 to avoid > false sharing. > +static const int MONITOR_NAME_LEN = 64; > + > class Monitor : public CHeapObj { > > public: > @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { > volatile intptr_t _WaitLock [1] ; // Protects _WaitSet > ParkEvent * volatile _WaitSet ; // LL of ParkEvents > volatile bool _snuck; // Used for sneaky locking > (evil). > - const char * _name; // Name of mutex > int NotifyCount ; // diagnostic assist > - double pad [8] ; // avoid false sharing > + char _name[MONITOR_NAME_LEN]; // Name of mutex > > // Debugging fields for naming, deadlock detection, etc. (some only > used in debug mode) > #ifndef PRODUCT > @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { > int ILocked () ; > > protected: > - static void ClearMonitor (Monitor * m) ; > + static void ClearMonitor (Monitor * m, const char* name = NULL) ; > Monitor() ; > > So the original code had an 8-double pad for avoiding false sharing. > Sounds very much like the old ObjectMonitor padding. I'm sure at the > time that Dice determined that 8-double value, the result was to pad > the size of Monitor to an even multiple of a particular cache line > size. > > Xiobin changed the 'name' field to be an array so that the name > chars could serve double duty as the cache line pad... pun intended. > Unfortunately that pad doesn't make sure that the resulting Monitor > size is a multiple of the cache line size. > > Dan > > >> >> Please review. If will also need a sponsor. >> >> Thanks and best regards, >> Martin >> > From claes.redestad at oracle.com Fri Oct 7 10:34:51 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Fri, 7 Oct 2016 12:34:51 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> Message-ID: <57F77A4B.6060604@oracle.com> Hi, I'm concerned that this might be an easy-but-wrong fix to a complex problem, and acknowledging that there are already use cases where the _name field is contra-productive. This change adds complexity that makes it even less likely such uses will be optimized for in the future. There are Padded* types put in place to deal with these concerns explicitly rather than implicitly *where it matters*, which allows us the choice of applying padding or not on a per use-case basis (which means we can also remove the _name field for those use cases that don't care about either, which might be most outside of the global lists). I am very concerned about false sharing, but I have no data to support that this change has any measurable benefit in practice: I even did an experiment years ago now where I turned _name into a pointer to not pad at all and saw nothing exceeding noise levels on any benchmark. Thanks! /Claes On 2016-10-07 12:18, Doerr, Martin wrote: > Hi Claes, > > what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler. > What exactly is your concern about the footprint overhead? > Are you not concerned about the risk of false sharing? > > Best regards, > Martin > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Freitag, 7. Oktober 2016 12:00 > To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > after due consideration I strongly consider this change unacceptable > since it adds footprint overhead to performance critcial compiler and > GC code with little to no data to support this won't cause regressions. > > Changes to Monitor/Mutex needs to be done with more surgical precision > than this. > > If I do have a veto on the matter, here it is. > > Thanks! > > /Claes > > On 2016-10-07 11:34, Doerr, Martin wrote: >> Hi Dan, >> >> thank you very much for reviewing and for investigating the history. >> >> It was not intended to make the functions you mentioned public. I've fixed that. >> I also updated the copyright information. >> >> New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >> >> @Coleen: Please use this one. I have also added reviewer attribution. >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >> Sent: Donnerstag, 6. Oktober 2016 23:13 >> To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >>> >>> My proposed change is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >> >> src/share/vm/runtime/mutex.hpp >> Please update the copyright year before pushing. >> >> L172: // The default length of monitor name is chosen to avoid >> false sharing. >> L173: enum { >> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >> sizeof(MonitorBase), >> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >> CACHE_LINE_PADDING : 64 >> L176: }; >> L177: char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> I have to say that I'm not fond of the fact that MONITOR_NAME_LEN >> can vary between platforms; I like that it is a minimum of 64 bytes >> and is still a constant. >> >> I'm also not happy that the resulting sizeof(Monitor) may not >> be a multiple >> of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate >> that unhappiness >> with the fact that sizeof(Monitor) hasn't been a multiple of >> the cache line >> size since at least 2008 and no one complained (that I know of). >> >> So if I was making this change, I would make MONITOR_NAME_LEN >> 64 bytes >> (like it was) and add a pad field that would bring up >> sizeof(Monitor) >> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes >> would be >> unhappy with me and anyone embedding a Monitor into another data >> structure would be unhappy with me, but I'm used to that :-) >> >> So what you have is fine, especially for JDK9. >> >> L180: public: >> L181: #ifndef PRODUCT >> L182: debug_only(static bool contains(Monitor * locks, Monitor * >> lock);) >> L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * >> locks);) >> L184: debug_only(Monitor * >> get_least_ranked_lock_besides_this(Monitor * locks);) >> L185: #endif >> L186: >> L187: void set_owner_implementation(Thread* >> owner) PRODUCT_RETURN; >> L188: void check_prelock_state (Thread* >> thread) PRODUCT_RETURN; >> L189: void check_block_state (Thread* thread) >> >> These were all "protected" before. Now they are "public". >> Any particular reason? >> >> Thumbs up on the mechanics of this change. I'm interested in the >> answer to the "protected" versus "public" question, but don't >> considered that query to be a blocker. >> >> >> The rest of this isn't code review, but some of this caught >> my attention. >> >> src/share/vm/runtime/mutex.hpp >> >> old L84: // The default length of monitor name is chosen to be 64 >> to avoid false sharing. >> old L85: static const int MONITOR_NAME_LEN = 64; >> >> I had to look up the history of this comment: >> >> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >> changeset: 55:2a8eb116ebbe >> user: xlu >> date: Tue Feb 05 23:21:57 2008 -0800 >> summary: 6610420: Debug VM crashes during monitor lock rank checking >> >> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >> @@ -82,6 +82,9 @@ class ParkEvent ; >> // *in that order*. If their implementations change such that these >> // assumptions are violated, a whole lot of code will break. >> >> +// The default length of monitor name is choosen to be 64 to avoid >> false sharing. >> +static const int MONITOR_NAME_LEN = 64; >> + >> class Monitor : public CHeapObj { >> >> public: >> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >> volatile bool _snuck; // Used for sneaky locking >> (evil). >> - const char * _name; // Name of mutex >> int NotifyCount ; // diagnostic assist >> - double pad [8] ; // avoid false sharing >> + char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> // Debugging fields for naming, deadlock detection, etc. (some only >> used in debug mode) >> #ifndef PRODUCT >> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >> int ILocked () ; >> >> protected: >> - static void ClearMonitor (Monitor * m) ; >> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >> Monitor() ; >> >> So the original code had an 8-double pad for avoiding false sharing. >> Sounds very much like the old ObjectMonitor padding. I'm sure at the >> time that Dice determined that 8-double value, the result was to pad >> the size of Monitor to an even multiple of a particular cache line >> size. >> >> Xiobin changed the 'name' field to be an array so that the name >> chars could serve double duty as the cache line pad... pun intended. >> Unfortunately that pad doesn't make sure that the resulting Monitor >> size is a multiple of the cache line size. >> >> Dan >> >> >>> >>> Please review. If will also need a sponsor. >>> >>> Thanks and best regards, >>> Martin >>> >> From thomas.schatzl at oracle.com Fri Oct 7 10:37:52 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 07 Oct 2016 12:37:52 +0200 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1ca79f91-4096-f404-349e-0906ce976748@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> <1ca79f91-4096-f404-349e-0906ce976748@oracle.com> Message-ID: <1475836672.2622.81.camel@oracle.com> Hi, On Fri, 2016-10-07 at 13:23 +1000, David Holmes wrote: > On 7/10/2016 12:50 PM, Hiroshi H Horii wrote: > > > > Dear Kim, David, and all, > > > > Thank you for your comments. > > > > I created a new webrev. I added memory_order_release as a new enum > > of > > cmpxchg_memory_order (atomic.hpp) and use it to update forwardees. > > > > http://cr.openjdk.java.net/~horii/8154736/webrev.04/ > I think you intended to modify cmpxchg_pre_membar not? > cmpxchg_post_membar! Release semantics require the "post" fence. > Though technically release semantics would put the barrier before the > store, not after. But with no pre-fence you could in theory have a > store before the cas move inside the cas implementation (on ppc/arm) > and get reordered with the store performed by the cas. > > src/share/vm/gc/parallel/psPromotionManager.cpp still uses? > memory_order_relaxed. > > That aside this seems too reactive to me. Kim may be right that > release semantics are sufficient for this code, but that is a claim > that needs some consideration and validation before we just run with > it and make the change. The approach to changes like this needs a lot > more discipline and methodology in my opinion. There are some other small issues with the suggested change: - the idiom used to print trace log messages 244 if (log_develop_is_enabled(Trace, gc, scavenge)) { 245 log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " PTR_FORMAT " (%d)}", does not require the first line. Log_develop_trace() will only generate code when compiled in debug mode anyway, so the check before that is superfluous. I saw that in several places. - could you explain what the advantage of ?298???if (!o->is_forwarded()) { ?299?????copy_to_survivor_space(o); ?300???} ?301???oop new_obj = o->forwardee(); compared to ?281???oop new_obj = o->is_forwarded() ?282?????????? o->forwardee() ?283?????????: copy_to_survivor_space(o); in PSPromotionManager::copy_and_push_safe_barrier() is? This seems to introduce a superfluous forced reload (forwardee() accesses a volatile variable), as copy_to_survivor_space already reloads and returns the forwardee even with all these changes. I may be overlooking something crucial (and it's Friday), but I do not see a difference in behavior (and problems) compared to old code, just the additional load. - the new assert at ?302???assert(forwardee != NULL, "forwardee should not be NULL"); seems superfluous. At this point, after the CAS has been executed, we assume that there must be a forwardee. Either copy_or_survivor_space returns it, or there has already been a forwardee. Thanks, Thomas From thomas.schatzl at oracle.com Fri Oct 7 10:38:55 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 07 Oct 2016 12:38:55 +0200 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <1475836735.2622.82.camel@oracle.com> Hi, On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote: > > > > On Oct 5, 2016, at 9:36 PM, David Holmes > > wrote: > > > > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: > > > > > > Dear David, > > > > > > Thank you for your comments. > > > > > > I just used to think that it may be better that > > > copy_to_survivor_space > > > doesn't return forwardee if CAS was failed in order to prevent > > > from > > > reading fields in forwardee. But as you pointed, this extends fix > > > for > > > this topic. > > > > > > I removed two NULL assignments from the previous wevrev. > > > http://cr.openjdk.java.net/~horii/8154736/webrev.03/ > > Which simply takes us back to where we were. It may not be safe for > > the caller of those methods to access the fields of the returned > > "forwardee". > > > > Sorry but I'm not seeing anything here that justifies removing the > > barriers from the cas in this code. GC lurkers feel free to jump in > > here - this is your code afterall! ;-) > > > > David > > ----- > Using a CAS with memory_order_relaxed in copy_to_survivor_space seems > to me to be extremely fragile and hard to reason about.??The places > where that copied object might escape to and be examined seem to be > myriad.??And not only do we need to worry about them today, but also > for future maintenance.??Even if it can modified and shown to be > correct today, it would be very easy to intoduce a bug later, as > should be obvious from the various issues pointed out so far during > this review. > > The key issue here is that we copy obj into new_obj, and then make > new_obj accessible to other threads via the CAS.??Those other threads > might attempt to access data in new_obj.??This suggests the CAS ought > to have at least a release fence to ensure the copy is complete > before the CAS is performed.??No amount of fencing on the read side > (such a in the work stealing) can remove that need. Depending on what "other threads" means. The thread that pops the reference should be okay (as it does a fence), because the thread pushing the entry on the mark stack also releases all stores. Threads not participating in this protocol are problematic, and this is indeed worrying me as well a bit. I have not seen any so far, but there is always a risk of overlooking some place. > And that might be all that is needed.??On the post-CAS side, we load > the forwardee and then load values from it.??I thik we can use > implicit consume with dependent loads (except on Alpha) plus the > suggested release fence to get the desired effect.??(If not, use an > acquire form of forwardee()?) > > I'm not certain that just a release fence is sufficient (I'm less > familiar with ParallelGC than I'd like for looking at something like > this), but I'm pretty sure I wouldn't want to go any weaker than > that. This change "only" impacts ppc64 at this time. Thanks, ? Thomas From david.holmes at oracle.com Fri Oct 7 12:08:25 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 7 Oct 2016 22:08:25 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1475836735.2622.82.camel@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> <1475836735.2622.82.camel@oracle.com> Message-ID: Thomas, > This change "only" impacts ppc64 at this time. This is a dangerous stance. The changes are to shared code. It only happens that only PPC atomic implementations support anything other than conservative barriers today. If someone adds additional forms on other platforms their GC code suddenly has new behaviour! Changes in shared code must be algorithmically correct on all platforms. Not just "it will work fine today". Given all then work being done to add missing barriers, removing them must come with a detailed analysis establishing the safety of doing so. And I am not seeing that here. David On 7/10/2016 8:38 PM, Thomas Schatzl wrote: > Hi, > > On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote: >>> >>> On Oct 5, 2016, at 9:36 PM, David Holmes >>> wrote: >>> >>> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: >>>> >>>> Dear David, >>>> >>>> Thank you for your comments. >>>> >>>> I just used to think that it may be better that >>>> copy_to_survivor_space >>>> doesn't return forwardee if CAS was failed in order to prevent >>>> from >>>> reading fields in forwardee. But as you pointed, this extends fix >>>> for >>>> this topic. >>>> >>>> I removed two NULL assignments from the previous wevrev. >>>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/ >>> Which simply takes us back to where we were. It may not be safe for >>> the caller of those methods to access the fields of the returned >>> "forwardee". >>> >>> Sorry but I'm not seeing anything here that justifies removing the >>> barriers from the cas in this code. GC lurkers feel free to jump in >>> here - this is your code afterall! ;-) >>> >>> David >>> ----- >> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems >> to me to be extremely fragile and hard to reason about. The places >> where that copied object might escape to and be examined seem to be >> myriad. And not only do we need to worry about them today, but also >> for future maintenance. Even if it can modified and shown to be >> correct today, it would be very easy to intoduce a bug later, as >> should be obvious from the various issues pointed out so far during >> this review. >> >> The key issue here is that we copy obj into new_obj, and then make >> new_obj accessible to other threads via the CAS. Those other threads >> might attempt to access data in new_obj. This suggests the CAS ought >> to have at least a release fence to ensure the copy is complete >> before the CAS is performed. No amount of fencing on the read side >> (such a in the work stealing) can remove that need. > > Depending on what "other threads" means. > > The thread that pops the reference should be okay (as it does a fence), > because the thread pushing the entry on the mark stack also releases > all stores. > > Threads not participating in this protocol are problematic, and this is > indeed worrying me as well a bit. > I have not seen any so far, but there is always a risk of overlooking > some place. > >> And that might be all that is needed. On the post-CAS side, we load >> the forwardee and then load values from it. I thik we can use >> implicit consume with dependent loads (except on Alpha) plus the >> suggested release fence to get the desired effect. (If not, use an >> acquire form of forwardee()?) >> >> I'm not certain that just a release fence is sufficient (I'm less >> familiar with ParallelGC than I'd like for looking at something like >> this), but I'm pretty sure I wouldn't want to go any weaker than >> that. > > This change "only" impacts ppc64 at this time. > > Thanks, > Thomas > From harold.seigel at oracle.com Fri Oct 7 15:20:17 2016 From: harold.seigel at oracle.com (harold seigel) Date: Fri, 7 Oct 2016 11:20:17 -0400 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock Message-ID: Hi, Please review this fix for JDK-8166364. This fix moves the setting of the module fields in the class mirrors of the fixup_module_list outside of the Module_lock. The determination of whether a mirror should be added to the fixup_module_list is still done under Module_lock as is the defining of module java.base. This prevents any synchronization issues with a mirror being erroneously added to the fixup_module_list after module java.base is defined. The other piece is that the VM, in Modules::define_javabase_module(), guarantees under Module_lock that only one thread will ever successfully define module java.base. Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, java/io, java/lang, and java/util tests using both fastdebug and slowdebug builds. The nsk cololocated and the non-colocated quick tests were also run against a slowdebug build. Thanks, Harold From jiangli.zhou at Oracle.COM Fri Oct 7 18:23:46 2016 From: jiangli.zhou at Oracle.COM (Jiangli Zhou) Date: Fri, 7 Oct 2016 11:23:46 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> References: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> Message-ID: <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> Hi David, Thanks for taking a look. > On Oct 6, 2016, at 10:33 PM, David Holmes wrote: > > Hi Jiangli, > > On 7/10/2016 2:39 PM, Jiangli Zhou wrote: >> Hi, >> >> Please review the following fix for JDK-8167333 : >> >> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >> >> When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info. > > It isn't obvious to me that all callers of CFS::source()/clone_source() will handle getting a NULL. Of course I can't tell which of those callers may be involved in this particular use-case. I took a look of all the code that calls CFS::source()/clone_source(). They all handle the NULL case with explicit NULL check. For our specific case, the particular caller involved is InstanceKlass::print_loading_log. Before the fix, it crashed when trying to print the invalid cfs->source after (cfs->source() != NULL) check. Thanks, Jiangli > > Thanks, > David > >> Tested with all existing class data sharing tests. >> >> Thanks, >> Jiangli >> From jiangli.zhou at oracle.com Fri Oct 7 22:35:19 2016 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Fri, 7 Oct 2016 15:35:19 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: <09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com> References: <09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com> Message-ID: <6FCF7FDC-BFDD-43F1-A311-3F7B72CA8D2C@oracle.com> Hi Dmitry, Thanks for the review. > On Oct 7, 2016, at 1:36 AM, Dmitry Samersoff wrote: > > Jiangli, > > I see couple of places in hotspot where result of > FileMapInfo::shared_classpath() is de-referenced without additional null > check. > > Could you insert check/assert/comments as appropriate to these places? That?s a very good point. I double-checked all other places that call FileMapInfo::shared_classpath(). They all have valid non-NULL shared class path entry when the entry field is accessed. Just being cautious, I added some asserts to make sure the shared class path entry is not NULL. Here is updated webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.01/ I?ve rerun all related tests. Thanks, Jiangli > > -Dmitry > > On 2016-10-07 07:39, Jiangli Zhou wrote: >> Hi, >> >> Please review the following fix for JDK-8167333 >> : >> >> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >> >> >> When a shared class is transformed by a JVMTI agent during initial >> loading (via CFLH), the VM creates a new ClassFileStream using the >> transformed class data. The source path info from the class? >> associated SharedClassPathEntry is passed as the ?source? argument to >> ClassFileStream. However, some shared classes may not have an >> associated SharedClassPathEntry and the class_path_index is -1. The >> VM needs to detect such case and not passing an invalid source path >> info. >> >> Tested with all existing class data sharing tests. >> >> Thanks, Jiangli >> > > > -- > Dmitry Samersoff > Oracle Java development team, Saint Petersburg, Russia > * I would love to change the world, but they won't give me the sources. From dmitry.samersoff at oracle.com Sat Oct 8 16:15:34 2016 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Sat, 8 Oct 2016 19:15:34 +0300 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: References: Message-ID: <2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com> Harold, I'd tried your fix in my kitchensync setup and can confirm, that VM doesn't crash anymore. The fix looks good for me. -Dmitry On 2016-10-07 18:20, harold seigel wrote: > Hi, > > Please review this fix for JDK-8166364. > > This fix moves the setting of the module fields in the class mirrors of > the fixup_module_list outside of the Module_lock. The determination of > whether a mirror should be added to the fixup_module_list is still done > under Module_lock as is the defining of module java.base. This prevents > any synchronization issues with a mirror being erroneously added to the > fixup_module_list after module java.base is defined. The other piece is > that the VM, in Modules::define_javabase_module(), guarantees under > Module_lock that only one thread will ever successfully define module > java.base. > > Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 > > The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, > java/io, java/lang, and java/util tests using both fastdebug and > slowdebug builds. The nsk cololocated and the non-colocated quick tests > were also run against a slowdebug build. > > Thanks, Harold > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From david.holmes at oracle.com Sun Oct 9 20:59:08 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 10 Oct 2016 06:59:08 +1000 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: References: Message-ID: <988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com> Hi Harold, Change looks good. A couple of suggestions re comments below. On 8/10/2016 1:20 AM, harold seigel wrote: > Hi, > > Please review this fix for JDK-8166364. > > This fix moves the setting of the module fields in the class mirrors of > the fixup_module_list outside of the Module_lock. The determination of > whether a mirror should be added to the fixup_module_list is still done > under Module_lock as is the defining of module java.base. This prevents > any synchronization issues with a mirror being erroneously added to the > fixup_module_list after module java.base is defined. The other piece is > that the VM, in Modules::define_javabase_module(), guarantees under > Module_lock that only one thread will ever successfully define module > java.base. > > Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ src/share/vm/classfile/modules.cpp Can you add a comment here: 246 + // Only the thread that actually defined the base module will get here, + // so no locking is needed. + 247 // Patch any previously loaded class's module field with java.base's java.lang.reflect.Module. 248 ModuleEntryTable::patch_javabase_entries(module_handle); --- src/share/vm/classfile/javaClasses.cpp This comment is no longer quite reads right now it is not the else clause: 801 // java.base was defined at some point between calling create_mirror() 802 // and obtaining the Module_lock, patch this particular class with java.base. suggest: // If java.base was already defined then patch this particular class with java.base. Thanks, David > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 > > The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, > java/io, java/lang, and java/util tests using both fastdebug and > slowdebug builds. The nsk cololocated and the non-colocated quick tests > were also run against a slowdebug build. > > Thanks, Harold > From david.holmes at oracle.com Sun Oct 9 21:10:37 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 10 Oct 2016 07:10:37 +1000 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> References: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> Message-ID: On 8/10/2016 4:23 AM, Jiangli Zhou wrote: > Hi David, > > Thanks for taking a look. > >> On Oct 6, 2016, at 10:33 PM, David Holmes > > wrote: >> >> Hi Jiangli, >> >> On 7/10/2016 2:39 PM, Jiangli Zhou wrote: >>> Hi, >>> >>> Please review the following fix for JDK-8167333 >>> : >>> >>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >>> >>> >>> When a shared class is transformed by a JVMTI agent during initial >>> loading (via CFLH), the VM creates a new ClassFileStream using the >>> transformed class data. The source path info from the class? >>> associated SharedClassPathEntry is passed as the ?source? argument to >>> ClassFileStream. However, some shared classes may not have an >>> associated SharedClassPathEntry and the class_path_index is -1. The >>> VM needs to detect such case and not passing an invalid source path info. >> >> It isn't obvious to me that all callers of >> CFS::source()/clone_source() will handle getting a NULL. Of course I >> can't tell which of those callers may be involved in this particular >> use-case. > > I took a look of all the code that calls CFS::source()/clone_source(). > They all handle the NULL case with explicit NULL check. For our specific > case, the particular caller involved > is InstanceKlass::print_loading_log. Before the fix, it crashed when > trying to print the invalid cfs->source after (cfs->source() != NULL) check. Thanks for verifying. I've looked at the latest webrev with the additional asserts - all looks good. David > Thanks, > Jiangli > >> >> Thanks, >> David >> >>> Tested with all existing class data sharing tests. >>> >>> Thanks, >>> Jiangli >>> > From ioi.lam at oracle.com Mon Oct 10 06:27:56 2016 From: ioi.lam at oracle.com (Ioi Lam) Date: Sun, 09 Oct 2016 23:27:56 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: References: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> Message-ID: <57FB34EC.2070606@oracle.com> On 10/9/16 2:10 PM, David Holmes wrote: > On 8/10/2016 4:23 AM, Jiangli Zhou wrote: >> Hi David, >> >> Thanks for taking a look. >> >>> On Oct 6, 2016, at 10:33 PM, David Holmes >> > wrote: >>> >>> Hi Jiangli, >>> >>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote: >>>> Hi, >>>> >>>> Please review the following fix for JDK-8167333 >>>> : >>>> >>>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >>>> >>>> >>>> When a shared class is transformed by a JVMTI agent during initial >>>> loading (via CFLH), the VM creates a new ClassFileStream using the >>>> transformed class data. The source path info from the class? >>>> associated SharedClassPathEntry is passed as the ?source? argument to >>>> ClassFileStream. However, some shared classes may not have an >>>> associated SharedClassPathEntry and the class_path_index is -1. The >>>> VM needs to detect such case and not passing an invalid source path >>>> info. >>> >>> It isn't obvious to me that all callers of >>> CFS::source()/clone_source() will handle getting a NULL. Of course I >>> can't tell which of those callers may be involved in this particular >>> use-case. >> >> I took a look of all the code that calls CFS::source()/clone_source(). >> They all handle the NULL case with explicit NULL check. For our specific >> case, the particular caller involved >> is InstanceKlass::print_loading_log. Before the fix, it crashed when >> trying to print the invalid cfs->source after (cfs->source() != NULL) >> check. > > Thanks for verifying. I've looked at the latest webrev with the > additional asserts - all looks good. > Looks good to me, too. Thanks - Ioi > David > >> Thanks, >> Jiangli >> >>> >>> Thanks, >>> David >>> >>>> Tested with all existing class data sharing tests. >>>> >>>> Thanks, >>>> Jiangli >>>> >> From robbin.ehn at oracle.com Mon Oct 10 07:06:04 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 10 Oct 2016 09:06:04 +0200 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: References: Message-ID: Thanks for fixing, looks good and works fine! /Robbin On 10/07/2016 05:20 PM, harold seigel wrote: > Hi, > > Please review this fix for JDK-8166364. > > This fix moves the setting of the module fields in the class mirrors of the fixup_module_list outside of the Module_lock. The determination of whether a mirror should be > added to the fixup_module_list is still done under Module_lock as is the defining of module java.base. This prevents any synchronization issues with a mirror being > erroneously added to the fixup_module_list after module java.base is defined. The other piece is that the VM, in Modules::define_javabase_module(), guarantees under > Module_lock that only one thread will ever successfully define module java.base. > > Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ > > JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 > > The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, java/io, java/lang, and java/util tests using both fastdebug and slowdebug builds. The nsk > cololocated and the non-colocated quick tests were also run against a slowdebug build. > > Thanks, Harold > From shafi.s.ahmad at oracle.com Mon Oct 10 07:24:37 2016 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Mon, 10 Oct 2016 00:24:37 -0700 (PDT) Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' Message-ID: Hi All, Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'. Summary: In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError. While I was analysis this issue I found there are two jvm state where OOM can happen: 1. OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is. 2. OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it. Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 Testing: jprt and jtreg (on Linux x86_64) Regards, Shafi From robbin.ehn at oracle.com Mon Oct 10 09:07:46 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Mon, 10 Oct 2016 11:07:46 +0200 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: Message-ID: <6e5229df-7012-4fec-eaa9-cd80ec7bbb4b@oracle.com> Hi Shafi, Looks good and works fine (tested with repro from bug), thanks for fixing! /Robbin (not a Reviewer) On 10/10/2016 09:24 AM, Shafi Ahmad wrote: > Hi All, > > Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'. > > Summary: > In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError. > While I was analysis this issue I found there are two jvm state where OOM can happen: > 1. OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is. > 2. OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it. > > Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > > Testing: jprt and jtreg (on Linux x86_64) > > Regards, > Shafi > From christian.tornqvist at oracle.com Mon Oct 10 11:43:29 2016 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Mon, 10 Oct 2016 07:43:29 -0400 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: Message-ID: <20e601d222eb$85dbf710$9193e530$@oracle.com> Hi Shafi, Note that this bug is targeted for JDK 10, you need to wait with pushing this until the repository for that release is open. Thanks, Christian -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Shafi Ahmad Sent: Monday, October 10, 2016 3:25 AM To: hotspot-runtime-dev at openjdk.java.net Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' Hi All, Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'. Summary: In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError. While I was analysis this issue I found there are two jvm state where OOM can happen: 1. OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is. 2. OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it. Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 Testing: jprt and jtreg (on Linux x86_64) Regards, Shafi From harold.seigel at oracle.com Mon Oct 10 12:29:51 2016 From: harold.seigel at oracle.com (harold seigel) Date: Mon, 10 Oct 2016 08:29:51 -0400 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: <2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com> References: <2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com> Message-ID: Hi Dmitry, Thanks for the review. Harold On 10/8/2016 12:15 PM, Dmitry Samersoff wrote: > Harold, > > I'd tried your fix in my kitchensync setup and can confirm, > that VM doesn't crash anymore. > > The fix looks good for me. > > -Dmitry > > > On 2016-10-07 18:20, harold seigel wrote: >> Hi, >> >> Please review this fix for JDK-8166364. >> >> This fix moves the setting of the module fields in the class mirrors of >> the fixup_module_list outside of the Module_lock. The determination of >> whether a mirror should be added to the fixup_module_list is still done >> under Module_lock as is the defining of module java.base. This prevents >> any synchronization issues with a mirror being erroneously added to the >> fixup_module_list after module java.base is defined. The other piece is >> that the VM, in Modules::define_javabase_module(), guarantees under >> Module_lock that only one thread will ever successfully define module >> java.base. >> >> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 >> >> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, >> java/io, java/lang, and java/util tests using both fastdebug and >> slowdebug builds. The nsk cololocated and the non-colocated quick tests >> were also run against a slowdebug build. >> >> Thanks, Harold >> > From harold.seigel at oracle.com Mon Oct 10 12:33:13 2016 From: harold.seigel at oracle.com (harold seigel) Date: Mon, 10 Oct 2016 08:33:13 -0400 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: <988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com> References: <988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com> Message-ID: Hi David, Thanks for the review. I'll fix the comments before pushing the fix. Harold On 10/9/2016 4:59 PM, David Holmes wrote: > Hi Harold, > > Change looks good. A couple of suggestions re comments below. > > On 8/10/2016 1:20 AM, harold seigel wrote: >> Hi, >> >> Please review this fix for JDK-8166364. >> >> This fix moves the setting of the module fields in the class mirrors of >> the fixup_module_list outside of the Module_lock. The determination of >> whether a mirror should be added to the fixup_module_list is still done >> under Module_lock as is the defining of module java.base. This prevents >> any synchronization issues with a mirror being erroneously added to the >> fixup_module_list after module java.base is defined. The other piece is >> that the VM, in Modules::define_javabase_module(), guarantees under >> Module_lock that only one thread will ever successfully define module >> java.base. >> >> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ > > src/share/vm/classfile/modules.cpp > > Can you add a comment here: > > 246 > + // Only the thread that actually defined the base module will > get here, > + // so no locking is needed. > + > 247 // Patch any previously loaded class's module field with > java.base's java.lang.reflect.Module. > 248 ModuleEntryTable::patch_javabase_entries(module_handle); > > --- > > src/share/vm/classfile/javaClasses.cpp > > This comment is no longer quite reads right now it is not the else > clause: > > 801 // java.base was defined at some point between calling > create_mirror() > 802 // and obtaining the Module_lock, patch this particular class > with java.base. > > suggest: > > // If java.base was already defined then patch this particular class > with java.base. > > > Thanks, > David > > >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 >> >> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, >> java/io, java/lang, and java/util tests using both fastdebug and >> slowdebug builds. The nsk cololocated and the non-colocated quick tests >> were also run against a slowdebug build. >> >> Thanks, Harold >> From harold.seigel at oracle.com Mon Oct 10 12:33:38 2016 From: harold.seigel at oracle.com (harold seigel) Date: Mon, 10 Oct 2016 08:33:38 -0400 Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out of order with lock Module_lock/6 -- possible deadlock In-Reply-To: References: Message-ID: <307d74fc-6abe-8376-adb9-ed21601655a1@oracle.com> Hi Robin, Thanks for the review and testing it. Harold On 10/10/2016 3:06 AM, Robbin Ehn wrote: > Thanks for fixing, looks good and works fine! > > /Robbin > > On 10/07/2016 05:20 PM, harold seigel wrote: >> Hi, >> >> Please review this fix for JDK-8166364. >> >> This fix moves the setting of the module fields in the class mirrors >> of the fixup_module_list outside of the Module_lock. The >> determination of whether a mirror should be >> added to the fixup_module_list is still done under Module_lock as is >> the defining of module java.base. This prevents any synchronization >> issues with a mirror being >> erroneously added to the fixup_module_list after module java.base is >> defined. The other piece is that the VM, in >> Modules::define_javabase_module(), guarantees under >> Module_lock that only one thread will ever successfully define module >> java.base. >> >> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/ >> >> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364 >> >> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, >> java/io, java/lang, and java/util tests using both fastdebug and >> slowdebug builds. The nsk >> cololocated and the non-colocated quick tests were also run against a >> slowdebug build. >> >> Thanks, Harold >> From mikael.gerdin at oracle.com Mon Oct 10 13:59:41 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Mon, 10 Oct 2016 15:59:41 +0200 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: Message-ID: Hi, On 2016-10-10 09:24, Shafi Ahmad wrote: > Hi All, > > Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'. > > Summary: > In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError. > While I was analysis this issue I found there are two jvm state where OOM can happen: > 1. OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is. > 2. OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it. > > Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ There is a lot of confusion in the VM code with the term "out of memory error". In some places it refers to code throwing a java.lang.OutOfMemoryError and expecting running java code to be able to potentially catch that Error and continue running. In other places, such as callers of report_vm_out_of_memory, the situation is much more dire and the calling thread may not even be a JavaThread and as such cannot "throw" an exception. report_vm_out_of_memory is only invoked through the macro vm_exit_out_of_memory, which of course implies that the condition is fatal and we are about to terminate the JVM process altogether. I think that it's incorrect to call code related to java.lang.OutOfMemoryError in report_vm_out_of_memory since the condition may not even be correlated with Java level application behavior. /Mikael > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > > Testing: jprt and jtreg (on Linux x86_64) > > Regards, > Shafi > From jiangli.zhou at oracle.com Mon Oct 10 17:04:53 2016 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 10 Oct 2016 10:04:53 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: References: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> Message-ID: <0C4A4742-70F4-419E-8C96-B8311C0D98CE@oracle.com> Thanks, David! Jiangli > On Oct 9, 2016, at 2:10 PM, David Holmes wrote: > > On 8/10/2016 4:23 AM, Jiangli Zhou wrote: >> Hi David, >> >> Thanks for taking a look. >> >>> On Oct 6, 2016, at 10:33 PM, David Holmes >> > wrote: >>> >>> Hi Jiangli, >>> >>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote: >>>> Hi, >>>> >>>> Please review the following fix for JDK-8167333 >>>> : >>>> >>>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >>>> >>>> >>>> When a shared class is transformed by a JVMTI agent during initial >>>> loading (via CFLH), the VM creates a new ClassFileStream using the >>>> transformed class data. The source path info from the class? >>>> associated SharedClassPathEntry is passed as the ?source? argument to >>>> ClassFileStream. However, some shared classes may not have an >>>> associated SharedClassPathEntry and the class_path_index is -1. The >>>> VM needs to detect such case and not passing an invalid source path info. >>> >>> It isn't obvious to me that all callers of >>> CFS::source()/clone_source() will handle getting a NULL. Of course I >>> can't tell which of those callers may be involved in this particular >>> use-case. >> >> I took a look of all the code that calls CFS::source()/clone_source(). >> They all handle the NULL case with explicit NULL check. For our specific >> case, the particular caller involved >> is InstanceKlass::print_loading_log. Before the fix, it crashed when >> trying to print the invalid cfs->source after (cfs->source() != NULL) check. > > Thanks for verifying. I've looked at the latest webrev with the additional asserts - all looks good. > > David > >> Thanks, >> Jiangli >> >>> >>> Thanks, >>> David >>> >>>> Tested with all existing class data sharing tests. >>>> >>>> Thanks, >>>> Jiangli >>>> >> From jiangli.zhou at oracle.com Mon Oct 10 17:10:27 2016 From: jiangli.zhou at oracle.com (Jiangli Zhou) Date: Mon, 10 Oct 2016 10:10:27 -0700 Subject: RFR: 8167333: Invalid source path info might be used when creating ClassFileStream after CFLH transforms a shared classes in some cases In-Reply-To: <57FB34EC.2070606@oracle.com> References: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com> <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com> <57FB34EC.2070606@oracle.com> Message-ID: <5E913B80-34CA-4744-906E-DC4C4E576E2B@oracle.com> Hi Ioi, Thanks for the review! Jiangli > On Oct 9, 2016, at 11:27 PM, Ioi Lam wrote: > > > > On 10/9/16 2:10 PM, David Holmes wrote: >> On 8/10/2016 4:23 AM, Jiangli Zhou wrote: >>> Hi David, >>> >>> Thanks for taking a look. >>> >>>> On Oct 6, 2016, at 10:33 PM, David Holmes >>> > wrote: >>>> >>>> Hi Jiangli, >>>> >>>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote: >>>>> Hi, >>>>> >>>>> Please review the following fix for JDK-8167333 >>>>> : >>>>> >>>>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ >>>>> >>>>> >>>>> When a shared class is transformed by a JVMTI agent during initial >>>>> loading (via CFLH), the VM creates a new ClassFileStream using the >>>>> transformed class data. The source path info from the class? >>>>> associated SharedClassPathEntry is passed as the ?source? argument to >>>>> ClassFileStream. However, some shared classes may not have an >>>>> associated SharedClassPathEntry and the class_path_index is -1. The >>>>> VM needs to detect such case and not passing an invalid source path info. >>>> >>>> It isn't obvious to me that all callers of >>>> CFS::source()/clone_source() will handle getting a NULL. Of course I >>>> can't tell which of those callers may be involved in this particular >>>> use-case. >>> >>> I took a look of all the code that calls CFS::source()/clone_source(). >>> They all handle the NULL case with explicit NULL check. For our specific >>> case, the particular caller involved >>> is InstanceKlass::print_loading_log. Before the fix, it crashed when >>> trying to print the invalid cfs->source after (cfs->source() != NULL) check. >> >> Thanks for verifying. I've looked at the latest webrev with the additional asserts - all looks good. >> > > Looks good to me, too. Thanks > > - Ioi >> David >> >>> Thanks, >>> Jiangli >>> >>>> >>>> Thanks, >>>> David >>>> >>>>> Tested with all existing class data sharing tests. >>>>> >>>>> Thanks, >>>>> Jiangli >>>>> >>> > From martin.doerr at sap.com Mon Oct 10 18:00:19 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 10 Oct 2016 18:00:19 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <57F77A4B.6060604@oracle.com> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> Message-ID: <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> Hi Claes, thank you very much for your explanations. I agree with you that it would be better to pad where the Monitors are used. It would still fulfill the purpose of this RFE without disturbing other usages. So I could introduce: class PaddedMonitor : public Monitor { enum { CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - (int)sizeof(Monitor), PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 }; char _padding[PADDING_LEN]; }; and similarly PaddedMutex and replace all of the ones which get allocated in a linear fashion (mutexLocker.cpp mutex_init()). Would you agree with this change? Thanks and best regards, Martin -----Original Message----- From: Claes Redestad [mailto:claes.redestad at oracle.com] Sent: Freitag, 7. Oktober 2016 12:35 To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Hi, I'm concerned that this might be an easy-but-wrong fix to a complex problem, and acknowledging that there are already use cases where the _name field is contra-productive. This change adds complexity that makes it even less likely such uses will be optimized for in the future. There are Padded* types put in place to deal with these concerns explicitly rather than implicitly *where it matters*, which allows us the choice of applying padding or not on a per use-case basis (which means we can also remove the _name field for those use cases that don't care about either, which might be most outside of the global lists). I am very concerned about false sharing, but I have no data to support that this change has any measurable benefit in practice: I even did an experiment years ago now where I turned _name into a pointer to not pad at all and saw nothing exceeding noise levels on any benchmark. Thanks! /Claes On 2016-10-07 12:18, Doerr, Martin wrote: > Hi Claes, > > what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler. > What exactly is your concern about the footprint overhead? > Are you not concerned about the risk of false sharing? > > Best regards, > Martin > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Freitag, 7. Oktober 2016 12:00 > To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > after due consideration I strongly consider this change unacceptable > since it adds footprint overhead to performance critcial compiler and > GC code with little to no data to support this won't cause regressions. > > Changes to Monitor/Mutex needs to be done with more surgical precision > than this. > > If I do have a veto on the matter, here it is. > > Thanks! > > /Claes > > On 2016-10-07 11:34, Doerr, Martin wrote: >> Hi Dan, >> >> thank you very much for reviewing and for investigating the history. >> >> It was not intended to make the functions you mentioned public. I've fixed that. >> I also updated the copyright information. >> >> New webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >> >> @Coleen: Please use this one. I have also added reviewer attribution. >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >> Sent: Donnerstag, 6. Oktober 2016 23:13 >> To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>> Hi, >>> >>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >>> >>> My proposed change is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >> >> src/share/vm/runtime/mutex.hpp >> Please update the copyright year before pushing. >> >> L172: // The default length of monitor name is chosen to avoid >> false sharing. >> L173: enum { >> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >> sizeof(MonitorBase), >> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >> CACHE_LINE_PADDING : 64 >> L176: }; >> L177: char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> I have to say that I'm not fond of the fact that MONITOR_NAME_LEN >> can vary between platforms; I like that it is a minimum of 64 bytes >> and is still a constant. >> >> I'm also not happy that the resulting sizeof(Monitor) may not >> be a multiple >> of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate >> that unhappiness >> with the fact that sizeof(Monitor) hasn't been a multiple of >> the cache line >> size since at least 2008 and no one complained (that I know of). >> >> So if I was making this change, I would make MONITOR_NAME_LEN >> 64 bytes >> (like it was) and add a pad field that would bring up >> sizeof(Monitor) >> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes >> would be >> unhappy with me and anyone embedding a Monitor into another data >> structure would be unhappy with me, but I'm used to that :-) >> >> So what you have is fine, especially for JDK9. >> >> L180: public: >> L181: #ifndef PRODUCT >> L182: debug_only(static bool contains(Monitor * locks, Monitor * >> lock);) >> L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * >> locks);) >> L184: debug_only(Monitor * >> get_least_ranked_lock_besides_this(Monitor * locks);) >> L185: #endif >> L186: >> L187: void set_owner_implementation(Thread* >> owner) PRODUCT_RETURN; >> L188: void check_prelock_state (Thread* >> thread) PRODUCT_RETURN; >> L189: void check_block_state (Thread* thread) >> >> These were all "protected" before. Now they are "public". >> Any particular reason? >> >> Thumbs up on the mechanics of this change. I'm interested in the >> answer to the "protected" versus "public" question, but don't >> considered that query to be a blocker. >> >> >> The rest of this isn't code review, but some of this caught >> my attention. >> >> src/share/vm/runtime/mutex.hpp >> >> old L84: // The default length of monitor name is chosen to be 64 >> to avoid false sharing. >> old L85: static const int MONITOR_NAME_LEN = 64; >> >> I had to look up the history of this comment: >> >> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >> changeset: 55:2a8eb116ebbe >> user: xlu >> date: Tue Feb 05 23:21:57 2008 -0800 >> summary: 6610420: Debug VM crashes during monitor lock rank checking >> >> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >> @@ -82,6 +82,9 @@ class ParkEvent ; >> // *in that order*. If their implementations change such that these >> // assumptions are violated, a whole lot of code will break. >> >> +// The default length of monitor name is choosen to be 64 to avoid >> false sharing. >> +static const int MONITOR_NAME_LEN = 64; >> + >> class Monitor : public CHeapObj { >> >> public: >> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >> volatile bool _snuck; // Used for sneaky locking >> (evil). >> - const char * _name; // Name of mutex >> int NotifyCount ; // diagnostic assist >> - double pad [8] ; // avoid false sharing >> + char _name[MONITOR_NAME_LEN]; // Name of mutex >> >> // Debugging fields for naming, deadlock detection, etc. (some only >> used in debug mode) >> #ifndef PRODUCT >> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >> int ILocked () ; >> >> protected: >> - static void ClearMonitor (Monitor * m) ; >> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >> Monitor() ; >> >> So the original code had an 8-double pad for avoiding false sharing. >> Sounds very much like the old ObjectMonitor padding. I'm sure at the >> time that Dice determined that 8-double value, the result was to pad >> the size of Monitor to an even multiple of a particular cache line >> size. >> >> Xiobin changed the 'name' field to be an array so that the name >> chars could serve double duty as the cache line pad... pun intended. >> Unfortunately that pad doesn't make sure that the resulting Monitor >> size is a multiple of the cache line size. >> >> Dan >> >> >>> >>> Please review. If will also need a sponsor. >>> >>> Thanks and best regards, >>> Martin >>> >> From Derek.White at cavium.com Fri Oct 7 17:48:26 2016 From: Derek.White at cavium.com (White, Derek) Date: Fri, 7 Oct 2016 17:48:26 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> <1475836735.2622.82.camel@oracle.com> Message-ID: FYI, On the aarch64 side, this change would turn a CAS+acquire/release semantics (CASAL) into a naked CAS. In v8.1, or removes a post write barrier after doing a series of load/store -exclusives. - Derek -----Original Message----- From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of David Holmes Sent: Friday, October 07, 2016 8:08 AM To: Thomas Schatzl ; Kim Barrett Cc: hotspot-compiler-dev ; Hiroshi H Horii ; Tim Ellison ; ppc-aix-port-dev at openjdk.java.net; Michihiro Horie ; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 Thomas, > This change "only" impacts ppc64 at this time. This is a dangerous stance. The changes are to shared code. It only happens that only PPC atomic implementations support anything other than conservative barriers today. If someone adds additional forms on other platforms their GC code suddenly has new behaviour! Changes in shared code must be algorithmically correct on all platforms. Not just "it will work fine today". Given all then work being done to add missing barriers, removing them must come with a detailed analysis establishing the safety of doing so. And I am not seeing that here. David On 7/10/2016 8:38 PM, Thomas Schatzl wrote: > Hi, > > On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote: >>> >>> On Oct 5, 2016, at 9:36 PM, David Holmes >>> wrote: >>> >>> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote: >>>> >>>> Dear David, >>>> >>>> Thank you for your comments. >>>> >>>> I just used to think that it may be better that >>>> copy_to_survivor_space doesn't return forwardee if CAS was failed >>>> in order to prevent from reading fields in forwardee. But as you >>>> pointed, this extends fix for this topic. >>>> >>>> I removed two NULL assignments from the previous wevrev. >>>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/ >>> Which simply takes us back to where we were. It may not be safe for >>> the caller of those methods to access the fields of the returned >>> "forwardee". >>> >>> Sorry but I'm not seeing anything here that justifies removing the >>> barriers from the cas in this code. GC lurkers feel free to jump in >>> here - this is your code afterall! ;-) >>> >>> David >>> ----- >> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems >> to me to be extremely fragile and hard to reason about. The places >> where that copied object might escape to and be examined seem to be >> myriad. And not only do we need to worry about them today, but also >> for future maintenance. Even if it can modified and shown to be >> correct today, it would be very easy to intoduce a bug later, as >> should be obvious from the various issues pointed out so far during >> this review. >> >> The key issue here is that we copy obj into new_obj, and then make >> new_obj accessible to other threads via the CAS. Those other threads >> might attempt to access data in new_obj. This suggests the CAS ought >> to have at least a release fence to ensure the copy is complete >> before the CAS is performed. No amount of fencing on the read side >> (such a in the work stealing) can remove that need. > > Depending on what "other threads" means. > > The thread that pops the reference should be okay (as it does a > fence), because the thread pushing the entry on the mark stack also > releases all stores. > > Threads not participating in this protocol are problematic, and this > is indeed worrying me as well a bit. > I have not seen any so far, but there is always a risk of overlooking > some place. > >> And that might be all that is needed. On the post-CAS side, we load >> the forwardee and then load values from it. I thik we can use >> implicit consume with dependent loads (except on Alpha) plus the >> suggested release fence to get the desired effect. (If not, use an >> acquire form of forwardee()?) >> >> I'm not certain that just a release fence is sufficient (I'm less >> familiar with ParallelGC than I'd like for looking at something like >> this), but I'm pretty sure I wouldn't want to go any weaker than >> that. > > This change "only" impacts ppc64 at this time. > > Thanks, > Thomas > From HORII at jp.ibm.com Mon Oct 10 14:30:47 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Mon, 10 Oct 2016 23:30:47 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Hi Thomas, David, and all, > I think you intended to modify cmpxchg_pre_membar not > cmpxchg_post_membar! The previous patch will change only behavior of cmpxchg_pre_membar. But the patch is not good to be reviewed (it was not obvious) and Martin suggested me to use lwsync rather than sync. I created a new webrev. This webrev includes all points that David and Thomas pointed also. http://cr.openjdk.java.net/~horii/8154736/webrev.05/ With this change, callers of copy_to_survivor_space can safely touch fields of returned obj because OrderAccess::acquire() is called in copy_to_survivor_space when CAS fails. > Changes in shared code must be algorithmically correct on all platforms. > Not just "it will work fine today". > > Given all then work being done to add missing barriers, removing them > must come with a detailed analysis establishing the safety of doing so. > And I am not seeing that here. The latest codes in the repository are missing some calls of OrderAccess::acquire() before touching fileds of new_obj or o->forwardee() in PSPromotionManager::copy_and_push_safe_barrier and copy_to_survivor_space respectivey. I believe, this webrev correct them, also. Some methods call forwardee(). However, they don't toruch fields of forwardee while copying survived objects to a survivor space. PSMarkSweepDecorator::compact() PSPromotionManager::process_array_chunk() PSPromotionManager::claim_or_forward_internal_depth() Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From coleen.phillimore at oracle.com Tue Oct 11 00:03:20 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Mon, 10 Oct 2016 20:03:20 -0400 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> Message-ID: Hi, Was the linear allocation in mutex.cpp the cause of the false sharing that you observed? I think I like this change better than the original, because I've wondered myself why the name string was so long. So with this, we could make Monitor's smaller if they're embedded in metadata or other structures. Thanks, Coleen On 10/10/16 2:00 PM, Doerr, Martin wrote: > Hi Claes, > > thank you very much for your explanations. > > I agree with you that it would be better to pad where the Monitors are used. It would still fulfill the purpose of this RFE without disturbing other usages. > > So I could introduce: > class PaddedMonitor : public Monitor { > enum { > CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - (int)sizeof(Monitor), > PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 > }; > char _padding[PADDING_LEN]; > }; > and similarly PaddedMutex and replace all of the ones which get allocated in a linear fashion (mutexLocker.cpp mutex_init()). > > Would you agree with this change? > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Freitag, 7. Oktober 2016 12:35 > To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > I'm concerned that this might be an easy-but-wrong fix to a complex > problem, and acknowledging that there are already use cases where the > _name field is contra-productive. This change adds complexity that > makes it even less likely such uses will be optimized for in the > future. > > There are Padded* types put in place to deal with these concerns > explicitly rather than implicitly *where it matters*, which allows us > the choice of applying padding or not on a per use-case basis (which > means we can also remove the _name field for those use cases that don't > care about either, which might be most outside of the global lists). > > I am very concerned about false sharing, but I have no data to support > that this change has any measurable benefit in practice: I even did an > experiment years ago now where I turned _name into a pointer to not pad > at all and saw nothing exceeding noise levels on any benchmark. > > Thanks! > > /Claes > > On 2016-10-07 12:18, Doerr, Martin wrote: >> Hi Claes, >> >> what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler. >> What exactly is your concern about the footprint overhead? >> Are you not concerned about the risk of false sharing? >> >> Best regards, >> Martin >> >> -----Original Message----- >> From: Claes Redestad [mailto:claes.redestad at oracle.com] >> Sent: Freitag, 7. Oktober 2016 12:00 >> To: Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) ; Coleen Phillimore (coleen.phillimore at oracle.com) >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> Hi, >> >> after due consideration I strongly consider this change unacceptable >> since it adds footprint overhead to performance critcial compiler and >> GC code with little to no data to support this won't cause regressions. >> >> Changes to Monitor/Mutex needs to be done with more surgical precision >> than this. >> >> If I do have a veto on the matter, here it is. >> >> Thanks! >> >> /Claes >> >> On 2016-10-07 11:34, Doerr, Martin wrote: >>> Hi Dan, >>> >>> thank you very much for reviewing and for investigating the history. >>> >>> It was not intended to make the functions you mentioned public. I've fixed that. >>> I also updated the copyright information. >>> >>> New webrev is here: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>> >>> @Coleen: Please use this one. I have also added reviewer attribution. >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>> To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >>> >>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>> Hi, >>>> >>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing. >>>> >>>> My proposed change is here: >>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>> src/share/vm/runtime/mutex.hpp >>> Please update the copyright year before pushing. >>> >>> L172: // The default length of monitor name is chosen to avoid >>> false sharing. >>> L173: enum { >>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>> sizeof(MonitorBase), >>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>> CACHE_LINE_PADDING : 64 >>> L176: }; >>> L177: char _name[MONITOR_NAME_LEN]; // Name of mutex >>> >>> I have to say that I'm not fond of the fact that MONITOR_NAME_LEN >>> can vary between platforms; I like that it is a minimum of 64 bytes >>> and is still a constant. >>> >>> I'm also not happy that the resulting sizeof(Monitor) may not >>> be a multiple >>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate >>> that unhappiness >>> with the fact that sizeof(Monitor) hasn't been a multiple of >>> the cache line >>> size since at least 2008 and no one complained (that I know of). >>> >>> So if I was making this change, I would make MONITOR_NAME_LEN >>> 64 bytes >>> (like it was) and add a pad field that would bring up >>> sizeof(Monitor) >>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes >>> would be >>> unhappy with me and anyone embedding a Monitor into another data >>> structure would be unhappy with me, but I'm used to that :-) >>> >>> So what you have is fine, especially for JDK9. >>> >>> L180: public: >>> L181: #ifndef PRODUCT >>> L182: debug_only(static bool contains(Monitor * locks, Monitor * >>> lock);) >>> L183: debug_only(static Monitor * get_least_ranked_lock(Monitor * >>> locks);) >>> L184: debug_only(Monitor * >>> get_least_ranked_lock_besides_this(Monitor * locks);) >>> L185: #endif >>> L186: >>> L187: void set_owner_implementation(Thread* >>> owner) PRODUCT_RETURN; >>> L188: void check_prelock_state (Thread* >>> thread) PRODUCT_RETURN; >>> L189: void check_block_state (Thread* thread) >>> >>> These were all "protected" before. Now they are "public". >>> Any particular reason? >>> >>> Thumbs up on the mechanics of this change. I'm interested in the >>> answer to the "protected" versus "public" question, but don't >>> considered that query to be a blocker. >>> >>> >>> The rest of this isn't code review, but some of this caught >>> my attention. >>> >>> src/share/vm/runtime/mutex.hpp >>> >>> old L84: // The default length of monitor name is chosen to be 64 >>> to avoid false sharing. >>> old L85: static const int MONITOR_NAME_LEN = 64; >>> >>> I had to look up the history of this comment: >>> >>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>> changeset: 55:2a8eb116ebbe >>> user: xlu >>> date: Tue Feb 05 23:21:57 2008 -0800 >>> summary: 6610420: Debug VM crashes during monitor lock rank checking >>> >>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>> @@ -82,6 +82,9 @@ class ParkEvent ; >>> // *in that order*. If their implementations change such that these >>> // assumptions are violated, a whole lot of code will break. >>> >>> +// The default length of monitor name is choosen to be 64 to avoid >>> false sharing. >>> +static const int MONITOR_NAME_LEN = 64; >>> + >>> class Monitor : public CHeapObj { >>> >>> public: >>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>> volatile bool _snuck; // Used for sneaky locking >>> (evil). >>> - const char * _name; // Name of mutex >>> int NotifyCount ; // diagnostic assist >>> - double pad [8] ; // avoid false sharing >>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>> >>> // Debugging fields for naming, deadlock detection, etc. (some only >>> used in debug mode) >>> #ifndef PRODUCT >>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>> int ILocked () ; >>> >>> protected: >>> - static void ClearMonitor (Monitor * m) ; >>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>> Monitor() ; >>> >>> So the original code had an 8-double pad for avoiding false sharing. >>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>> time that Dice determined that 8-double value, the result was to pad >>> the size of Monitor to an even multiple of a particular cache line >>> size. >>> >>> Xiobin changed the 'name' field to be an array so that the name >>> chars could serve double duty as the cache line pad... pun intended. >>> Unfortunately that pad doesn't make sure that the resulting Monitor >>> size is a multiple of the cache line size. >>> >>> Dan >>> >>> >>>> Please review. If will also need a sponsor. >>>> >>>> Thanks and best regards, >>>> Martin >>>> From david.holmes at oracle.com Tue Oct 11 00:35:05 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Oct 2016 10:35:05 +1000 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: Message-ID: <3cec9797-64dc-d767-255c-8ce3fb66b7bb@oracle.com> Hi Shafi, On 10/10/2016 11:59 PM, Mikael Gerdin wrote: > Hi, > > On 2016-10-10 09:24, Shafi Ahmad wrote: >> Hi All, >> >> Please review the simple change for the fix of bug '' JDK-8155004: >> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to >> create threads'. >> >> Summary: >> In the current implementation there are few scenarios where we are not >> obeying the jvm option -XX:+CrashOnOutOfMemoryError. >> While I was analysis this issue I found there are two jvm state where >> OOM can happen: >> 1. OOM during VM initialization - as per our internal discussion for >> this case it is not worth for dumping core file, so this is left as it >> is. >> 2. OOM once VM is initialized - For this scenario most of the place >> code is already added but few place corresponding code changes are >> missing so this change covers it. >> >> Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > > > There is a lot of confusion in the VM code with the term "out of memory > error". > In some places it refers to code throwing a java.lang.OutOfMemoryError > and expecting running java code to be able to potentially catch that > Error and continue running. > > In other places, such as callers of report_vm_out_of_memory, the > situation is much more dire and the calling thread may not even be a > JavaThread and as such cannot "throw" an exception. > report_vm_out_of_memory is only invoked through the macro > vm_exit_out_of_memory, which of course implies that the condition is > fatal and we are about to terminate the JVM process altogether. > > I think that it's incorrect to call code related to > java.lang.OutOfMemoryError in report_vm_out_of_memory since the > condition may not even be correlated with Java level application behavior. I totally agree with Mikael. A call to report_java_out_of_memory should only be made on a code path that will throw an OOME. There is a lot of contention over how things like HeapDumpOnOutOfMemory and CrashOnOutOfMemory should behave given the various reasons why we can run out of memory. I see little point in doing a heap dump, for example, if we did not exhaust the heap. I think there are a lot of issues with this mechanism and the placement of some of the calls to report_java_out_of_memory are questionable (eg should it come before or after posting JVMTI resource exhaustion events? should it come before or after vm initialization checks? [I think after, but that isn't always so!]). In the context of this fix the change to jvm.cpp, in JVM_StartThread, is acceptable. And it addresses the request made by the bug submitter. Thanks, David > /Mikael > >> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 >> >> Testing: jprt and jtreg (on Linux x86_64) >> >> Regards, >> Shafi >> From david.holmes at oracle.com Tue Oct 11 01:12:16 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Oct 2016 11:12:16 +1000 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> Message-ID: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> Ok. I will sponsor this once hs is open again. Thanks, David On 6/10/2016 10:10 PM, Alan Burlison wrote: > On 04/10/2016 19:37, Alan Burlison wrote: > >>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >>> already being included. >> >> Cool, I'll pop that in instead - thanks! > > Done, webrev updated, jprt hotspot testset is clean. > From david.holmes at oracle.com Tue Oct 11 01:55:12 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Oct 2016 11:55:12 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB Message-ID: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Turns out the only place changes were needed were in JDI. Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change in ObjectReference is very simple and there is a CCC request in progress to ratify that change. The implementation change in ObjectReferenceImpl mirrors the updated spec and use the same format as already present in the class version of the check method. The test is a little more complex. This is obviously an extension to what is already tested in InterfaceMethodsTest. However IMT has a number of problem with the way it is currently written [1] - specifically it doesn't properly separate method lookup from method invocation. So I've added the capability to separate lookup and invocation for use with the private interface methods - I have not tried to address shortcomings of the existing tests. Though I did fix the return value checking logic! And did some clarifying comments and renaming in a couple of place. Still on the test I can't add the negative tests I would like to add because they actually pass due to a different long standing bug in JDI - [2]. So the actual private interface method testing is very simple: can I get the Method from the InterfaceType for the interface declaring the method? Can I then invoke that method on an instance of a class that implements the interface. Thanks, David [1] https://bugs.openjdk.java.net/browse/JDK-8166453 [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From calvin.cheung at oracle.com Tue Oct 11 03:59:40 2016 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Mon, 10 Oct 2016 20:59:40 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file Message-ID: <57FC63AC.3020809@oracle.com> Please review this small fix for not including classes in the classlist file which are unusable during run time. bug: https://bugs.openjdk.java.net/browse/JDK-8166931 webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ Testing: JPRT with -testset hotspot jtreg tests under hotspot/runtime on all supported platforms (in progress) thanks, Calvin From aph at redhat.com Tue Oct 11 09:25:52 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 11 Oct 2016 10:25:52 +0100 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: On 06/10/16 23:16, Kim Barrett wrote: > The key issue here is that we copy obj into new_obj, and then make > new_obj accessible to other threads via the CAS. Those other > threads might attempt to access data in new_obj. This suggests the > CAS ought to have at least a release fence to ensure the copy is > complete before the CAS is performed. No amount of fencing on the > read side (such as in the work stealing) can remove that need. I agree. > And that might be all that is needed. On the post-CAS side, we load > the forwardee and then load values from it. I thik we can use > implicit consume with dependent loads (except on Alpha) plus the > suggested release fence to get the desired effect. That's probably true, except that there's not really any such thing as "implicit consume" in C++. While all of the hardware we use respects address dependencies, it's not something that the compiler knows about, and it's explicitly undefined behaviour in the C++ memory model. If we're depending on memory_order_consume, perhaps we ought to think about adding it to Atomic, even though it's just a volatile load in older compilers. Andrew. From Alan.Burlison at oracle.com Tue Oct 11 09:31:54 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Tue, 11 Oct 2016 10:31:54 +0100 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> Message-ID: <870f60a7-b0c0-eaae-7a59-7bea05c323af@oracle.com> On 11/10/2016 02:12, David Holmes wrote: > Ok. I will sponsor this once hs is open again. Thanks, is there a schedule somewhere for that which I can go look at? -- Alan Burlison -- From claes.redestad at oracle.com Tue Oct 11 10:05:15 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 11 Oct 2016 12:05:15 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> Message-ID: <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> Hi, On 2016-10-11 02:03, Coleen Phillimore wrote: > > Hi, > > Was the linear allocation in mutex.cpp the cause of the false sharing > that you observed? I think I like this change better than the > original, because I've wondered myself why the name string was so > long. So with this, we could make Monitor's smaller if they're > embedded in metadata or other structures. Music to my ears! I even think most embedded uses would see improvements if _name was removed entirely (or "simply" turned into a const char * so that it's not copied and embedded into the Monitor/Mutex) > > Thanks, > Coleen > > On 10/10/16 2:00 PM, Doerr, Martin wrote: >> Hi Claes, >> >> thank you very much for your explanations. >> >> I agree with you that it would be better to pad where the Monitors >> are used. It would still fulfill the purpose of this RFE without >> disturbing other usages. >> >> So I could introduce: >> class PaddedMonitor : public Monitor { >> enum { >> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >> (int)sizeof(Monitor), >> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >> }; >> char _padding[PADDING_LEN]; >> }; >> and similarly PaddedMutex and replace all of the ones which get >> allocated in a linear fashion (mutexLocker.cpp mutex_init()). Sure! Some compilers may take issue with cases where PADDING_LEN == 0 (since char _padding[0] is technically illegal C++, but works on gcc etc) so maybe that special case will have to be (somewhat excessively): PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : DEFAULT_CACHE_LINE_SIZE We took a look at if it'd be feasible to express class PaddedMonitor : public PaddedEnd, but it appears that'd require variadic template arguments (C++11) to get right (since we'd need PaddedEnd to transitively publish constructors of Monitor). Thanks! /Claes >> >> Would you agree with this change? >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Claes Redestad [mailto:claes.redestad at oracle.com] >> Sent: Freitag, 7. Oktober 2016 12:35 >> To: Doerr, Martin ; >> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >> David Holmes (david.holmes at oracle.com) ; >> Coleen Phillimore (coleen.phillimore at oracle.com) >> >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >> DEFAULT_CACHE_LINE_SIZE >> >> Hi, >> >> I'm concerned that this might be an easy-but-wrong fix to a complex >> problem, and acknowledging that there are already use cases where the >> _name field is contra-productive. This change adds complexity that >> makes it even less likely such uses will be optimized for in the >> future. >> >> There are Padded* types put in place to deal with these concerns >> explicitly rather than implicitly *where it matters*, which allows us >> the choice of applying padding or not on a per use-case basis (which >> means we can also remove the _name field for those use cases that don't >> care about either, which might be most outside of the global lists). >> >> I am very concerned about false sharing, but I have no data to support >> that this change has any measurable benefit in practice: I even did an >> experiment years ago now where I turned _name into a pointer to not pad >> at all and saw nothing exceeding noise levels on any benchmark. >> >> Thanks! >> >> /Claes >> >> On 2016-10-07 12:18, Doerr, Martin wrote: >>> Hi Claes, >>> >>> what the change basically does is that the _name[] field gets >>> enlarged by 8 bytes on platforms with 128 byte >>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>> by the C++ compiler. >>> What exactly is your concern about the footprint overhead? >>> Are you not concerned about the risk of false sharing? >>> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Freitag, 7. Oktober 2016 12:00 >>> To: Doerr, Martin ; >>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) ; >>> Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> after due consideration I strongly consider this change unacceptable >>> since it adds footprint overhead to performance critcial compiler and >>> GC code with little to no data to support this won't cause regressions. >>> >>> Changes to Monitor/Mutex needs to be done with more surgical precision >>> than this. >>> >>> If I do have a veto on the matter, here it is. >>> >>> Thanks! >>> >>> /Claes >>> >>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>> Hi Dan, >>>> >>>> thank you very much for reviewing and for investigating the history. >>>> >>>> It was not intended to make the functions you mentioned public. >>>> I've fixed that. >>>> I also updated the copyright information. >>>> >>>> New webrev is here: >>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>> >>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>> >>>> Thanks and best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>> To: Doerr, Martin ; >>>> hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>> padding is to avoid false sharing. >>>>> >>>>> My proposed change is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>> src/share/vm/runtime/mutex.hpp >>>> Please update the copyright year before pushing. >>>> >>>> L172: // The default length of monitor name is chosen to >>>> avoid >>>> false sharing. >>>> L173: enum { >>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>> sizeof(MonitorBase), >>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>> CACHE_LINE_PADDING : 64 >>>> L176: }; >>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>> mutex >>>> >>>> I have to say that I'm not fond of the fact that >>>> MONITOR_NAME_LEN >>>> can vary between platforms; I like that it is a minimum >>>> of 64 bytes >>>> and is still a constant. >>>> >>>> I'm also not happy that the resulting sizeof(Monitor) >>>> may not >>>> be a multiple >>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>> mitigate >>>> that unhappiness >>>> with the fact that sizeof(Monitor) hasn't been a >>>> multiple of >>>> the cache line >>>> size since at least 2008 and no one complained (that I >>>> know of). >>>> >>>> So if I was making this change, I would make >>>> MONITOR_NAME_LEN >>>> 64 bytes >>>> (like it was) and add a pad field that would bring up >>>> sizeof(Monitor) >>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>> Claes >>>> would be >>>> unhappy with me and anyone embedding a Monitor into >>>> another data >>>> structure would be unhappy with me, but I'm used to >>>> that :-) >>>> >>>> So what you have is fine, especially for JDK9. >>>> >>>> L180: public: >>>> L181: #ifndef PRODUCT >>>> L182: debug_only(static bool contains(Monitor * locks, >>>> Monitor * >>>> lock);) >>>> L183: debug_only(static Monitor * >>>> get_least_ranked_lock(Monitor * >>>> locks);) >>>> L184: debug_only(Monitor * >>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>> L185: #endif >>>> L186: >>>> L187: void set_owner_implementation(Thread* >>>> owner) PRODUCT_RETURN; >>>> L188: void check_prelock_state (Thread* >>>> thread) PRODUCT_RETURN; >>>> L189: void check_block_state (Thread* thread) >>>> >>>> These were all "protected" before. Now they are "public". >>>> Any particular reason? >>>> >>>> Thumbs up on the mechanics of this change. I'm interested in the >>>> answer to the "protected" versus "public" question, but don't >>>> considered that query to be a blocker. >>>> >>>> >>>> The rest of this isn't code review, but some of this caught >>>> my attention. >>>> >>>> src/share/vm/runtime/mutex.hpp >>>> >>>> old L84: // The default length of monitor name is chosen to >>>> be 64 >>>> to avoid false sharing. >>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>> >>>> I had to look up the history of this comment: >>>> >>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>> changeset: 55:2a8eb116ebbe >>>> user: xlu >>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>> checking >>>> >>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>> // *in that order*. If their implementations change such that >>>> these >>>> // assumptions are violated, a whole lot of code will break. >>>> >>>> +// The default length of monitor name is choosen to be 64 to avoid >>>> false sharing. >>>> +static const int MONITOR_NAME_LEN = 64; >>>> + >>>> class Monitor : public CHeapObj { >>>> >>>> public: >>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>> volatile bool _snuck; // Used for sneaky >>>> locking >>>> (evil). >>>> - const char * _name; // Name of mutex >>>> int NotifyCount ; // diagnostic assist >>>> - double pad [8] ; // avoid false sharing >>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>> >>>> // Debugging fields for naming, deadlock detection, etc. >>>> (some only >>>> used in debug mode) >>>> #ifndef PRODUCT >>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>> int ILocked () ; >>>> >>>> protected: >>>> - static void ClearMonitor (Monitor * m) ; >>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>> Monitor() ; >>>> >>>> So the original code had an 8-double pad for avoiding false sharing. >>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>> time that Dice determined that 8-double value, the result was to pad >>>> the size of Monitor to an even multiple of a particular cache line >>>> size. >>>> >>>> Xiobin changed the 'name' field to be an array so that the name >>>> chars could serve double duty as the cache line pad... pun intended. >>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>> size is a multiple of the cache line size. >>>> >>>> Dan >>>> >>>> >>>>> Please review. If will also need a sponsor. >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> > From lois.foltan at oracle.com Tue Oct 11 11:38:06 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 11 Oct 2016 07:38:06 -0400 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FC63AC.3020809@oracle.com> References: <57FC63AC.3020809@oracle.com> Message-ID: <57FCCF1E.1080703@oracle.com> On 10/10/2016 11:59 PM, Calvin Cheung wrote: > > Please review this small fix for not including classes in the > classlist file which are unusable during run time. > > bug: https://bugs.openjdk.java.net/browse/JDK-8166931 > > webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ Hi Calvin, src/share/vm/classfile/classFileParser.cpp - line #5781, I find the if statement logic to be somewhat confusing. This check seems to be only for classes defined to the boot and platform class loader. I am assuming it does not apply to the application class loader because there is no way to differentiate a class defined to the application class loader from being on the --patch-module list and the -classpath? Is that why the if statement logic does not include the application class loader? Maybe it is enough to improve the comment to something like: // For the boot and platform class loaders, check if the class is not found in the java runtime image // or the boot loader's appended entries. This indicates that the class must be located on the --patch-module list and // is not useable during run time, so should be skipped. Then please indent the start of line #5782 by one space to show that the check for the platform class loader is part of that first || expression. test/runtime/modules/PatchModule/PatchModuleClassList.java - good test! Thanks, Lois > > Testing: > JPRT with -testset hotspot > jtreg tests under hotspot/runtime on all supported platforms (in > progress) > > thanks, > Calvin From lois.foltan at oracle.com Tue Oct 11 11:48:10 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 11 Oct 2016 07:48:10 -0400 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: <57FCD17A.6000501@oracle.com> Hi David, This looks good and I like the improvements you made to the test. Lois On 10/10/2016 9:55 PM, David Holmes wrote: > Turns out the only place changes were needed were in JDI. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 > > webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ > > The spec change in ObjectReference is very simple and there is a CCC > request in progress to ratify that change. > > The implementation change in ObjectReferenceImpl mirrors the updated > spec and use the same format as already present in the class version > of the check method. > > The test is a little more complex. This is obviously an extension to > what is already tested in InterfaceMethodsTest. However IMT has a > number of problem with the way it is currently written [1] - > specifically it doesn't properly separate method lookup from method > invocation. So I've added the capability to separate lookup and > invocation for use with the private interface methods - I have not > tried to address shortcomings of the existing tests. Though I did fix > the return value checking logic! And did some clarifying comments and > renaming in a couple of place. > > Still on the test I can't add the negative tests I would like to add > because they actually pass due to a different long standing bug in JDI > - [2]. So the actual private interface method testing is very simple: > can I get the Method from the InterfaceType for the interface > declaring the method? Can I then invoke that method on an instance of > a class that implements the interface. > > Thanks, > David > > [1] https://bugs.openjdk.java.net/browse/JDK-8166453 > [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From david.holmes at oracle.com Tue Oct 11 13:33:35 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 11 Oct 2016 23:33:35 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <57FCD17A.6000501@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <57FCD17A.6000501@oracle.com> Message-ID: Thanks for looking at this Lois! David On 11/10/2016 9:48 PM, Lois Foltan wrote: > Hi David, > This looks good and I like the improvements you made to the test. > Lois > > On 10/10/2016 9:55 PM, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version >> of the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a >> number of problem with the way it is currently written [1] - >> specifically it doesn't properly separate method lookup from method >> invocation. So I've added the capability to separate lookup and >> invocation for use with the private interface methods - I have not >> tried to address shortcomings of the existing tests. Though I did fix >> the return value checking logic! And did some clarifying comments and >> renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI >> - [2]. So the actual private interface method testing is very simple: >> can I get the Method from the InterfaceType for the interface >> declaring the method? Can I then invoke that method on an instance of >> a class that implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > From jiangli.zhou at Oracle.COM Tue Oct 11 15:53:18 2016 From: jiangli.zhou at Oracle.COM (Jiangli Zhou) Date: Tue, 11 Oct 2016 08:53:18 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FC63AC.3020809@oracle.com> References: <57FC63AC.3020809@oracle.com> Message-ID: Looks good. Thanks, Jiangli > On Oct 10, 2016, at 8:59 PM, Calvin Cheung wrote: > > > Please review this small fix for not including classes in the classlist file which are unusable during run time. > > bug: https://bugs.openjdk.java.net/browse/JDK-8166931 > > webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ > > Testing: > JPRT with -testset hotspot > jtreg tests under hotspot/runtime on all supported platforms (in progress) > > thanks, > Calvin From martin.doerr at sap.com Tue Oct 11 16:26:29 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Tue, 11 Oct 2016 16:26:29 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> Message-ID: Hi all, I came to the same conclusion regarding inheritance from PaddingEnd. Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.) I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid. Here's a new webrev: http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?). Please take a look. Thanks and best regards, Martin -----Original Message----- From: Claes Redestad [mailto:claes.redestad at oracle.com] Sent: Dienstag, 11. Oktober 2016 12:05 To: Coleen Phillimore ; Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Hi, On 2016-10-11 02:03, Coleen Phillimore wrote: > > Hi, > > Was the linear allocation in mutex.cpp the cause of the false sharing > that you observed? I think I like this change better than the > original, because I've wondered myself why the name string was so > long. So with this, we could make Monitor's smaller if they're > embedded in metadata or other structures. Music to my ears! I even think most embedded uses would see improvements if _name was removed entirely (or "simply" turned into a const char * so that it's not copied and embedded into the Monitor/Mutex) > > Thanks, > Coleen > > On 10/10/16 2:00 PM, Doerr, Martin wrote: >> Hi Claes, >> >> thank you very much for your explanations. >> >> I agree with you that it would be better to pad where the Monitors >> are used. It would still fulfill the purpose of this RFE without >> disturbing other usages. >> >> So I could introduce: >> class PaddedMonitor : public Monitor { >> enum { >> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >> (int)sizeof(Monitor), >> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >> }; >> char _padding[PADDING_LEN]; >> }; >> and similarly PaddedMutex and replace all of the ones which get >> allocated in a linear fashion (mutexLocker.cpp mutex_init()). Sure! Some compilers may take issue with cases where PADDING_LEN == 0 (since char _padding[0] is technically illegal C++, but works on gcc etc) so maybe that special case will have to be (somewhat excessively): PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : DEFAULT_CACHE_LINE_SIZE We took a look at if it'd be feasible to express class PaddedMonitor : public PaddedEnd, but it appears that'd require variadic template arguments (C++11) to get right (since we'd need PaddedEnd to transitively publish constructors of Monitor). Thanks! /Claes >> >> Would you agree with this change? >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Claes Redestad [mailto:claes.redestad at oracle.com] >> Sent: Freitag, 7. Oktober 2016 12:35 >> To: Doerr, Martin ; >> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >> David Holmes (david.holmes at oracle.com) ; >> Coleen Phillimore (coleen.phillimore at oracle.com) >> >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >> DEFAULT_CACHE_LINE_SIZE >> >> Hi, >> >> I'm concerned that this might be an easy-but-wrong fix to a complex >> problem, and acknowledging that there are already use cases where the >> _name field is contra-productive. This change adds complexity that >> makes it even less likely such uses will be optimized for in the >> future. >> >> There are Padded* types put in place to deal with these concerns >> explicitly rather than implicitly *where it matters*, which allows us >> the choice of applying padding or not on a per use-case basis (which >> means we can also remove the _name field for those use cases that don't >> care about either, which might be most outside of the global lists). >> >> I am very concerned about false sharing, but I have no data to support >> that this change has any measurable benefit in practice: I even did an >> experiment years ago now where I turned _name into a pointer to not pad >> at all and saw nothing exceeding noise levels on any benchmark. >> >> Thanks! >> >> /Claes >> >> On 2016-10-07 12:18, Doerr, Martin wrote: >>> Hi Claes, >>> >>> what the change basically does is that the _name[] field gets >>> enlarged by 8 bytes on platforms with 128 byte >>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>> by the C++ compiler. >>> What exactly is your concern about the footprint overhead? >>> Are you not concerned about the risk of false sharing? >>> >>> Best regards, >>> Martin >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Freitag, 7. Oktober 2016 12:00 >>> To: Doerr, Martin ; >>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) ; >>> Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> after due consideration I strongly consider this change unacceptable >>> since it adds footprint overhead to performance critcial compiler and >>> GC code with little to no data to support this won't cause regressions. >>> >>> Changes to Monitor/Mutex needs to be done with more surgical precision >>> than this. >>> >>> If I do have a veto on the matter, here it is. >>> >>> Thanks! >>> >>> /Claes >>> >>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>> Hi Dan, >>>> >>>> thank you very much for reviewing and for investigating the history. >>>> >>>> It was not intended to make the functions you mentioned public. >>>> I've fixed that. >>>> I also updated the copyright information. >>>> >>>> New webrev is here: >>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>> >>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>> >>>> Thanks and best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>> To: Doerr, Martin ; >>>> hotspot-runtime-dev at openjdk.java.net >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>> Hi, >>>>> >>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>> padding is to avoid false sharing. >>>>> >>>>> My proposed change is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>> src/share/vm/runtime/mutex.hpp >>>> Please update the copyright year before pushing. >>>> >>>> L172: // The default length of monitor name is chosen to >>>> avoid >>>> false sharing. >>>> L173: enum { >>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>> sizeof(MonitorBase), >>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>> CACHE_LINE_PADDING : 64 >>>> L176: }; >>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>> mutex >>>> >>>> I have to say that I'm not fond of the fact that >>>> MONITOR_NAME_LEN >>>> can vary between platforms; I like that it is a minimum >>>> of 64 bytes >>>> and is still a constant. >>>> >>>> I'm also not happy that the resulting sizeof(Monitor) >>>> may not >>>> be a multiple >>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>> mitigate >>>> that unhappiness >>>> with the fact that sizeof(Monitor) hasn't been a >>>> multiple of >>>> the cache line >>>> size since at least 2008 and no one complained (that I >>>> know of). >>>> >>>> So if I was making this change, I would make >>>> MONITOR_NAME_LEN >>>> 64 bytes >>>> (like it was) and add a pad field that would bring up >>>> sizeof(Monitor) >>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>> Claes >>>> would be >>>> unhappy with me and anyone embedding a Monitor into >>>> another data >>>> structure would be unhappy with me, but I'm used to >>>> that :-) >>>> >>>> So what you have is fine, especially for JDK9. >>>> >>>> L180: public: >>>> L181: #ifndef PRODUCT >>>> L182: debug_only(static bool contains(Monitor * locks, >>>> Monitor * >>>> lock);) >>>> L183: debug_only(static Monitor * >>>> get_least_ranked_lock(Monitor * >>>> locks);) >>>> L184: debug_only(Monitor * >>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>> L185: #endif >>>> L186: >>>> L187: void set_owner_implementation(Thread* >>>> owner) PRODUCT_RETURN; >>>> L188: void check_prelock_state (Thread* >>>> thread) PRODUCT_RETURN; >>>> L189: void check_block_state (Thread* thread) >>>> >>>> These were all "protected" before. Now they are "public". >>>> Any particular reason? >>>> >>>> Thumbs up on the mechanics of this change. I'm interested in the >>>> answer to the "protected" versus "public" question, but don't >>>> considered that query to be a blocker. >>>> >>>> >>>> The rest of this isn't code review, but some of this caught >>>> my attention. >>>> >>>> src/share/vm/runtime/mutex.hpp >>>> >>>> old L84: // The default length of monitor name is chosen to >>>> be 64 >>>> to avoid false sharing. >>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>> >>>> I had to look up the history of this comment: >>>> >>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>> changeset: 55:2a8eb116ebbe >>>> user: xlu >>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>> checking >>>> >>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>> // *in that order*. If their implementations change such that >>>> these >>>> // assumptions are violated, a whole lot of code will break. >>>> >>>> +// The default length of monitor name is choosen to be 64 to avoid >>>> false sharing. >>>> +static const int MONITOR_NAME_LEN = 64; >>>> + >>>> class Monitor : public CHeapObj { >>>> >>>> public: >>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>> volatile bool _snuck; // Used for sneaky >>>> locking >>>> (evil). >>>> - const char * _name; // Name of mutex >>>> int NotifyCount ; // diagnostic assist >>>> - double pad [8] ; // avoid false sharing >>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>> >>>> // Debugging fields for naming, deadlock detection, etc. >>>> (some only >>>> used in debug mode) >>>> #ifndef PRODUCT >>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>> int ILocked () ; >>>> >>>> protected: >>>> - static void ClearMonitor (Monitor * m) ; >>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>> Monitor() ; >>>> >>>> So the original code had an 8-double pad for avoiding false sharing. >>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>> time that Dice determined that 8-double value, the result was to pad >>>> the size of Monitor to an even multiple of a particular cache line >>>> size. >>>> >>>> Xiobin changed the 'name' field to be an array so that the name >>>> chars could serve double duty as the cache line pad... pun intended. >>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>> size is a multiple of the cache line size. >>>> >>>> Dan >>>> >>>> >>>>> Please review. If will also need a sponsor. >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> > From calvin.cheung at oracle.com Tue Oct 11 17:19:35 2016 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 11 Oct 2016 10:19:35 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FCCF1E.1080703@oracle.com> References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com> Message-ID: <57FD1F27.5090404@oracle.com> Hi Lois, Thanks for your review. On 10/11/16, 4:38 AM, Lois Foltan wrote: > > On 10/10/2016 11:59 PM, Calvin Cheung wrote: >> >> Please review this small fix for not including classes in the >> classlist file which are unusable during run time. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8166931 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ > > Hi Calvin, > > src/share/vm/classfile/classFileParser.cpp > - line #5781, I find the if statement logic to be somewhat confusing. > This check seems to be only for classes defined to the boot and > platform class loader. I am assuming it does not apply to the > application class loader because there is no way to differentiate a > class defined to the application class loader from being on the > --patch-module list and the -classpath? Is that why the if statement > logic does not include the application class loader? Yes. We do want to include the classes defined to the app class loader. > Maybe it is enough to improve the comment to something like: > > // For the boot and platform class loaders, check if the class is > not found in the java runtime image > // or the boot loader's appended entries. This indicates that the > class must be located on the --patch-module list and > // is not useable during run time, so should be skipped. I've modified it a little. How about the following? // For the boot and platform class loaders, check if the class is not found in the java runtime image. // Additional check for the boot class loader is if the class is not found in the boot loader?s appended // entries. This indicates that the class is not useable during run time, such as the ones found in the // ?patch-module entries, so it should not be included in the classlist file. > > Then please indent the start of line #5782 by one space to show that > the check for the platform class loader is part of that first || > expression. I'll fix it. > > test/runtime/modules/PatchModule/PatchModuleClassList.java > - good test! Let me know if you want to see another webrev. thanks, Calvin > > Thanks, > Lois > >> >> Testing: >> JPRT with -testset hotspot >> jtreg tests under hotspot/runtime on all supported platforms (in >> progress) >> >> thanks, >> Calvin > From calvin.cheung at oracle.com Tue Oct 11 17:24:14 2016 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 11 Oct 2016 10:24:14 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: References: <57FC63AC.3020809@oracle.com> Message-ID: <57FD203E.1090101@oracle.com> Thanks, Jiangli. Calvin On 10/11/16, 8:53 AM, Jiangli Zhou wrote: > Looks good. > > Thanks, > Jiangli > >> On Oct 10, 2016, at 8:59 PM, Calvin Cheung wrote: >> >> >> Please review this small fix for not including classes in the classlist file which are unusable during run time. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8166931 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ >> >> Testing: >> JPRT with -testset hotspot >> jtreg tests under hotspot/runtime on all supported platforms (in progress) >> >> thanks, >> Calvin From daniel.daugherty at oracle.com Tue Oct 11 17:30:05 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Oct 2016 11:30:05 -0600 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: On 10/10/16 7:55 PM, David Holmes wrote: > Turns out the only place changes were needed were in JDI. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 > > webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java No comments. (Thanks for also fixing the typo.) src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java L352: if (isNonVirtual(options)) { L353: if (method.isAbstract()) { L354: throw new IllegalArgumentException("Abstract method"); L355: } Any particular reason for breaking the logic into two distinct if-statements? Perhaps: if (isNonVirtual(options) && method.isAbstract()) { throw new IllegalArgumentException("Abstract method"); } Also, perhaps "unexpected Abstract method" is more clear? test/com/sun/jdi/InterfaceMethodsTest.java L526: if (t.getClass() != expectedException) { L527: System.err.println("--- FAILED"); L528: failure("FAILED: " + t); L529: return null; L530: } You should also report the expectedException value here to aid in failure analysis. Thumbs up! I don't need to see another webrev if you decide to make the above small tweaks. Dan > > The spec change in ObjectReference is very simple and there is a CCC > request in progress to ratify that change. > > The implementation change in ObjectReferenceImpl mirrors the updated > spec and use the same format as already present in the class version > of the check method. > > The test is a little more complex. This is obviously an extension to > what is already tested in InterfaceMethodsTest. However IMT has a > number of problem with the way it is currently written [1] - > specifically it doesn't properly separate method lookup from method > invocation. So I've added the capability to separate lookup and > invocation for use with the private interface methods - I have not > tried to address shortcomings of the existing tests. Though I did fix > the return value checking logic! And did some clarifying comments and > renaming in a couple of place. > > Still on the test I can't add the negative tests I would like to add > because they actually pass due to a different long standing bug in JDI > - [2]. So the actual private interface method testing is very simple: > can I get the Method from the InterfaceType for the interface > declaring the method? Can I then invoke that method on an instance of > a class that implements the interface. > > Thanks, > David > > [1] https://bugs.openjdk.java.net/browse/JDK-8166453 > [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From lois.foltan at oracle.com Tue Oct 11 17:33:49 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 11 Oct 2016 13:33:49 -0400 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FD1F27.5090404@oracle.com> References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com> <57FD1F27.5090404@oracle.com> Message-ID: <57FD227D.4070307@oracle.com> On 10/11/2016 1:19 PM, Calvin Cheung wrote: > Hi Lois, > > Thanks for your review. > > On 10/11/16, 4:38 AM, Lois Foltan wrote: >> >> On 10/10/2016 11:59 PM, Calvin Cheung wrote: >>> >>> Please review this small fix for not including classes in the >>> classlist file which are unusable during run time. >>> >>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931 >>> >>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ >> >> Hi Calvin, >> >> src/share/vm/classfile/classFileParser.cpp >> - line #5781, I find the if statement logic to be somewhat >> confusing. This check seems to be only for classes defined to the >> boot and platform class loader. I am assuming it does not apply to >> the application class loader because there is no way to differentiate >> a class defined to the application class loader from being on the >> --patch-module list and the -classpath? Is that why the if statement >> logic does not include the application class loader? > Yes. We do want to include the classes defined to the app class loader. Even if those classes defined to the app class loader are located in --patch-module entries? >> Maybe it is enough to improve the comment to something like: >> >> // For the boot and platform class loaders, check if the class is >> not found in the java runtime image >> // or the boot loader's appended entries. This indicates that the >> class must be located on the --patch-module list and >> // is not useable during run time, so should be skipped. > I've modified it a little. How about the following? > // For the boot and platform class loaders, check if the class is not > found in the java runtime image. > // Additional check for the boot class loader is if the class is not > found in the boot loader?s appended > // entries. This indicates that the class is not useable during run > time, such as the ones found in the > // ?patch-module entries, so it should not be included in the > classlist file. Looks good, thanks for rewording! > >> >> Then please indent the start of line #5782 by one space to show that >> the check for the platform class loader is part of that first || >> expression. > I'll fix it. >> >> test/runtime/modules/PatchModule/PatchModuleClassList.java >> - good test! > Let me know if you want to see another webrev. No, I'm all set. Thanks, Lois > > thanks, > Calvin >> >> Thanks, >> Lois >> >>> >>> Testing: >>> JPRT with -testset hotspot >>> jtreg tests under hotspot/runtime on all supported platforms (in >>> progress) >>> >>> thanks, >>> Calvin >> From coleen.phillimore at oracle.com Tue Oct 11 17:35:11 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 11 Oct 2016 13:35:11 -0400 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> Message-ID: <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com> I am fine with this change. Maybe a one line comment here, something like: // Using Padded subclasses to prevent false sharing of these global monitors and mutexes. 172 void mutex_init() { 173 def(tty_lock , PaddedMutex , event, true, Monitor::_safepoint_check_never); // allow to lock in VM On 10/11/16 12:26 PM, Doerr, Martin wrote: > Hi all, > > I came to the same conclusion regarding inheritance from PaddingEnd. > Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.) > > I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid. > > Here's a new webrev: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ > > It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?). > > Please take a look. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Dienstag, 11. Oktober 2016 12:05 > To: Coleen Phillimore ; Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > On 2016-10-11 02:03, Coleen Phillimore wrote: >> Hi, >> >> Was the linear allocation in mutex.cpp the cause of the false sharing >> that you observed? I think I like this change better than the >> original, because I've wondered myself why the name string was so >> long. So with this, we could make Monitor's smaller if they're >> embedded in metadata or other structures. > Music to my ears! > > I even think most embedded uses would see improvements if _name was > removed entirely (or "simply" turned into a const char * so that it's > not copied and embedded into the Monitor/Mutex) > >> Thanks, >> Coleen >> >> On 10/10/16 2:00 PM, Doerr, Martin wrote: >>> Hi Claes, >>> >>> thank you very much for your explanations. >>> >>> I agree with you that it would be better to pad where the Monitors >>> are used. It would still fulfill the purpose of this RFE without >>> disturbing other usages. >>> >>> So I could introduce: >>> class PaddedMonitor : public Monitor { >>> enum { >>> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >>> (int)sizeof(Monitor), >>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >>> }; >>> char _padding[PADDING_LEN]; >>> }; >>> and similarly PaddedMutex and replace all of the ones which get >>> allocated in a linear fashion (mutexLocker.cpp mutex_init()). > Sure! > > Some compilers may take issue with cases where PADDING_LEN == 0 (since > char _padding[0] is technically illegal C++, but works on gcc etc) so > maybe that special case will have to be (somewhat excessively): > > PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : > DEFAULT_CACHE_LINE_SIZE > > We took a look at if it'd be feasible to express class PaddedMonitor : > public PaddedEnd, but it appears that'd require variadic > template arguments (C++11) to get right (since we'd need PaddedEnd to > transitively publish constructors of Monitor). > > Thanks! > > /Claes > >>> Would you agree with this change? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Freitag, 7. Oktober 2016 12:35 >>> To: Doerr, Martin ; >>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) ; >>> Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> I'm concerned that this might be an easy-but-wrong fix to a complex >>> problem, and acknowledging that there are already use cases where the >>> _name field is contra-productive. This change adds complexity that >>> makes it even less likely such uses will be optimized for in the >>> future. >>> >>> There are Padded* types put in place to deal with these concerns >>> explicitly rather than implicitly *where it matters*, which allows us >>> the choice of applying padding or not on a per use-case basis (which >>> means we can also remove the _name field for those use cases that don't >>> care about either, which might be most outside of the global lists). >>> >>> I am very concerned about false sharing, but I have no data to support >>> that this change has any measurable benefit in practice: I even did an >>> experiment years ago now where I turned _name into a pointer to not pad >>> at all and saw nothing exceeding noise levels on any benchmark. >>> >>> Thanks! >>> >>> /Claes >>> >>> On 2016-10-07 12:18, Doerr, Martin wrote: >>>> Hi Claes, >>>> >>>> what the change basically does is that the _name[] field gets >>>> enlarged by 8 bytes on platforms with 128 byte >>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>>> by the C++ compiler. >>>> What exactly is your concern about the footprint overhead? >>>> Are you not concerned about the risk of false sharing? >>>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>> Sent: Freitag, 7. Oktober 2016 12:00 >>>> To: Doerr, Martin ; >>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>>> David Holmes (david.holmes at oracle.com) ; >>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>> >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> Hi, >>>> >>>> after due consideration I strongly consider this change unacceptable >>>> since it adds footprint overhead to performance critcial compiler and >>>> GC code with little to no data to support this won't cause regressions. >>>> >>>> Changes to Monitor/Mutex needs to be done with more surgical precision >>>> than this. >>>> >>>> If I do have a veto on the matter, here it is. >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>>> Hi Dan, >>>>> >>>>> thank you very much for reviewing and for investigating the history. >>>>> >>>>> It was not intended to make the functions you mentioned public. >>>>> I've fixed that. >>>>> I also updated the copyright information. >>>>> >>>>> New webrev is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>>> >>>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>>> To: Doerr, Martin ; >>>>> hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>> DEFAULT_CACHE_LINE_SIZE >>>>> >>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>>> padding is to avoid false sharing. >>>>>> >>>>>> My proposed change is here: >>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>>> src/share/vm/runtime/mutex.hpp >>>>> Please update the copyright year before pushing. >>>>> >>>>> L172: // The default length of monitor name is chosen to >>>>> avoid >>>>> false sharing. >>>>> L173: enum { >>>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>>> sizeof(MonitorBase), >>>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>>> CACHE_LINE_PADDING : 64 >>>>> L176: }; >>>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>>> mutex >>>>> >>>>> I have to say that I'm not fond of the fact that >>>>> MONITOR_NAME_LEN >>>>> can vary between platforms; I like that it is a minimum >>>>> of 64 bytes >>>>> and is still a constant. >>>>> >>>>> I'm also not happy that the resulting sizeof(Monitor) >>>>> may not >>>>> be a multiple >>>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>>> mitigate >>>>> that unhappiness >>>>> with the fact that sizeof(Monitor) hasn't been a >>>>> multiple of >>>>> the cache line >>>>> size since at least 2008 and no one complained (that I >>>>> know of). >>>>> >>>>> So if I was making this change, I would make >>>>> MONITOR_NAME_LEN >>>>> 64 bytes >>>>> (like it was) and add a pad field that would bring up >>>>> sizeof(Monitor) >>>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>>> Claes >>>>> would be >>>>> unhappy with me and anyone embedding a Monitor into >>>>> another data >>>>> structure would be unhappy with me, but I'm used to >>>>> that :-) >>>>> >>>>> So what you have is fine, especially for JDK9. >>>>> >>>>> L180: public: >>>>> L181: #ifndef PRODUCT >>>>> L182: debug_only(static bool contains(Monitor * locks, >>>>> Monitor * >>>>> lock);) >>>>> L183: debug_only(static Monitor * >>>>> get_least_ranked_lock(Monitor * >>>>> locks);) >>>>> L184: debug_only(Monitor * >>>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>>> L185: #endif >>>>> L186: >>>>> L187: void set_owner_implementation(Thread* >>>>> owner) PRODUCT_RETURN; >>>>> L188: void check_prelock_state (Thread* >>>>> thread) PRODUCT_RETURN; >>>>> L189: void check_block_state (Thread* thread) >>>>> >>>>> These were all "protected" before. Now they are "public". >>>>> Any particular reason? >>>>> >>>>> Thumbs up on the mechanics of this change. I'm interested in the >>>>> answer to the "protected" versus "public" question, but don't >>>>> considered that query to be a blocker. >>>>> >>>>> >>>>> The rest of this isn't code review, but some of this caught >>>>> my attention. >>>>> >>>>> src/share/vm/runtime/mutex.hpp >>>>> >>>>> old L84: // The default length of monitor name is chosen to >>>>> be 64 >>>>> to avoid false sharing. >>>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>>> >>>>> I had to look up the history of this comment: >>>>> >>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>>> changeset: 55:2a8eb116ebbe >>>>> user: xlu >>>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>>> checking >>>>> >>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>>> // *in that order*. If their implementations change such that >>>>> these >>>>> // assumptions are violated, a whole lot of code will break. >>>>> >>>>> +// The default length of monitor name is choosen to be 64 to avoid >>>>> false sharing. >>>>> +static const int MONITOR_NAME_LEN = 64; >>>>> + >>>>> class Monitor : public CHeapObj { >>>>> >>>>> public: >>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>>> volatile bool _snuck; // Used for sneaky >>>>> locking >>>>> (evil). >>>>> - const char * _name; // Name of mutex >>>>> int NotifyCount ; // diagnostic assist >>>>> - double pad [8] ; // avoid false sharing >>>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>>> >>>>> // Debugging fields for naming, deadlock detection, etc. >>>>> (some only >>>>> used in debug mode) >>>>> #ifndef PRODUCT >>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>>> int ILocked () ; >>>>> >>>>> protected: >>>>> - static void ClearMonitor (Monitor * m) ; >>>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>>> Monitor() ; >>>>> >>>>> So the original code had an 8-double pad for avoiding false sharing. >>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>>> time that Dice determined that 8-double value, the result was to pad >>>>> the size of Monitor to an even multiple of a particular cache line >>>>> size. >>>>> >>>>> Xiobin changed the 'name' field to be an array so that the name >>>>> chars could serve double duty as the cache line pad... pun intended. >>>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>>> size is a multiple of the cache line size. >>>>> >>>>> Dan >>>>> >>>>> >>>>>> Please review. If will also need a sponsor. >>>>>> >>>>>> Thanks and best regards, >>>>>> Martin >>>>>> From calvin.cheung at oracle.com Tue Oct 11 17:43:51 2016 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 11 Oct 2016 10:43:51 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FD227D.4070307@oracle.com> References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com> <57FD1F27.5090404@oracle.com> <57FD227D.4070307@oracle.com> Message-ID: <57FD24D7.1090903@oracle.com> On 10/11/16, 10:33 AM, Lois Foltan wrote: > > On 10/11/2016 1:19 PM, Calvin Cheung wrote: >> Hi Lois, >> >> Thanks for your review. >> >> On 10/11/16, 4:38 AM, Lois Foltan wrote: >>> >>> On 10/10/2016 11:59 PM, Calvin Cheung wrote: >>>> >>>> Please review this small fix for not including classes in the >>>> classlist file which are unusable during run time. >>>> >>>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931 >>>> >>>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ >>> >>> Hi Calvin, >>> >>> src/share/vm/classfile/classFileParser.cpp >>> - line #5781, I find the if statement logic to be somewhat >>> confusing. This check seems to be only for classes defined to the >>> boot and platform class loader. I am assuming it does not apply to >>> the application class loader because there is no way to >>> differentiate a class defined to the application class loader from >>> being on the --patch-module list and the -classpath? Is that why >>> the if statement logic does not include the application class loader? >> Yes. We do want to include the classes defined to the app class loader. > Even if those classes defined to the app class loader are located in > --patch-module entries? Yes. With the fix for JDK-8164011, we currently don't archive any classes found in the --patch-module entries. thanks, Calvin > >>> Maybe it is enough to improve the comment to something like: >>> >>> // For the boot and platform class loaders, check if the class is >>> not found in the java runtime image >>> // or the boot loader's appended entries. This indicates that the >>> class must be located on the --patch-module list and >>> // is not useable during run time, so should be skipped. >> I've modified it a little. How about the following? >> // For the boot and platform class loaders, check if the class is not >> found in the java runtime image. >> // Additional check for the boot class loader is if the class is not >> found in the boot loader?s appended >> // entries. This indicates that the class is not useable during run >> time, such as the ones found in the >> // ?patch-module entries, so it should not be included in the >> classlist file. > Looks good, thanks for rewording! > >> >>> >>> Then please indent the start of line #5782 by one space to show that >>> the check for the platform class loader is part of that first || >>> expression. >> I'll fix it. >>> >>> test/runtime/modules/PatchModule/PatchModuleClassList.java >>> - good test! >> Let me know if you want to see another webrev. > > No, I'm all set. > Thanks, > Lois > >> >> thanks, >> Calvin >>> >>> Thanks, >>> Lois >>> >>>> >>>> Testing: >>>> JPRT with -testset hotspot >>>> jtreg tests under hotspot/runtime on all supported platforms >>>> (in progress) >>>> >>>> thanks, >>>> Calvin >>> > From claes.redestad at oracle.com Tue Oct 11 17:44:05 2016 From: claes.redestad at oracle.com (Claes Redestad) Date: Tue, 11 Oct 2016 19:44:05 +0200 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com> Message-ID: <57FD24E5.2080506@oracle.com> I am also happy with this, thanks! /Claes On 2016-10-11 19:35, Coleen Phillimore wrote: > > I am fine with this change. Maybe a one line comment here, something like: > > // Using Padded subclasses to prevent false sharing of these global monitors and mutexes. > 172 void mutex_init() { > 173 def(tty_lock , PaddedMutex , event, true, Monitor::_safepoint_check_never); // allow to lock in VM > > > > On 10/11/16 12:26 PM, Doerr, Martin wrote: >> Hi all, >> >> I came to the same conclusion regarding inheritance from PaddingEnd. >> Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.) >> >> I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid. >> >> Here's a new webrev: >> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ >> >> It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?). >> >> Please take a look. >> >> Thanks and best regards, >> Martin >> >> >> -----Original Message----- >> From: Claes Redestad [mailto:claes.redestad at oracle.com] >> Sent: Dienstag, 11. Oktober 2016 12:05 >> To: Coleen Phillimore; Doerr, Martin;daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) >> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE >> >> Hi, >> >> On 2016-10-11 02:03, Coleen Phillimore wrote: >>> Hi, >>> >>> Was the linear allocation in mutex.cpp the cause of the false sharing >>> that you observed? I think I like this change better than the >>> original, because I've wondered myself why the name string was so >>> long. So with this, we could make Monitor's smaller if they're >>> embedded in metadata or other structures. >> Music to my ears! >> >> I even think most embedded uses would see improvements if _name was >> removed entirely (or "simply" turned into a const char * so that it's >> not copied and embedded into the Monitor/Mutex) >> >>> Thanks, >>> Coleen >>> >>> On 10/10/16 2:00 PM, Doerr, Martin wrote: >>>> Hi Claes, >>>> >>>> thank you very much for your explanations. >>>> >>>> I agree with you that it would be better to pad where the Monitors >>>> are used. It would still fulfill the purpose of this RFE without >>>> disturbing other usages. >>>> >>>> So I could introduce: >>>> class PaddedMonitor : public Monitor { >>>> enum { >>>> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >>>> (int)sizeof(Monitor), >>>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >>>> }; >>>> char _padding[PADDING_LEN]; >>>> }; >>>> and similarly PaddedMutex and replace all of the ones which get >>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()). >> Sure! >> >> Some compilers may take issue with cases where PADDING_LEN == 0 (since >> char _padding[0] is technically illegal C++, but works on gcc etc) so >> maybe that special case will have to be (somewhat excessively): >> >> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : >> DEFAULT_CACHE_LINE_SIZE >> >> We took a look at if it'd be feasible to express class PaddedMonitor : >> public PaddedEnd, but it appears that'd require variadic >> template arguments (C++11) to get right (since we'd need PaddedEnd to >> transitively publish constructors of Monitor). >> >> Thanks! >> >> /Claes >> >>>> Would you agree with this change? >>>> >>>> Thanks and best regards, >>>> Martin >>>> >>>> >>>> -----Original Message----- >>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>> Sent: Freitag, 7. Oktober 2016 12:35 >>>> To: Doerr, Martin; >>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; >>>> David Holmes (david.holmes at oracle.com); >>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>> >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> Hi, >>>> >>>> I'm concerned that this might be an easy-but-wrong fix to a complex >>>> problem, and acknowledging that there are already use cases where the >>>> _name field is contra-productive. This change adds complexity that >>>> makes it even less likely such uses will be optimized for in the >>>> future. >>>> >>>> There are Padded* types put in place to deal with these concerns >>>> explicitly rather than implicitly *where it matters*, which allows us >>>> the choice of applying padding or not on a per use-case basis (which >>>> means we can also remove the _name field for those use cases that don't >>>> care about either, which might be most outside of the global lists). >>>> >>>> I am very concerned about false sharing, but I have no data to support >>>> that this change has any measurable benefit in practice: I even did an >>>> experiment years ago now where I turned _name into a pointer to not pad >>>> at all and saw nothing exceeding noise levels on any benchmark. >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>> On 2016-10-07 12:18, Doerr, Martin wrote: >>>>> Hi Claes, >>>>> >>>>> what the change basically does is that the _name[] field gets >>>>> enlarged by 8 bytes on platforms with 128 byte >>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>>>> by the C++ compiler. >>>>> What exactly is your concern about the footprint overhead? >>>>> Are you not concerned about the risk of false sharing? >>>>> >>>>> Best regards, >>>>> Martin >>>>> >>>>> -----Original Message----- >>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>>> Sent: Freitag, 7. Oktober 2016 12:00 >>>>> To: Doerr, Martin; >>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; >>>>> David Holmes (david.holmes at oracle.com); >>>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>>> >>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>> DEFAULT_CACHE_LINE_SIZE >>>>> >>>>> Hi, >>>>> >>>>> after due consideration I strongly consider this change unacceptable >>>>> since it adds footprint overhead to performance critcial compiler and >>>>> GC code with little to no data to support this won't cause regressions. >>>>> >>>>> Changes to Monitor/Mutex needs to be done with more surgical precision >>>>> than this. >>>>> >>>>> If I do have a veto on the matter, here it is. >>>>> >>>>> Thanks! >>>>> >>>>> /Claes >>>>> >>>>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>>>> Hi Dan, >>>>>> >>>>>> thank you very much for reviewing and for investigating the history. >>>>>> >>>>>> It was not intended to make the functions you mentioned public. >>>>>> I've fixed that. >>>>>> I also updated the copyright information. >>>>>> >>>>>> New webrev is here: >>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>>>> >>>>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>>>> >>>>>> Thanks and best regards, >>>>>> Martin >>>>>> >>>>>> >>>>>> -----Original Message----- >>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>>>> To: Doerr, Martin; >>>>>> hotspot-runtime-dev at openjdk.java.net >>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>>> DEFAULT_CACHE_LINE_SIZE >>>>>> >>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>>>> Hi, >>>>>>> >>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>>>> padding is to avoid false sharing. >>>>>>> >>>>>>> My proposed change is here: >>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>>>> src/share/vm/runtime/mutex.hpp >>>>>> Please update the copyright year before pushing. >>>>>> >>>>>> L172: // The default length of monitor name is chosen to >>>>>> avoid >>>>>> false sharing. >>>>>> L173: enum { >>>>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>>>> sizeof(MonitorBase), >>>>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>>>> CACHE_LINE_PADDING : 64 >>>>>> L176: }; >>>>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>>>> mutex >>>>>> >>>>>> I have to say that I'm not fond of the fact that >>>>>> MONITOR_NAME_LEN >>>>>> can vary between platforms; I like that it is a minimum >>>>>> of 64 bytes >>>>>> and is still a constant. >>>>>> >>>>>> I'm also not happy that the resulting sizeof(Monitor) >>>>>> may not >>>>>> be a multiple >>>>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>>>> mitigate >>>>>> that unhappiness >>>>>> with the fact that sizeof(Monitor) hasn't been a >>>>>> multiple of >>>>>> the cache line >>>>>> size since at least 2008 and no one complained (that I >>>>>> know of). >>>>>> >>>>>> So if I was making this change, I would make >>>>>> MONITOR_NAME_LEN >>>>>> 64 bytes >>>>>> (like it was) and add a pad field that would bring up >>>>>> sizeof(Monitor) >>>>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>>>> Claes >>>>>> would be >>>>>> unhappy with me and anyone embedding a Monitor into >>>>>> another data >>>>>> structure would be unhappy with me, but I'm used to >>>>>> that :-) >>>>>> >>>>>> So what you have is fine, especially for JDK9. >>>>>> >>>>>> L180: public: >>>>>> L181: #ifndef PRODUCT >>>>>> L182: debug_only(static bool contains(Monitor * locks, >>>>>> Monitor * >>>>>> lock);) >>>>>> L183: debug_only(static Monitor * >>>>>> get_least_ranked_lock(Monitor * >>>>>> locks);) >>>>>> L184: debug_only(Monitor * >>>>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>>>> L185: #endif >>>>>> L186: >>>>>> L187: void set_owner_implementation(Thread* >>>>>> owner) PRODUCT_RETURN; >>>>>> L188: void check_prelock_state (Thread* >>>>>> thread) PRODUCT_RETURN; >>>>>> L189: void check_block_state (Thread* thread) >>>>>> >>>>>> These were all "protected" before. Now they are "public". >>>>>> Any particular reason? >>>>>> >>>>>> Thumbs up on the mechanics of this change. I'm interested in the >>>>>> answer to the "protected" versus "public" question, but don't >>>>>> considered that query to be a blocker. >>>>>> >>>>>> >>>>>> The rest of this isn't code review, but some of this caught >>>>>> my attention. >>>>>> >>>>>> src/share/vm/runtime/mutex.hpp >>>>>> >>>>>> old L84: // The default length of monitor name is chosen to >>>>>> be 64 >>>>>> to avoid false sharing. >>>>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>>>> >>>>>> I had to look up the history of this comment: >>>>>> >>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>>>> changeset: 55:2a8eb116ebbe >>>>>> user: xlu >>>>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>>>> checking >>>>>> >>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>>>> // *in that order*. If their implementations change such that >>>>>> these >>>>>> // assumptions are violated, a whole lot of code will break. >>>>>> >>>>>> +// The default length of monitor name is choosen to be 64 to avoid >>>>>> false sharing. >>>>>> +static const int MONITOR_NAME_LEN = 64; >>>>>> + >>>>>> class Monitor : public CHeapObj { >>>>>> >>>>>> public: >>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>>>> volatile bool _snuck; // Used for sneaky >>>>>> locking >>>>>> (evil). >>>>>> - const char * _name; // Name of mutex >>>>>> int NotifyCount ; // diagnostic assist >>>>>> - double pad [8] ; // avoid false sharing >>>>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>>>> >>>>>> // Debugging fields for naming, deadlock detection, etc. >>>>>> (some only >>>>>> used in debug mode) >>>>>> #ifndef PRODUCT >>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>>>> int ILocked () ; >>>>>> >>>>>> protected: >>>>>> - static void ClearMonitor (Monitor * m) ; >>>>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>>>> Monitor() ; >>>>>> >>>>>> So the original code had an 8-double pad for avoiding false sharing. >>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>>>> time that Dice determined that 8-double value, the result was to pad >>>>>> the size of Monitor to an even multiple of a particular cache line >>>>>> size. >>>>>> >>>>>> Xiobin changed the 'name' field to be an array so that the name >>>>>> chars could serve double duty as the cache line pad... pun intended. >>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>>>> size is a multiple of the cache line size. >>>>>> >>>>>> Dan >>>>>> >>>>>> >>>>>>> Please review. If will also need a sponsor. >>>>>>> >>>>>>> Thanks and best regards, >>>>>>> Martin >>>>>>> > From daniel.daugherty at oracle.com Tue Oct 11 17:51:10 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Oct 2016 11:51:10 -0600 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <57FD24E5.2080506@oracle.com> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com> <57FD24E5.2080506@oracle.com> Message-ID: Thumbs up on this version! Dan On 10/11/16 11:44 AM, Claes Redestad wrote: > I am also happy with this, thanks! > > /Claes > > On 2016-10-11 19:35, Coleen Phillimore wrote: >> >> I am fine with this change. Maybe a one line comment here, something >> like: >> >> // Using Padded subclasses to prevent false sharing of these global >> monitors and mutexes. >> 172 void mutex_init() { >> 173 def(tty_lock , PaddedMutex , >> event, true, Monitor::_safepoint_check_never); // allow >> to lock in VM >> >> >> >> On 10/11/16 12:26 PM, Doerr, Martin wrote: >>> Hi all, >>> >>> I came to the same conclusion regarding inheritance from PaddingEnd. >>> Unfortunately, you're also right, Claes, that we should better not >>> use 0 as minimal padding length because some compilers may have >>> trouble with 0 length arrays. I hope 1 is ok as minimal padding >>> length because the new operator does not allocate cache line aligned >>> at the moment. So I don't see any benefit in more padding. (Padding >>> length of 1 byte has the advantage that it may not enlarge the >>> object size if the previous field leaves some space due to its type.) >>> >>> I believe 2 _LockWord fields on one cache line was basically the >>> problem we wanted to avoid. >>> >>> Here's a new webrev: >>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ >>> >>> It also enables changing the _name[] field to a pointer or a smaller >>> array. I guess this should better be done in a separate change >>> (jdk10?). >>> >>> Please take a look. >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Dienstag, 11. Oktober 2016 12:05 >>> To: Coleen Phillimore; Doerr, >>> Martin;daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> On 2016-10-11 02:03, Coleen Phillimore wrote: >>>> Hi, >>>> >>>> Was the linear allocation in mutex.cpp the cause of the false sharing >>>> that you observed? I think I like this change better than the >>>> original, because I've wondered myself why the name string was so >>>> long. So with this, we could make Monitor's smaller if they're >>>> embedded in metadata or other structures. >>> Music to my ears! >>> >>> I even think most embedded uses would see improvements if _name was >>> removed entirely (or "simply" turned into a const char * so that it's >>> not copied and embedded into the Monitor/Mutex) >>> >>>> Thanks, >>>> Coleen >>>> >>>> On 10/10/16 2:00 PM, Doerr, Martin wrote: >>>>> Hi Claes, >>>>> >>>>> thank you very much for your explanations. >>>>> >>>>> I agree with you that it would be better to pad where the Monitors >>>>> are used. It would still fulfill the purpose of this RFE without >>>>> disturbing other usages. >>>>> >>>>> So I could introduce: >>>>> class PaddedMonitor : public Monitor { >>>>> enum { >>>>> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >>>>> (int)sizeof(Monitor), >>>>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >>>>> }; >>>>> char _padding[PADDING_LEN]; >>>>> }; >>>>> and similarly PaddedMutex and replace all of the ones which get >>>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()). >>> Sure! >>> >>> Some compilers may take issue with cases where PADDING_LEN == 0 (since >>> char _padding[0] is technically illegal C++, but works on gcc etc) so >>> maybe that special case will have to be (somewhat excessively): >>> >>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : >>> DEFAULT_CACHE_LINE_SIZE >>> >>> We took a look at if it'd be feasible to express class PaddedMonitor : >>> public PaddedEnd, but it appears that'd require variadic >>> template arguments (C++11) to get right (since we'd need PaddedEnd to >>> transitively publish constructors of Monitor). >>> >>> Thanks! >>> >>> /Claes >>> >>>>> Would you agree with this change? >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>>> Sent: Freitag, 7. Oktober 2016 12:35 >>>>> To: Doerr, Martin; >>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; >>>>> David Holmes (david.holmes at oracle.com); >>>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>>> >>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>> DEFAULT_CACHE_LINE_SIZE >>>>> >>>>> Hi, >>>>> >>>>> I'm concerned that this might be an easy-but-wrong fix to a complex >>>>> problem, and acknowledging that there are already use cases where the >>>>> _name field is contra-productive. This change adds complexity that >>>>> makes it even less likely such uses will be optimized for in the >>>>> future. >>>>> >>>>> There are Padded* types put in place to deal with these concerns >>>>> explicitly rather than implicitly *where it matters*, which allows us >>>>> the choice of applying padding or not on a per use-case basis (which >>>>> means we can also remove the _name field for those use cases that >>>>> don't >>>>> care about either, which might be most outside of the global lists). >>>>> >>>>> I am very concerned about false sharing, but I have no data to >>>>> support >>>>> that this change has any measurable benefit in practice: I even >>>>> did an >>>>> experiment years ago now where I turned _name into a pointer to >>>>> not pad >>>>> at all and saw nothing exceeding noise levels on any benchmark. >>>>> >>>>> Thanks! >>>>> >>>>> /Claes >>>>> >>>>> On 2016-10-07 12:18, Doerr, Martin wrote: >>>>>> Hi Claes, >>>>>> >>>>>> what the change basically does is that the _name[] field gets >>>>>> enlarged by 8 bytes on platforms with 128 byte >>>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>>>>> by the C++ compiler. >>>>>> What exactly is your concern about the footprint overhead? >>>>>> Are you not concerned about the risk of false sharing? >>>>>> >>>>>> Best regards, >>>>>> Martin >>>>>> >>>>>> -----Original Message----- >>>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>>>> Sent: Freitag, 7. Oktober 2016 12:00 >>>>>> To: Doerr, Martin; >>>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; >>>>>> David Holmes (david.holmes at oracle.com); >>>>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>>>> >>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>>> DEFAULT_CACHE_LINE_SIZE >>>>>> >>>>>> Hi, >>>>>> >>>>>> after due consideration I strongly consider this change unacceptable >>>>>> since it adds footprint overhead to performance critcial compiler >>>>>> and >>>>>> GC code with little to no data to support this won't cause >>>>>> regressions. >>>>>> >>>>>> Changes to Monitor/Mutex needs to be done with more surgical >>>>>> precision >>>>>> than this. >>>>>> >>>>>> If I do have a veto on the matter, here it is. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> /Claes >>>>>> >>>>>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>>>>> Hi Dan, >>>>>>> >>>>>>> thank you very much for reviewing and for investigating the >>>>>>> history. >>>>>>> >>>>>>> It was not intended to make the functions you mentioned public. >>>>>>> I've fixed that. >>>>>>> I also updated the copyright information. >>>>>>> >>>>>>> New webrev is here: >>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>>>>> >>>>>>> @Coleen: Please use this one. I have also added reviewer >>>>>>> attribution. >>>>>>> >>>>>>> Thanks and best regards, >>>>>>> Martin >>>>>>> >>>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>>>>> To: Doerr, Martin; >>>>>>> hotspot-runtime-dev at openjdk.java.net >>>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>>>> DEFAULT_CACHE_LINE_SIZE >>>>>>> >>>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>>>>> padding is to avoid false sharing. >>>>>>>> >>>>>>>> My proposed change is here: >>>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>>>>>> >>>>>>> src/share/vm/runtime/mutex.hpp >>>>>>> Please update the copyright year before pushing. >>>>>>> >>>>>>> L172: // The default length of monitor name is chosen to >>>>>>> avoid >>>>>>> false sharing. >>>>>>> L173: enum { >>>>>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>>>>> sizeof(MonitorBase), >>>>>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>>>>> CACHE_LINE_PADDING : 64 >>>>>>> L176: }; >>>>>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>>>>> mutex >>>>>>> >>>>>>> I have to say that I'm not fond of the fact that >>>>>>> MONITOR_NAME_LEN >>>>>>> can vary between platforms; I like that it is a >>>>>>> minimum >>>>>>> of 64 bytes >>>>>>> and is still a constant. >>>>>>> >>>>>>> I'm also not happy that the resulting sizeof(Monitor) >>>>>>> may not >>>>>>> be a multiple >>>>>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>>>>> mitigate >>>>>>> that unhappiness >>>>>>> with the fact that sizeof(Monitor) hasn't been a >>>>>>> multiple of >>>>>>> the cache line >>>>>>> size since at least 2008 and no one complained (that I >>>>>>> know of). >>>>>>> >>>>>>> So if I was making this change, I would make >>>>>>> MONITOR_NAME_LEN >>>>>>> 64 bytes >>>>>>> (like it was) and add a pad field that would bring up >>>>>>> sizeof(Monitor) >>>>>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of >>>>>>> course, >>>>>>> Claes >>>>>>> would be >>>>>>> unhappy with me and anyone embedding a Monitor into >>>>>>> another data >>>>>>> structure would be unhappy with me, but I'm used to >>>>>>> that :-) >>>>>>> >>>>>>> So what you have is fine, especially for JDK9. >>>>>>> >>>>>>> L180: public: >>>>>>> L181: #ifndef PRODUCT >>>>>>> L182: debug_only(static bool contains(Monitor * locks, >>>>>>> Monitor * >>>>>>> lock);) >>>>>>> L183: debug_only(static Monitor * >>>>>>> get_least_ranked_lock(Monitor * >>>>>>> locks);) >>>>>>> L184: debug_only(Monitor * >>>>>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>>>>> L185: #endif >>>>>>> L186: >>>>>>> L187: void set_owner_implementation(Thread* >>>>>>> owner) PRODUCT_RETURN; >>>>>>> L188: void check_prelock_state (Thread* >>>>>>> thread) PRODUCT_RETURN; >>>>>>> L189: void check_block_state (Thread* thread) >>>>>>> >>>>>>> These were all "protected" before. Now they are >>>>>>> "public". >>>>>>> Any particular reason? >>>>>>> >>>>>>> Thumbs up on the mechanics of this change. I'm interested in the >>>>>>> answer to the "protected" versus "public" question, but don't >>>>>>> considered that query to be a blocker. >>>>>>> >>>>>>> >>>>>>> The rest of this isn't code review, but some of this caught >>>>>>> my attention. >>>>>>> >>>>>>> src/share/vm/runtime/mutex.hpp >>>>>>> >>>>>>> old L84: // The default length of monitor name is >>>>>>> chosen to >>>>>>> be 64 >>>>>>> to avoid false sharing. >>>>>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>>>>> >>>>>>> I had to look up the history of this comment: >>>>>>> >>>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>>>>> changeset: 55:2a8eb116ebbe >>>>>>> user: xlu >>>>>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>>>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>>>>> checking >>>>>>> >>>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>>>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 >>>>>>> -0500 >>>>>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 >>>>>>> -0800 >>>>>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>>>>> // *in that order*. If their implementations change such >>>>>>> that >>>>>>> these >>>>>>> // assumptions are violated, a whole lot of code will break. >>>>>>> >>>>>>> +// The default length of monitor name is choosen to be 64 to avoid >>>>>>> false sharing. >>>>>>> +static const int MONITOR_NAME_LEN = 64; >>>>>>> + >>>>>>> class Monitor : public CHeapObj { >>>>>>> >>>>>>> public: >>>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>>>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>>>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>>>>> volatile bool _snuck; // Used for sneaky >>>>>>> locking >>>>>>> (evil). >>>>>>> - const char * _name; // Name of mutex >>>>>>> int NotifyCount ; // diagnostic assist >>>>>>> - double pad [8] ; // avoid false sharing >>>>>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>>>>> >>>>>>> // Debugging fields for naming, deadlock detection, etc. >>>>>>> (some only >>>>>>> used in debug mode) >>>>>>> #ifndef PRODUCT >>>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>>>>> int ILocked () ; >>>>>>> >>>>>>> protected: >>>>>>> - static void ClearMonitor (Monitor * m) ; >>>>>>> + static void ClearMonitor (Monitor * m, const char* name = >>>>>>> NULL) ; >>>>>>> Monitor() ; >>>>>>> >>>>>>> So the original code had an 8-double pad for avoiding false >>>>>>> sharing. >>>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at >>>>>>> the >>>>>>> time that Dice determined that 8-double value, the result was to >>>>>>> pad >>>>>>> the size of Monitor to an even multiple of a particular cache line >>>>>>> size. >>>>>>> >>>>>>> Xiobin changed the 'name' field to be an array so that the name >>>>>>> chars could serve double duty as the cache line pad... pun >>>>>>> intended. >>>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>>>>> size is a multiple of the cache line size. >>>>>>> >>>>>>> Dan >>>>>>> >>>>>>> >>>>>>>> Please review. If will also need a sponsor. >>>>>>>> >>>>>>>> Thanks and best regards, >>>>>>>> Martin >>>>>>>> >> From ioi.lam at oracle.com Tue Oct 11 20:14:42 2016 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 11 Oct 2016 13:14:42 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FC63AC.3020809@oracle.com> References: <57FC63AC.3020809@oracle.com> Message-ID: <57FD4832.9030705@oracle.com> Looks good. Thanks - Ioi On 10/10/16 8:59 PM, Calvin Cheung wrote: > > Please review this small fix for not including classes in the > classlist file which are unusable during run time. > > bug: https://bugs.openjdk.java.net/browse/JDK-8166931 > > webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ > > Testing: > JPRT with -testset hotspot > jtreg tests under hotspot/runtime on all supported platforms (in > progress) > > thanks, > Calvin From calvin.cheung at oracle.com Tue Oct 11 20:46:07 2016 From: calvin.cheung at oracle.com (Calvin Cheung) Date: Tue, 11 Oct 2016 13:46:07 -0700 Subject: RFR(S): 8166931: Do not include classes which are unusable during run time in the classlist file In-Reply-To: <57FD4832.9030705@oracle.com> References: <57FC63AC.3020809@oracle.com> <57FD4832.9030705@oracle.com> Message-ID: <57FD4F8F.1090606@oracle.com> Ioi, Thanks for your review! Calvin On 10/11/16, 1:14 PM, Ioi Lam wrote: > Looks good. Thanks > > - Ioi > > On 10/10/16 8:59 PM, Calvin Cheung wrote: >> >> Please review this small fix for not including classes in the >> classlist file which are unusable during run time. >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8166931 >> >> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/ >> >> Testing: >> JPRT with -testset hotspot >> jtreg tests under hotspot/runtime on all supported platforms (in >> progress) >> >> thanks, >> Calvin > From david.holmes at oracle.com Tue Oct 11 21:11:24 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 07:11:24 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: Hi Dan, Thanks for looking at this. On 12/10/2016 3:30 AM, Daniel D. Daugherty wrote: > On 10/10/16 7:55 PM, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ > > src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java > No comments. (Thanks for also fixing the typo.) > > src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java > L352: if (isNonVirtual(options)) { > L353: if (method.isAbstract()) { > L354: throw new IllegalArgumentException("Abstract > method"); > L355: } > Any particular reason for breaking the logic into two > distinct if-statements? > > Perhaps: > > if (isNonVirtual(options) && method.isAbstract()) { > throw new IllegalArgumentException("Abstract > method"); > } > > Also, perhaps "unexpected Abstract method" is more clear? :) I tried to forestall this comment by saying "use the same format as already present in the class version of the check method. " The code I have now is a copy of what is prior: 312 /* 313 * For nonvirtual invokes, method must have a body 314 */ 315 if (isNonVirtual(options)) { 316 if (method.isAbstract()) { 317 throw new IllegalArgumentException("Abstract method"); 318 } 319 } While I personally prefer the conjunctive form I went for consistency whilst minimizing changes. > test/com/sun/jdi/InterfaceMethodsTest.java > L526: if (t.getClass() != expectedException) { > L527: System.err.println("--- FAILED"); > L528: failure("FAILED: " + t); > L529: return null; > L530: } > You should also report the expectedException value here to > aid in failure analysis. Good point - will update. > Thumbs up! I don't need to see another webrev if you decide to > make the above small tweaks. Great - thanks again. David > Dan > > >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version >> of the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a >> number of problem with the way it is currently written [1] - >> specifically it doesn't properly separate method lookup from method >> invocation. So I've added the capability to separate lookup and >> invocation for use with the private interface methods - I have not >> tried to address shortcomings of the existing tests. Though I did fix >> the return value checking logic! And did some clarifying comments and >> renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI >> - [2]. So the actual private interface method testing is very simple: >> can I get the Method from the InterfaceType for the interface >> declaring the method? Can I then invoke that method on an instance of >> a class that implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > From mandy.chung at oracle.com Tue Oct 11 21:14:52 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 11 Oct 2016 14:14:52 -0700 Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java needs update for JDK-8162401 Message-ID: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com> Harold, Can you review this test update: diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java b/test/runtime/modules/IgnoreModulePropertiesTest.java --- a/test/runtime/modules/IgnoreModulePropertiesTest.java +++ b/test/runtime/modules/IgnoreModulePropertiesTest.java @@ -69,8 +69,9 @@ public static void main(String[] args) throws Exception { testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException"); testOption("--limit-modules", "java.sqlx", "jdk.module.limitmods", "java.lang.module.ResolutionException"); - testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "java.lang.RuntimeException"); - testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", "java.lang.RuntimeException"); + testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "WARNING: Unknown module: xyzz"); + testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", + "WARNING: package xyzz not in java.base"); testOption("--patch-module", "=d", "jdk.module.patch.0", "IllegalArgumentException"); } } -?add-modules is now a repeating option. Should this line: testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException"); be changed to ?jdk.module.addmods.0?, as in addreads, addexports property? Mandy From daniel.daugherty at oracle.com Tue Oct 11 21:27:52 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 11 Oct 2016 15:27:52 -0600 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: <765b6111-54c6-01b6-7288-6fd4a1578afb@oracle.com> On 10/11/16 3:11 PM, David Holmes wrote: > Hi Dan, > > Thanks for looking at this. > > On 12/10/2016 3:30 AM, Daniel D. Daugherty wrote: >> On 10/10/16 7:55 PM, David Holmes wrote: >>> Turns out the only place changes were needed were in JDI. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >>> >>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java >> No comments. (Thanks for also fixing the typo.) >> >> src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java >> L352: if (isNonVirtual(options)) { >> L353: if (method.isAbstract()) { >> L354: throw new IllegalArgumentException("Abstract >> method"); >> L355: } >> Any particular reason for breaking the logic into two >> distinct if-statements? >> >> Perhaps: >> >> if (isNonVirtual(options) && method.isAbstract()) { >> throw new IllegalArgumentException("Abstract >> method"); >> } >> >> Also, perhaps "unexpected Abstract method" is more clear? > > :) I tried to forestall this comment by saying "use the same format as > already present in the class version of the check method. " The code I > have now is a copy of what is prior: > > 312 /* > 313 * For nonvirtual invokes, method must have a body > 314 */ > 315 if (isNonVirtual(options)) { > 316 if (method.isAbstract()) { > 317 throw new IllegalArgumentException("Abstract > method"); > 318 } > 319 } > > While I personally prefer the conjunctive form I went for consistency > whilst minimizing changes. I'm okay with your choice. Dan > >> test/com/sun/jdi/InterfaceMethodsTest.java >> L526: if (t.getClass() != expectedException) { >> L527: System.err.println("--- FAILED"); >> L528: failure("FAILED: " + t); >> L529: return null; >> L530: } >> You should also report the expectedException value here to >> aid in failure analysis. > > Good point - will update. > >> Thumbs up! I don't need to see another webrev if you decide to >> make the above small tweaks. > > Great - thanks again. > > David > >> Dan >> >> >>> >>> The spec change in ObjectReference is very simple and there is a CCC >>> request in progress to ratify that change. >>> >>> The implementation change in ObjectReferenceImpl mirrors the updated >>> spec and use the same format as already present in the class version >>> of the check method. >>> >>> The test is a little more complex. This is obviously an extension to >>> what is already tested in InterfaceMethodsTest. However IMT has a >>> number of problem with the way it is currently written [1] - >>> specifically it doesn't properly separate method lookup from method >>> invocation. So I've added the capability to separate lookup and >>> invocation for use with the private interface methods - I have not >>> tried to address shortcomings of the existing tests. Though I did fix >>> the return value checking logic! And did some clarifying comments and >>> renaming in a couple of place. >>> >>> Still on the test I can't add the negative tests I would like to add >>> because they actually pass due to a different long standing bug in JDI >>> - [2]. So the actual private interface method testing is very simple: >>> can I get the Method from the InterfaceType for the interface >>> declaring the method? Can I then invoke that method on an instance of >>> a class that implements the interface. >>> >>> Thanks, >>> David >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 >> From david.holmes at oracle.com Wed Oct 12 02:21:56 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 12:21:56 +1000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> Message-ID: <10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com> Looks good to me too! Only comment is do we want to change this comment: 84 // The default length of monitor name is chosen to be 64 to avoid false sharing. 85 static const int MONITOR_NAME_LEN = 64; and do we even want to change the value here? Thanks, David On 12/10/2016 2:26 AM, Doerr, Martin wrote: > Hi all, > > I came to the same conclusion regarding inheritance from PaddingEnd. > Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.) > > I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid. > > Here's a new webrev: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ > > It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?). > > Please take a look. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Dienstag, 11. Oktober 2016 12:05 > To: Coleen Phillimore ; Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > On 2016-10-11 02:03, Coleen Phillimore wrote: >> >> Hi, >> >> Was the linear allocation in mutex.cpp the cause of the false sharing >> that you observed? I think I like this change better than the >> original, because I've wondered myself why the name string was so >> long. So with this, we could make Monitor's smaller if they're >> embedded in metadata or other structures. > > Music to my ears! > > I even think most embedded uses would see improvements if _name was > removed entirely (or "simply" turned into a const char * so that it's > not copied and embedded into the Monitor/Mutex) > >> >> Thanks, >> Coleen >> >> On 10/10/16 2:00 PM, Doerr, Martin wrote: >>> Hi Claes, >>> >>> thank you very much for your explanations. >>> >>> I agree with you that it would be better to pad where the Monitors >>> are used. It would still fulfill the purpose of this RFE without >>> disturbing other usages. >>> >>> So I could introduce: >>> class PaddedMonitor : public Monitor { >>> enum { >>> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >>> (int)sizeof(Monitor), >>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >>> }; >>> char _padding[PADDING_LEN]; >>> }; >>> and similarly PaddedMutex and replace all of the ones which get >>> allocated in a linear fashion (mutexLocker.cpp mutex_init()). > > Sure! > > Some compilers may take issue with cases where PADDING_LEN == 0 (since > char _padding[0] is technically illegal C++, but works on gcc etc) so > maybe that special case will have to be (somewhat excessively): > > PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : > DEFAULT_CACHE_LINE_SIZE > > We took a look at if it'd be feasible to express class PaddedMonitor : > public PaddedEnd, but it appears that'd require variadic > template arguments (C++11) to get right (since we'd need PaddedEnd to > transitively publish constructors of Monitor). > > Thanks! > > /Claes > >>> >>> Would you agree with this change? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Freitag, 7. Oktober 2016 12:35 >>> To: Doerr, Martin ; >>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) ; >>> Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> I'm concerned that this might be an easy-but-wrong fix to a complex >>> problem, and acknowledging that there are already use cases where the >>> _name field is contra-productive. This change adds complexity that >>> makes it even less likely such uses will be optimized for in the >>> future. >>> >>> There are Padded* types put in place to deal with these concerns >>> explicitly rather than implicitly *where it matters*, which allows us >>> the choice of applying padding or not on a per use-case basis (which >>> means we can also remove the _name field for those use cases that don't >>> care about either, which might be most outside of the global lists). >>> >>> I am very concerned about false sharing, but I have no data to support >>> that this change has any measurable benefit in practice: I even did an >>> experiment years ago now where I turned _name into a pointer to not pad >>> at all and saw nothing exceeding noise levels on any benchmark. >>> >>> Thanks! >>> >>> /Claes >>> >>> On 2016-10-07 12:18, Doerr, Martin wrote: >>>> Hi Claes, >>>> >>>> what the change basically does is that the _name[] field gets >>>> enlarged by 8 bytes on platforms with 128 byte >>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>>> by the C++ compiler. >>>> What exactly is your concern about the footprint overhead? >>>> Are you not concerned about the risk of false sharing? >>>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>> Sent: Freitag, 7. Oktober 2016 12:00 >>>> To: Doerr, Martin ; >>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>>> David Holmes (david.holmes at oracle.com) ; >>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>> >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> Hi, >>>> >>>> after due consideration I strongly consider this change unacceptable >>>> since it adds footprint overhead to performance critcial compiler and >>>> GC code with little to no data to support this won't cause regressions. >>>> >>>> Changes to Monitor/Mutex needs to be done with more surgical precision >>>> than this. >>>> >>>> If I do have a veto on the matter, here it is. >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>>> Hi Dan, >>>>> >>>>> thank you very much for reviewing and for investigating the history. >>>>> >>>>> It was not intended to make the functions you mentioned public. >>>>> I've fixed that. >>>>> I also updated the copyright information. >>>>> >>>>> New webrev is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>>> >>>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>>> To: Doerr, Martin ; >>>>> hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>> DEFAULT_CACHE_LINE_SIZE >>>>> >>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>>> padding is to avoid false sharing. >>>>>> >>>>>> My proposed change is here: >>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>>> src/share/vm/runtime/mutex.hpp >>>>> Please update the copyright year before pushing. >>>>> >>>>> L172: // The default length of monitor name is chosen to >>>>> avoid >>>>> false sharing. >>>>> L173: enum { >>>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>>> sizeof(MonitorBase), >>>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>>> CACHE_LINE_PADDING : 64 >>>>> L176: }; >>>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>>> mutex >>>>> >>>>> I have to say that I'm not fond of the fact that >>>>> MONITOR_NAME_LEN >>>>> can vary between platforms; I like that it is a minimum >>>>> of 64 bytes >>>>> and is still a constant. >>>>> >>>>> I'm also not happy that the resulting sizeof(Monitor) >>>>> may not >>>>> be a multiple >>>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>>> mitigate >>>>> that unhappiness >>>>> with the fact that sizeof(Monitor) hasn't been a >>>>> multiple of >>>>> the cache line >>>>> size since at least 2008 and no one complained (that I >>>>> know of). >>>>> >>>>> So if I was making this change, I would make >>>>> MONITOR_NAME_LEN >>>>> 64 bytes >>>>> (like it was) and add a pad field that would bring up >>>>> sizeof(Monitor) >>>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>>> Claes >>>>> would be >>>>> unhappy with me and anyone embedding a Monitor into >>>>> another data >>>>> structure would be unhappy with me, but I'm used to >>>>> that :-) >>>>> >>>>> So what you have is fine, especially for JDK9. >>>>> >>>>> L180: public: >>>>> L181: #ifndef PRODUCT >>>>> L182: debug_only(static bool contains(Monitor * locks, >>>>> Monitor * >>>>> lock);) >>>>> L183: debug_only(static Monitor * >>>>> get_least_ranked_lock(Monitor * >>>>> locks);) >>>>> L184: debug_only(Monitor * >>>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>>> L185: #endif >>>>> L186: >>>>> L187: void set_owner_implementation(Thread* >>>>> owner) PRODUCT_RETURN; >>>>> L188: void check_prelock_state (Thread* >>>>> thread) PRODUCT_RETURN; >>>>> L189: void check_block_state (Thread* thread) >>>>> >>>>> These were all "protected" before. Now they are "public". >>>>> Any particular reason? >>>>> >>>>> Thumbs up on the mechanics of this change. I'm interested in the >>>>> answer to the "protected" versus "public" question, but don't >>>>> considered that query to be a blocker. >>>>> >>>>> >>>>> The rest of this isn't code review, but some of this caught >>>>> my attention. >>>>> >>>>> src/share/vm/runtime/mutex.hpp >>>>> >>>>> old L84: // The default length of monitor name is chosen to >>>>> be 64 >>>>> to avoid false sharing. >>>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>>> >>>>> I had to look up the history of this comment: >>>>> >>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>>> changeset: 55:2a8eb116ebbe >>>>> user: xlu >>>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>>> checking >>>>> >>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>>> // *in that order*. If their implementations change such that >>>>> these >>>>> // assumptions are violated, a whole lot of code will break. >>>>> >>>>> +// The default length of monitor name is choosen to be 64 to avoid >>>>> false sharing. >>>>> +static const int MONITOR_NAME_LEN = 64; >>>>> + >>>>> class Monitor : public CHeapObj { >>>>> >>>>> public: >>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>>> volatile bool _snuck; // Used for sneaky >>>>> locking >>>>> (evil). >>>>> - const char * _name; // Name of mutex >>>>> int NotifyCount ; // diagnostic assist >>>>> - double pad [8] ; // avoid false sharing >>>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>>> >>>>> // Debugging fields for naming, deadlock detection, etc. >>>>> (some only >>>>> used in debug mode) >>>>> #ifndef PRODUCT >>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>>> int ILocked () ; >>>>> >>>>> protected: >>>>> - static void ClearMonitor (Monitor * m) ; >>>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>>> Monitor() ; >>>>> >>>>> So the original code had an 8-double pad for avoiding false sharing. >>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>>> time that Dice determined that 8-double value, the result was to pad >>>>> the size of Monitor to an even multiple of a particular cache line >>>>> size. >>>>> >>>>> Xiobin changed the 'name' field to be an array so that the name >>>>> chars could serve double duty as the cache line pad... pun intended. >>>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>>> size is a multiple of the cache line size. >>>>> >>>>> Dan >>>>> >>>>> >>>>>> Please review. If will also need a sponsor. >>>>>> >>>>>> Thanks and best regards, >>>>>> Martin >>>>>> >> > From serguei.spitsyn at oracle.com Wed Oct 12 02:37:52 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 11 Oct 2016 19:37:52 -0700 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: Hi David, It looks good, thank you for test improvements. One minor comment. http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html 511 private Method testLookup(ReferenceType targetClass, String methodName, String methodSig, 512 boolean declaredOnly, Class expectedException) { 513 514 System.err.println("Looking up " + targetClass.name() + "." + methodName + methodSig); 515 try { 516 Method m = declaredOnly ? 517 lookupDeclaredMethod(targetClass, methodName, methodSig) : 518 lookupMethod(targetClass, methodName, methodSig); 519 520 if (expectedException == null) { 521 System.err.println("--- PASSED"); 522 return m; 523 } 524 } 525 catch (Throwable t) { 526 if (t.getClass() != expectedException) { 527 System.err.println("--- FAILED"); 528 failure("FAILED: got exception " + t + " but expected exception " 529 + expectedException.getSimpleName()); 530 return null; 531 } 532 else { 533 System.err.println("--- PASSED"); 534 return null; 535 } 536 } 537 System.err.println("--- FAILED"); 538 failure("FAILED: lookup succeeded but expected exception " 539 + expectedException.getSimpleName()); 540 return null; 541 } I'd be better to keep the fragments 520-523 and 537-540 together as they are logically bound. Perhaps, it is better to move the 520-523 to move before the L537. There are more cases to use the testLookup() in this test but it is probably for future improvements. Thanks, Serguei On 10/10/16 18:55, David Holmes wrote: > Turns out the only place changes were needed were in JDI. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 > > webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ > > The spec change in ObjectReference is very simple and there is a CCC > request in progress to ratify that change. > > The implementation change in ObjectReferenceImpl mirrors the updated > spec and use the same format as already present in the class version > of the check method. > > The test is a little more complex. This is obviously an extension to > what is already tested in InterfaceMethodsTest. However IMT has a > number of problem with the way it is currently written [1] - > specifically it doesn't properly separate method lookup from method > invocation. So I've added the capability to separate lookup and > invocation for use with the private interface methods - I have not > tried to address shortcomings of the existing tests. Though I did fix > the return value checking logic! And did some clarifying comments and > renaming in a couple of place. > > Still on the test I can't add the negative tests I would like to add > because they actually pass due to a different long standing bug in JDI > - [2]. So the actual private interface method testing is very simple: > can I get the Method from the InterfaceType for the interface > declaring the method? Can I then invoke that method on an instance of > a class that implements the interface. > > Thanks, > David > > [1] https://bugs.openjdk.java.net/browse/JDK-8166453 > [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From david.holmes at oracle.com Wed Oct 12 02:50:23 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 12:50:23 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com> Hi Serguei, Thanks for looking at this. On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote: > Hi David, > > It looks good, thank you for test improvements. > > One minor comment. > > http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html > > 511 private Method testLookup(ReferenceType targetClass, String > methodName, String methodSig, > 512 boolean declaredOnly, Class expectedException) { > 513 > 514 System.err.println("Looking up " + targetClass.name() + "." + > methodName + methodSig); > 515 try { > 516 Method m = declaredOnly ? > 517 lookupDeclaredMethod(targetClass, methodName, methodSig) : > 518 lookupMethod(targetClass, methodName, methodSig); > 519 > 520 if (expectedException == null) { > 521 System.err.println("--- PASSED"); > 522 return m; > 523 } > 524 } > 525 catch (Throwable t) { > 526 if (t.getClass() != expectedException) { > 527 System.err.println("--- FAILED"); > 528 failure("FAILED: got exception " + t + " but expected exception " > 529 + expectedException.getSimpleName()); > 530 return null; > 531 } > 532 else { > 533 System.err.println("--- PASSED"); > 534 return null; > 535 } > 536 } > 537 System.err.println("--- FAILED"); > 538 failure("FAILED: lookup succeeded but expected exception " > 539 + expectedException.getSimpleName()); > 540 return null; > 541 } > > I'd be better to keep the fragments 520-523 and 537-540 together as > they are logically bound. > Perhaps, it is better to move the 520-523 to move before the L537. You're right - but I prefer to move the code from L537 into an else for the if at L520. Webrev updated in place. > There are more cases to use the testLookup() in this test but it is > probably for future improvements. Yes - see the bugs I linked as [1] and [2]. There are even more bugs related to static interface method handling that impact this test. Bit of a can-of-worms. Thanks, David ----- > > Thanks, > Serguei > > > > On 10/10/16 18:55, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version >> of the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a >> number of problem with the way it is currently written [1] - >> specifically it doesn't properly separate method lookup from method >> invocation. So I've added the capability to separate lookup and >> invocation for use with the private interface methods - I have not >> tried to address shortcomings of the existing tests. Though I did fix >> the return value checking logic! And did some clarifying comments and >> renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI >> - [2]. So the actual private interface method testing is very simple: >> can I get the Method from the InterfaceType for the interface >> declaring the method? Can I then invoke that method on an instance of >> a class that implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > > From serguei.spitsyn at oracle.com Wed Oct 12 03:02:22 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 11 Oct 2016 20:02:22 -0700 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com> Message-ID: On 10/11/16 19:50, David Holmes wrote: > Hi Serguei, > > Thanks for looking at this. > > On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote: >> Hi David, >> >> It looks good, thank you for test improvements. >> >> One minor comment. >> >> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html >> >> >> 511 private Method testLookup(ReferenceType targetClass, String >> methodName, String methodSig, >> 512 boolean declaredOnly, Class expectedException) { >> 513 >> 514 System.err.println("Looking up " + targetClass.name() + "." + >> methodName + methodSig); >> 515 try { >> 516 Method m = declaredOnly ? >> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) : >> 518 lookupMethod(targetClass, methodName, methodSig); >> 519 >> 520 if (expectedException == null) { >> 521 System.err.println("--- PASSED"); >> 522 return m; >> 523 } >> 524 } >> 525 catch (Throwable t) { >> 526 if (t.getClass() != expectedException) { >> 527 System.err.println("--- FAILED"); >> 528 failure("FAILED: got exception " + t + " but expected exception " >> 529 + expectedException.getSimpleName()); >> 530 return null; >> 531 } >> 532 else { >> 533 System.err.println("--- PASSED"); >> 534 return null; >> 535 } >> 536 } >> 537 System.err.println("--- FAILED"); >> 538 failure("FAILED: lookup succeeded but expected exception " >> 539 + expectedException.getSimpleName()); >> 540 return null; >> 541 } >> >> I'd be better to keep the fragments 520-523 and 537-540 together as >> they are logically bound. >> Perhaps, it is better to move the 520-523 to move before the L537. > > You're right - but I prefer to move the code from L537 into an else > for the if at L520. Webrev updated in place. It's up to you. > >> There are more cases to use the testLookup() in this test but it is >> probably for future improvements. > > Yes - see the bugs I linked as [1] and [2]. Right. Perhaps, the it is a part of the JDK-8166453. > > There are even more bugs related to static interface method handling > that impact this test. Bit of a can-of-worms. BTW, would it make sense to consider one more test case ? private void testImplementationClass(ReferenceType targetClass, ObjectReference thisObject) { . . . testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I", vm().mirrorOf(RESULT_B), "private interface methods are not inheritable"); Thanks, Serguei > Thanks, David ----- >> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote: >>> Turns out the only place changes were needed were in JDI. Bug: >>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev: >>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change >>> in ObjectReference is very simple and there is a CCC request in >>> progress to ratify that change. The implementation change in >>> ObjectReferenceImpl mirrors the updated spec and use the same format >>> as already present in the class version of the check method. The >>> test is a little more complex. This is obviously an extension to >>> what is already tested in InterfaceMethodsTest. However IMT has a >>> number of problem with the way it is currently written [1] - >>> specifically it doesn't properly separate method lookup from method >>> invocation. So I've added the capability to separate lookup and >>> invocation for use with the private interface methods - I have not >>> tried to address shortcomings of the existing tests. Though I did >>> fix the return value checking logic! And did some clarifying >>> comments and renaming in a couple of place. Still on the test I >>> can't add the negative tests I would like to add because they >>> actually pass due to a different long standing bug in JDI - [2]. So >>> the actual private interface method testing is very simple: can I >>> get the Method from the InterfaceType for the interface declaring >>> the method? Can I then invoke that method on an instance of a class >>> that implements the interface. Thanks, David [1] >>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2] >>> https://bugs.openjdk.java.net/browse/JDK-8167416 From david.holmes at oracle.com Wed Oct 12 03:28:37 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 13:28:37 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com> Message-ID: On 12/10/2016 1:02 PM, serguei.spitsyn at oracle.com wrote: > On 10/11/16 19:50, David Holmes wrote: >> Hi Serguei, >> >> Thanks for looking at this. >> >> On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote: >>> Hi David, >>> >>> It looks good, thank you for test improvements. >>> >>> One minor comment. >>> >>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html >>> >>> >>> 511 private Method testLookup(ReferenceType targetClass, String >>> methodName, String methodSig, >>> 512 boolean declaredOnly, Class expectedException) { >>> 513 >>> 514 System.err.println("Looking up " + targetClass.name() + "." + >>> methodName + methodSig); >>> 515 try { >>> 516 Method m = declaredOnly ? >>> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) : >>> 518 lookupMethod(targetClass, methodName, methodSig); >>> 519 >>> 520 if (expectedException == null) { >>> 521 System.err.println("--- PASSED"); >>> 522 return m; >>> 523 } >>> 524 } >>> 525 catch (Throwable t) { >>> 526 if (t.getClass() != expectedException) { >>> 527 System.err.println("--- FAILED"); >>> 528 failure("FAILED: got exception " + t + " but expected exception " >>> 529 + expectedException.getSimpleName()); >>> 530 return null; >>> 531 } >>> 532 else { >>> 533 System.err.println("--- PASSED"); >>> 534 return null; >>> 535 } >>> 536 } >>> 537 System.err.println("--- FAILED"); >>> 538 failure("FAILED: lookup succeeded but expected exception " >>> 539 + expectedException.getSimpleName()); >>> 540 return null; >>> 541 } >>> >>> I'd be better to keep the fragments 520-523 and 537-540 together as >>> they are logically bound. >>> Perhaps, it is better to move the 520-523 to move before the L537. >> >> You're right - but I prefer to move the code from L537 into an else >> for the if at L520. Webrev updated in place. > > It's up to you. > > >> >>> There are more cases to use the testLookup() in this test but it is >>> probably for future improvements. >> >> Yes - see the bugs I linked as [1] and [2]. > > Right. Perhaps, the it is a part of the JDK-8166453. > >> >> There are even more bugs related to static interface method handling >> that impact this test. Bit of a can-of-worms. > > > > > BTW, would it make sense to consider one more test case ? > > private void testImplementationClass(ReferenceType targetClass, > ObjectReference thisObject) { > . . . > > testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I", > vm().mirrorOf(RESULT_B), > "private interface methods are not inheritable"); Such a test will presently fail. It will do a lookup of privateInterfaceMethodB from targetClass, which will succeed because of the way the local getMethods is implemented. The invocation will then be successful. The real test for the above would be a lookup of the private interface method in the implementation class, but that will succeed when it should not because of bug [2]. Within the current constraints of the test and the JDI implementation only the simple positive tests for private interface methods are possible. Thanks, David > Thanks, Serguei >> Thanks, David ----- >>> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote: >>>> Turns out the only place changes were needed were in JDI. Bug: >>>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev: >>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change >>>> in ObjectReference is very simple and there is a CCC request in >>>> progress to ratify that change. The implementation change in >>>> ObjectReferenceImpl mirrors the updated spec and use the same format >>>> as already present in the class version of the check method. The >>>> test is a little more complex. This is obviously an extension to >>>> what is already tested in InterfaceMethodsTest. However IMT has a >>>> number of problem with the way it is currently written [1] - >>>> specifically it doesn't properly separate method lookup from method >>>> invocation. So I've added the capability to separate lookup and >>>> invocation for use with the private interface methods - I have not >>>> tried to address shortcomings of the existing tests. Though I did >>>> fix the return value checking logic! And did some clarifying >>>> comments and renaming in a couple of place. Still on the test I >>>> can't add the negative tests I would like to add because they >>>> actually pass due to a different long standing bug in JDI - [2]. So >>>> the actual private interface method testing is very simple: can I >>>> get the Method from the InterfaceType for the interface declaring >>>> the method? Can I then invoke that method on an instance of a class >>>> that implements the interface. Thanks, David [1] >>>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2] >>>> https://bugs.openjdk.java.net/browse/JDK-8167416 > From ioi.lam at oracle.com Wed Oct 12 04:47:48 2016 From: ioi.lam at oracle.com (Ioi Lam) Date: Tue, 11 Oct 2016 21:47:48 -0700 Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class is in_error_state Message-ID: <57FDC074.1070900@oracle.com> https://bugs.openjdk.java.net/browse/JDK-8166203 http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/ Summary: Kudos to Coleen for noticing the bug. When dumping the CDS archive, we would throw NoClassDefFoundError inside InstanceKlass::link_class_impl() if the current class is in_error_state. This was only intended to be a convenient way to deal with verification errors during CDS dumping time. However, if the code is executed in normal VM execution time, it would violate the JLS. The fix is to throw the NoClassDefFoundError only when DumpSharedSpaces==true, to avoid affecting normal VM execution. Thanks - Ioi From david.holmes at oracle.com Wed Oct 12 05:08:38 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 15:08:38 +1000 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant Message-ID: Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ In IUnlock we have the following succession code to wakeup the "onDeck" thread: ParkEvent * List = _EntryList; if (List != NULL) { // Transfer the head of the EntryList to the OnDeck position. // Once OnDeck, a thread stays OnDeck until it acquires the lock. // For a given lock there is at most OnDeck thread at any one instant. WakeOne: assert(List == _EntryList, "invariant"); ParkEvent * const w = List; assert(RelaxAssert || w != Thread::current()->_MutexEvent, "invariant"); _EntryList = w->ListNext; // as a diagnostic measure consider setting w->_ListNext = BAD assert(UNS(_OnDeck) == _LBIT, "invariant"); _OnDeck = w; // pass OnDeck to w. It is critical that the update to _EntryList happens before we set _OnDeck, as as soon as _OnDeck is set the selected thread (which need not yet have parked) can acquire the mutex, complete its critical section and proceed to unlock the mutex, and so execute IUnlock in parallel with the original thread. If the write to _EntryList has not yet happened that second thread finds itself still at the head of _EntryList and so the assert fires. If the write to _EntryList happens after the load "List = _EntryList", then the first assert can also fire. Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) with a matching load_acquire(&_OnDeck) in the ILock code: while (_OnDeck != ESelf) { ParkCommon(ESelf, 0); } and corresponding "raw" lock code. Also fixed a couple of typos. Thanks, David From serguei.spitsyn at oracle.com Wed Oct 12 05:57:32 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 11 Oct 2016 22:57:32 -0700 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com> Message-ID: On 10/11/16 20:28, David Holmes wrote: > On 12/10/2016 1:02 PM, serguei.spitsyn at oracle.com wrote: >> On 10/11/16 19:50, David Holmes wrote: >>> Hi Serguei, >>> >>> Thanks for looking at this. >>> >>> On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote: >>>> Hi David, >>>> >>>> It looks good, thank you for test improvements. >>>> >>>> One minor comment. >>>> >>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html >>>> >>>> >>>> >>>> 511 private Method testLookup(ReferenceType targetClass, String >>>> methodName, String methodSig, >>>> 512 boolean declaredOnly, Class expectedException) { >>>> 513 >>>> 514 System.err.println("Looking up " + targetClass.name() + "." + >>>> methodName + methodSig); >>>> 515 try { >>>> 516 Method m = declaredOnly ? >>>> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) : >>>> 518 lookupMethod(targetClass, methodName, methodSig); >>>> 519 >>>> 520 if (expectedException == null) { >>>> 521 System.err.println("--- PASSED"); >>>> 522 return m; >>>> 523 } >>>> 524 } >>>> 525 catch (Throwable t) { >>>> 526 if (t.getClass() != expectedException) { >>>> 527 System.err.println("--- FAILED"); >>>> 528 failure("FAILED: got exception " + t + " but expected exception " >>>> 529 + expectedException.getSimpleName()); >>>> 530 return null; >>>> 531 } >>>> 532 else { >>>> 533 System.err.println("--- PASSED"); >>>> 534 return null; >>>> 535 } >>>> 536 } >>>> 537 System.err.println("--- FAILED"); >>>> 538 failure("FAILED: lookup succeeded but expected exception " >>>> 539 + expectedException.getSimpleName()); >>>> 540 return null; >>>> 541 } >>>> >>>> I'd be better to keep the fragments 520-523 and 537-540 together as >>>> they are logically bound. >>>> Perhaps, it is better to move the 520-523 to move before the L537. >>> >>> You're right - but I prefer to move the code from L537 into an else >>> for the if at L520. Webrev updated in place. >> >> It's up to you. >> >> >>> >>>> There are more cases to use the testLookup() in this test but it is >>>> probably for future improvements. >>> >>> Yes - see the bugs I linked as [1] and [2]. >> >> Right. Perhaps, the it is a part of the JDK-8166453. >> >>> >>> There are even more bugs related to static interface method handling >>> that impact this test. Bit of a can-of-worms. >> >> >> >> >> BTW, would it make sense to consider one more test case ? >> >> private void testImplementationClass(ReferenceType targetClass, >> ObjectReference thisObject) { >> . . . >> >> testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I", >> vm().mirrorOf(RESULT_B), >> "private interface methods are not inheritable"); > > Such a test will presently fail. It will do a lookup of > privateInterfaceMethodB from targetClass, which will succeed because > of the way the local getMethods is implemented. The invocation will > then be successful. The real test for the above would be a lookup of > the private interface method in the implementation class, but that > will succeed when it should not because of bug [2]. > > Within the current constraints of the test and the JDI implementation > only the simple positive tests for private interface methods are possible. Got it, thanks. Thanks, Serguei > > Thanks, > David > >> Thanks, Serguei >>> Thanks, David ----- >>>> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote: >>>>> Turns out the only place changes were needed were in JDI. Bug: >>>>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev: >>>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change >>>>> in ObjectReference is very simple and there is a CCC request in >>>>> progress to ratify that change. The implementation change in >>>>> ObjectReferenceImpl mirrors the updated spec and use the same format >>>>> as already present in the class version of the check method. The >>>>> test is a little more complex. This is obviously an extension to >>>>> what is already tested in InterfaceMethodsTest. However IMT has a >>>>> number of problem with the way it is currently written [1] - >>>>> specifically it doesn't properly separate method lookup from method >>>>> invocation. So I've added the capability to separate lookup and >>>>> invocation for use with the private interface methods - I have not >>>>> tried to address shortcomings of the existing tests. Though I did >>>>> fix the return value checking logic! And did some clarifying >>>>> comments and renaming in a couple of place. Still on the test I >>>>> can't add the negative tests I would like to add because they >>>>> actually pass due to a different long standing bug in JDI - [2]. So >>>>> the actual private interface method testing is very simple: can I >>>>> get the Method from the InterfaceType for the interface declaring >>>>> the method? Can I then invoke that method on an instance of a class >>>>> that implements the interface. Thanks, David [1] >>>>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2] >>>>> https://bugs.openjdk.java.net/browse/JDK-8167416 >> From david.holmes at oracle.com Wed Oct 12 05:58:59 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 12 Oct 2016 15:58:59 +1000 Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class is in_error_state In-Reply-To: <57FDC074.1070900@oracle.com> References: <57FDC074.1070900@oracle.com> Message-ID: Hi Ioi, On 12/10/2016 2:47 PM, Ioi Lam wrote: > https://bugs.openjdk.java.net/browse/JDK-8166203 > http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/ > > Summary: > > Kudos to Coleen for noticing the bug. > > When dumping the CDS archive, we would throw NoClassDefFoundError inside > InstanceKlass::link_class_impl() if the current class is in_error_state. > This was only intended to be a convenient way to deal with verification > errors during CDS dumping time. However, if the code is executed in > normal VM execution time, it would violate the JLS. > > The fix is to throw the NoClassDefFoundError only when > DumpSharedSpaces==true, to avoid affecting normal VM execution. Fix looks fine. Test change is somewhat confusing. What bug does this still refer to? 160 try { 161 boolean bb = Iunlinked.v; 162 } catch(NoClassDefFoundError e) { 163 System.out.println("NoClassDefFoundError thrown because of bug"); 164 } Either the try block should complete exceptionally or the catch block, to indicate a failure. Thanks, David > Thanks > - Ioi > From shafi.s.ahmad at oracle.com Wed Oct 12 07:12:17 2016 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Wed, 12 Oct 2016 00:12:17 -0700 (PDT) Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: Message-ID: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> Hi Mikael, Thanks for reviewing it. Once VM is initialized, following are two OOME scenarios: 1) OOME due to unavailability of java memory [Mainly due to java application]. 2) OOME due to unavailability of native memory. Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is correct. Consider the following scenarios: 1) Let there is java application which uses JNI code and inside JNI code there is native memory allocation/free and we hit OOME. 2) Let there is java application which uses JNI code and inside JNI code there is memory leak error and due to this OOME situation occurs. 3) We use jvm option Xms and -Xmx in such a way that the available native memory is very less and VM hit OOME. I am not sure above scenario is feasible in JVM or not but if any of the above scenario is possible in VM then should we consider it as OOME due java application or not? I consider case 1) and 2) as OOME due to java application and added code for java.lang.OutOfMemoryError inside report_vm_out_of_memory. My assumption of OOME once VM is initialized completely is due to java application[directly or indirectly] may not hold true always. -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added the related code change inside report_vm_out_of_memory. Yes, I must not use ' java.lang.OutOfMemoryError' for such case. Please let me know whether I should remove the code change inside report_vm_out_of_memory or keep it by adding appropriate reason of OutOfMemoryError. Regards, Shafi > -----Original Message----- > From: Mikael Gerdin > Sent: Monday, October 10, 2016 7:30 PM > To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > work for OOM caused by inability to create threads' > > Hi, > > On 2016-10-10 09:24, Shafi Ahmad wrote: > > Hi All, > > > > Please review the simple change for the fix of bug '' JDK-8155004: > CrashOnOutOfMemoryError doesn't work for OOM caused by inability to > create threads'. > > > > Summary: > > In the current implementation there are few scenarios where we are not > obeying the jvm option -XX:+CrashOnOutOfMemoryError. > > While I was analysis this issue I found there are two jvm state where OOM > can happen: > > 1. OOM during VM initialization - as per our internal discussion for this case > it is not worth for dumping core file, so this is left as it is. > > 2. OOM once VM is initialized - For this scenario most of the place code is > already added but few place corresponding code changes are missing so this > change covers it. > > > > Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > > > There is a lot of confusion in the VM code with the term "out of memory > error". > In some places it refers to code throwing a java.lang.OutOfMemoryError and > expecting running java code to be able to potentially catch that Error and > continue running. > > In other places, such as callers of report_vm_out_of_memory, the situation > is much more dire and the calling thread may not even be a JavaThread and > as such cannot "throw" an exception. > report_vm_out_of_memory is only invoked through the macro > vm_exit_out_of_memory, which of course implies that the condition is fatal > and we are about to terminate the JVM process altogether. > > I think that it's incorrect to call code related to java.lang.OutOfMemoryError > in report_vm_out_of_memory since the condition may not even be > correlated with Java level application behavior. > > /Mikael > > > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > > > > Testing: jprt and jtreg (on Linux x86_64) > > > > Regards, > > Shafi > > From serguei.spitsyn at oracle.com Wed Oct 12 08:08:04 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 12 Oct 2016 01:08:04 -0700 Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class is in_error_state In-Reply-To: References: <57FDC074.1070900@oracle.com> Message-ID: Hi Ioi, The fix looks good to me. But I agree with David below that the catch statement is somewhat confusing. The test needs to fail in such a case with a message like "Unexpected NoClassDefFoundError <...>". Thanks, Serguei On 10/11/16 22:58, David Holmes wrote: > Hi Ioi, > > On 12/10/2016 2:47 PM, Ioi Lam wrote: >> https://bugs.openjdk.java.net/browse/JDK-8166203 >> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/ >> >> Summary: >> >> Kudos to Coleen for noticing the bug. >> >> When dumping the CDS archive, we would throw NoClassDefFoundError inside >> InstanceKlass::link_class_impl() if the current class is in_error_state. >> This was only intended to be a convenient way to deal with verification >> errors during CDS dumping time. However, if the code is executed in >> normal VM execution time, it would violate the JLS. >> >> The fix is to throw the NoClassDefFoundError only when >> DumpSharedSpaces==true, to avoid affecting normal VM execution. > > Fix looks fine. > > Test change is somewhat confusing. What bug does this still refer to? > > 160 try { > 161 boolean bb = Iunlinked.v; > 162 } catch(NoClassDefFoundError e) { > 163 System.out.println("NoClassDefFoundError thrown > because of bug"); > 164 } > > Either the try block should complete exceptionally or the catch block, > to indicate a failure. > > Thanks, > David > > >> Thanks >> - Ioi >> From martin.doerr at sap.com Wed Oct 12 08:53:01 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Wed, 12 Oct 2016 08:53:01 +0000 Subject: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE In-Reply-To: <10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com> References: <57F77202.8070201@oracle.com> <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap> <57F77A4B.6060604@oracle.com> <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap> <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com> <10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com> Message-ID: Thanks everbody for reviewing. The webrev with additional comments is here: http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.04/ I have added a TODO to check if the _name[] array should better get replaced by a const char*. Would you like me to open a new bug for jdk 10 so we have a reminder? Thank you very much for sponsoring, Coleen. Best regards, Martin -----Original Message----- From: David Holmes [mailto:david.holmes at oracle.com] Sent: Mittwoch, 12. Oktober 2016 04:22 To: Doerr, Martin ; Claes Redestad ; Coleen Phillimore ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE Looks good to me too! Only comment is do we want to change this comment: 84 // The default length of monitor name is chosen to be 64 to avoid false sharing. 85 static const int MONITOR_NAME_LEN = 64; and do we even want to change the value here? Thanks, David On 12/10/2016 2:26 AM, Doerr, Martin wrote: > Hi all, > > I came to the same conclusion regarding inheritance from PaddingEnd. > Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.) > > I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid. > > Here's a new webrev: > http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/ > > It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?). > > Please take a look. > > Thanks and best regards, > Martin > > > -----Original Message----- > From: Claes Redestad [mailto:claes.redestad at oracle.com] > Sent: Dienstag, 11. Oktober 2016 12:05 > To: Coleen Phillimore ; Doerr, Martin ; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) > Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE > > Hi, > > On 2016-10-11 02:03, Coleen Phillimore wrote: >> >> Hi, >> >> Was the linear allocation in mutex.cpp the cause of the false sharing >> that you observed? I think I like this change better than the >> original, because I've wondered myself why the name string was so >> long. So with this, we could make Monitor's smaller if they're >> embedded in metadata or other structures. > > Music to my ears! > > I even think most embedded uses would see improvements if _name was > removed entirely (or "simply" turned into a const char * so that it's > not copied and embedded into the Monitor/Mutex) > >> >> Thanks, >> Coleen >> >> On 10/10/16 2:00 PM, Doerr, Martin wrote: >>> Hi Claes, >>> >>> thank you very much for your explanations. >>> >>> I agree with you that it would be better to pad where the Monitors >>> are used. It would still fulfill the purpose of this RFE without >>> disturbing other usages. >>> >>> So I could introduce: >>> class PaddedMonitor : public Monitor { >>> enum { >>> CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - >>> (int)sizeof(Monitor), >>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0 >>> }; >>> char _padding[PADDING_LEN]; >>> }; >>> and similarly PaddedMutex and replace all of the ones which get >>> allocated in a linear fashion (mutexLocker.cpp mutex_init()). > > Sure! > > Some compilers may take issue with cases where PADDING_LEN == 0 (since > char _padding[0] is technically illegal C++, but works on gcc etc) so > maybe that special case will have to be (somewhat excessively): > > PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : > DEFAULT_CACHE_LINE_SIZE > > We took a look at if it'd be feasible to express class PaddedMonitor : > public PaddedEnd, but it appears that'd require variadic > template arguments (C++11) to get right (since we'd need PaddedEnd to > transitively publish constructors of Monitor). > > Thanks! > > /Claes > >>> >>> Would you agree with this change? >>> >>> Thanks and best regards, >>> Martin >>> >>> >>> -----Original Message----- >>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>> Sent: Freitag, 7. Oktober 2016 12:35 >>> To: Doerr, Martin ; >>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>> David Holmes (david.holmes at oracle.com) ; >>> Coleen Phillimore (coleen.phillimore at oracle.com) >>> >>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>> DEFAULT_CACHE_LINE_SIZE >>> >>> Hi, >>> >>> I'm concerned that this might be an easy-but-wrong fix to a complex >>> problem, and acknowledging that there are already use cases where the >>> _name field is contra-productive. This change adds complexity that >>> makes it even less likely such uses will be optimized for in the >>> future. >>> >>> There are Padded* types put in place to deal with these concerns >>> explicitly rather than implicitly *where it matters*, which allows us >>> the choice of applying padding or not on a per use-case basis (which >>> means we can also remove the _name field for those use cases that don't >>> care about either, which might be most outside of the global lists). >>> >>> I am very concerned about false sharing, but I have no data to support >>> that this change has any measurable benefit in practice: I even did an >>> experiment years ago now where I turned _name into a pointer to not pad >>> at all and saw nothing exceeding noise levels on any benchmark. >>> >>> Thanks! >>> >>> /Claes >>> >>> On 2016-10-07 12:18, Doerr, Martin wrote: >>>> Hi Claes, >>>> >>>> what the change basically does is that the _name[] field gets >>>> enlarged by 8 bytes on platforms with 128 byte >>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed >>>> by the C++ compiler. >>>> What exactly is your concern about the footprint overhead? >>>> Are you not concerned about the risk of false sharing? >>>> >>>> Best regards, >>>> Martin >>>> >>>> -----Original Message----- >>>> From: Claes Redestad [mailto:claes.redestad at oracle.com] >>>> Sent: Freitag, 7. Oktober 2016 12:00 >>>> To: Doerr, Martin ; >>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; >>>> David Holmes (david.holmes at oracle.com) ; >>>> Coleen Phillimore (coleen.phillimore at oracle.com) >>>> >>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>> DEFAULT_CACHE_LINE_SIZE >>>> >>>> Hi, >>>> >>>> after due consideration I strongly consider this change unacceptable >>>> since it adds footprint overhead to performance critcial compiler and >>>> GC code with little to no data to support this won't cause regressions. >>>> >>>> Changes to Monitor/Mutex needs to be done with more surgical precision >>>> than this. >>>> >>>> If I do have a veto on the matter, here it is. >>>> >>>> Thanks! >>>> >>>> /Claes >>>> >>>> On 2016-10-07 11:34, Doerr, Martin wrote: >>>>> Hi Dan, >>>>> >>>>> thank you very much for reviewing and for investigating the history. >>>>> >>>>> It was not intended to make the functions you mentioned public. >>>>> I've fixed that. >>>>> I also updated the copyright information. >>>>> >>>>> New webrev is here: >>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/ >>>>> >>>>> @Coleen: Please use this one. I have also added reviewer attribution. >>>>> >>>>> Thanks and best regards, >>>>> Martin >>>>> >>>>> >>>>> -----Original Message----- >>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] >>>>> Sent: Donnerstag, 6. Oktober 2016 23:13 >>>>> To: Doerr, Martin ; >>>>> hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to >>>>> DEFAULT_CACHE_LINE_SIZE >>>>> >>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote: >>>>>> Hi, >>>>>> >>>>>> the current implementation of Monitor padding (mutex.cpp) assumes >>>>>> that cache lines are 64 Bytes. There's a platform dependent define >>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of >>>>>> padding is to avoid false sharing. >>>>>> >>>>>> My proposed change is here: >>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ >>>>> src/share/vm/runtime/mutex.hpp >>>>> Please update the copyright year before pushing. >>>>> >>>>> L172: // The default length of monitor name is chosen to >>>>> avoid >>>>> false sharing. >>>>> L173: enum { >>>>> L174: CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - >>>>> sizeof(MonitorBase), >>>>> L175: MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? >>>>> CACHE_LINE_PADDING : 64 >>>>> L176: }; >>>>> L177: char _name[MONITOR_NAME_LEN]; // Name of >>>>> mutex >>>>> >>>>> I have to say that I'm not fond of the fact that >>>>> MONITOR_NAME_LEN >>>>> can vary between platforms; I like that it is a minimum >>>>> of 64 bytes >>>>> and is still a constant. >>>>> >>>>> I'm also not happy that the resulting sizeof(Monitor) >>>>> may not >>>>> be a multiple >>>>> of the DEFAULT_CACHE_LINE_SIZE. However, I have to >>>>> mitigate >>>>> that unhappiness >>>>> with the fact that sizeof(Monitor) hasn't been a >>>>> multiple of >>>>> the cache line >>>>> size since at least 2008 and no one complained (that I >>>>> know of). >>>>> >>>>> So if I was making this change, I would make >>>>> MONITOR_NAME_LEN >>>>> 64 bytes >>>>> (like it was) and add a pad field that would bring up >>>>> sizeof(Monitor) >>>>> to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, >>>>> Claes >>>>> would be >>>>> unhappy with me and anyone embedding a Monitor into >>>>> another data >>>>> structure would be unhappy with me, but I'm used to >>>>> that :-) >>>>> >>>>> So what you have is fine, especially for JDK9. >>>>> >>>>> L180: public: >>>>> L181: #ifndef PRODUCT >>>>> L182: debug_only(static bool contains(Monitor * locks, >>>>> Monitor * >>>>> lock);) >>>>> L183: debug_only(static Monitor * >>>>> get_least_ranked_lock(Monitor * >>>>> locks);) >>>>> L184: debug_only(Monitor * >>>>> get_least_ranked_lock_besides_this(Monitor * locks);) >>>>> L185: #endif >>>>> L186: >>>>> L187: void set_owner_implementation(Thread* >>>>> owner) PRODUCT_RETURN; >>>>> L188: void check_prelock_state (Thread* >>>>> thread) PRODUCT_RETURN; >>>>> L189: void check_block_state (Thread* thread) >>>>> >>>>> These were all "protected" before. Now they are "public". >>>>> Any particular reason? >>>>> >>>>> Thumbs up on the mechanics of this change. I'm interested in the >>>>> answer to the "protected" versus "public" question, but don't >>>>> considered that query to be a blocker. >>>>> >>>>> >>>>> The rest of this isn't code review, but some of this caught >>>>> my attention. >>>>> >>>>> src/share/vm/runtime/mutex.hpp >>>>> >>>>> old L84: // The default length of monitor name is chosen to >>>>> be 64 >>>>> to avoid false sharing. >>>>> old L85: static const int MONITOR_NAME_LEN = 64; >>>>> >>>>> I had to look up the history of this comment: >>>>> >>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp >>>>> changeset: 55:2a8eb116ebbe >>>>> user: xlu >>>>> date: Tue Feb 05 23:21:57 2008 -0800 >>>>> summary: 6610420: Debug VM crashes during monitor lock rank >>>>> checking >>>>> >>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp >>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp >>>>> --- a/src/share/vm/runtime/mutex.hpp Thu Jan 31 14:56:50 2008 -0500 >>>>> +++ b/src/share/vm/runtime/mutex.hpp Tue Feb 05 23:21:57 2008 -0800 >>>>> @@ -82,6 +82,9 @@ class ParkEvent ; >>>>> // *in that order*. If their implementations change such that >>>>> these >>>>> // assumptions are violated, a whole lot of code will break. >>>>> >>>>> +// The default length of monitor name is choosen to be 64 to avoid >>>>> false sharing. >>>>> +static const int MONITOR_NAME_LEN = 64; >>>>> + >>>>> class Monitor : public CHeapObj { >>>>> >>>>> public: >>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj { >>>>> volatile intptr_t _WaitLock [1] ; // Protects _WaitSet >>>>> ParkEvent * volatile _WaitSet ; // LL of ParkEvents >>>>> volatile bool _snuck; // Used for sneaky >>>>> locking >>>>> (evil). >>>>> - const char * _name; // Name of mutex >>>>> int NotifyCount ; // diagnostic assist >>>>> - double pad [8] ; // avoid false sharing >>>>> + char _name[MONITOR_NAME_LEN]; // Name of mutex >>>>> >>>>> // Debugging fields for naming, deadlock detection, etc. >>>>> (some only >>>>> used in debug mode) >>>>> #ifndef PRODUCT >>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj { >>>>> int ILocked () ; >>>>> >>>>> protected: >>>>> - static void ClearMonitor (Monitor * m) ; >>>>> + static void ClearMonitor (Monitor * m, const char* name = NULL) ; >>>>> Monitor() ; >>>>> >>>>> So the original code had an 8-double pad for avoiding false sharing. >>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the >>>>> time that Dice determined that 8-double value, the result was to pad >>>>> the size of Monitor to an even multiple of a particular cache line >>>>> size. >>>>> >>>>> Xiobin changed the 'name' field to be an array so that the name >>>>> chars could serve double duty as the cache line pad... pun intended. >>>>> Unfortunately that pad doesn't make sure that the resulting Monitor >>>>> size is a multiple of the cache line size. >>>>> >>>>> Dan >>>>> >>>>> >>>>>> Please review. If will also need a sponsor. >>>>>> >>>>>> Thanks and best regards, >>>>>> Martin >>>>>> >> > From coleen.phillimore at oracle.com Wed Oct 12 12:17:52 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 12 Oct 2016 08:17:52 -0400 Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class is in_error_state In-Reply-To: References: <57FDC074.1070900@oracle.com> Message-ID: <3979349a-edb5-16be-af19-33c2aa808209@oracle.com> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/test/runtime/lambda-features/InterfaceInitializationStates.java.udiff.html Same comment as the others. Just take out the try block for Iunlinked. Thank you for fixing this! Coleen On 10/12/16 4:08 AM, serguei.spitsyn at oracle.com wrote: > Hi Ioi, > > The fix looks good to me. > But I agree with David below that the catch statement is somewhat > confusing. > The test needs to fail in such a case with a message like "Unexpected > NoClassDefFoundError <...>". > > Thanks, > Serguei > > > > On 10/11/16 22:58, David Holmes wrote: >> Hi Ioi, >> >> On 12/10/2016 2:47 PM, Ioi Lam wrote: >>> https://bugs.openjdk.java.net/browse/JDK-8166203 >>> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/ >>> >>> Summary: >>> >>> Kudos to Coleen for noticing the bug. >>> >>> When dumping the CDS archive, we would throw NoClassDefFoundError >>> inside >>> InstanceKlass::link_class_impl() if the current class is >>> in_error_state. >>> This was only intended to be a convenient way to deal with verification >>> errors during CDS dumping time. However, if the code is executed in >>> normal VM execution time, it would violate the JLS. >>> >>> The fix is to throw the NoClassDefFoundError only when >>> DumpSharedSpaces==true, to avoid affecting normal VM execution. >> >> Fix looks fine. >> >> Test change is somewhat confusing. What bug does this still refer to? >> >> 160 try { >> 161 boolean bb = Iunlinked.v; >> 162 } catch(NoClassDefFoundError e) { >> 163 System.out.println("NoClassDefFoundError thrown >> because of bug"); >> 164 } >> >> Either the try block should complete exceptionally or the catch >> block, to indicate a failure. >> >> Thanks, >> David >> >> >>> Thanks >>> - Ioi >>> > From varming at gmail.com Wed Oct 12 13:21:44 2016 From: varming at gmail.com (Carsten Varming) Date: Wed, 12 Oct 2016 09:21:44 -0400 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: References: Message-ID: Dear David, In line 590 "Pass onDeck to w". I don't understand this part of the comment. Should it say something like "Pass ownership of _OnDeck and _EntryList to w". In line 532 there is a read of _OnDeck and in line 553 there is a read of _EntryList. Why is it safe not to do a load_acquire of _OnDeck in line 532? Carsten On Wed, Oct 12, 2016 at 1:08 AM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ > > In IUnlock we have the following succession code to wakeup the "onDeck" > thread: > > ParkEvent * List = _EntryList; > if (List != NULL) { > // Transfer the head of the EntryList to the OnDeck position. > // Once OnDeck, a thread stays OnDeck until it acquires the lock. > // For a given lock there is at most OnDeck thread at any one instant. > WakeOne: > assert(List == _EntryList, "invariant"); > ParkEvent * const w = List; > assert(RelaxAssert || w != Thread::current()->_MutexEvent, > "invariant"); > _EntryList = w->ListNext; > // as a diagnostic measure consider setting w->_ListNext = BAD > assert(UNS(_OnDeck) == _LBIT, "invariant"); > _OnDeck = w; // pass OnDeck to w. > > It is critical that the update to _EntryList happens before we set > _OnDeck, as as soon as _OnDeck is set the selected thread (which need not > yet have parked) can acquire the mutex, complete its critical section and > proceed to unlock the mutex, and so execute IUnlock in parallel with the > original thread. If the write to _EntryList has not yet happened that > second thread finds itself still at the head of _EntryList and so the > assert fires. If the write to _EntryList happens after the load "List = > _EntryList", then the first assert can also fire. > > Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) > with a matching load_acquire(&_OnDeck) in the ILock code: > > while (_OnDeck != ESelf) { > ParkCommon(ESelf, 0); > } > > and corresponding "raw" lock code. Also fixed a couple of typos. > > Thanks, > David > From daniel.daugherty at oracle.com Wed Oct 12 15:03:50 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Wed, 12 Oct 2016 09:03:50 -0600 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: References: Message-ID: On 10/11/16 11:08 PM, David Holmes wrote: > Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ Very nice catch! We should check the ObjectMonitor succession code for similar issues (my task). src/share/vm/runtime/mutex.cpp L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, UNS(ESelf)) == 0) { Thanks for fixing this bug also! L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) { So you've changed this load of _OnDeck to use load-acquire which matches the new store-release on L595: OrderAccess::release_store_ptr(&_OnDeck, w); What about the other loads of _OnDeck or stores to _OnDeck? There should at least be a new comment explaining why we don't need an OrderAccess operation for those. Update: I see you changed one other load of _OnDeck on L1061. Now I'm really wanting comments for the other _OnDeck loads and stores. :-) Update: I see Carsten V. asked about this in a slightly different way. L590: // Pass onDeck to w, ensuring that _EntryList has been set first. Typo: 'onDeck' -> 'OnDeck' I suspect you don't want to fix all this CamelCase usage to meet HotSpot style. I did that for most of the ObjectMonitor code and it was painful. We could clean it up early in JDK10. Update: I see Carsten has a comment about this comment also. I don't think I quite agree that we're "passing" _EntryList to w, but I can be convinced otherwise... Again, very nice catch! I'd like to see another webrev with the other _OnDeck loads and stores either updated for OrderAccess ops or some comment explaining why it's not needed. Dan > > In IUnlock we have the following succession code to wakeup the > "onDeck" thread: > > ParkEvent * List = _EntryList; > if (List != NULL) { > // Transfer the head of the EntryList to the OnDeck position. > // Once OnDeck, a thread stays OnDeck until it acquires the lock. > // For a given lock there is at most OnDeck thread at any one > instant. > WakeOne: > assert(List == _EntryList, "invariant"); > ParkEvent * const w = List; > assert(RelaxAssert || w != Thread::current()->_MutexEvent, > "invariant"); > _EntryList = w->ListNext; > // as a diagnostic measure consider setting w->_ListNext = BAD > assert(UNS(_OnDeck) == _LBIT, "invariant"); > _OnDeck = w; // pass OnDeck to w. > > It is critical that the update to _EntryList happens before we set > _OnDeck, as as soon as _OnDeck is set the selected thread (which need > not yet have parked) can acquire the mutex, complete its critical > section and proceed to unlock the mutex, and so execute IUnlock in > parallel with the original thread. If the write to _EntryList has not > yet happened that second thread finds itself still at the head of > _EntryList and so the assert fires. If the write to _EntryList happens > after the load "List = _EntryList", then the first assert can also fire. > > Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) > with a matching load_acquire(&_OnDeck) in the ILock code: > > while (_OnDeck != ESelf) { > ParkCommon(ESelf, 0); > } > > and corresponding "raw" lock code. Also fixed a couple of typos. > > Thanks, > David From ioi.lam at oracle.com Wed Oct 12 16:09:23 2016 From: ioi.lam at oracle.com (Ioi Lam) Date: Wed, 12 Oct 2016 09:09:23 -0700 Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class is in_error_state In-Reply-To: <3979349a-edb5-16be-af19-33c2aa808209@oracle.com> References: <57FDC074.1070900@oracle.com> <3979349a-edb5-16be-af19-33c2aa808209@oracle.com> Message-ID: <57FE6033.40805@oracle.com> David, Serguei & Coleen, Thanks for the comments. I will fix the test by removing the try .. catch block. - Ioi On 10/12/16 5:17 AM, Coleen Phillimore wrote: > http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/test/runtime/lambda-features/InterfaceInitializationStates.java.udiff.html > > > Same comment as the others. > > Just take out the try block for Iunlinked. > > Thank you for fixing this! > Coleen > > > On 10/12/16 4:08 AM, serguei.spitsyn at oracle.com wrote: >> Hi Ioi, >> >> The fix looks good to me. >> But I agree with David below that the catch statement is somewhat >> confusing. >> The test needs to fail in such a case with a message like "Unexpected >> NoClassDefFoundError <...>". >> >> Thanks, >> Serguei >> >> >> >> On 10/11/16 22:58, David Holmes wrote: >>> Hi Ioi, >>> >>> On 12/10/2016 2:47 PM, Ioi Lam wrote: >>>> https://bugs.openjdk.java.net/browse/JDK-8166203 >>>> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/ >>>> >>>> Summary: >>>> >>>> Kudos to Coleen for noticing the bug. >>>> >>>> When dumping the CDS archive, we would throw NoClassDefFoundError >>>> inside >>>> InstanceKlass::link_class_impl() if the current class is >>>> in_error_state. >>>> This was only intended to be a convenient way to deal with >>>> verification >>>> errors during CDS dumping time. However, if the code is executed in >>>> normal VM execution time, it would violate the JLS. >>>> >>>> The fix is to throw the NoClassDefFoundError only when >>>> DumpSharedSpaces==true, to avoid affecting normal VM execution. >>> >>> Fix looks fine. >>> >>> Test change is somewhat confusing. What bug does this still refer to? >>> >>> 160 try { >>> 161 boolean bb = Iunlinked.v; >>> 162 } catch(NoClassDefFoundError e) { >>> 163 System.out.println("NoClassDefFoundError thrown >>> because of bug"); >>> 164 } >>> >>> Either the try block should complete exceptionally or the catch >>> block, to indicate a failure. >>> >>> Thanks, >>> David >>> >>> >>>> Thanks >>>> - Ioi >>>> >> > From coleen.phillimore at oracle.com Wed Oct 12 21:10:45 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 12 Oct 2016 17:10:45 -0400 Subject: [8u-dev] Request for approval: 8163969: Cyclic interface initialization causes JVM crash Message-ID: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> Summary: Backport change to correct interface initialization. There were too many changes to instanceKlass.cpp for a clean backport. Also in JDK8, this corrects interface initialization to not initialize the whole interface hierarchy if an interface, not class, initializes initialization. This is to correctly follow JLS 12.4.2 step 7. I filed a compatibility request (in review) to document the difference in behavior, which I believe will not be noticed. Tested with JPRT, including runtime jtreg lambda-features tests, and JCK tests. open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 Thanks, Coleen From coleen.phillimore at oracle.com Wed Oct 12 22:51:04 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 12 Oct 2016 18:51:04 -0400 Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> Message-ID: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> Note, this is also an RFR since the backport wasn't clean. thanks, Coleen On 10/12/16 5:10 PM, Coleen Phillimore wrote: > Summary: Backport change to correct interface initialization. > > There were too many changes to instanceKlass.cpp for a clean > backport. Also in JDK8, this corrects interface initialization to not > initialize the whole interface hierarchy if an interface, not class, > initializes initialization. This is to correctly follow JLS 12.4.2 > step 7. I filed a compatibility request (in review) to document the > difference in behavior, which I believe will not be noticed. > > Tested with JPRT, including runtime jtreg lambda-features tests, and > JCK tests. > > open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 > > Thanks, > Coleen From george.triantafillou at oracle.com Wed Oct 12 23:25:01 2016 From: george.triantafillou at oracle.com (George Triantafillou) Date: Wed, 12 Oct 2016 19:25:01 -0400 Subject: [8u-dev] Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> Message-ID: <53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com> Hi Coleen, Small typo in src/share/vm/oops/instanceKlass.cpp: 889 // Next, if C is a class rather than an interface, initialize it's super class and super change to 889 // Next, if C is a class rather than an interface, initialize its super class and super Otherwise, looks good. -George On 10/12/2016 5:10 PM, Coleen Phillimore wrote: > Summary: Backport change to correct interface initialization. > > There were too many changes to instanceKlass.cpp for a clean > backport. Also in JDK8, this corrects interface initialization to not > initialize the whole interface hierarchy if an interface, not class, > initializes initialization. This is to correctly follow JLS 12.4.2 > step 7. I filed a compatibility request (in review) to document the > difference in behavior, which I believe will not be noticed. > > Tested with JPRT, including runtime jtreg lambda-features tests, and > JCK tests. > > open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev > bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 > > Thanks, > Coleen From serguei.spitsyn at oracle.com Wed Oct 12 23:35:45 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Wed, 12 Oct 2016 16:35:45 -0700 Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> Message-ID: <7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com> Coleen, The backport looks good to me. Minor questions to the test. http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev/test/runtime/lambda-features/TestInterfaceInit.java.frames.html 2 * Copyright (c) 2014, 2015, Oracle and/or its affiliates. All rights reserved. 2015 => 2016 28 * @bug 8098557 Why the new bug number is not 8163969? Thanks, Serguei On 10/12/16 15:51, Coleen Phillimore wrote: > > Note, this is also an RFR since the backport wasn't clean. > thanks, > Coleen > > > On 10/12/16 5:10 PM, Coleen Phillimore wrote: >> Summary: Backport change to correct interface initialization. >> >> There were too many changes to instanceKlass.cpp for a clean >> backport. Also in JDK8, this corrects interface initialization to >> not initialize the whole interface hierarchy if an interface, not >> class, initializes initialization. This is to correctly follow JLS >> 12.4.2 step 7. I filed a compatibility request (in review) to >> document the difference in behavior, which I believe will not be >> noticed. >> >> Tested with JPRT, including runtime jtreg lambda-features tests, and >> JCK tests. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >> >> Thanks, >> Coleen > From david.holmes at oracle.com Thu Oct 13 00:54:31 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Oct 2016 10:54:31 +1000 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: References: Message-ID: <504e9148-54b2-6203-465c-fbda63a19375@oracle.com> Hi Carsten, Thanks for looking at this. On 12/10/2016 11:21 PM, Carsten Varming wrote: > Dear David, > > In line 590 "Pass onDeck to w". I don't understand this part of the > comment. Should it say something like "Pass ownership of _OnDeck and > _EntryList to w". First note that that part of the comment already existed in the old code: 589 _OnDeck = w; // pass OnDeck to w. This is making 'w' the OnDeck thread - nothing to do with _EntryList. Prior to executing this line the current thread holds the "OnDeck lock" - which is just a logical lock obtained by CASing _OnDeck from 0 to 1. The current thread selects 'w' (as the head of _EntryList) to be the new OnDeck thread, and passes that role to it through the assignment - and of course by doing so drops the "OnDeck lock". I changed to read: // Pass OnDeck role to w, ensuring that _EntryList has been set first. > In line 532 there is a read of _OnDeck and in line 553 there is a read > of _EntryList. Why is it safe not to do a load_acquire of _OnDeck in > line 532? I knew I should have just stuck in a storestore() then we could all be blissfully ignorant :) Okay here's that code fragment with comments elided 514 void Monitor::IUnlock(bool RelaxAssert) { 529 OrderAccess::release_store(&_LockWord.Bytes[_LSBINDEX], 0); // drop outer lock 530 531 OrderAccess::storeload(); 532 ParkEvent * const w = _OnDeck; 533 assert(RelaxAssert || w != Thread::current()->_MutexEvent, "invariant"); 534 if (w != NULL) { 548 if ((UNS(w) & _LBIT) == 0) w->unpark(); 549 return; 550 } A release-store to X is used to ensure that shared data written prior to the store to X actually occurs prior to that store. A load-acquire of X ensures that if the load sees the value written by the store-release then it also sees the updates to the shared data. In the current case the release-store to _OnDeck writes the non-NULL value 'w', and if the code above sees a non-NULL value (ie it sees 'w') then at most it unparks 'w' and returns. It never accesses the shared data ie _EntryList. So no load-acquire is needed as we are not trying to "synchronize" with the changes to the shared state made by the thread that did the release-store. I've added a comment as per my response to Dan's email Now lets look at all the other loads of _OnDeck: - there are a bunch of asserts such as: 444 assert(_OnDeck != Self->_MutexEvent, "invariant"); 588 assert(UNS(_OnDeck) == _LBIT, "invariant"); 836 assert(_OnDeck != ESelf, "invariant"); 1197 assert((UNS(_owner)|UNS(_LockWord.FullWord)|UNS(_EntryList)|UNS(_WaitSet)|UNS(_OnDeck)) == 0, ""); and a load that is only used for an assert: 1161 uintptr_t ondeck = UNS(_OnDeck); These are all logical checks on the current state and are not attempting to synchronize with any other changes to shared state, so no need for load-acquire. I hope there is no controversy on that point. - there are some CAS operations which already embody full bi-directional fences which subsume the load-acquire (whether directly needed to synchronize with the store-release or not) 466 if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, UNS(ESelf)) == 0) { 573 if (CASPTR (&_OnDeck, NULL, _LBIT) != UNS(NULL)) { - then we have: 843 if (_OnDeck == ESelf && TrySpin(Self)) break; I missed this one - it needs the load-acquire for the same reason as the code in ILock - which should have been obvious given: 841 // The following fragment is extracted from Monitor::ILock() :) Webrev updated in place - also see my response to Dan. Thanks, David > Carsten > > > On Wed, Oct 12, 2016 at 1:08 AM, David Holmes > wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 > > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ > > > In IUnlock we have the following succession code to wakeup the > "onDeck" thread: > > ParkEvent * List = _EntryList; > if (List != NULL) { > // Transfer the head of the EntryList to the OnDeck position. > // Once OnDeck, a thread stays OnDeck until it acquires the lock. > // For a given lock there is at most OnDeck thread at any one > instant. > WakeOne: > assert(List == _EntryList, "invariant"); > ParkEvent * const w = List; > assert(RelaxAssert || w != Thread::current()->_MutexEvent, > "invariant"); > _EntryList = w->ListNext; > // as a diagnostic measure consider setting w->_ListNext = BAD > assert(UNS(_OnDeck) == _LBIT, "invariant"); > _OnDeck = w; // pass OnDeck to w. > > It is critical that the update to _EntryList happens before we set > _OnDeck, as as soon as _OnDeck is set the selected thread (which > need not yet have parked) can acquire the mutex, complete its > critical section and proceed to unlock the mutex, and so execute > IUnlock in parallel with the original thread. If the write to > _EntryList has not yet happened that second thread finds itself > still at the head of _EntryList and so the assert fires. If the > write to _EntryList happens after the load "List = _EntryList", then > the first assert can also fire. > > Preferred fix today is to use OrderAccess::release_store(&_OnDeck, > w) with a matching load_acquire(&_OnDeck) in the ILock code: > > while (_OnDeck != ESelf) { > ParkCommon(ESelf, 0); > } > > and corresponding "raw" lock code. Also fixed a couple of typos. > > Thanks, > David > > From david.holmes at oracle.com Thu Oct 13 01:18:05 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Oct 2016 11:18:05 +1000 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: References: Message-ID: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> Hi Dan, Thanks for looking at this. On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote: > On 10/11/16 11:08 PM, David Holmes wrote: >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 >> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ > > Very nice catch! We should check the ObjectMonitor succession code for > similar issues (my task). Yes. As I said in email I did a quick check through but the succession logic is sufficiently different that nothing was obviously wrong in a similar way. > > src/share/vm/runtime/mutex.cpp > L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, > UNS(ESelf)) == 0) { > Thanks for fixing this bug also! > > L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) { > So you've changed this load of _OnDeck to use load-acquire > which matches the new store-release on L595: > > OrderAccess::release_store_ptr(&_OnDeck, w); Right. > What about the other loads of _OnDeck or stores to _OnDeck? > There should at least be a new comment explaining why we don't > need an OrderAccess operation for those. Update: I see you > changed one other load of _OnDeck on L1061. Now I'm really > wanting comments for the other _OnDeck loads and stores. :-) > > Update: I see Carsten V. asked about this in a slightly different > way. See my reply to Carsten re the load's. I did miss one as we have three "locking" paths that need to synchronize with the IUnlock code. As for documenting ... for line 532 I can add something simple like: 532 ParkEvent * const w = _OnDeck; // raw load as we will just return if non-NULL For the other stores to _OnDeck ... CAS should be obvious. The setting to NULL should also be quite clear as only the _OnDeck thread sets to NULL to relinquish being _OnDeck once it has acquired the mutex, which happens via CAS which has full barriers. None of the plain stores are in the context of: some_var = y; // write some shared-state _OnDeck = NULL; // signal some_var has been updated > L590: // Pass onDeck to w, ensuring that _EntryList has been set > first. > Typo: 'onDeck' -> 'OnDeck' > > I suspect you don't want to fix all this CamelCase usage to meet > HotSpot style. I did that for most of the ObjectMonitor code and > it was painful. We could clean it up early in JDK10. I fixed the typo and also changed ONDECK to OnDeck so that we generally refer to OnDeck in commentary unless specifically referring to the _OnDeck field. > Update: I see Carsten has a comment about this comment also. I > don't think I quite agree that we're "passing" _EntryList to w, > but I can be convinced otherwise... Right, nothing to do with _EntryList just making w the OnDeck thread. > Again, very nice catch! I'd like to see another webrev with the other > _OnDeck loads and stores either updated for OrderAccess ops or some > comment explaining why it's not needed. webrev updated in place with one comment and one new use of load-acquire. Plus some cosmetic changes. Thanks again, David > Dan > > >> >> In IUnlock we have the following succession code to wakeup the >> "onDeck" thread: >> >> ParkEvent * List = _EntryList; >> if (List != NULL) { >> // Transfer the head of the EntryList to the OnDeck position. >> // Once OnDeck, a thread stays OnDeck until it acquires the lock. >> // For a given lock there is at most OnDeck thread at any one >> instant. >> WakeOne: >> assert(List == _EntryList, "invariant"); >> ParkEvent * const w = List; >> assert(RelaxAssert || w != Thread::current()->_MutexEvent, >> "invariant"); >> _EntryList = w->ListNext; >> // as a diagnostic measure consider setting w->_ListNext = BAD >> assert(UNS(_OnDeck) == _LBIT, "invariant"); >> _OnDeck = w; // pass OnDeck to w. >> >> It is critical that the update to _EntryList happens before we set >> _OnDeck, as as soon as _OnDeck is set the selected thread (which need >> not yet have parked) can acquire the mutex, complete its critical >> section and proceed to unlock the mutex, and so execute IUnlock in >> parallel with the original thread. If the write to _EntryList has not >> yet happened that second thread finds itself still at the head of >> _EntryList and so the assert fires. If the write to _EntryList happens >> after the load "List = _EntryList", then the first assert can also fire. >> >> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) >> with a matching load_acquire(&_OnDeck) in the ILock code: >> >> while (_OnDeck != ESelf) { >> ParkCommon(ESelf, 0); >> } >> >> and corresponding "raw" lock code. Also fixed a couple of typos. >> >> Thanks, >> David > From david.holmes at oracle.com Thu Oct 13 01:47:39 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Oct 2016 11:47:39 +1000 Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> Message-ID: <67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com> Hi Coleen, On 13/10/2016 8:51 AM, Coleen Phillimore wrote: > > Note, this is also an RFR since the backport wasn't clean. > thanks, > Coleen Backport of fix itself is good. I'm assuming you simply copied across the existing test from the JDK9 repo. The reference in the test to 8098557 is confusing because 8098557 was never backported and 8163969 effectively replaces it. So as Serguei alluded to I'd replace the @bug 8098557 with 8163969. Thanks, David > > On 10/12/16 5:10 PM, Coleen Phillimore wrote: >> Summary: Backport change to correct interface initialization. >> >> There were too many changes to instanceKlass.cpp for a clean >> backport. Also in JDK8, this corrects interface initialization to not >> initialize the whole interface hierarchy if an interface, not class, >> initializes initialization. This is to correctly follow JLS 12.4.2 >> step 7. I filed a compatibility request (in review) to document the >> difference in behavior, which I believe will not be noticed. >> >> Tested with JPRT, including runtime jtreg lambda-features tests, and >> JCK tests. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >> >> Thanks, >> Coleen > From thomas.stuefe at gmail.com Thu Oct 13 04:55:55 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 13 Oct 2016 06:55:55 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. Message-ID: Dear all, please take a look at the following fix: Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html --- In short, this fix provides the ability to cancel hanging error reporting steps. This uses the same code paths secondary error handling uses during error reporting. With this patch, steps which take too long will be canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this: 4 [timeout occurred during error reporting in step ""] after xxxx ms. 5 and we now also get a finish message in the hs-err file if we hit the ErrorLogTimeout and error reporting will stop altogether: 6 ------ Timout during error reporting after xxx ms. ------ (in addition to the "time expired, abort" message the WatcherThread writes to stderr) --- This is something which bugged us for a long time, because we rely heavily on the hs_err files for error analysis at customer sites, and there are a number of reasons why one step may hang and prevent the follow-up steps from running. It works like this: Before, when error reporting started, the WatcherThread was waiting for ErrorLogTimeout seconds, then would stop the VM. Now, the WatcherThread periodically pings error reporting, which checks if the last step did timeout. If it does, it sends a signal to the reporting thread, and the thread will continue with the next step. This follows the same path as secondary crash handling. Some implementation details: On Posix platforms, to interrupt the thread, I use pthread_kill. This means I must know the pthread id of the reporting thread, which I now store at the beginning of error reporting. We already store the reporting thread id in first_error_tid, but that I cannot use, because it gets set by os::current_thread_id(), which is not always the pthread id. Should we ever switch to only using pthread id for posix platforms, this coding can be simplified. On Windows, there is unfortunately no easy way to interrupt a non-cooperative thread. I would need a way to cause a SEH inside the target thread, which then would get handled by secondary error handling like on Posix platforms, but that is not easy. It is doable - one can suspend the thread, modify the thread context in a way that it will crash upon resume. But that felt a bit heavyweight for this problem. So on windows, timeout handling still works (after ErrorLogTimeout the VM gets shut down), but error reporting steps are not interruptable. If we feel this is important, this can be added later. Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Oct 13 05:49:33 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 13 Oct 2016 07:49:33 +0200 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. Message-ID: Hi all, may I have plase a review for this tiny change? It just adds some assert to NMT. Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ MEMFLAGS/webrev.00/webrev/ We had an ugly memory overwrite caused by this - ultimately our fault, because we fed an invalid memory flag to NMT - but it was difficult to find. An assert would have saved some time. Thank you! Thomas From david.holmes at oracle.com Thu Oct 13 10:08:24 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Oct 2016 20:08:24 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: Message-ID: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Hi Thomas, On 13/10/2016 3:49 PM, Thomas St?fe wrote: > Hi all, > > may I have plase a review for this tiny change? It just adds some assert to > NMT. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 > webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ > MEMFLAGS/webrev.00/webrev/ > > We had an ugly memory overwrite caused by this - ultimately our fault, > because we fed an invalid memory flag to NMT - but it was difficult to > find. An assert would have saved some time. I'm a little perplexed with asserting that something of MEMFLAGS type must be an actual MEMFLAGS value - it implies the caller is coercing plain int to MEMFLAGS, and I don't have much sympathy if they mess that up. Can't help wondering if there is some clever C++ trick to flag bad conversions at compile-time? The function that takes the index should validate the index, so that is fine. Which one were you actually passing the bad value to? :) This isn't a strong objection just musing if we can do better. And as the hs repos are still closed, and likely to remain so till early next week, we have some slack time :) Cheers, David > Thank you! > > Thomas > From david.holmes at oracle.com Thu Oct 13 10:25:13 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 13 Oct 2016 20:25:13 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: In the interests of fairness I should also point out this is technically an enhancement not a bug fix. David On 13/10/2016 8:08 PM, David Holmes wrote: > Hi Thomas, > > On 13/10/2016 3:49 PM, Thomas St?fe wrote: >> Hi all, >> >> may I have plase a review for this tiny change? It just adds some >> assert to >> NMT. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ >> MEMFLAGS/webrev.00/webrev/ >> >> We had an ugly memory overwrite caused by this - ultimately our fault, >> because we fed an invalid memory flag to NMT - but it was difficult to >> find. An assert would have saved some time. > > I'm a little perplexed with asserting that something of MEMFLAGS type > must be an actual MEMFLAGS value - it implies the caller is coercing > plain int to MEMFLAGS, and I don't have much sympathy if they mess that > up. Can't help wondering if there is some clever C++ trick to flag bad > conversions at compile-time? > > The function that takes the index should validate the index, so that is > fine. > > Which one were you actually passing the bad value to? :) > > This isn't a strong objection just musing if we can do better. And as > the hs repos are still closed, and likely to remain so till early next > week, we have some slack time :) > > Cheers, > David > >> Thank you! >> >> Thomas >> From george.triantafillou at oracle.com Thu Oct 13 11:40:14 2016 From: george.triantafillou at oracle.com (George Triantafillou) Date: Thu, 13 Oct 2016 07:40:14 -0400 Subject: RFR(XS) 8166155: Create tests for VM module option handling In-Reply-To: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com> References: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com> Message-ID: <5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com> After offline feedback from Dmitry Dmitriev, here's an updated webrev: http://cr.openjdk.java.net/~gtriantafill/8166155/webrev.01/ The test was moved to a separate test for VM module option handling. Thanks. -George On 9/15/2016 2:25 PM, George Triantafillou wrote: > Please review this change that adds test coverage for the new VM > module option handling implemented in > https://bugs.openjdk.java.net/browse/JDK-8157038. > > JBS: https://bugs.openjdk.java.net/browse/JDK-8166155 > webrev: http://cr.openjdk.java.net/~gtriantafill/8166155/webrev/ > > > Tested locally on Linux. > > Thanks. > > -George > From lois.foltan at oracle.com Thu Oct 13 11:40:24 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Thu, 13 Oct 2016 07:40:24 -0400 Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java needs update for JDK-8162401 In-Reply-To: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com> References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com> Message-ID: <57FF72A8.1030002@oracle.com> On 10/11/2016 5:14 PM, Mandy Chung wrote: > Harold, > > Can you review this test update: > > diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java b/test/runtime/modules/IgnoreModulePropertiesTest.java > --- a/test/runtime/modules/IgnoreModulePropertiesTest.java > +++ b/test/runtime/modules/IgnoreModulePropertiesTest.java > @@ -69,8 +69,9 @@ > public static void main(String[] args) throws Exception { > testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException"); > testOption("--limit-modules", "java.sqlx", "jdk.module.limitmods", "java.lang.module.ResolutionException"); > - testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "java.lang.RuntimeException"); > - testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", "java.lang.RuntimeException"); > + testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "WARNING: Unknown module: xyzz"); > + testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", > + "WARNING: package xyzz not in java.base"); > testOption("--patch-module", "=d", "jdk.module.patch.0", "IllegalArgumentException"); > } > } Hi Mandy, Looks good. > > -?add-modules is now a repeating option. Should this line: > testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException"); > > be changed to ?jdk.module.addmods.0?, as in addreads, addexports property? Yes, I think it should. Lois > > Mandy From coleen.phillimore at oracle.com Thu Oct 13 12:26:34 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Oct 2016 08:26:34 -0400 Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> <7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com> Message-ID: Thank you Serguei. On 10/12/16 7:35 PM, serguei.spitsyn at oracle.com wrote: > Coleen, > > The backport looks good to me. > > Minor questions to the test. > > > http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev/test/runtime/lambda-features/TestInterfaceInit.java.frames.html > > > 2 * Copyright (c) 2014, 2015, Oracle and/or its affiliates. All rights > reserved. 2015 => 2016 > My commit script fixes copyrights, so I'll change that in the commit. > > 28 * @bug 8098557 > > Why the new bug number is not 8163969? > I fixed the bug number as you and David suggested. Thanks again! Coleen > > Thanks, > Serguei > > > > On 10/12/16 15:51, Coleen Phillimore wrote: >> >> Note, this is also an RFR since the backport wasn't clean. >> thanks, >> Coleen >> >> >> On 10/12/16 5:10 PM, Coleen Phillimore wrote: >>> Summary: Backport change to correct interface initialization. >>> >>> There were too many changes to instanceKlass.cpp for a clean >>> backport. Also in JDK8, this corrects interface initialization to >>> not initialize the whole interface hierarchy if an interface, not >>> class, initializes initialization. This is to correctly follow JLS >>> 12.4.2 step 7. I filed a compatibility request (in review) to >>> document the difference in behavior, which I believe will not be >>> noticed. >>> >>> Tested with JPRT, including runtime jtreg lambda-features tests, and >>> JCK tests. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >>> >>> Thanks, >>> Coleen >> > From coleen.phillimore at oracle.com Thu Oct 13 12:27:11 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Oct 2016 08:27:11 -0400 Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com> <67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com> Message-ID: <6beff22e-62a5-5277-dad4-ffb5c8bcbc91@oracle.com> On 10/12/16 9:47 PM, David Holmes wrote: > Hi Coleen, > > On 13/10/2016 8:51 AM, Coleen Phillimore wrote: >> >> Note, this is also an RFR since the backport wasn't clean. >> thanks, >> Coleen > > Backport of fix itself is good. > > I'm assuming you simply copied across the existing test from the JDK9 > repo. The reference in the test to 8098557 is confusing because > 8098557 was never backported and 8163969 effectively replaces it. So > as Serguei alluded to I'd replace the @bug 8098557 with 8163969. Thanks, yes, I fixed it. Thanks for you and Serguei noticing it. Coleen > > Thanks, > David > >> >> On 10/12/16 5:10 PM, Coleen Phillimore wrote: >>> Summary: Backport change to correct interface initialization. >>> >>> There were too many changes to instanceKlass.cpp for a clean >>> backport. Also in JDK8, this corrects interface initialization to not >>> initialize the whole interface hierarchy if an interface, not class, >>> initializes initialization. This is to correctly follow JLS 12.4.2 >>> step 7. I filed a compatibility request (in review) to document the >>> difference in behavior, which I believe will not be noticed. >>> >>> Tested with JPRT, including runtime jtreg lambda-features tests, and >>> JCK tests. >>> >>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >>> >>> Thanks, >>> Coleen >> From coleen.phillimore at oracle.com Thu Oct 13 12:45:36 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Oct 2016 08:45:36 -0400 Subject: [8u-dev] Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com> Message-ID: Thanks George. On 10/12/16 7:25 PM, George Triantafillou wrote: > Hi Coleen, > > Small typo in src/share/vm/oops/instanceKlass.cpp: > > 889 // Next, if C is a class rather than an interface, initialize > it's super class and super > > change to > > 889 // Next, if C is a class rather than an interface, initialize > its super class and super > > Otherwise, looks good. I always want to type the ' in it's for some reason. I fixed it. Coleen > > -George > > On 10/12/2016 5:10 PM, Coleen Phillimore wrote: >> Summary: Backport change to correct interface initialization. >> >> There were too many changes to instanceKlass.cpp for a clean >> backport. Also in JDK8, this corrects interface initialization to >> not initialize the whole interface hierarchy if an interface, not >> class, initializes initialization. This is to correctly follow JLS >> 12.4.2 step 7. I filed a compatibility request (in review) to >> document the difference in behavior, which I believe will not be >> noticed. >> >> Tested with JPRT, including runtime jtreg lambda-features tests, and >> JCK tests. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >> >> Thanks, >> Coleen > From thomas.stuefe at gmail.com Thu Oct 13 12:53:01 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 13 Oct 2016 14:53:01 +0200 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: Hi David, On Thu, Oct 13, 2016 at 12:08 PM, David Holmes wrote: > Hi Thomas, > > On 13/10/2016 3:49 PM, Thomas St?fe wrote: > >> Hi all, >> >> may I have plase a review for this tiny change? It just adds some assert >> to >> NMT. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 >> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul >> d-check_ >> MEMFLAGS/webrev.00/webrev/ >> >> We had an ugly memory overwrite caused by this - ultimately our fault, >> because we fed an invalid memory flag to NMT - but it was difficult to >> find. An assert would have saved some time. >> > > I'm a little perplexed with asserting that something of MEMFLAGS type must > be an actual MEMFLAGS value - it implies the caller is coercing plain int > to MEMFLAGS, and I don't have much sympathy if they mess that up. Can't > help wondering if there is some clever C++ trick to flag bad conversions at > compile-time? > > The error was caused by an uninitialized variable of type MEMFLAGS. This was our fault, we have heavily modified allocation.hpp and introduced an error then merging changes from upstream. Due to a merging error this lead to a case where Arena::_flags was not initialized and contained a very large value. I admit it looks funny. If it bothers you, I could instead check the returned index to be in the range for the size of the _malloc array in MallocMemorySnapshot::by_type(). Technically, it would mean the same. > The function that takes the index should validate the index, so that is > fine. > > Which one were you actually passing the bad value to? :) > > This isn't a strong objection just musing if we can do better. And as the > hs repos are still closed, and likely to remain so till early next week, we > have some slack time :) > > :) Sure. Kind Regards, Thomas > Cheers, > David > > Thank you! >> >> Thomas >> >> From thomas.stuefe at gmail.com Thu Oct 13 12:57:51 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 13 Oct 2016 14:57:51 +0200 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: On Thu, Oct 13, 2016 at 12:25 PM, David Holmes wrote: > In the interests of fairness I should also point out this is technically > an enhancement not a bug fix. > > David You are right, I changed this to an enhancement in Jira. From harold.seigel at oracle.com Thu Oct 13 13:00:15 2016 From: harold.seigel at oracle.com (harold seigel) Date: Thu, 13 Oct 2016 09:00:15 -0400 Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java needs update for JDK-8162401 In-Reply-To: <57FF72A8.1030002@oracle.com> References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com> <57FF72A8.1030002@oracle.com> Message-ID: <2ae090cf-39ff-9e0b-cce3-33f93ee46f65@oracle.com> Hi Mandy, Sorry, I was off yesterday. Your changes look good. Harold On 10/13/2016 7:40 AM, Lois Foltan wrote: > > On 10/11/2016 5:14 PM, Mandy Chung wrote: >> Harold, >> >> Can you review this test update: >> >> diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java >> b/test/runtime/modules/IgnoreModulePropertiesTest.java >> --- a/test/runtime/modules/IgnoreModulePropertiesTest.java >> +++ b/test/runtime/modules/IgnoreModulePropertiesTest.java >> @@ -69,8 +69,9 @@ >> public static void main(String[] args) throws Exception { >> testOption("--add-modules", "java.sqlx", >> "jdk.module.addmods", "java.lang.module.ResolutionException"); >> testOption("--limit-modules", "java.sqlx", >> "jdk.module.limitmods", "java.lang.module.ResolutionException"); >> - testOption("--add-reads", "xyzz=yyzd", >> "jdk.module.addreads.0", "java.lang.RuntimeException"); >> - testOption("--add-exports", "java.base/xyzz=yyzd", >> "jdk.module.addexports.0", "java.lang.RuntimeException"); >> + testOption("--add-reads", "xyzz=yyzd", >> "jdk.module.addreads.0", "WARNING: Unknown module: xyzz"); >> + testOption("--add-exports", "java.base/xyzz=yyzd", >> "jdk.module.addexports.0", >> + "WARNING: package xyzz not in java.base"); >> testOption("--patch-module", "=d", "jdk.module.patch.0", >> "IllegalArgumentException"); >> } >> } > > Hi Mandy, > Looks good. > >> >> -?add-modules is now a repeating option. Should this line: >> testOption("--add-modules", "java.sqlx", "jdk.module.addmods", >> "java.lang.module.ResolutionException"); >> >> be changed to ?jdk.module.addmods.0?, as in addreads, addexports >> property? > > Yes, I think it should. > Lois > >> >> Mandy > From varming at gmail.com Thu Oct 13 14:20:34 2016 From: varming at gmail.com (Carsten Varming) Date: Thu, 13 Oct 2016 10:20:34 -0400 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> References: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> Message-ID: Dear David, The updated webrev looks good to me. Carsten On Wed, Oct 12, 2016 at 9:18 PM, David Holmes wrote: > Hi Dan, > > Thanks for looking at this. > > On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote: > >> On 10/11/16 11:08 PM, David Holmes wrote: >> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 >>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ >>> >> >> Very nice catch! We should check the ObjectMonitor succession code for >> similar issues (my task). >> > > Yes. As I said in email I did a quick check through but the succession > logic is sufficiently different that nothing was obviously wrong in a > similar way. > > >> src/share/vm/runtime/mutex.cpp >> L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, >> UNS(ESelf)) == 0) { >> Thanks for fixing this bug also! >> >> L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) { >> So you've changed this load of _OnDeck to use load-acquire >> which matches the new store-release on L595: >> >> OrderAccess::release_store_ptr(&_OnDeck, w); >> > > Right. > > What about the other loads of _OnDeck or stores to _OnDeck? >> There should at least be a new comment explaining why we don't >> need an OrderAccess operation for those. Update: I see you >> changed one other load of _OnDeck on L1061. Now I'm really >> wanting comments for the other _OnDeck loads and stores. :-) >> >> Update: I see Carsten V. asked about this in a slightly different >> way. >> > > See my reply to Carsten re the load's. I did miss one as we have three > "locking" paths that need to synchronize with the IUnlock code. > > As for documenting ... for line 532 I can add something simple like: > > 532 ParkEvent * const w = _OnDeck; // raw load as we will just return > if non-NULL > > For the other stores to _OnDeck ... CAS should be obvious. The setting to > NULL should also be quite clear as only the _OnDeck thread sets to NULL to > relinquish being _OnDeck once it has acquired the mutex, which happens via > CAS which has full barriers. None of the plain stores are in the context of: > > some_var = y; // write some shared-state > _OnDeck = NULL; // signal some_var has been updated > > L590: // Pass onDeck to w, ensuring that _EntryList has been set >> first. >> Typo: 'onDeck' -> 'OnDeck' >> >> I suspect you don't want to fix all this CamelCase usage to meet >> HotSpot style. I did that for most of the ObjectMonitor code and >> it was painful. We could clean it up early in JDK10. >> > > I fixed the typo and also changed ONDECK to OnDeck so that we generally > refer to OnDeck in commentary unless specifically referring to the _OnDeck > field. > > Update: I see Carsten has a comment about this comment also. I >> don't think I quite agree that we're "passing" _EntryList to w, >> but I can be convinced otherwise... >> > > Right, nothing to do with _EntryList just making w the OnDeck thread. > > Again, very nice catch! I'd like to see another webrev with the other >> _OnDeck loads and stores either updated for OrderAccess ops or some >> comment explaining why it's not needed. >> > > webrev updated in place with one comment and one new use of load-acquire. > Plus some cosmetic changes. > > Thanks again, > David > > > Dan >> >> >> >>> In IUnlock we have the following succession code to wakeup the >>> "onDeck" thread: >>> >>> ParkEvent * List = _EntryList; >>> if (List != NULL) { >>> // Transfer the head of the EntryList to the OnDeck position. >>> // Once OnDeck, a thread stays OnDeck until it acquires the lock. >>> // For a given lock there is at most OnDeck thread at any one >>> instant. >>> WakeOne: >>> assert(List == _EntryList, "invariant"); >>> ParkEvent * const w = List; >>> assert(RelaxAssert || w != Thread::current()->_MutexEvent, >>> "invariant"); >>> _EntryList = w->ListNext; >>> // as a diagnostic measure consider setting w->_ListNext = BAD >>> assert(UNS(_OnDeck) == _LBIT, "invariant"); >>> _OnDeck = w; // pass OnDeck to w. >>> >>> It is critical that the update to _EntryList happens before we set >>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need >>> not yet have parked) can acquire the mutex, complete its critical >>> section and proceed to unlock the mutex, and so execute IUnlock in >>> parallel with the original thread. If the write to _EntryList has not >>> yet happened that second thread finds itself still at the head of >>> _EntryList and so the assert fires. If the write to _EntryList happens >>> after the load "List = _EntryList", then the first assert can also fire. >>> >>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) >>> with a matching load_acquire(&_OnDeck) in the ILock code: >>> >>> while (_OnDeck != ESelf) { >>> ParkCommon(ESelf, 0); >>> } >>> >>> and corresponding "raw" lock code. Also fixed a couple of typos. >>> >>> Thanks, >>> David >>> >> >> From mandy.chung at oracle.com Thu Oct 13 14:58:06 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 13 Oct 2016 07:58:06 -0700 Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java needs update for JDK-8162401 In-Reply-To: <57FF72A8.1030002@oracle.com> References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com> <57FF72A8.1030002@oracle.com> Message-ID: <5AFFF1F1-4B2D-407E-A95C-3F977AAF9FA0@oracle.com> > On Oct 13, 2016, at 4:40 AM, Lois Foltan wrote: > > >> >> -?add-modules is now a repeating option. Should this line: >> testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException"); >> >> be changed to ?jdk.module.addmods.0?, as in addreads, addexports property? > > Yes, I think it should. I can change this since I?m on this file. Thanks Mandy From max.ockner at oracle.com Thu Oct 13 15:35:05 2016 From: max.ockner at oracle.com (Max Ockner) Date: Thu, 13 Oct 2016 11:35:05 -0400 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: <57FFA9A9.50401@oracle.com> Hi Thomas, (Comments below. ) Max On 10/13/2016 8:53 AM, Thomas St?fe wrote: > Hi David, > > On Thu, Oct 13, 2016 at 12:08 PM, David Holmes > wrote: > >> Hi Thomas, >> >> On 13/10/2016 3:49 PM, Thomas St?fe wrote: >> >>> Hi all, >>> >>> may I have plase a review for this tiny change? It just adds some assert >>> to >>> NMT. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 >>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul >>> d-check_ >>> MEMFLAGS/webrev.00/webrev/ >>> >>> We had an ugly memory overwrite caused by this - ultimately our fault, >>> because we fed an invalid memory flag to NMT - but it was difficult to >>> find. An assert would have saved some time. It is alarming that a bug in NMT could cause a problem in memory management, since it was my understanding that memory allocation decisions are not informed by the NMT state. >> I'm a little perplexed with asserting that something of MEMFLAGS type must >> be an actual MEMFLAGS value - it implies the caller is coercing plain int >> to MEMFLAGS, and I don't have much sympathy if they mess that up. Can't >> help wondering if there is some clever C++ trick to flag bad conversions at >> compile-time? >> >> > The error was caused by an uninitialized variable of type MEMFLAGS. This > was our fault, we have heavily modified allocation.hpp and introduced an > error then merging changes from upstream. Due to a merging error this lead > to a case where Arena::_flags was not initialized and contained a very > large value. > > I admit it looks funny. If it bothers you, I could instead check the > returned index to be in the range for the size of the _malloc array in > MallocMemorySnapshot::by_type(). Technically, it would mean the same. > > > >> The function that takes the index should validate the index, so that is >> fine. I agree with this. I think the decision on whether to access a slot should occur as close to memory accessing code as possible. Another note - If you are validating the index immediately before consumption, then it looks like there is a second place where you need to add an assert. In virtualMemoryTracker.hpp we have: inline VirtualMemory* by_type(MEMFLAGS flag) { int index = NMTUtil::flag_to_index(flag); return &_virtual_memory[index]; } >> Which one were you actually passing the bad value to? :) >> >> This isn't a strong objection just musing if we can do better. And as the >> hs repos are still closed, and likely to remain so till early next week, we >> have some slack time :) > :) Sure. > > Kind Regards, Thomas > > >> Cheers, >> David >> >> Thank you! >>> Thomas >>> >>> From coleen.phillimore at oracle.com Thu Oct 13 15:56:50 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Thu, 13 Oct 2016 11:56:50 -0400 Subject: [8u-dev] Request for approval: 8163969: Cyclic interface initialization causes JVM crash In-Reply-To: <20161013141505.GC3354@vimes> References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com> <20161013141505.GC3354@vimes> Message-ID: Thank you! Coleen On 10/13/16 10:15 AM, Rob McKenna wrote: > Approved. > > Review thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-October/021575.html > > -Rob > > On 12/10/16 05:10, Coleen Phillimore wrote: >> Summary: Backport change to correct interface initialization. >> >> There were too many changes to instanceKlass.cpp for a clean backport. Also >> in JDK8, this corrects interface initialization to not initialize the whole >> interface hierarchy if an interface, not class, initializes initialization. >> This is to correctly follow JLS 12.4.2 step 7. I filed a compatibility >> request (in review) to document the difference in behavior, which I believe >> will not be noticed. >> >> Tested with JPRT, including runtime jtreg lambda-features tests, and JCK >> tests. >> >> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev >> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8 >> >> Thanks, >> Coleen From daniel.daugherty at oracle.com Thu Oct 13 16:24:51 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 13 Oct 2016 10:24:51 -0600 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> References: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> Message-ID: <60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com> > webrev updated in place with one comment and one new use of load-acquire. > Plus some cosmetic changes. > > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ src/share/vm/runtime/mutex.cpp No comments. Thumbs up! Dan On 10/12/16 7:18 PM, David Holmes wrote: > Hi Dan, > > Thanks for looking at this. > > On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote: >> On 10/11/16 11:08 PM, David Holmes wrote: >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 >>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ >> >> Very nice catch! We should check the ObjectMonitor succession code for >> similar issues (my task). > > Yes. As I said in email I did a quick check through but the succession > logic is sufficiently different that nothing was obviously wrong in a > similar way. > >> >> src/share/vm/runtime/mutex.cpp >> L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, >> UNS(ESelf)) == 0) { >> Thanks for fixing this bug also! >> >> L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) { >> So you've changed this load of _OnDeck to use load-acquire >> which matches the new store-release on L595: >> >> OrderAccess::release_store_ptr(&_OnDeck, w); > > Right. > >> What about the other loads of _OnDeck or stores to _OnDeck? >> There should at least be a new comment explaining why we don't >> need an OrderAccess operation for those. Update: I see you >> changed one other load of _OnDeck on L1061. Now I'm really >> wanting comments for the other _OnDeck loads and stores. :-) >> >> Update: I see Carsten V. asked about this in a slightly >> different >> way. > > See my reply to Carsten re the load's. I did miss one as we have three > "locking" paths that need to synchronize with the IUnlock code. > > As for documenting ... for line 532 I can add something simple like: > > 532 ParkEvent * const w = _OnDeck; // raw load as we will just > return if non-NULL > > For the other stores to _OnDeck ... CAS should be obvious. The setting > to NULL should also be quite clear as only the _OnDeck thread sets to > NULL to relinquish being _OnDeck once it has acquired the mutex, which > happens via CAS which has full barriers. None of the plain stores are > in the context of: > > some_var = y; // write some shared-state > _OnDeck = NULL; // signal some_var has been updated > >> L590: // Pass onDeck to w, ensuring that _EntryList has been set >> first. >> Typo: 'onDeck' -> 'OnDeck' >> >> I suspect you don't want to fix all this CamelCase usage to meet >> HotSpot style. I did that for most of the ObjectMonitor code and >> it was painful. We could clean it up early in JDK10. > > I fixed the typo and also changed ONDECK to OnDeck so that we > generally refer to OnDeck in commentary unless specifically referring > to the _OnDeck field. > >> Update: I see Carsten has a comment about this comment also. I >> don't think I quite agree that we're "passing" _EntryList to w, >> but I can be convinced otherwise... > > Right, nothing to do with _EntryList just making w the OnDeck thread. > >> Again, very nice catch! I'd like to see another webrev with the other >> _OnDeck loads and stores either updated for OrderAccess ops or some >> comment explaining why it's not needed. > > webrev updated in place with one comment and one new use of > load-acquire. Plus some cosmetic changes. > > Thanks again, > David > >> Dan >> >> >>> >>> In IUnlock we have the following succession code to wakeup the >>> "onDeck" thread: >>> >>> ParkEvent * List = _EntryList; >>> if (List != NULL) { >>> // Transfer the head of the EntryList to the OnDeck position. >>> // Once OnDeck, a thread stays OnDeck until it acquires the lock. >>> // For a given lock there is at most OnDeck thread at any one >>> instant. >>> WakeOne: >>> assert(List == _EntryList, "invariant"); >>> ParkEvent * const w = List; >>> assert(RelaxAssert || w != Thread::current()->_MutexEvent, >>> "invariant"); >>> _EntryList = w->ListNext; >>> // as a diagnostic measure consider setting w->_ListNext = BAD >>> assert(UNS(_OnDeck) == _LBIT, "invariant"); >>> _OnDeck = w; // pass OnDeck to w. >>> >>> It is critical that the update to _EntryList happens before we set >>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need >>> not yet have parked) can acquire the mutex, complete its critical >>> section and proceed to unlock the mutex, and so execute IUnlock in >>> parallel with the original thread. If the write to _EntryList has not >>> yet happened that second thread finds itself still at the head of >>> _EntryList and so the assert fires. If the write to _EntryList happens >>> after the load "List = _EntryList", then the first assert can also >>> fire. >>> >>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) >>> with a matching load_acquire(&_OnDeck) in the ILock code: >>> >>> while (_OnDeck != ESelf) { >>> ParkCommon(ESelf, 0); >>> } >>> >>> and corresponding "raw" lock code. Also fixed a couple of typos. >>> >>> Thanks, >>> David >> From christian.tornqvist at oracle.com Thu Oct 13 18:09:08 2016 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Thu, 13 Oct 2016 14:09:08 -0400 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows Message-ID: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> Hi everyone, Please review this small fix for an intermittent issue we've seen when running tests concurrently that use jcmd/jstack. When running jcmd, we enumerate the perfdata files and then open them one by one to read things like main class names etc. If the perfdata file disappears (because the Java process exited) before we get to it, we end up with different exceptions depending on where in the code we are. The code at: http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/ classes/sun/tools/common/ProcessArgumentMatcher.java#l88 handles this, the problem is that if we get all the way to open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which isn't caught by this. The fix is to throw a NPE instead of Exception and let the existing code handle this. Fix has been tested locally and with 30 JPRT runs (with concurrency patch applied), also managed to reproduce and verify this fix locally using a debugger to trigger the race. Webrev: http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ Bug (unfortunately not visible): https://bugs.openjdk.java.net/browse/JDK-8159799 Thanks, Christian From david.holmes at oracle.com Thu Oct 13 22:45:19 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 08:45:19 +1000 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> Message-ID: Hi Christian, Great find on getting to the bottom of this! However ... On 14/10/2016 4:09 AM, Christian Tornqvist wrote: > Hi everyone, > > Please review this small fix for an intermittent issue we've seen when > running tests concurrently that use jcmd/jstack. > > When running jcmd, we enumerate the perfdata files and then open them one by > one to read things like main class names etc. If the perfdata file > disappears (because the Java process exited) before we get to it, we end up > with different exceptions depending on where in the code we are. > > The code at: > > http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/ > classes/sun/tools/common/ProcessArgumentMatcher.java#l88 > > handles this, the problem is that if we get all the way to > open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which > isn't caught by this. The fix is to throw a NPE instead of Exception and let > the existing code handle this. ... that seems the wrong fix. NPE is a very specific exception with a very clear meaning. I'm not at all sure where the existing NPE may come from, but it seems to me that there should be a more specific exception defined for this condition that is thrown by the VM and anticipated by the Java code. Why is there not a FileNotFoundException for example ?? The current NPE seems incidental. As a quick fix to improve test stability I can agree to this but I'd like to see a RFE to properly coordinate the VM and Java sides of this with a well defined (set of) exception(s). Thanks, David > Fix has been tested locally and with 30 JPRT runs (with concurrency patch > applied), also managed to reproduce and verify this fix locally using a > debugger to trigger the race. > > > > Webrev: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ > > > > Bug (unfortunately not visible): > > https://bugs.openjdk.java.net/browse/JDK-8159799 > > > > Thanks, > > Christian > > > > > From david.holmes at oracle.com Thu Oct 13 22:50:34 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 08:50:34 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: <18ccfdd5-503f-934b-6ff4-b1ae7237b4dd@oracle.com> On 13/10/2016 10:57 PM, Thomas St?fe wrote: > On Thu, Oct 13, 2016 at 12:25 PM, David Holmes > wrote: > > In the interests of fairness I should also point out this is > technically an enhancement not a bug fix. > > David > > > You are right, I changed this to an enhancement in Jira. Which means this has to wait for 10, or else go through FC Extension process. David From david.holmes at oracle.com Thu Oct 13 22:57:25 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 08:57:25 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> Message-ID: <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> On 13/10/2016 10:53 PM, Thomas St?fe wrote: > Hi David, > > On Thu, Oct 13, 2016 at 12:08 PM, David Holmes > wrote: > > Hi Thomas, > > On 13/10/2016 3:49 PM, Thomas St?fe wrote: > > Hi all, > > may I have plase a review for this tiny change? It just adds > some assert to NMT. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 > > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ > > MEMFLAGS/webrev.00/webrev/ > > We had an ugly memory overwrite caused by this - ultimately our > fault, because we fed an invalid memory flag to NMT - but it was > difficult to find. An assert would have saved some time. > > > I'm a little perplexed with asserting that something of MEMFLAGS > type must be an actual MEMFLAGS value - it implies the caller is > coercing plain int to MEMFLAGS, and I don't have much sympathy if > they mess that up. Can't help wondering if there is some clever C++ > trick to flag bad conversions at compile-time? > > > The error was caused by an uninitialized variable of type MEMFLAGS. This > was our fault, we have heavily modified allocation.hpp and introduced an > error then merging changes from upstream. Due to a merging error this > lead to a case where Arena::_flags was not initialized and contained a > very large value. Ah I see. Lack of default initialization can be annoying :) > I admit it looks funny. If it bothers you, I could instead check the > returned index to be in the range for the size of the _malloc array in > MallocMemorySnapshot::by_type(). Technically, it would mean the same. So I just realized that here: 62 // Map memory type to human readable name 63 static const char* flag_to_name(MEMFLAGS flag) { 64 assert(flag >= 0 && flag < mt_number_of_types, "Invalid flag value %d.", (int)flag); 65 return _memory_type_names[flag_to_index(flag)]; 66 } we call flag_to_index, so the assert is redundant as it is already in flag_to_index. Then presumably we change flag_to_index to something like this: static inline int flag_to_index(MEMFLAGS flag) { int index = (flag & 0xff); assert(index >= 0 && index < mt_number_of_types, "Invalid flag value %d.", (int)flag); return index; } so we're validating the index rather than the flag. Cheers, David > > > The function that takes the index should validate the index, so that > is fine. > > Which one were you actually passing the bad value to? :) > > This isn't a strong objection just musing if we can do better. And > as the hs repos are still closed, and likely to remain so till early > next week, we have some slack time :) > > > :) Sure. > > Kind Regards, Thomas > > > Cheers, > David > > Thank you! > > Thomas > > From david.holmes at oracle.com Thu Oct 13 23:13:04 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 09:13:04 +1000 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: References: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> Message-ID: Thanks Carsten. David On 14/10/2016 12:20 AM, Carsten Varming wrote: > Dear David, > > The updated webrev looks good to me. > > Carsten > > On Wed, Oct 12, 2016 at 9:18 PM, David Holmes > wrote: > > Hi Dan, > > Thanks for looking at this. > > On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote: > > On 10/11/16 11:08 PM, David Holmes wrote: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 > > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ > > > > Very nice catch! We should check the ObjectMonitor succession > code for > similar issues (my task). > > > Yes. As I said in email I did a quick check through but the > succession logic is sufficiently different that nothing was > obviously wrong in a similar way. > > > src/share/vm/runtime/mutex.cpp > L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, > UNS(ESelf)) == 0) { > Thanks for fixing this bug also! > > L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != > ESelf) { > So you've changed this load of _OnDeck to use load-acquire > which matches the new store-release on L595: > > OrderAccess::release_store_ptr(&_OnDeck, w); > > > Right. > > What about the other loads of _OnDeck or stores to _OnDeck? > There should at least be a new comment explaining why we > don't > need an OrderAccess operation for those. Update: I see you > changed one other load of _OnDeck on L1061. Now I'm really > wanting comments for the other _OnDeck loads and stores. :-) > > Update: I see Carsten V. asked about this in a slightly > different > way. > > > See my reply to Carsten re the load's. I did miss one as we have > three "locking" paths that need to synchronize with the IUnlock code. > > As for documenting ... for line 532 I can add something simple like: > > 532 ParkEvent * const w = _OnDeck; // raw load as we will just > return if non-NULL > > For the other stores to _OnDeck ... CAS should be obvious. The > setting to NULL should also be quite clear as only the _OnDeck > thread sets to NULL to relinquish being _OnDeck once it has acquired > the mutex, which happens via CAS which has full barriers. None of > the plain stores are in the context of: > > some_var = y; // write some shared-state > _OnDeck = NULL; // signal some_var has been updated > > L590: // Pass onDeck to w, ensuring that _EntryList has > been set > first. > Typo: 'onDeck' -> 'OnDeck' > > I suspect you don't want to fix all this CamelCase usage > to meet > HotSpot style. I did that for most of the ObjectMonitor > code and > it was painful. We could clean it up early in JDK10. > > > I fixed the typo and also changed ONDECK to OnDeck so that we > generally refer to OnDeck in commentary unless specifically > referring to the _OnDeck field. > > Update: I see Carsten has a comment about this comment > also. I > don't think I quite agree that we're "passing" > _EntryList to w, > but I can be convinced otherwise... > > > Right, nothing to do with _EntryList just making w the OnDeck thread. > > Again, very nice catch! I'd like to see another webrev with the > other > _OnDeck loads and stores either updated for OrderAccess ops or some > comment explaining why it's not needed. > > > webrev updated in place with one comment and one new use of > load-acquire. Plus some cosmetic changes. > > Thanks again, > David > > > Dan > > > > In IUnlock we have the following succession code to wakeup the > "onDeck" thread: > > ParkEvent * List = _EntryList; > if (List != NULL) { > // Transfer the head of the EntryList to the OnDeck > position. > // Once OnDeck, a thread stays OnDeck until it acquires > the lock. > // For a given lock there is at most OnDeck thread at > any one > instant. > WakeOne: > assert(List == _EntryList, "invariant"); > ParkEvent * const w = List; > assert(RelaxAssert || w != Thread::current()->_MutexEvent, > "invariant"); > _EntryList = w->ListNext; > // as a diagnostic measure consider setting w->_ListNext > = BAD > assert(UNS(_OnDeck) == _LBIT, "invariant"); > _OnDeck = w; // pass OnDeck to w. > > It is critical that the update to _EntryList happens before > we set > _OnDeck, as as soon as _OnDeck is set the selected thread > (which need > not yet have parked) can acquire the mutex, complete its > critical > section and proceed to unlock the mutex, and so execute > IUnlock in > parallel with the original thread. If the write to > _EntryList has not > yet happened that second thread finds itself still at the > head of > _EntryList and so the assert fires. If the write to > _EntryList happens > after the load "List = _EntryList", then the first assert > can also fire. > > Preferred fix today is to use > OrderAccess::release_store(&_OnDeck, w) > with a matching load_acquire(&_OnDeck) in the ILock code: > > while (_OnDeck != ESelf) { > ParkCommon(ESelf, 0); > } > > and corresponding "raw" lock code. Also fixed a couple of typos. > > Thanks, > David > > > From david.holmes at oracle.com Thu Oct 13 23:13:32 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 09:13:32 +1000 Subject: RFR: 8166197: assert(RelaxAssert || w != Thread::current()->_MutexEvent) failed: invariant In-Reply-To: <60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com> References: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com> <60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com> Message-ID: Thanks Dan! David On 14/10/2016 2:24 AM, Daniel D. Daugherty wrote: >> webrev updated in place with one comment and one new use of load-acquire. >> Plus some cosmetic changes. >> >> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ > > src/share/vm/runtime/mutex.cpp > No comments. > > Thumbs up! > > Dan > > > > On 10/12/16 7:18 PM, David Holmes wrote: >> Hi Dan, >> >> Thanks for looking at this. >> >> On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote: >>> On 10/11/16 11:08 PM, David Holmes wrote: >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197 >>>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/ >>> >>> Very nice catch! We should check the ObjectMonitor succession code for >>> similar issues (my task). >> >> Yes. As I said in email I did a quick check through but the succession >> logic is sufficiently different that nothing was obviously wrong in a >> similar way. >> >>> >>> src/share/vm/runtime/mutex.cpp >>> L466: if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, >>> UNS(ESelf)) == 0) { >>> Thanks for fixing this bug also! >>> >>> L477: while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) { >>> So you've changed this load of _OnDeck to use load-acquire >>> which matches the new store-release on L595: >>> >>> OrderAccess::release_store_ptr(&_OnDeck, w); >> >> Right. >> >>> What about the other loads of _OnDeck or stores to _OnDeck? >>> There should at least be a new comment explaining why we don't >>> need an OrderAccess operation for those. Update: I see you >>> changed one other load of _OnDeck on L1061. Now I'm really >>> wanting comments for the other _OnDeck loads and stores. :-) >>> >>> Update: I see Carsten V. asked about this in a slightly >>> different >>> way. >> >> See my reply to Carsten re the load's. I did miss one as we have three >> "locking" paths that need to synchronize with the IUnlock code. >> >> As for documenting ... for line 532 I can add something simple like: >> >> 532 ParkEvent * const w = _OnDeck; // raw load as we will just >> return if non-NULL >> >> For the other stores to _OnDeck ... CAS should be obvious. The setting >> to NULL should also be quite clear as only the _OnDeck thread sets to >> NULL to relinquish being _OnDeck once it has acquired the mutex, which >> happens via CAS which has full barriers. None of the plain stores are >> in the context of: >> >> some_var = y; // write some shared-state >> _OnDeck = NULL; // signal some_var has been updated >> >>> L590: // Pass onDeck to w, ensuring that _EntryList has been set >>> first. >>> Typo: 'onDeck' -> 'OnDeck' >>> >>> I suspect you don't want to fix all this CamelCase usage to meet >>> HotSpot style. I did that for most of the ObjectMonitor code and >>> it was painful. We could clean it up early in JDK10. >> >> I fixed the typo and also changed ONDECK to OnDeck so that we >> generally refer to OnDeck in commentary unless specifically referring >> to the _OnDeck field. >> >>> Update: I see Carsten has a comment about this comment also. I >>> don't think I quite agree that we're "passing" _EntryList to w, >>> but I can be convinced otherwise... >> >> Right, nothing to do with _EntryList just making w the OnDeck thread. >> >>> Again, very nice catch! I'd like to see another webrev with the other >>> _OnDeck loads and stores either updated for OrderAccess ops or some >>> comment explaining why it's not needed. >> >> webrev updated in place with one comment and one new use of >> load-acquire. Plus some cosmetic changes. >> >> Thanks again, >> David >> >>> Dan >>> >>> >>>> >>>> In IUnlock we have the following succession code to wakeup the >>>> "onDeck" thread: >>>> >>>> ParkEvent * List = _EntryList; >>>> if (List != NULL) { >>>> // Transfer the head of the EntryList to the OnDeck position. >>>> // Once OnDeck, a thread stays OnDeck until it acquires the lock. >>>> // For a given lock there is at most OnDeck thread at any one >>>> instant. >>>> WakeOne: >>>> assert(List == _EntryList, "invariant"); >>>> ParkEvent * const w = List; >>>> assert(RelaxAssert || w != Thread::current()->_MutexEvent, >>>> "invariant"); >>>> _EntryList = w->ListNext; >>>> // as a diagnostic measure consider setting w->_ListNext = BAD >>>> assert(UNS(_OnDeck) == _LBIT, "invariant"); >>>> _OnDeck = w; // pass OnDeck to w. >>>> >>>> It is critical that the update to _EntryList happens before we set >>>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need >>>> not yet have parked) can acquire the mutex, complete its critical >>>> section and proceed to unlock the mutex, and so execute IUnlock in >>>> parallel with the original thread. If the write to _EntryList has not >>>> yet happened that second thread finds itself still at the head of >>>> _EntryList and so the assert fires. If the write to _EntryList happens >>>> after the load "List = _EntryList", then the first assert can also >>>> fire. >>>> >>>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) >>>> with a matching load_acquire(&_OnDeck) in the ILock code: >>>> >>>> while (_OnDeck != ESelf) { >>>> ParkCommon(ESelf, 0); >>>> } >>>> >>>> and corresponding "raw" lock code. Also fixed a couple of typos. >>>> >>>> Thanks, >>>> David >>> > From shafi.s.ahmad at oracle.com Fri Oct 14 05:28:45 2016 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Thu, 13 Oct 2016 22:28:45 -0700 (PDT) Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> Message-ID: <0fe3deb1-594c-46e6-829b-fe70315d3496@default> Hi, May I get some comment on this. Regards, Shafi > -----Original Message----- > From: Shafi Ahmad > Sent: Wednesday, October 12, 2016 12:42 PM > To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net > Cc: David Holmes > Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > work for OOM caused by inability to create threads' > > Hi Mikael, > > Thanks for reviewing it. > > Once VM is initialized, following are two OOME scenarios: > 1) OOME due to unavailability of java memory [Mainly due to java > application]. > 2) OOME due to unavailability of native memory. > > Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is > correct. > > Consider the following scenarios: > 1) Let there is java application which uses JNI code and inside JNI code there > is native memory allocation/free and we hit OOME. > 2) Let there is java application which uses JNI code and inside JNI code there > is memory leak error and due to this OOME situation occurs. > 3) We use jvm option Xms and -Xmx in such a way that the available native > memory is very less and VM hit OOME. > > I am not sure above scenario is feasible in JVM or not but if any of the above > scenario is possible in VM then should we consider it as OOME due java > application or not? > I consider case 1) and 2) as OOME due to java application and added code for > java.lang.OutOfMemoryError inside report_vm_out_of_memory. > > My assumption of OOME once VM is initialized completely is due to java > application[directly or indirectly] may not hold true always. > -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added > the related code change inside report_vm_out_of_memory. > Yes, I must not use ' java.lang.OutOfMemoryError' for such case. > > Please let me know whether I should remove the code change inside > report_vm_out_of_memory or keep it by adding appropriate reason of > OutOfMemoryError. > > Regards, > Shafi > > > -----Original Message----- > > From: Mikael Gerdin > > Sent: Monday, October 10, 2016 7:30 PM > > To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net > > Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > > work for OOM caused by inability to create threads' > > > > Hi, > > > > On 2016-10-10 09:24, Shafi Ahmad wrote: > > > Hi All, > > > > > > Please review the simple change for the fix of bug '' JDK-8155004: > > CrashOnOutOfMemoryError doesn't work for OOM caused by inability to > > create threads'. > > > > > > Summary: > > > In the current implementation there are few scenarios where we are > > > not > > obeying the jvm option -XX:+CrashOnOutOfMemoryError. > > > While I was analysis this issue I found there are two jvm state > > > where OOM > > can happen: > > > 1. OOM during VM initialization - as per our internal discussion > > > for this case > > it is not worth for dumping core file, so this is left as it is. > > > 2. OOM once VM is initialized - For this scenario most of the > > > place code is > > already added but few place corresponding code changes are missing so > > this change covers it. > > > > > > Webrev link: > http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > > > > > > There is a lot of confusion in the VM code with the term "out of > > memory error". > > In some places it refers to code throwing a java.lang.OutOfMemoryError > > and expecting running java code to be able to potentially catch that > > Error and continue running. > > > > In other places, such as callers of report_vm_out_of_memory, the > > situation is much more dire and the calling thread may not even be a > > JavaThread and as such cannot "throw" an exception. > > report_vm_out_of_memory is only invoked through the macro > > vm_exit_out_of_memory, which of course implies that the condition is > > fatal and we are about to terminate the JVM process altogether. > > > > I think that it's incorrect to call code related to > > java.lang.OutOfMemoryError in report_vm_out_of_memory since the > > condition may not even be correlated with Java level application behavior. > > > > /Mikael > > > > > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > > > > > > Testing: jprt and jtreg (on Linux x86_64) > > > > > > Regards, > > > Shafi > > > From staffan.larsen at oracle.com Fri Oct 14 06:18:35 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Fri, 14 Oct 2016 08:18:35 +0200 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> Message-ID: <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> Thanks for getting to the bottom of this! The fix looks good. Perhaps, as David points out, FileNotFoundException is a better choice but that requires more changes as FileNotFoundException is not used anywhere else in the JVM. Thanks, /Staffan > On 13 Oct 2016, at 20:09, Christian Tornqvist wrote: > > Hi everyone, > > > > Please review this small fix for an intermittent issue we've seen when > running tests concurrently that use jcmd/jstack. > > When running jcmd, we enumerate the perfdata files and then open them one by > one to read things like main class names etc. If the perfdata file > disappears (because the Java process exited) before we get to it, we end up > with different exceptions depending on where in the code we are. > > > > The code at: > > http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/ > classes/sun/tools/common/ProcessArgumentMatcher.java#l88 > > > > handles this, the problem is that if we get all the way to > open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which > isn't caught by this. The fix is to throw a NPE instead of Exception and let > the existing code handle this. > > > > Fix has been tested locally and with 30 JPRT runs (with concurrency patch > applied), also managed to reproduce and verify this fix locally using a > debugger to trigger the race. > > > > Webrev: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ > > > > Bug (unfortunately not visible): > > https://bugs.openjdk.java.net/browse/JDK-8159799 > > > > Thanks, > > Christian > > > > > From david.holmes at oracle.com Fri Oct 14 06:31:36 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 16:31:36 +1000 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: <0fe3deb1-594c-46e6-829b-fe70315d3496@default> References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> <0fe3deb1-594c-46e6-829b-fe70315d3496@default> Message-ID: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> Hi Shafi, I stand by my previous comment - in the context of this bug, in relation to failure to create a native thread "A call to report_java_out_of_memory should only be made on a code path that will throw an OOME." Does this mean we have all OOM (not OOME)** situations covered? Nope. But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError seem specific to OutOfMemoryError to me - and not that useful for dealing with the JNI leak you describe. Feel free to file a RFE to look into more elaborate/extensive OOM handling. I'm not sure if NMT hooks into JNI. Thanks, David ** OOM - out-of-memory OOME - OutOfMemoryError - a Java exception thrown in response to a detected OOM condition On 14/10/2016 3:28 PM, Shafi Ahmad wrote: > Hi, > > May I get some comment on this. > > Regards, > Shafi > >> -----Original Message----- >> From: Shafi Ahmad >> Sent: Wednesday, October 12, 2016 12:42 PM >> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net >> Cc: David Holmes >> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >> work for OOM caused by inability to create threads' >> >> Hi Mikael, >> >> Thanks for reviewing it. >> >> Once VM is initialized, following are two OOME scenarios: >> 1) OOME due to unavailability of java memory [Mainly due to java >> application]. >> 2) OOME due to unavailability of native memory. >> >> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is >> correct. >> >> Consider the following scenarios: >> 1) Let there is java application which uses JNI code and inside JNI code there >> is native memory allocation/free and we hit OOME. >> 2) Let there is java application which uses JNI code and inside JNI code there >> is memory leak error and due to this OOME situation occurs. >> 3) We use jvm option Xms and -Xmx in such a way that the available native >> memory is very less and VM hit OOME. >> >> I am not sure above scenario is feasible in JVM or not but if any of the above >> scenario is possible in VM then should we consider it as OOME due java >> application or not? >> I consider case 1) and 2) as OOME due to java application and added code for >> java.lang.OutOfMemoryError inside report_vm_out_of_memory. >> >> My assumption of OOME once VM is initialized completely is due to java >> application[directly or indirectly] may not hold true always. >> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added >> the related code change inside report_vm_out_of_memory. >> Yes, I must not use ' java.lang.OutOfMemoryError' for such case. >> >> Please let me know whether I should remove the code change inside >> report_vm_out_of_memory or keep it by adding appropriate reason of >> OutOfMemoryError. >> >> Regards, >> Shafi >> >>> -----Original Message----- >>> From: Mikael Gerdin >>> Sent: Monday, October 10, 2016 7:30 PM >>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >>> work for OOM caused by inability to create threads' >>> >>> Hi, >>> >>> On 2016-10-10 09:24, Shafi Ahmad wrote: >>>> Hi All, >>>> >>>> Please review the simple change for the fix of bug '' JDK-8155004: >>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to >>> create threads'. >>>> >>>> Summary: >>>> In the current implementation there are few scenarios where we are >>>> not >>> obeying the jvm option -XX:+CrashOnOutOfMemoryError. >>>> While I was analysis this issue I found there are two jvm state >>>> where OOM >>> can happen: >>>> 1. OOM during VM initialization - as per our internal discussion >>>> for this case >>> it is not worth for dumping core file, so this is left as it is. >>>> 2. OOM once VM is initialized - For this scenario most of the >>>> place code is >>> already added but few place corresponding code changes are missing so >>> this change covers it. >>>> >>>> Webrev link: >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ >>> >>> >>> There is a lot of confusion in the VM code with the term "out of >>> memory error". >>> In some places it refers to code throwing a java.lang.OutOfMemoryError >>> and expecting running java code to be able to potentially catch that >>> Error and continue running. >>> >>> In other places, such as callers of report_vm_out_of_memory, the >>> situation is much more dire and the calling thread may not even be a >>> JavaThread and as such cannot "throw" an exception. >>> report_vm_out_of_memory is only invoked through the macro >>> vm_exit_out_of_memory, which of course implies that the condition is >>> fatal and we are about to terminate the JVM process altogether. >>> >>> I think that it's incorrect to call code related to >>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the >>> condition may not even be correlated with Java level application behavior. >>> >>> /Mikael >>> >>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 >>>> >>>> Testing: jprt and jtreg (on Linux x86_64) >>>> >>>> Regards, >>>> Shafi >>>> From shafi.s.ahmad at oracle.com Fri Oct 14 07:21:14 2016 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Fri, 14 Oct 2016 00:21:14 -0700 (PDT) Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> <0fe3deb1-594c-46e6-829b-fe70315d3496@default> <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> Message-ID: Hi David, Thanks for the clarification. I will send the updated webrev. Regards, Shafi > -----Original Message----- > From: David Holmes > Sent: Friday, October 14, 2016 12:02 PM > To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > work for OOM caused by inability to create threads' > > Hi Shafi, > > I stand by my previous comment - in the context of this bug, in relation to > failure to create a native thread "A call to report_java_out_of_memory > should only be made on a code path that will throw an OOME." > > Does this mean we have all OOM (not OOME)** situations covered? Nope. > But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError > seem specific to OutOfMemoryError to me - and not that useful for dealing > with the JNI leak you describe. > > Feel free to file a RFE to look into more elaborate/extensive OOM handling. > I'm not sure if NMT hooks into JNI. > > Thanks, > David > > > ** OOM - out-of-memory > OOME - OutOfMemoryError - a Java exception thrown in response to a > detected OOM condition > > On 14/10/2016 3:28 PM, Shafi Ahmad wrote: > > Hi, > > > > May I get some comment on this. > > > > Regards, > > Shafi > > > >> -----Original Message----- > >> From: Shafi Ahmad > >> Sent: Wednesday, October 12, 2016 12:42 PM > >> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net > >> Cc: David Holmes > >> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > >> work for OOM caused by inability to create threads' > >> > >> Hi Mikael, > >> > >> Thanks for reviewing it. > >> > >> Once VM is initialized, following are two OOME scenarios: > >> 1) OOME due to unavailability of java memory [Mainly due to java > >> application]. > >> 2) OOME due to unavailability of native memory. > >> > >> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is > >> correct. > >> > >> Consider the following scenarios: > >> 1) Let there is java application which uses JNI code and inside JNI > >> code there is native memory allocation/free and we hit OOME. > >> 2) Let there is java application which uses JNI code and inside JNI > >> code there is memory leak error and due to this OOME situation occurs. > >> 3) We use jvm option Xms and -Xmx in such a way that the available > >> native memory is very less and VM hit OOME. > >> > >> I am not sure above scenario is feasible in JVM or not but if any of > >> the above scenario is possible in VM then should we consider it as > >> OOME due java application or not? > >> I consider case 1) and 2) as OOME due to java application and added > >> code for java.lang.OutOfMemoryError inside > report_vm_out_of_memory. > >> > >> My assumption of OOME once VM is initialized completely is due to > >> java application[directly or indirectly] may not hold true always. > >> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I > >> added the related code change inside report_vm_out_of_memory. > >> Yes, I must not use ' java.lang.OutOfMemoryError' for such case. > >> > >> Please let me know whether I should remove the code change inside > >> report_vm_out_of_memory or keep it by adding appropriate reason of > >> OutOfMemoryError. > >> > >> Regards, > >> Shafi > >> > >>> -----Original Message----- > >>> From: Mikael Gerdin > >>> Sent: Monday, October 10, 2016 7:30 PM > >>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net > >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError > >>> doesn't work for OOM caused by inability to create threads' > >>> > >>> Hi, > >>> > >>> On 2016-10-10 09:24, Shafi Ahmad wrote: > >>>> Hi All, > >>>> > >>>> Please review the simple change for the fix of bug '' JDK-8155004: > >>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to > >>> create threads'. > >>>> > >>>> Summary: > >>>> In the current implementation there are few scenarios where we are > >>>> not > >>> obeying the jvm option -XX:+CrashOnOutOfMemoryError. > >>>> While I was analysis this issue I found there are two jvm state > >>>> where OOM > >>> can happen: > >>>> 1. OOM during VM initialization - as per our internal discussion > >>>> for this case > >>> it is not worth for dumping core file, so this is left as it is. > >>>> 2. OOM once VM is initialized - For this scenario most of the > >>>> place code is > >>> already added but few place corresponding code changes are missing > >>> so this change covers it. > >>>> > >>>> Webrev link: > >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > >>> > >>> > >>> There is a lot of confusion in the VM code with the term "out of > >>> memory error". > >>> In some places it refers to code throwing a > >>> java.lang.OutOfMemoryError and expecting running java code to be > >>> able to potentially catch that Error and continue running. > >>> > >>> In other places, such as callers of report_vm_out_of_memory, the > >>> situation is much more dire and the calling thread may not even be a > >>> JavaThread and as such cannot "throw" an exception. > >>> report_vm_out_of_memory is only invoked through the macro > >>> vm_exit_out_of_memory, which of course implies that the condition is > >>> fatal and we are about to terminate the JVM process altogether. > >>> > >>> I think that it's incorrect to call code related to > >>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the > >>> condition may not even be correlated with Java level application > behavior. > >>> > >>> /Mikael > >>> > >>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > >>>> > >>>> Testing: jprt and jtreg (on Linux x86_64) > >>>> > >>>> Regards, > >>>> Shafi > >>>> From shafi.s.ahmad at oracle.com Fri Oct 14 08:55:50 2016 From: shafi.s.ahmad at oracle.com (Shafi Ahmad) Date: Fri, 14 Oct 2016 01:55:50 -0700 (PDT) Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> <0fe3deb1-594c-46e6-829b-fe70315d3496@default> <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> Message-ID: <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default> Please find updated webrev. http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/ Regards, Shafi > -----Original Message----- > From: David Holmes > Sent: Friday, October 14, 2016 12:02 PM > To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net > Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > work for OOM caused by inability to create threads' > > Hi Shafi, > > I stand by my previous comment - in the context of this bug, in relation to > failure to create a native thread "A call to report_java_out_of_memory > should only be made on a code path that will throw an OOME." > > Does this mean we have all OOM (not OOME)** situations covered? Nope. > But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError > seem specific to OutOfMemoryError to me - and not that useful for dealing > with the JNI leak you describe. > > Feel free to file a RFE to look into more elaborate/extensive OOM handling. > I'm not sure if NMT hooks into JNI. > > Thanks, > David > > > ** OOM - out-of-memory > OOME - OutOfMemoryError - a Java exception thrown in response to a > detected OOM condition > > On 14/10/2016 3:28 PM, Shafi Ahmad wrote: > > Hi, > > > > May I get some comment on this. > > > > Regards, > > Shafi > > > >> -----Original Message----- > >> From: Shafi Ahmad > >> Sent: Wednesday, October 12, 2016 12:42 PM > >> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net > >> Cc: David Holmes > >> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't > >> work for OOM caused by inability to create threads' > >> > >> Hi Mikael, > >> > >> Thanks for reviewing it. > >> > >> Once VM is initialized, following are two OOME scenarios: > >> 1) OOME due to unavailability of java memory [Mainly due to java > >> application]. > >> 2) OOME due to unavailability of native memory. > >> > >> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is > >> correct. > >> > >> Consider the following scenarios: > >> 1) Let there is java application which uses JNI code and inside JNI > >> code there is native memory allocation/free and we hit OOME. > >> 2) Let there is java application which uses JNI code and inside JNI > >> code there is memory leak error and due to this OOME situation occurs. > >> 3) We use jvm option Xms and -Xmx in such a way that the available > >> native memory is very less and VM hit OOME. > >> > >> I am not sure above scenario is feasible in JVM or not but if any of > >> the above scenario is possible in VM then should we consider it as > >> OOME due java application or not? > >> I consider case 1) and 2) as OOME due to java application and added > >> code for java.lang.OutOfMemoryError inside > report_vm_out_of_memory. > >> > >> My assumption of OOME once VM is initialized completely is due to > >> java application[directly or indirectly] may not hold true always. > >> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I > >> added the related code change inside report_vm_out_of_memory. > >> Yes, I must not use ' java.lang.OutOfMemoryError' for such case. > >> > >> Please let me know whether I should remove the code change inside > >> report_vm_out_of_memory or keep it by adding appropriate reason of > >> OutOfMemoryError. > >> > >> Regards, > >> Shafi > >> > >>> -----Original Message----- > >>> From: Mikael Gerdin > >>> Sent: Monday, October 10, 2016 7:30 PM > >>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net > >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError > >>> doesn't work for OOM caused by inability to create threads' > >>> > >>> Hi, > >>> > >>> On 2016-10-10 09:24, Shafi Ahmad wrote: > >>>> Hi All, > >>>> > >>>> Please review the simple change for the fix of bug '' JDK-8155004: > >>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to > >>> create threads'. > >>>> > >>>> Summary: > >>>> In the current implementation there are few scenarios where we are > >>>> not > >>> obeying the jvm option -XX:+CrashOnOutOfMemoryError. > >>>> While I was analysis this issue I found there are two jvm state > >>>> where OOM > >>> can happen: > >>>> 1. OOM during VM initialization - as per our internal discussion > >>>> for this case > >>> it is not worth for dumping core file, so this is left as it is. > >>>> 2. OOM once VM is initialized - For this scenario most of the > >>>> place code is > >>> already added but few place corresponding code changes are missing > >>> so this change covers it. > >>>> > >>>> Webrev link: > >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ > >>> > >>> > >>> There is a lot of confusion in the VM code with the term "out of > >>> memory error". > >>> In some places it refers to code throwing a > >>> java.lang.OutOfMemoryError and expecting running java code to be > >>> able to potentially catch that Error and continue running. > >>> > >>> In other places, such as callers of report_vm_out_of_memory, the > >>> situation is much more dire and the calling thread may not even be a > >>> JavaThread and as such cannot "throw" an exception. > >>> report_vm_out_of_memory is only invoked through the macro > >>> vm_exit_out_of_memory, which of course implies that the condition is > >>> fatal and we are about to terminate the JVM process altogether. > >>> > >>> I think that it's incorrect to call code related to > >>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the > >>> condition may not even be correlated with Java level application > behavior. > >>> > >>> /Mikael > >>> > >>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 > >>>> > >>>> Testing: jprt and jtreg (on Linux x86_64) > >>>> > >>>> Regards, > >>>> Shafi > >>>> From david.holmes at oracle.com Fri Oct 14 11:26:21 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 21:26:21 +1000 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default> References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> <0fe3deb1-594c-46e6-829b-fe70315d3496@default> <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default> Message-ID: On 14/10/2016 6:55 PM, Shafi Ahmad wrote: > Please find updated webrev. > > http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/ Ok. Thanks, David > Regards, > Shafi > >> -----Original Message----- >> From: David Holmes >> Sent: Friday, October 14, 2016 12:02 PM >> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net >> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >> work for OOM caused by inability to create threads' >> >> Hi Shafi, >> >> I stand by my previous comment - in the context of this bug, in relation to >> failure to create a native thread "A call to report_java_out_of_memory >> should only be made on a code path that will throw an OOME." >> >> Does this mean we have all OOM (not OOME)** situations covered? Nope. >> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError >> seem specific to OutOfMemoryError to me - and not that useful for dealing >> with the JNI leak you describe. >> >> Feel free to file a RFE to look into more elaborate/extensive OOM handling. >> I'm not sure if NMT hooks into JNI. >> >> Thanks, >> David >> >> >> ** OOM - out-of-memory >> OOME - OutOfMemoryError - a Java exception thrown in response to a >> detected OOM condition >> >> On 14/10/2016 3:28 PM, Shafi Ahmad wrote: >>> Hi, >>> >>> May I get some comment on this. >>> >>> Regards, >>> Shafi >>> >>>> -----Original Message----- >>>> From: Shafi Ahmad >>>> Sent: Wednesday, October 12, 2016 12:42 PM >>>> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net >>>> Cc: David Holmes >>>> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >>>> work for OOM caused by inability to create threads' >>>> >>>> Hi Mikael, >>>> >>>> Thanks for reviewing it. >>>> >>>> Once VM is initialized, following are two OOME scenarios: >>>> 1) OOME due to unavailability of java memory [Mainly due to java >>>> application]. >>>> 2) OOME due to unavailability of native memory. >>>> >>>> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is >>>> correct. >>>> >>>> Consider the following scenarios: >>>> 1) Let there is java application which uses JNI code and inside JNI >>>> code there is native memory allocation/free and we hit OOME. >>>> 2) Let there is java application which uses JNI code and inside JNI >>>> code there is memory leak error and due to this OOME situation occurs. >>>> 3) We use jvm option Xms and -Xmx in such a way that the available >>>> native memory is very less and VM hit OOME. >>>> >>>> I am not sure above scenario is feasible in JVM or not but if any of >>>> the above scenario is possible in VM then should we consider it as >>>> OOME due java application or not? >>>> I consider case 1) and 2) as OOME due to java application and added >>>> code for java.lang.OutOfMemoryError inside >> report_vm_out_of_memory. >>>> >>>> My assumption of OOME once VM is initialized completely is due to >>>> java application[directly or indirectly] may not hold true always. >>>> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I >>>> added the related code change inside report_vm_out_of_memory. >>>> Yes, I must not use ' java.lang.OutOfMemoryError' for such case. >>>> >>>> Please let me know whether I should remove the code change inside >>>> report_vm_out_of_memory or keep it by adding appropriate reason of >>>> OutOfMemoryError. >>>> >>>> Regards, >>>> Shafi >>>> >>>>> -----Original Message----- >>>>> From: Mikael Gerdin >>>>> Sent: Monday, October 10, 2016 7:30 PM >>>>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net >>>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError >>>>> doesn't work for OOM caused by inability to create threads' >>>>> >>>>> Hi, >>>>> >>>>> On 2016-10-10 09:24, Shafi Ahmad wrote: >>>>>> Hi All, >>>>>> >>>>>> Please review the simple change for the fix of bug '' JDK-8155004: >>>>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to >>>>> create threads'. >>>>>> >>>>>> Summary: >>>>>> In the current implementation there are few scenarios where we are >>>>>> not >>>>> obeying the jvm option -XX:+CrashOnOutOfMemoryError. >>>>>> While I was analysis this issue I found there are two jvm state >>>>>> where OOM >>>>> can happen: >>>>>> 1. OOM during VM initialization - as per our internal discussion >>>>>> for this case >>>>> it is not worth for dumping core file, so this is left as it is. >>>>>> 2. OOM once VM is initialized - For this scenario most of the >>>>>> place code is >>>>> already added but few place corresponding code changes are missing >>>>> so this change covers it. >>>>>> >>>>>> Webrev link: >>>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ >>>>> >>>>> >>>>> There is a lot of confusion in the VM code with the term "out of >>>>> memory error". >>>>> In some places it refers to code throwing a >>>>> java.lang.OutOfMemoryError and expecting running java code to be >>>>> able to potentially catch that Error and continue running. >>>>> >>>>> In other places, such as callers of report_vm_out_of_memory, the >>>>> situation is much more dire and the calling thread may not even be a >>>>> JavaThread and as such cannot "throw" an exception. >>>>> report_vm_out_of_memory is only invoked through the macro >>>>> vm_exit_out_of_memory, which of course implies that the condition is >>>>> fatal and we are about to terminate the JVM process altogether. >>>>> >>>>> I think that it's incorrect to call code related to >>>>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the >>>>> condition may not even be correlated with Java level application >> behavior. >>>>> >>>>> /Mikael >>>>> >>>>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 >>>>>> >>>>>> Testing: jprt and jtreg (on Linux x86_64) >>>>>> >>>>>> Regards, >>>>>> Shafi >>>>>> From mikael.gerdin at oracle.com Fri Oct 14 11:54:48 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Fri, 14 Oct 2016 13:54:48 +0200 Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads' In-Reply-To: References: <5eb7474b-a72e-41c0-b389-bfad82270f18@default> <0fe3deb1-594c-46e6-829b-fe70315d3496@default> <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com> <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default> Message-ID: <68331d88-4c52-c737-1b09-bf3cfe81720e@oracle.com> On 2016-10-14 13:26, David Holmes wrote: > On 14/10/2016 6:55 PM, Shafi Ahmad wrote: >> Please find updated webrev. >> >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/ > > Ok. +1 /Mikael > > Thanks, > David > >> Regards, >> Shafi >> >>> -----Original Message----- >>> From: David Holmes >>> Sent: Friday, October 14, 2016 12:02 PM >>> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >>> work for OOM caused by inability to create threads' >>> >>> Hi Shafi, >>> >>> I stand by my previous comment - in the context of this bug, in >>> relation to >>> failure to create a native thread "A call to report_java_out_of_memory >>> should only be made on a code path that will throw an OOME." >>> >>> Does this mean we have all OOM (not OOME)** situations covered? Nope. >>> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError >>> seem specific to OutOfMemoryError to me - and not that useful for >>> dealing >>> with the JNI leak you describe. >>> >>> Feel free to file a RFE to look into more elaborate/extensive OOM >>> handling. >>> I'm not sure if NMT hooks into JNI. >>> >>> Thanks, >>> David >>> >>> >>> ** OOM - out-of-memory >>> OOME - OutOfMemoryError - a Java exception thrown in response to a >>> detected OOM condition >>> >>> On 14/10/2016 3:28 PM, Shafi Ahmad wrote: >>>> Hi, >>>> >>>> May I get some comment on this. >>>> >>>> Regards, >>>> Shafi >>>> >>>>> -----Original Message----- >>>>> From: Shafi Ahmad >>>>> Sent: Wednesday, October 12, 2016 12:42 PM >>>>> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net >>>>> Cc: David Holmes >>>>> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't >>>>> work for OOM caused by inability to create threads' >>>>> >>>>> Hi Mikael, >>>>> >>>>> Thanks for reviewing it. >>>>> >>>>> Once VM is initialized, following are two OOME scenarios: >>>>> 1) OOME due to unavailability of java memory [Mainly due to java >>>>> application]. >>>>> 2) OOME due to unavailability of native memory. >>>>> >>>>> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is >>>>> correct. >>>>> >>>>> Consider the following scenarios: >>>>> 1) Let there is java application which uses JNI code and inside JNI >>>>> code there is native memory allocation/free and we hit OOME. >>>>> 2) Let there is java application which uses JNI code and inside JNI >>>>> code there is memory leak error and due to this OOME situation occurs. >>>>> 3) We use jvm option Xms and -Xmx in such a way that the available >>>>> native memory is very less and VM hit OOME. >>>>> >>>>> I am not sure above scenario is feasible in JVM or not but if any of >>>>> the above scenario is possible in VM then should we consider it as >>>>> OOME due java application or not? >>>>> I consider case 1) and 2) as OOME due to java application and added >>>>> code for java.lang.OutOfMemoryError inside >>> report_vm_out_of_memory. >>>>> >>>>> My assumption of OOME once VM is initialized completely is due to >>>>> java application[directly or indirectly] may not hold true always. >>>>> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I >>>>> added the related code change inside report_vm_out_of_memory. >>>>> Yes, I must not use ' java.lang.OutOfMemoryError' for such case. >>>>> >>>>> Please let me know whether I should remove the code change inside >>>>> report_vm_out_of_memory or keep it by adding appropriate reason of >>>>> OutOfMemoryError. >>>>> >>>>> Regards, >>>>> Shafi >>>>> >>>>>> -----Original Message----- >>>>>> From: Mikael Gerdin >>>>>> Sent: Monday, October 10, 2016 7:30 PM >>>>>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net >>>>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError >>>>>> doesn't work for OOM caused by inability to create threads' >>>>>> >>>>>> Hi, >>>>>> >>>>>> On 2016-10-10 09:24, Shafi Ahmad wrote: >>>>>>> Hi All, >>>>>>> >>>>>>> Please review the simple change for the fix of bug '' JDK-8155004: >>>>>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to >>>>>> create threads'. >>>>>>> >>>>>>> Summary: >>>>>>> In the current implementation there are few scenarios where we are >>>>>>> not >>>>>> obeying the jvm option -XX:+CrashOnOutOfMemoryError. >>>>>>> While I was analysis this issue I found there are two jvm state >>>>>>> where OOM >>>>>> can happen: >>>>>>> 1. OOM during VM initialization - as per our internal discussion >>>>>>> for this case >>>>>> it is not worth for dumping core file, so this is left as it is. >>>>>>> 2. OOM once VM is initialized - For this scenario most of the >>>>>>> place code is >>>>>> already added but few place corresponding code changes are missing >>>>>> so this change covers it. >>>>>>> >>>>>>> Webrev link: >>>>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/ >>>>>> >>>>>> >>>>>> There is a lot of confusion in the VM code with the term "out of >>>>>> memory error". >>>>>> In some places it refers to code throwing a >>>>>> java.lang.OutOfMemoryError and expecting running java code to be >>>>>> able to potentially catch that Error and continue running. >>>>>> >>>>>> In other places, such as callers of report_vm_out_of_memory, the >>>>>> situation is much more dire and the calling thread may not even be a >>>>>> JavaThread and as such cannot "throw" an exception. >>>>>> report_vm_out_of_memory is only invoked through the macro >>>>>> vm_exit_out_of_memory, which of course implies that the condition is >>>>>> fatal and we are about to terminate the JVM process altogether. >>>>>> >>>>>> I think that it's incorrect to call code related to >>>>>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the >>>>>> condition may not even be correlated with Java level application >>> behavior. >>>>>> >>>>>> /Mikael >>>>>> >>>>>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004 >>>>>>> >>>>>>> Testing: jprt and jtreg (on Linux x86_64) >>>>>>> >>>>>>> Regards, >>>>>>> Shafi >>>>>>> From christian.tornqvist at oracle.com Fri Oct 14 12:11:27 2016 From: christian.tornqvist at oracle.com (Christian Tornqvist) Date: Fri, 14 Oct 2016 08:11:27 -0400 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> Message-ID: <186a01d22614$17a3d520$46eb7f60$@oracle.com> Hi Staffan/David, I looked at how the other platforms deal with it, they throw an IllegalArgumentException when they fail to open the file. I think it makes sense for Windows to do the same, please see the updated webrev with IAE instead of a NPE: http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/ Thanks, Christian -----Original Message----- From: Staffan Larsen [mailto:staffan.larsen at oracle.com] Sent: Friday, October 14, 2016 2:19 AM To: Christian Tornqvist Cc: hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows Thanks for getting to the bottom of this! The fix looks good. Perhaps, as David points out, FileNotFoundException is a better choice but that requires more changes as FileNotFoundException is not used anywhere else in the JVM. Thanks, /Staffan > On 13 Oct 2016, at 20:09, Christian Tornqvist wrote: > > Hi everyone, > > > > Please review this small fix for an intermittent issue we've seen when > running tests concurrently that use jcmd/jstack. > > When running jcmd, we enumerate the perfdata files and then open them > one by one to read things like main class names etc. If the perfdata > file disappears (because the Java process exited) before we get to it, > we end up with different exceptions depending on where in the code we are. > > > > The code at: > > http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/ > share/ > classes/sun/tools/common/ProcessArgumentMatcher.java#l88 > > > > handles this, the problem is that if we get all the way to > open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception > which isn't caught by this. The fix is to throw a NPE instead of > Exception and let the existing code handle this. > > > > Fix has been tested locally and with 30 JPRT runs (with concurrency > patch applied), also managed to reproduce and verify this fix locally > using a debugger to trigger the race. > > > > Webrev: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ > > > > Bug (unfortunately not visible): > > https://bugs.openjdk.java.net/browse/JDK-8159799 > > > > Thanks, > > Christian > > > > > From george.triantafillou at oracle.com Fri Oct 14 12:12:51 2016 From: george.triantafillou at oracle.com (George Triantafillou) Date: Fri, 14 Oct 2016 08:12:51 -0400 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> <186a01d22614$17a3d520$46eb7f60$@oracle.com> Message-ID: Hi Christian, This looks good! -George On 10/14/2016 8:11 AM, Christian Tornqvist wrote: > Hi Staffan/David, > > I looked at how the other platforms deal with it, they throw an > IllegalArgumentException when they fail to open the file. I think it makes > sense for Windows to do the same, please see the updated webrev with IAE > instead of a NPE: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/ > > Thanks, > Christian > > -----Original Message----- > From: Staffan Larsen [mailto:staffan.larsen at oracle.com] > Sent: Friday, October 14, 2016 2:19 AM > To: Christian Tornqvist > Cc: hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with > Could not open PerfMemory on Windows > > Thanks for getting to the bottom of this! > > The fix looks good. Perhaps, as David points out, FileNotFoundException is a > better choice but that requires more changes as FileNotFoundException is not > used anywhere else in the JVM. > > Thanks, > /Staffan > >> On 13 Oct 2016, at 20:09, Christian Tornqvist > wrote: >> Hi everyone, >> >> >> >> Please review this small fix for an intermittent issue we've seen when >> running tests concurrently that use jcmd/jstack. >> >> When running jcmd, we enumerate the perfdata files and then open them >> one by one to read things like main class names etc. If the perfdata >> file disappears (because the Java process exited) before we get to it, >> we end up with different exceptions depending on where in the code we are. >> >> >> >> The code at: >> >> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/ >> share/ >> classes/sun/tools/common/ProcessArgumentMatcher.java#l88 >> >> >> >> handles this, the problem is that if we get all the way to >> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception >> which isn't caught by this. The fix is to throw a NPE instead of >> Exception and let the existing code handle this. >> >> >> >> Fix has been tested locally and with 30 JPRT runs (with concurrency >> patch applied), also managed to reproduce and verify this fix locally >> using a debugger to trigger the race. >> >> >> >> Webrev: >> >> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ >> >> >> >> Bug (unfortunately not visible): >> >> https://bugs.openjdk.java.net/browse/JDK-8159799 >> >> >> >> Thanks, >> >> Christian >> >> >> >> >> > From frederic.parain at oracle.com Fri Oct 14 12:26:49 2016 From: frederic.parain at oracle.com (Frederic Parain) Date: Fri, 14 Oct 2016 08:26:49 -0400 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> <186a01d22614$17a3d520$46eb7f60$@oracle.com> Message-ID: <456f7cc0-2f4c-90f8-ea26-2800d934bcdb@oracle.com> Christian, Great work to find the root cause of the issue! The fix with the IAE looks good to me. Thanks, Fred On 10/14/2016 08:11 AM, Christian Tornqvist wrote: > Hi Staffan/David, > > I looked at how the other platforms deal with it, they throw an > IllegalArgumentException when they fail to open the file. I think it makes > sense for Windows to do the same, please see the updated webrev with IAE > instead of a NPE: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/ > > Thanks, > Christian > > -----Original Message----- > From: Staffan Larsen [mailto:staffan.larsen at oracle.com] > Sent: Friday, October 14, 2016 2:19 AM > To: Christian Tornqvist > Cc: hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with > Could not open PerfMemory on Windows > > Thanks for getting to the bottom of this! > > The fix looks good. Perhaps, as David points out, FileNotFoundException is a > better choice but that requires more changes as FileNotFoundException is not > used anywhere else in the JVM. > > Thanks, > /Staffan > >> On 13 Oct 2016, at 20:09, Christian Tornqvist > wrote: >> >> Hi everyone, >> >> >> >> Please review this small fix for an intermittent issue we've seen when >> running tests concurrently that use jcmd/jstack. >> >> When running jcmd, we enumerate the perfdata files and then open them >> one by one to read things like main class names etc. If the perfdata >> file disappears (because the Java process exited) before we get to it, >> we end up with different exceptions depending on where in the code we are. >> >> >> >> The code at: >> >> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/ >> share/ >> classes/sun/tools/common/ProcessArgumentMatcher.java#l88 >> >> >> >> handles this, the problem is that if we get all the way to >> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception >> which isn't caught by this. The fix is to throw a NPE instead of >> Exception and let the existing code handle this. >> >> >> >> Fix has been tested locally and with 30 JPRT runs (with concurrency >> patch applied), also managed to reproduce and verify this fix locally >> using a debugger to trigger the race. >> >> >> >> Webrev: >> >> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ >> >> >> >> Bug (unfortunately not visible): >> >> https://bugs.openjdk.java.net/browse/JDK-8159799 >> >> >> >> Thanks, >> >> Christian >> >> >> >> >> > > From dmitry.dmitriev at oracle.com Fri Oct 14 12:46:09 2016 From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev) Date: Fri, 14 Oct 2016 15:46:09 +0300 Subject: RFR(XS) 8166155: Create tests for VM module option handling In-Reply-To: <5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com> References: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com> <5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com> Message-ID: <9e03ff3e-684f-6b12-8589-2ffc4de24ec2@oracle.com> Hi George, Thank you for moving the test to the separate file. Few comments: 1) Why you not use createJavaProcessBuilder from jdk.test.lib.process.ProcessTools testlibrary class? 2) You can reduce size of the test by introducing common test function: private static void checkInvalidModuleOption(String VMOptionFile, String expectedOutput) { ProcessBuilder pb = createJavaProcessBuilder( "-XX:VMOptionsFile=" + getAbsolutePathFromSource(VMOptionFile)); output = new OutputAnalyzer(pb.start()); output.shouldContain(expectedOutput); output.shouldHaveExitValue(1); } In this case main test function will looks like this: checkInvalidModuleOption(ADD_MODULES_BAD1, "Usage"); checkInvalidModuleOption(ADD_MODULES_BAD2, "Unrecognized option"); ... Thanks, Dmitry On 13.10.2016 14:40, George Triantafillou wrote: > After offline feedback from Dmitry Dmitriev, here's an updated webrev: > > http://cr.openjdk.java.net/~gtriantafill/8166155/webrev.01/ > > > The test was moved to a separate test for VM module option handling. > Thanks. > > -George > > On 9/15/2016 2:25 PM, George Triantafillou wrote: >> Please review this change that adds test coverage for the new VM >> module option handling implemented in >> https://bugs.openjdk.java.net/browse/JDK-8157038. >> >> JBS: https://bugs.openjdk.java.net/browse/JDK-8166155 >> webrev: http://cr.openjdk.java.net/~gtriantafill/8166155/webrev/ >> >> >> Tested locally on Linux. >> >> Thanks. >> >> -George >> > From david.holmes at oracle.com Fri Oct 14 13:10:37 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 23:10:37 +1000 Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could not open PerfMemory on Windows In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com> References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com> <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com> <186a01d22614$17a3d520$46eb7f60$@oracle.com> Message-ID: On 14/10/2016 10:11 PM, Christian Tornqvist wrote: > Hi Staffan/David, > > I looked at how the other platforms deal with it, they throw an > IllegalArgumentException when they fail to open the file. I think it makes > sense for Windows to do the same, please see the updated webrev with IAE > instead of a NPE: > > http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/ I still find IAE a little odd in this context, but okay - consistency counts for a lot. :) Thanks, David > Thanks, > Christian > > -----Original Message----- > From: Staffan Larsen [mailto:staffan.larsen at oracle.com] > Sent: Friday, October 14, 2016 2:19 AM > To: Christian Tornqvist > Cc: hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with > Could not open PerfMemory on Windows > > Thanks for getting to the bottom of this! > > The fix looks good. Perhaps, as David points out, FileNotFoundException is a > better choice but that requires more changes as FileNotFoundException is not > used anywhere else in the JVM. > > Thanks, > /Staffan > >> On 13 Oct 2016, at 20:09, Christian Tornqvist > wrote: >> >> Hi everyone, >> >> >> >> Please review this small fix for an intermittent issue we've seen when >> running tests concurrently that use jcmd/jstack. >> >> When running jcmd, we enumerate the perfdata files and then open them >> one by one to read things like main class names etc. If the perfdata >> file disappears (because the Java process exited) before we get to it, >> we end up with different exceptions depending on where in the code we are. >> >> >> >> The code at: >> >> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/ >> share/ >> classes/sun/tools/common/ProcessArgumentMatcher.java#l88 >> >> >> >> handles this, the problem is that if we get all the way to >> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception >> which isn't caught by this. The fix is to throw a NPE instead of >> Exception and let the existing code handle this. >> >> >> >> Fix has been tested locally and with 30 JPRT runs (with concurrency >> patch applied), also managed to reproduce and verify this fix locally >> using a debugger to trigger the race. >> >> >> >> Webrev: >> >> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/ >> >> >> >> Bug (unfortunately not visible): >> >> https://bugs.openjdk.java.net/browse/JDK-8159799 >> >> >> >> Thanks, >> >> Christian >> >> >> >> >> > > From david.holmes at oracle.com Fri Oct 14 06:15:43 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 14 Oct 2016 16:15:43 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com> On 11/10/2016 12:30 AM, Hiroshi H Horii wrote: > Hi Thomas, David, and all, > >> I think you intended to modify cmpxchg_pre_membar not >> cmpxchg_post_membar! > > The previous patch will change only behavior of cmpxchg_pre_membar. But No it changed the post-membar: http://cr.openjdk.java.net/~horii/8154736/webrev.04/src/os_cpu/linux_ppc/vm/atomic_linux_ppc.hpp.cdiff.html inline void cmpxchg_post_membar(cmpxchg_memory_order order) { ! if (order == memory_order_conservative) { __asm__ __volatile__ ( /* fence */ strasm_sync ); } but I did get confused in what I wrote previously. Given the release must come before the store the pre barrier must be the one that does that - as per latest code. > the patch is not good to be reviewed (it was not obvious) and Martin > suggested me to use lwsync rather than sync. > I created a new webrev. This webrev includes all points that David and > Thomas pointed also. > > http://cr.openjdk.java.net/~horii/8154736/webrev.05/ > > With this change, callers of copy_to_survivor_space can safely touch > fields of returned obj because OrderAccess::acquire() is called in > copy_to_survivor_space when CAS fails. So the intent is that the acquire() pairs with the release() semantics of the cmpxchg store that succeeded in the other thread. That makes sense, though I really have to question the trade-off in code complexity and understandability against any performance gain due to a slight reduction in the barrier strengths. Do you have any metrics on this latest version? >> Changes in shared code must be algorithmically correct on all platforms. >> Not just "it will work fine today". >> >> Given all then work being done to add missing barriers, removing them >> must come with a detailed analysis establishing the safety of doing so. >> And I am not seeing that here. > > The latest codes in the repository are missing some calls of > OrderAccess::acquire() before touching fileds of new_obj or > o->forwardee() in PSPromotionManager::copy_and_push_safe_barrier and > copy_to_survivor_space respectivey. I believe, this webrev correct them, > also. > > Some methods call forwardee(). However, they don't toruch fields of > forwardee while copying survived objects to a survivor space. > PSMarkSweepDecorator::compact() > PSPromotionManager::process_array_chunk() > PSPromotionManager::claim_or_forward_internal_depth() Focusing on the code for now ... src/os_cpu/linux_ppc/vm/atomic_linux_ppc.hpp src/os_cpu/aix_ppc/vm/atomic_aix_ppc.hpp In cmpxchg_post_membar I think it is preferable to maintain the existing default of a full fence if not specifically a "release" or "relaxed" operation ie: inline void cmpxchg_post_membar(cmpxchg_memory_order order) { if (order == memory_order_release) { // no post membar } else if (order != memory_order_relaxed) { __asm__ __volatile__ ( /* fence */ strasm_sync ); } as is done for the pre-membar. If nothing else pre-membar and post-membar should be consistent in their approach. --- src/share/vm/gc/parallel/psPromotionManager.cpp 507 // call acquire for reading fields of obj in callers May I suggest: // acquire() by cas loser is needed to pair with 'release' of cas winner // so we can safely access data (eg. fields of obj) --- src/share/vm/gc/parallel/psPromotionManager.inline.hpp 258 // call acquire for reading fields of new_obj in callers 264 // call acquire for reading fields of new_obj in callers Same as above. 264 // call acquire for reading fields of new_obj in callers 265 OrderAccess::acquire(); If I'm reading this right this is the else part of: 119 // The same test as "o->is_forwarded()" 120 if (!test_mark->is_marked()) { and it less clear what the acquire() is pairing with, but presumably it is still the release of a successful cas_forward_to. But given the isolation of this code from the modified cas operation I have to wonder about performance again - how often will we take this path with the new barrier, compared to the paths with the now modified cas operations? --- Overall the way this proposal has been presented does not instill me with great confidence about its correctness, or performance benefit: 1. Replace strong cas with a relaxed cas Issue: accessing obj fields in logging statement may not be safe. 2. Remove access to obj fields in logging statements Issue: callee access to obj fields is also unsafe 3. Return NULL on failed cas paths Issue: changes overall semantics and correctness of returning NULL is very unclear. 4. Go back to previous code. Issue: callee access to obj fields is also unsafe Suggestion from Kim: fully relaxed seems unsafe but using "release" semantics might be okay [this hypothesis warranted detailed discussion but there was none] 5. replace relaxed cas with release cas 6. Add in acquire() on cas-losing paths, and general access to forwardee Aside: we can presumably add back the logging statement in the losing path now we have the acquire() in place. So far the only justification for making these changes to the GC code come from the April discussion [1] where it was stated simply that: "We've looked at the proposed changes and we are pretty sure that the cmpxchg done during copy_to_survivor_space in the parallel GC doesn't require the full fence/acquire semantics." [Martin & Volker] Reading back through all the emails, including the ones in April, I _think_ part of the reasoning here is that we're not doing a CAS that publishes a new object that was just created, but that we have previously created that object using a full CAS and are now only updating the markword of another object with a forwarding pointer. The second cas would not need full fence semantics as the other object is already visible. However I am not a GC expert and other comments by GC folk suggest that is not in fact the case, or at least is not necessarily always the case. So I can not establish that what is being proposed is correct. I think the GC experts need to have a discussion to resolve things to their mutual satisfaction. Thanks, David [1] http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html ------- > Regards, > Hiroshi > ----------------------- > Hiroshi Horii, Ph.D. > IBM Research - Tokyo > From hui.shi at linaro.org Sat Oct 15 10:34:19 2016 From: hui.shi at linaro.org (Hui Shi) Date: Sat, 15 Oct 2016 18:34:19 +0800 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered Message-ID: Hi all, Could someone help review this fix? JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421 webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/ JVM crashes with illegal threadstate when running on single core machine (for example with single core VM running on aarch64 box). Current JNI wrapper generator missing store _thread_in_native_trans into thread state when machine has only one CPU core. # # A fatal error has been detected by the Java Runtime Environment: # # Internal Error (safepoint.cpp:716), pid=4329, tid=0x0000ffff89204200 # fatal error: Illegal threadstate encountered: 4 # # # If you would like to submit a bug report, please visit: # http://bugreport.java.com/bugreport/crash.jsp # The crash happened outside the Java Virtual Machine in native code. # See problematic frame for where to report the bug. # --------------- T H R E A D --------------- Current thread (0x0000ffff684fe000): JavaThread "localhost-startStop-1" daemon [_thread_in_native, id=4341, stack(0x0000ffff89005000, 0x0000ffff89205000)] Stack: [0x0000ffff89005000,0x0000ffff89205000], sp=0x0000ffff89201f60, free space=2035k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0x95ed3c] VMError::report_and_die()+0x130 V [libjvm.so+0x42c04c] report_fatal(char const*, int, char const*)+0x60 V [libjvm.so+0x85bae4] SafepointSynchronize::block(JavaThread*) [clone .part.24]+0x50 V [libjvm.so+0x90381c] JavaThread::check_safepoint_ and_suspend_for_native_trans(JavaThread*)+0x1c8 V [libjvm.so+0x903f58] JavaThread::check_special_condition_for_native_trans( JavaThread*)+0x14 J 236 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x0000ffff7c1e64f0 [0x0000ffff7c1e63c0+0x130] J 1167 C1 java.util.jar.JarFile$JarEntryIterator.hasMoreElements()Z (5 bytes) @ 0x0000ffff7c4f1320 [0x0000ffff7c4f1180+0x1a0] J 840 C1 java.util.jar.JarFile.getInputStream(Ljava/util/zip/ ZipEntry;)Ljava/io/InputStream; (89 bytes) @ 0x0000ffff7c402b54 [0x0000ffff7c402180+0x9d4] J 1187 C1 org.apache.tomcat.util.scan.FileUrlJar. getEntryInputStream()Ljava/io/InputStream; (21 bytes) @ 0x0000ffff7c52a640 [0x0000ffff7c52a4c0+0x180] Regards Hui From aph at redhat.com Sun Oct 16 09:19:27 2016 From: aph at redhat.com (Andrew Haley) Date: Sun, 16 Oct 2016 10:19:27 +0100 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: References: Message-ID: Hi, On 15/10/16 11:34, Hui Shi wrote: > Could someone help review this fix? > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421 > webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/ > > JVM crashes with illegal threadstate when running on single core machine > (for example with single core VM running on aarch64 box). > Current JNI wrapper generator missing store _thread_in_native_trans into > thread state when machine has only one CPU core. Oh, yuck. Thanks. I'd accept (and prefer) a patch which got rid of the is_MP() check altogether. These days systems are often virtualized, and running processes can be migrated from one system to another. In such circumstances, is_MP() is just a bug. But that needs wider discussion because it affects more systems than just AArch64, so your patch is OK for now. Andrew. From david.holmes at oracle.com Sun Oct 16 20:50:12 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 17 Oct 2016 06:50:12 +1000 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: References: Message-ID: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> On 15/10/2016 8:34 PM, Hui Shi wrote: > Hi all, > > Could someone help review this fix? > > JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421 > webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/ > > JVM crashes with illegal threadstate when running on single core machine > (for example with single core VM running on aarch64 box). > Current JNI wrapper generator missing store _thread_in_native_trans into > thread state when machine has only one CPU core. Fix seems okay - though I'm not expert on aarch64 assembler. But I have to wonder why this chunk of code is different to the functionally equivalent code in templateInterpreterGenerator_aarch64.cpp - including the difference between using DSB and DMB for the barrier? // change thread state __ mov(rscratch1, _thread_in_native_trans); __ lea(rscratch2, Address(rthread, JavaThread::thread_state_offset())); __ stlrw(rscratch1, rscratch2); if (os::is_MP()) { if (UseMembar) { // Force this write out before the read below __ dsb(Assembler::SY); } else { // Write serialization page so VM thread can do a pseudo remote membar. // We use the current thread pointer to calculate a thread specific // offset to write to within the page. This minimizes bus traffic // due to cache line collision. __ serialize_memory(rthread, rscratch2); } } David > # > # A fatal error has been detected by the Java Runtime Environment: > # > # Internal Error (safepoint.cpp:716), pid=4329, tid=0x0000ffff89204200 > # fatal error: Illegal threadstate encountered: 4 > # > # > # If you would like to submit a bug report, please visit: > # http://bugreport.java.com/bugreport/crash.jsp > # The crash happened outside the Java Virtual Machine in native code. > # See problematic frame for where to report the bug. > # > > --------------- T H R E A D --------------- > > Current thread (0x0000ffff684fe000): JavaThread "localhost-startStop-1" > daemon [_thread_in_native, id=4341, stack(0x0000ffff89005000, > 0x0000ffff89205000)] > > Stack: [0x0000ffff89005000,0x0000ffff89205000], sp=0x0000ffff89201f60, free > space=2035k > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > V [libjvm.so+0x95ed3c] VMError::report_and_die()+0x130 > V [libjvm.so+0x42c04c] report_fatal(char const*, int, char const*)+0x60 > V [libjvm.so+0x85bae4] SafepointSynchronize::block(JavaThread*) [clone > .part.24]+0x50 > V [libjvm.so+0x90381c] JavaThread::check_safepoint_ > and_suspend_for_native_trans(JavaThread*)+0x1c8 > V [libjvm.so+0x903f58] JavaThread::check_special_condition_for_native_trans( > JavaThread*)+0x14 > J 236 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x0000ffff7c1e64f0 > [0x0000ffff7c1e63c0+0x130] > J 1167 C1 java.util.jar.JarFile$JarEntryIterator.hasMoreElements()Z (5 > bytes) @ 0x0000ffff7c4f1320 [0x0000ffff7c4f1180+0x1a0] > J 840 C1 java.util.jar.JarFile.getInputStream(Ljava/util/zip/ > ZipEntry;)Ljava/io/InputStream; (89 bytes) @ 0x0000ffff7c402b54 > [0x0000ffff7c402180+0x9d4] > J 1187 C1 org.apache.tomcat.util.scan.FileUrlJar. > getEntryInputStream()Ljava/io/InputStream; (21 bytes) @ 0x0000ffff7c52a640 > [0x0000ffff7c52a4c0+0x180] > > Regards > Hui > From HORII at jp.ibm.com Mon Oct 17 01:44:53 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Mon, 17 Oct 2016 10:44:53 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Hi David, Thank you for your comments. > Do you have any metrics on this latest version? Pause time of Young GC (3rd-10th in evaluation period) in SPECjbb2013 was shorten 5.4% and Critical jOPS (which highly depends on GC pause time) was improved 9.2%. CPU was POWER8 (8247-22L) and two cores were enabled. 24GB for mx and 20GB for mn. > So far the only justification for making these changes to the GC code > come from the April discussion [1] where it was stated simply that: > "We've looked at the proposed changes and we are pretty sure that the > cmpxchg done during copy_to_survivor_space in the parallel GC doesn't > require the full fence/acquire semantics." [Martin & Volker] > > Reading back through all the emails, including the ones in April, I > _think_ part of the reasoning here is that we're not doing a CAS that > publishes a new object that was just created, but that we have > previously created that object using a full CAS and are now only > updating the markword of another object with a forwarding pointer. The > second cas would not need full fence semantics as the other object is > already visible. However I am not a GC expert and other comments by GC > folk suggest that is not in fact the case, or at least is not > necessarily always the case. So I can not establish that what is being > proposed is correct. > > I think the GC experts need to have a discussion to resolve things to > their mutual satisfaction. Thank you for lots of your comments and suggestions. And lots of my mistakes made the discussion long. very sorry. I would like to know comments of GC experts. Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From aph at redhat.com Mon Oct 17 08:41:34 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 17 Oct 2016 09:41:34 +0100 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> References: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> Message-ID: <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> On 16/10/16 21:50, David Holmes wrote: > including the difference between using DSB and DMB for the barrier? DSB was a mistake. I wrote this code before I understood the difference between DSB and DSB; only DMB is needed here. The documentation we had was rather thin on detail Also, the line above which changes thread_state uses STLRW, a fully sequentially-consistent store, so I don't think that any of the code within os::is_MP() is needed at all. I have noticed these anomalies before, but didn't do anything because it's delicate code and very difficult to test. This might be a good time to correct both versions. Andrew. From martin.doerr at sap.com Mon Oct 17 16:38:09 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 17 Oct 2016 16:38:09 +0000 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 Message-ID: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> Hi, I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9: 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed). 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev. Webrev is here: http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/ Please review. Thanks and best regards, Martin From david.holmes at oracle.com Tue Oct 18 03:59:17 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Oct 2016 13:59:17 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> Message-ID: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> Hi Lois, Dan, Serguei, Went to push this today and realized I had left off the updated JNI method lookup tests. As I said in the bug report JNI behaves as expected, but there weren't any testcases so I added them: http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ Thanks, David On 11/10/2016 11:55 AM, David Holmes wrote: > Turns out the only place changes were needed were in JDI. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 > > webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ > > The spec change in ObjectReference is very simple and there is a CCC > request in progress to ratify that change. > > The implementation change in ObjectReferenceImpl mirrors the updated > spec and use the same format as already present in the class version of > the check method. > > The test is a little more complex. This is obviously an extension to > what is already tested in InterfaceMethodsTest. However IMT has a number > of problem with the way it is currently written [1] - specifically it > doesn't properly separate method lookup from method invocation. So I've > added the capability to separate lookup and invocation for use with the > private interface methods - I have not tried to address shortcomings of > the existing tests. Though I did fix the return value checking logic! > And did some clarifying comments and renaming in a couple of place. > > Still on the test I can't add the negative tests I would like to add > because they actually pass due to a different long standing bug in JDI - > [2]. So the actual private interface method testing is very simple: can > I get the Method from the InterfaceType for the interface declaring the > method? Can I then invoke that method on an instance of a class that > implements the interface. > > Thanks, > David > > [1] https://bugs.openjdk.java.net/browse/JDK-8166453 > [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From thomas.stuefe at gmail.com Tue Oct 18 05:39:20 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 18 Oct 2016 07:39:20 +0200 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> Message-ID: Hi David, Max, I changed the asserts according to Max' suggestion. Instead of checking inside flag_to_index, now I check before callers of this function use this value to access memory. http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html As David correctly writes, this is technically not a bug, so I guess this will have to wait until java 10. Kind Regards, Thomas On Fri, Oct 14, 2016 at 12:57 AM, David Holmes wrote: > On 13/10/2016 10:53 PM, Thomas St?fe wrote: > >> Hi David, >> >> On Thu, Oct 13, 2016 at 12:08 PM, David Holmes > > wrote: >> >> Hi Thomas, >> >> On 13/10/2016 3:49 PM, Thomas St?fe wrote: >> >> Hi all, >> >> may I have plase a review for this tiny change? It just adds >> some assert to NMT. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 >> >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul >> d-check_ >> > ld-check_> >> MEMFLAGS/webrev.00/webrev/ >> >> We had an ugly memory overwrite caused by this - ultimately our >> fault, because we fed an invalid memory flag to NMT - but it was >> difficult to find. An assert would have saved some time. >> >> >> I'm a little perplexed with asserting that something of MEMFLAGS >> type must be an actual MEMFLAGS value - it implies the caller is >> coercing plain int to MEMFLAGS, and I don't have much sympathy if >> they mess that up. Can't help wondering if there is some clever C++ >> trick to flag bad conversions at compile-time? >> >> >> The error was caused by an uninitialized variable of type MEMFLAGS. This >> was our fault, we have heavily modified allocation.hpp and introduced an >> error then merging changes from upstream. Due to a merging error this >> lead to a case where Arena::_flags was not initialized and contained a >> very large value. >> > > Ah I see. Lack of default initialization can be annoying :) > > I admit it looks funny. If it bothers you, I could instead check the >> returned index to be in the range for the size of the _malloc array in >> MallocMemorySnapshot::by_type(). Technically, it would mean the same. >> > > So I just realized that here: > > 62 // Map memory type to human readable name > 63 static const char* flag_to_name(MEMFLAGS flag) { > 64 assert(flag >= 0 && flag < mt_number_of_types, "Invalid flag > value %d.", (int)flag); > 65 return _memory_type_names[flag_to_index(flag)]; > 66 } > > we call flag_to_index, so the assert is redundant as it is already in > flag_to_index. Then presumably we change flag_to_index to something like > this: > > static inline int flag_to_index(MEMFLAGS flag) { > int index = (flag & 0xff); > assert(index >= 0 && index < mt_number_of_types, "Invalid flag > value %d.", (int)flag); > return index; > } > > so we're validating the index rather than the flag. > > Cheers, > David > > > >> >> The function that takes the index should validate the index, so that >> is fine. >> >> Which one were you actually passing the bad value to? :) >> >> This isn't a strong objection just musing if we can do better. And >> as the hs repos are still closed, and likely to remain so till early >> next week, we have some slack time :) >> >> >> :) Sure. >> >> Kind Regards, Thomas >> >> >> Cheers, >> David >> >> Thank you! >> >> Thomas >> >> >> From thomas.stuefe at gmail.com Tue Oct 18 06:22:08 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 18 Oct 2016 08:22:08 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: Message-ID: Ping. On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe wrote: > Dear all, > > please take a look at the following fix: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 > webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944- > Hanging-Error-Reporting/webrev.00/webrev/index.html > > --- > > In short, this fix provides the ability to cancel hanging error reporting > steps. This uses the same code paths secondary error handling uses during > error reporting. With this patch, steps which take too long will be > canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this: > > 4 [timeout occurred during error reporting in step ""] after > xxxx ms. > 5 > > and we now also get a finish message in the hs-err file if we hit the > ErrorLogTimeout and error reporting will stop altogether: > > 6 ------ Timout during error reporting after xxx ms. ------ > > (in addition to the "time expired, abort" message the WatcherThread writes > to stderr) > > --- > > This is something which bugged us for a long time, because we rely heavily > on the hs_err files for error analysis at customer sites, and there are a > number of reasons why one step may hang and prevent the follow-up steps > from running. > > It works like this: > > Before, when error reporting started, the WatcherThread was waiting for > ErrorLogTimeout seconds, then would stop the VM. > > Now, the WatcherThread periodically pings error reporting, which checks if > the last step did timeout. If it does, it sends a signal to the reporting > thread, and the thread will continue with the next step. This follows the > same path as secondary crash handling. > > Some implementation details: > > On Posix platforms, to interrupt the thread, I use pthread_kill. This > means I must know the pthread id of the reporting thread, which I now store > at the beginning of error reporting. We already store the reporting thread > id in first_error_tid, but that I cannot use, because it gets set by > os::current_thread_id(), which is not always the pthread id. Should we ever > switch to only using pthread id for posix platforms, this coding can be > simplified. > > On Windows, there is unfortunately no easy way to interrupt a > non-cooperative thread. I would need a way to cause a SEH inside the target > thread, which then would get handled by secondary error handling like on > Posix platforms, but that is not easy. It is doable - one can suspend the > thread, modify the thread context in a way that it will crash upon resume. > But that felt a bit heavyweight for this problem. So on windows, timeout > handling still works (after ErrorLogTimeout the VM gets shut down), but > error reporting steps are not interruptable. If we feel this is important, > this can be added later. > > Kind Regards, Thomas > > > > > > > > > > > From serguei.spitsyn at oracle.com Tue Oct 18 07:10:37 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 18 Oct 2016 00:10:37 -0700 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> Message-ID: <7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com> David, It looks good. Thanks, Serguei On 10/17/16 20:59, David Holmes wrote: > Hi Lois, Dan, Serguei, > > Went to push this today and realized I had left off the updated JNI > method lookup tests. As I said in the bug report JNI behaves as > expected, but there weren't any testcases so I added them: > > http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ > > Thanks, > David > > On 11/10/2016 11:55 AM, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version of >> the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a number >> of problem with the way it is currently written [1] - specifically it >> doesn't properly separate method lookup from method invocation. So I've >> added the capability to separate lookup and invocation for use with the >> private interface methods - I have not tried to address shortcomings of >> the existing tests. Though I did fix the return value checking logic! >> And did some clarifying comments and renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI - >> [2]. So the actual private interface method testing is very simple: can >> I get the Method from the InterfaceType for the interface declaring the >> method? Can I then invoke that method on an instance of a class that >> implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From david.holmes at oracle.com Tue Oct 18 07:16:19 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Oct 2016 17:16:19 +1000 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: Message-ID: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com> Hi Thomas, I took an initial look but am still mulling over things. Note that as an enhancement this will need to wait for Java 10 repos to open - unless you go through the FC extension process. Thanks, David On 18/10/2016 4:22 PM, Thomas St?fe wrote: > Ping. > > On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe > wrote: > >> Dear all, >> >> please take a look at the following fix: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944- >> Hanging-Error-Reporting/webrev.00/webrev/index.html >> >> --- >> >> In short, this fix provides the ability to cancel hanging error reporting >> steps. This uses the same code paths secondary error handling uses during >> error reporting. With this patch, steps which take too long will be >> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this: >> >> 4 [timeout occurred during error reporting in step ""] after >> xxxx ms. >> 5 >> >> and we now also get a finish message in the hs-err file if we hit the >> ErrorLogTimeout and error reporting will stop altogether: >> >> 6 ------ Timout during error reporting after xxx ms. ------ >> >> (in addition to the "time expired, abort" message the WatcherThread writes >> to stderr) >> >> --- >> >> This is something which bugged us for a long time, because we rely heavily >> on the hs_err files for error analysis at customer sites, and there are a >> number of reasons why one step may hang and prevent the follow-up steps >> from running. >> >> It works like this: >> >> Before, when error reporting started, the WatcherThread was waiting for >> ErrorLogTimeout seconds, then would stop the VM. >> >> Now, the WatcherThread periodically pings error reporting, which checks if >> the last step did timeout. If it does, it sends a signal to the reporting >> thread, and the thread will continue with the next step. This follows the >> same path as secondary crash handling. >> >> Some implementation details: >> >> On Posix platforms, to interrupt the thread, I use pthread_kill. This >> means I must know the pthread id of the reporting thread, which I now store >> at the beginning of error reporting. We already store the reporting thread >> id in first_error_tid, but that I cannot use, because it gets set by >> os::current_thread_id(), which is not always the pthread id. Should we ever >> switch to only using pthread id for posix platforms, this coding can be >> simplified. >> >> On Windows, there is unfortunately no easy way to interrupt a >> non-cooperative thread. I would need a way to cause a SEH inside the target >> thread, which then would get handled by secondary error handling like on >> Posix platforms, but that is not easy. It is doable - one can suspend the >> thread, modify the thread context in a way that it will crash upon resume. >> But that felt a bit heavyweight for this problem. So on windows, timeout >> handling still works (after ErrorLogTimeout the VM gets shut down), but >> error reporting steps are not interruptable. If we feel this is important, >> this can be added later. >> >> Kind Regards, Thomas >> >> >> >> >> >> >> >> >> >> >> From david.holmes at oracle.com Tue Oct 18 07:16:45 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 18 Oct 2016 17:16:45 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> <7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com> Message-ID: Thanks Serguei! David On 18/10/2016 5:10 PM, serguei.spitsyn at oracle.com wrote: > David, > > It looks good. > > Thanks, > Serguei > > > On 10/17/16 20:59, David Holmes wrote: >> Hi Lois, Dan, Serguei, >> >> Went to push this today and realized I had left off the updated JNI >> method lookup tests. As I said in the bug report JNI behaves as >> expected, but there weren't any testcases so I added them: >> >> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ >> >> Thanks, >> David >> >> On 11/10/2016 11:55 AM, David Holmes wrote: >>> Turns out the only place changes were needed were in JDI. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >>> >>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >>> >>> The spec change in ObjectReference is very simple and there is a CCC >>> request in progress to ratify that change. >>> >>> The implementation change in ObjectReferenceImpl mirrors the updated >>> spec and use the same format as already present in the class version of >>> the check method. >>> >>> The test is a little more complex. This is obviously an extension to >>> what is already tested in InterfaceMethodsTest. However IMT has a number >>> of problem with the way it is currently written [1] - specifically it >>> doesn't properly separate method lookup from method invocation. So I've >>> added the capability to separate lookup and invocation for use with the >>> private interface methods - I have not tried to address shortcomings of >>> the existing tests. Though I did fix the return value checking logic! >>> And did some clarifying comments and renaming in a couple of place. >>> >>> Still on the test I can't add the negative tests I would like to add >>> because they actually pass due to a different long standing bug in JDI - >>> [2]. So the actual private interface method testing is very simple: can >>> I get the Method from the InterfaceType for the interface declaring the >>> method? Can I then invoke that method on an instance of a class that >>> implements the interface. >>> >>> Thanks, >>> David >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > > From thomas.stuefe at gmail.com Tue Oct 18 07:49:49 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 18 Oct 2016 09:49:49 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com> References: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com> Message-ID: Hi David, thanks! On Tue, Oct 18, 2016 at 9:16 AM, David Holmes wrote: > Hi Thomas, > > I took an initial look but am still mulling over things. > > Note that as an enhancement this will need to wait for Java 10 repos to > open - unless you go through the FC extension process. > > I was afraid that would be the case. Oh well. Kind Regards, Thomas > Thanks, > David > > > On 18/10/2016 4:22 PM, Thomas St?fe wrote: > >> Ping. >> >> On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe >> wrote: >> >> Dear all, >>> >>> please take a look at the following fix: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944- >>> Hanging-Error-Reporting/webrev.00/webrev/index.html >>> >>> --- >>> >>> In short, this fix provides the ability to cancel hanging error reporting >>> steps. This uses the same code paths secondary error handling uses during >>> error reporting. With this patch, steps which take too long will be >>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like >>> this: >>> >>> 4 [timeout occurred during error reporting in step ""] after >>> xxxx ms. >>> 5 >>> >>> and we now also get a finish message in the hs-err file if we hit the >>> ErrorLogTimeout and error reporting will stop altogether: >>> >>> 6 ------ Timout during error reporting after xxx ms. ------ >>> >>> (in addition to the "time expired, abort" message the WatcherThread >>> writes >>> to stderr) >>> >>> --- >>> >>> This is something which bugged us for a long time, because we rely >>> heavily >>> on the hs_err files for error analysis at customer sites, and there are a >>> number of reasons why one step may hang and prevent the follow-up steps >>> from running. >>> >>> It works like this: >>> >>> Before, when error reporting started, the WatcherThread was waiting for >>> ErrorLogTimeout seconds, then would stop the VM. >>> >>> Now, the WatcherThread periodically pings error reporting, which checks >>> if >>> the last step did timeout. If it does, it sends a signal to the reporting >>> thread, and the thread will continue with the next step. This follows the >>> same path as secondary crash handling. >>> >>> Some implementation details: >>> >>> On Posix platforms, to interrupt the thread, I use pthread_kill. This >>> means I must know the pthread id of the reporting thread, which I now >>> store >>> at the beginning of error reporting. We already store the reporting >>> thread >>> id in first_error_tid, but that I cannot use, because it gets set by >>> os::current_thread_id(), which is not always the pthread id. Should we >>> ever >>> switch to only using pthread id for posix platforms, this coding can be >>> simplified. >>> >>> On Windows, there is unfortunately no easy way to interrupt a >>> non-cooperative thread. I would need a way to cause a SEH inside the >>> target >>> thread, which then would get handled by secondary error handling like on >>> Posix platforms, but that is not easy. It is doable - one can suspend the >>> thread, modify the thread context in a way that it will crash upon >>> resume. >>> But that felt a bit heavyweight for this problem. So on windows, timeout >>> handling still works (after ErrorLogTimeout the VM gets shut down), but >>> error reporting steps are not interruptable. If we feel this is >>> important, >>> this can be added later. >>> >>> Kind Regards, Thomas >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From goetz.lindenmaier at sap.com Tue Oct 18 11:49:01 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Tue, 18 Oct 2016 11:49:01 +0000 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 In-Reply-To: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> Message-ID: <3a833908865b4e2c975ec51c672b68a6@DEWDFE13DE50.global.corp.sap> Hi Martin, thanks for doing this change, it looks good. Maybe you want to add comment to load_mirror_from_const_method(): // As load_mirror() on other platforms just that const_method is passed // in instead of method (saving one indirection). Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces at openjdk.java.net] On Behalf Of Doerr, Martin > Sent: Montag, 17. Oktober 2016 18:38 > To: 'hotspot-runtime-dev at openjdk.java.net' dev at openjdk.java.net> > Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after > 8154580 and 8154867 > > Hi, > > I'd like to clean up the template interpreter on PPC64 a little bit after changes > which were pushed into jdk9: > > 8154580 introduced copying the java mirror into the interpreter frame. Some > code can be implemented shorter. Before this change, the size of the ijava > state was designed to be a multiple of 16. We should remove the comment > as this is no longer true. I have checked that this is not really required > (generate_fixed_frame inserts frame padding if needed). > > 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states > are not treated appropriately (which is not critical because the template > interpreter also uses itos for shorter types). This part of the change was > requested by Coleen, but it didn't make it into the original webrev. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webre > v.00/ > > Please review. > > Thanks and best regards, > Martin From lois.foltan at oracle.com Tue Oct 18 12:06:37 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Tue, 18 Oct 2016 08:06:37 -0400 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> Message-ID: <5806104D.2070605@oracle.com> Looks good David! Lois On 10/17/2016 11:59 PM, David Holmes wrote: > Hi Lois, Dan, Serguei, > > Went to push this today and realized I had left off the updated JNI > method lookup tests. As I said in the bug report JNI behaves as > expected, but there weren't any testcases so I added them: > > http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ > > Thanks, > David > > On 11/10/2016 11:55 AM, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version of >> the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a number >> of problem with the way it is currently written [1] - specifically it >> doesn't properly separate method lookup from method invocation. So I've >> added the capability to separate lookup and invocation for use with the >> private interface methods - I have not tried to address shortcomings of >> the existing tests. Though I did fix the return value checking logic! >> And did some clarifying comments and renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI - >> [2]. So the actual private interface method testing is very simple: can >> I get the Method from the InterfaceType for the interface declaring the >> method? Can I then invoke that method on an instance of a class that >> implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From felix.yang at linaro.org Tue Oct 18 12:51:55 2016 From: felix.yang at linaro.org (Felix Yang) Date: Tue, 18 Oct 2016 20:51:55 +0800 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> References: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> Message-ID: Hi, Thanks for fixing the bug. Is it OK to push this patch into repo: http://hg.openjdk.java.net/ jdk9/hs/hotspot for now? Thanks, Felix On 17 October 2016 at 16:41, Andrew Haley wrote: > On 16/10/16 21:50, David Holmes wrote: > > > including the difference between using DSB and DMB for the barrier? > > DSB was a mistake. I wrote this code before I understood the > difference between DSB and DSB; only DMB is needed here. The > documentation we had was rather thin on detail Also, the line above > which changes thread_state uses STLRW, a fully sequentially-consistent > store, so I don't think that any of the code within os::is_MP() is > needed at all. > > I have noticed these anomalies before, but didn't do anything because > it's delicate code and very difficult to test. This might be a good > time to correct both versions. > > Andrew. > From aph at redhat.com Tue Oct 18 13:03:28 2016 From: aph at redhat.com (Andrew Haley) Date: Tue, 18 Oct 2016 14:03:28 +0100 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: References: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> Message-ID: <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com> On 18/10/16 13:51, Felix Yang wrote: > Is it OK to push this patch into repo: http://hg.openjdk.java.net/ > jdk9/hs/hotspot for now? Yes, but whoever does this should also apply it to http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/ and http://icedtea.classpath.org/hg/icedtea7-forest/hotspot/. Andrew. From daniel.daugherty at oracle.com Tue Oct 18 14:27:09 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Tue, 18 Oct 2016 08:27:09 -0600 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> Message-ID: On 10/17/16 9:59 PM, David Holmes wrote: > Hi Lois, Dan, Serguei, > > Went to push this today and realized I had left off the updated JNI > method lookup tests. As I said in the bug report JNI behaves as > expected, but there weren't any testcases so I added them: > > http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ test/runtime/jni/PrivateInterfaceMethods/PrivateInterfaceMethods.java L74: lookup(A.class.getName(), "onlyA", null); //should succeed : : L90: lookup(Impl2.class.getName(), "onlyC", NoSuchMethodError.class); //should fail nit: please add a space after '//' L138: String desc = " Lookup of " + definingClass + "." + method; nit: any particular reason for the space before Lookup? test/runtime/jni/PrivateInterfaceMethods/libPrivateInterfaceMethods.c L78: blank line at the end of the file. jcheck will probably complain. Thumbs up! Feel free to ignore the nits. No need to see a new webrev if you fix them. Dan > > Thanks, > David > > On 11/10/2016 11:55 AM, David Holmes wrote: >> Turns out the only place changes were needed were in JDI. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >> >> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >> >> The spec change in ObjectReference is very simple and there is a CCC >> request in progress to ratify that change. >> >> The implementation change in ObjectReferenceImpl mirrors the updated >> spec and use the same format as already present in the class version of >> the check method. >> >> The test is a little more complex. This is obviously an extension to >> what is already tested in InterfaceMethodsTest. However IMT has a number >> of problem with the way it is currently written [1] - specifically it >> doesn't properly separate method lookup from method invocation. So I've >> added the capability to separate lookup and invocation for use with the >> private interface methods - I have not tried to address shortcomings of >> the existing tests. Though I did fix the return value checking logic! >> And did some clarifying comments and renaming in a couple of place. >> >> Still on the test I can't add the negative tests I would like to add >> because they actually pass due to a different long standing bug in JDI - >> [2]. So the actual private interface method testing is very simple: can >> I get the Method from the InterfaceType for the interface declaring the >> method? Can I then invoke that method on an instance of a class that >> implements the interface. >> >> Thanks, >> David >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 From coleen.phillimore at oracle.com Tue Oct 18 21:56:12 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Tue, 18 Oct 2016 17:56:12 -0400 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 In-Reply-To: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> Message-ID: <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com> This seems good. I think it's a shame to change load_mirror() to load_mirror_from_const_method() though because there's load_mirror() with the same parameters on all the other platforms and it makes platform development a little easier. But that's up to you to because you can generate shorter sequences. Coleen On 10/17/16 12:38 PM, Doerr, Martin wrote: > Hi, > > I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9: > > 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed). > > 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/ > > Please review. > > Thanks and best regards, > Martin > From david.holmes at oracle.com Tue Oct 18 23:55:32 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 09:55:32 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: <5806104D.2070605@oracle.com> References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> <5806104D.2070605@oracle.com> Message-ID: <0958e1c9-250b-17e2-69f4-3b7ad9303ea9@oracle.com> Thanks Lois! David On 18/10/2016 10:06 PM, Lois Foltan wrote: > Looks good David! > Lois > > On 10/17/2016 11:59 PM, David Holmes wrote: >> Hi Lois, Dan, Serguei, >> >> Went to push this today and realized I had left off the updated JNI >> method lookup tests. As I said in the bug report JNI behaves as >> expected, but there weren't any testcases so I added them: >> >> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ >> >> Thanks, >> David >> >> On 11/10/2016 11:55 AM, David Holmes wrote: >>> Turns out the only place changes were needed were in JDI. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >>> >>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >>> >>> The spec change in ObjectReference is very simple and there is a CCC >>> request in progress to ratify that change. >>> >>> The implementation change in ObjectReferenceImpl mirrors the updated >>> spec and use the same format as already present in the class version of >>> the check method. >>> >>> The test is a little more complex. This is obviously an extension to >>> what is already tested in InterfaceMethodsTest. However IMT has a number >>> of problem with the way it is currently written [1] - specifically it >>> doesn't properly separate method lookup from method invocation. So I've >>> added the capability to separate lookup and invocation for use with the >>> private interface methods - I have not tried to address shortcomings of >>> the existing tests. Though I did fix the return value checking logic! >>> And did some clarifying comments and renaming in a couple of place. >>> >>> Still on the test I can't add the negative tests I would like to add >>> because they actually pass due to a different long standing bug in JDI - >>> [2]. So the actual private interface method testing is very simple: can >>> I get the Method from the InterfaceType for the interface declaring the >>> method? Can I then invoke that method on an instance of a class that >>> implements the interface. >>> >>> Thanks, >>> David >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > From david.holmes at oracle.com Tue Oct 18 23:56:52 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 09:56:52 +1000 Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and JDB In-Reply-To: References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com> <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com> Message-ID: <4a476eaa-07be-d029-6fef-c7ebbb357708@oracle.com> On 19/10/2016 12:27 AM, Daniel D. Daugherty wrote: > On 10/17/16 9:59 PM, David Holmes wrote: >> Hi Lois, Dan, Serguei, >> >> Went to push this today and realized I had left off the updated JNI >> method lookup tests. As I said in the bug report JNI behaves as >> expected, but there weren't any testcases so I added them: >> >> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/ > > test/runtime/jni/PrivateInterfaceMethods/PrivateInterfaceMethods.java > L74: lookup(A.class.getName(), "onlyA", null); //should succeed > : > : > L90: lookup(Impl2.class.getName(), "onlyC", > NoSuchMethodError.class); //should fail > nit: please add a space after '//' > > L138: String desc = " Lookup of " + definingClass + "." + > method; > nit: any particular reason for the space before Lookup? Just checking your powers of observation :) > > > test/runtime/jni/PrivateInterfaceMethods/libPrivateInterfaceMethods.c > L78: blank line at the end of the file. jcheck will probably complain. Yeah I deal with that at commit time. > > Thumbs up! Feel free to ignore the nits. No need to see a new > webrev if you fix them. Thanks. Will fix the nits. David > Dan > > >> >> Thanks, >> David >> >> On 11/10/2016 11:55 AM, David Holmes wrote: >>> Turns out the only place changes were needed were in JDI. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827 >>> >>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/ >>> >>> The spec change in ObjectReference is very simple and there is a CCC >>> request in progress to ratify that change. >>> >>> The implementation change in ObjectReferenceImpl mirrors the updated >>> spec and use the same format as already present in the class version of >>> the check method. >>> >>> The test is a little more complex. This is obviously an extension to >>> what is already tested in InterfaceMethodsTest. However IMT has a number >>> of problem with the way it is currently written [1] - specifically it >>> doesn't properly separate method lookup from method invocation. So I've >>> added the capability to separate lookup and invocation for use with the >>> private interface methods - I have not tried to address shortcomings of >>> the existing tests. Though I did fix the return value checking logic! >>> And did some clarifying comments and renaming in a couple of place. >>> >>> Still on the test I can't add the negative tests I would like to add >>> because they actually pass due to a different long standing bug in JDI - >>> [2]. So the actual private interface method testing is very simple: can >>> I get the Method from the InterfaceType for the interface declaring the >>> method? Can I then invoke that method on an instance of a class that >>> implements the interface. >>> >>> Thanks, >>> David >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453 >>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416 > From david.holmes at oracle.com Wed Oct 19 01:21:25 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 11:21:25 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> Message-ID: On 18/10/2016 3:39 PM, Thomas St?fe wrote: > Hi David, Max, > > I changed the asserts according to Max' suggestion. Instead of checking > inside flag_to_index, now I check before callers of this function use > this value to access memory. I don't see where Max suggested that?? It doesn't make sense to me to have all the callers of flag_to_index check what it returned instead of doing it inside flag_to_index. > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html > > As David correctly writes, this is technically not a bug, so I guess > this will have to wait until java 10. Yes, afraid so. Thanks, David > Kind Regards, Thomas > > > > On Fri, Oct 14, 2016 at 12:57 AM, David Holmes > wrote: > > On 13/10/2016 10:53 PM, Thomas St?fe wrote: > > Hi David, > > On Thu, Oct 13, 2016 at 12:08 PM, David Holmes > > >> wrote: > > Hi Thomas, > > On 13/10/2016 3:49 PM, Thomas St?fe wrote: > > Hi all, > > may I have plase a review for this tiny change? It just adds > some assert to NMT. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 > > > > webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ > > > > > MEMFLAGS/webrev.00/webrev/ > > We had an ugly memory overwrite caused by this - > ultimately our > fault, because we fed an invalid memory flag to NMT - > but it was > difficult to find. An assert would have saved some time. > > > I'm a little perplexed with asserting that something of MEMFLAGS > type must be an actual MEMFLAGS value - it implies the caller is > coercing plain int to MEMFLAGS, and I don't have much > sympathy if > they mess that up. Can't help wondering if there is some > clever C++ > trick to flag bad conversions at compile-time? > > > The error was caused by an uninitialized variable of type > MEMFLAGS. This > was our fault, we have heavily modified allocation.hpp and > introduced an > error then merging changes from upstream. Due to a merging error > this > lead to a case where Arena::_flags was not initialized and > contained a > very large value. > > > Ah I see. Lack of default initialization can be annoying :) > > I admit it looks funny. If it bothers you, I could instead check the > returned index to be in the range for the size of the _malloc > array in > MallocMemorySnapshot::by_type(). Technically, it would mean the > same. > > > So I just realized that here: > > 62 // Map memory type to human readable name > 63 static const char* flag_to_name(MEMFLAGS flag) { > 64 assert(flag >= 0 && flag < mt_number_of_types, "Invalid > flag value %d.", (int)flag); > 65 return _memory_type_names[flag_to_index(flag)]; > 66 } > > we call flag_to_index, so the assert is redundant as it is already > in flag_to_index. Then presumably we change flag_to_index to > something like this: > > static inline int flag_to_index(MEMFLAGS flag) { > int index = (flag & 0xff); > assert(index >= 0 && index < mt_number_of_types, "Invalid > flag value %d.", (int)flag); > return index; > } > > so we're validating the index rather than the flag. > > Cheers, > David > > > > > The function that takes the index should validate the index, > so that > is fine. > > Which one were you actually passing the bad value to? :) > > This isn't a strong objection just musing if we can do > better. And > as the hs repos are still closed, and likely to remain so > till early > next week, we have some slack time :) > > > :) Sure. > > Kind Regards, Thomas > > > Cheers, > David > > Thank you! > > Thomas > > > From david.holmes at oracle.com Wed Oct 19 02:01:57 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 12:01:57 +1000 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> Message-ID: <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> Pushed. David On 11/10/2016 11:12 AM, David Holmes wrote: > Ok. I will sponsor this once hs is open again. > > Thanks, > David > > On 6/10/2016 10:10 PM, Alan Burlison wrote: >> On 04/10/2016 19:37, Alan Burlison wrote: >> >>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >>>> already being included. >>> >>> Cool, I'll pop that in instead - thanks! >> >> Done, webrev updated, jprt hotspot testset is clean. >> From thomas.stuefe at gmail.com Wed Oct 19 05:17:01 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 19 Oct 2016 07:17:01 +0200 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> Message-ID: On Wed, Oct 19, 2016 at 3:21 AM, David Holmes wrote: > On 18/10/2016 3:39 PM, Thomas St?fe wrote: > >> Hi David, Max, >> >> I changed the asserts according to Max' suggestion. Instead of checking >> inside flag_to_index, now I check before callers of this function use >> this value to access memory. >> > > I don't see where Max suggested that?? Max wrote: " I think the decision on whether to access a slot should occur as close to memory accessing code as possible." and proceeded to suggest fixing VirtualMemorySnapshot::by_type() as well. > It doesn't make sense to me to have all the callers of flag_to_index check > what it returned instead of doing it inside flag_to_index. > > I disagree. Imho it makes sense to either check the Memflags enumeration input argument in flag_to_index() or the returned index before consumption. In both cases one knows the valid value range. Strictly speaking checking the index in flag_to_index() cannot be done because it is a faceless int type whose valid values are not yet known. It is all academical and mostly a matter of taste. > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul >> d-check_MEMFLAGS/webrev.01/webrev/index.html >> >> As David correctly writes, this is technically not a bug, so I guess >> this will have to wait until java 10. >> > > Yes, afraid so. > The fix is trivial and I will try to get fc extension for this (now that Goetz explained to me how to do this :). It seems this is done for many other non-bug issues as well. ..Thomas > Thanks, > David > > Kind Regards, Thomas >> >> >> >> On Fri, Oct 14, 2016 at 12:57 AM, David Holmes > > wrote: >> >> On 13/10/2016 10:53 PM, Thomas St?fe wrote: >> >> Hi David, >> >> On Thu, Oct 13, 2016 at 12:08 PM, David Holmes >> >> > >> >> wrote: >> >> Hi Thomas, >> >> On 13/10/2016 3:49 PM, Thomas St?fe wrote: >> >> Hi all, >> >> may I have plase a review for this tiny change? It just >> adds >> some assert to NMT. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650 >> >> > > >> webrev: >> >> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul >> d-check_ >> > ld-check_> >> >> > ld-check_ >> > ld-check_>> >> MEMFLAGS/webrev.00/webrev/ >> >> We had an ugly memory overwrite caused by this - >> ultimately our >> fault, because we fed an invalid memory flag to NMT - >> but it was >> difficult to find. An assert would have saved some time. >> >> >> I'm a little perplexed with asserting that something of >> MEMFLAGS >> type must be an actual MEMFLAGS value - it implies the caller >> is >> coercing plain int to MEMFLAGS, and I don't have much >> sympathy if >> they mess that up. Can't help wondering if there is some >> clever C++ >> trick to flag bad conversions at compile-time? >> >> >> The error was caused by an uninitialized variable of type >> MEMFLAGS. This >> was our fault, we have heavily modified allocation.hpp and >> introduced an >> error then merging changes from upstream. Due to a merging error >> this >> lead to a case where Arena::_flags was not initialized and >> contained a >> very large value. >> >> >> Ah I see. Lack of default initialization can be annoying :) >> >> I admit it looks funny. If it bothers you, I could instead check >> the >> returned index to be in the range for the size of the _malloc >> array in >> MallocMemorySnapshot::by_type(). Technically, it would mean the >> same. >> >> >> So I just realized that here: >> >> 62 // Map memory type to human readable name >> 63 static const char* flag_to_name(MEMFLAGS flag) { >> 64 assert(flag >= 0 && flag < mt_number_of_types, "Invalid >> flag value %d.", (int)flag); >> 65 return _memory_type_names[flag_to_index(flag)]; >> 66 } >> >> we call flag_to_index, so the assert is redundant as it is already >> in flag_to_index. Then presumably we change flag_to_index to >> something like this: >> >> static inline int flag_to_index(MEMFLAGS flag) { >> int index = (flag & 0xff); >> assert(index >= 0 && index < mt_number_of_types, "Invalid >> flag value %d.", (int)flag); >> return index; >> } >> >> so we're validating the index rather than the flag. >> >> Cheers, >> David >> >> >> >> >> The function that takes the index should validate the index, >> so that >> is fine. >> >> Which one were you actually passing the bad value to? :) >> >> This isn't a strong objection just musing if we can do >> better. And >> as the hs repos are still closed, and likely to remain so >> till early >> next week, we have some slack time :) >> >> >> :) Sure. >> >> Kind Regards, Thomas >> >> >> Cheers, >> David >> >> Thank you! >> >> Thomas >> >> >> >> From felix.yang at linaro.org Wed Oct 19 05:48:59 2016 From: felix.yang at linaro.org (Felix Yang) Date: Wed, 19 Oct 2016 13:48:59 +0800 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com> References: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com> Message-ID: Hi, I have pushed the patch to jdk9/hs/hotspot repo and also backported to aarch64-port/jdk8u/hotspot repo. I checked the code of icedtea7-forest/hotspot and it seems to me that it does not have the issue, please take a look. Thanks, Felix On 18 October 2016 at 21:03, Andrew Haley wrote: > On 18/10/16 13:51, Felix Yang wrote: > > Is it OK to push this patch into repo: http://hg.openjdk.java.net/ > > jdk9/hs/hotspot for now? > > Yes, but whoever does this should also apply it to > http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/ and > http://icedtea.classpath.org/hg/icedtea7-forest/hotspot/. > > Andrew. > > From thomas.stuefe at gmail.com Wed Oct 19 06:10:04 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 19 Oct 2016 08:10:04 +0200 Subject: "os" - make this a real namespace? Message-ID: Hi all, a small question. I sometimes stumble over the fact that "os" is a class, not a namespace. And that we include the platform dependent additions into the middle of this class. This has a number of repercussions, like not being able to include the platform dependent files (os__) directly, not being able to forward declare functions from the "os" namespace (e.g. os::malloc) etc. I also cannot split implementations from "os" functions to different implementation files without problems. It seems to me all compiler nowadays support namespaces, would it not make sense to convert "os" to a real namespace? While we are at it, what is the reason for the "" sub classes? e.g. os::Bsd, os::Aix etc? It makes integrating patches between platforms difficult and, to me, does not seem to serve any clear purpose. If the purpose is to be a very low wrapper around OS particularities, it makes no sense to have them in the "os" namespace and to make them visible to the shared sections of the VM. E.g. there should be no reason to access "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" functions outside implementations specific for Posix platforms. Thanks, and Kind Regards, Thomas From david.holmes at oracle.com Wed Oct 19 06:49:35 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 16:49:35 +1000 Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS. In-Reply-To: References: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com> <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com> Message-ID: On 19/10/2016 3:17 PM, Thomas St?fe wrote: > > > On Wed, Oct 19, 2016 at 3:21 AM, David Holmes > wrote: > > On 18/10/2016 3:39 PM, Thomas St?fe wrote: > > Hi David, Max, > > I changed the asserts according to Max' suggestion. Instead of > checking > inside flag_to_index, now I check before callers of this > function use > this value to access memory. > > > I don't see where Max suggested that?? > > > Max wrote: " I think the decision on whether to access a slot should > occur as close to memory accessing code as possible." and proceeded to > suggest fixing VirtualMemorySnapshot::by_type() as well. I did not interpret that comment that way, and was puzzled by the reference to by_type. > > It doesn't make sense to me to have all the callers of flag_to_index > check what it returned instead of doing it inside flag_to_index. > > > I disagree. Imho it makes sense to either check the Memflags enumeration > input argument in flag_to_index() or the returned index before > consumption. In both cases one knows the valid value range. Strictly > speaking checking the index in flag_to_index() cannot be done because it > is a faceless int type whose valid values are not yet known. The index has to fall in the range 0 <= index <= mt_number_of_types, and I was suggesting that it makes more sense to verify this once in flag_to_index() than in all the callers of flag_to_index. David > It is all academical and mostly a matter of taste. > > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html > > > As David correctly writes, this is technically not a bug, so I guess > this will have to wait until java 10. > > > Yes, afraid so. > > > The fix is trivial and I will try to get fc extension for this (now that > Goetz explained to me how to do this :). It seems this is done for many > other non-bug issues as well. > > ..Thomas > > > Thanks, > David > > Kind Regards, Thomas > > > > On Fri, Oct 14, 2016 at 12:57 AM, David Holmes > > >> wrote: > > On 13/10/2016 10:53 PM, Thomas St?fe wrote: > > Hi David, > > On Thu, Oct 13, 2016 at 12:08 PM, David Holmes > > > > > >>> wrote: > > Hi Thomas, > > On 13/10/2016 3:49 PM, Thomas St?fe wrote: > > Hi all, > > may I have plase a review for this tiny change? > It just adds > some assert to NMT. > > Bug: > https://bugs.openjdk.java.net/browse/JDK-8167650 > > > > > > >> > webrev: > > > http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_ > > > > > > > > > >> > MEMFLAGS/webrev.00/webrev/ > > We had an ugly memory overwrite caused by this - > ultimately our > fault, because we fed an invalid memory flag to > NMT - > but it was > difficult to find. An assert would have saved > some time. > > > I'm a little perplexed with asserting that something > of MEMFLAGS > type must be an actual MEMFLAGS value - it implies > the caller is > coercing plain int to MEMFLAGS, and I don't have much > sympathy if > they mess that up. Can't help wondering if there is some > clever C++ > trick to flag bad conversions at compile-time? > > > The error was caused by an uninitialized variable of type > MEMFLAGS. This > was our fault, we have heavily modified allocation.hpp and > introduced an > error then merging changes from upstream. Due to a > merging error > this > lead to a case where Arena::_flags was not initialized and > contained a > very large value. > > > Ah I see. Lack of default initialization can be annoying :) > > I admit it looks funny. If it bothers you, I could > instead check the > returned index to be in the range for the size of the > _malloc > array in > MallocMemorySnapshot::by_type(). Technically, it would > mean the > same. > > > So I just realized that here: > > 62 // Map memory type to human readable name > 63 static const char* flag_to_name(MEMFLAGS flag) { > 64 assert(flag >= 0 && flag < mt_number_of_types, "Invalid > flag value %d.", (int)flag); > 65 return _memory_type_names[flag_to_index(flag)]; > 66 } > > we call flag_to_index, so the assert is redundant as it is > already > in flag_to_index. Then presumably we change flag_to_index to > something like this: > > static inline int flag_to_index(MEMFLAGS flag) { > int index = (flag & 0xff); > assert(index >= 0 && index < mt_number_of_types, "Invalid > flag value %d.", (int)flag); > return index; > } > > so we're validating the index rather than the flag. > > Cheers, > David > > > > > The function that takes the index should validate > the index, > so that > is fine. > > Which one were you actually passing the bad value to? :) > > This isn't a strong objection just musing if we can do > better. And > as the hs repos are still closed, and likely to > remain so > till early > next week, we have some slack time :) > > > :) Sure. > > Kind Regards, Thomas > > > Cheers, > David > > Thank you! > > Thomas > > > > From david.holmes at oracle.com Wed Oct 19 07:02:53 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 17:02:53 +1000 Subject: "os" - make this a real namespace? In-Reply-To: References: Message-ID: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> Hi Thomas, On 19/10/2016 4:10 PM, Thomas St?fe wrote: > Hi all, > > a small question. > > I sometimes stumble over the fact that "os" is a class, not a namespace. ?? AFAIK everything in hotspot is a class not a namespace - we don't use "namespaces". > And that we include the platform dependent additions into the middle of > this class. Build-time specialization. It allows for the os API to actually be different on different platforms, as opposed to just being implemented differently. > This has a number of repercussions, like not being able to include the > platform dependent files (os__) directly, not being able to I'd call that a feature - they are not intended to be standalone APIs. > forward declare functions from the "os" namespace (e.g. os::malloc) etc. I > also cannot split implementations from "os" functions to different > implementation files without problems. > > It seems to me all compiler nowadays support namespaces, would it not make > sense to convert "os" to a real namespace? Not being a C++ aficionado I'm not sure exactly what that would entail - as far as I know we don't use C++ namespaces anywhere in hotspot. > While we are at it, what is the reason for the "" sub classes? e.g. > os::Bsd, os::Aix etc? It makes integrating patches between platforms > difficult and, to me, does not seem to serve any clear purpose. Must admit this arrangement has also had me confused at times. I think it is way to add a per-OS helper class for the main os API implementation. > If the purpose is to be a very low wrapper around OS particularities, it > makes no sense to have them in the "os" namespace and to make them visible > to the shared sections of the VM. E.g. there should be no reason to access > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > functions outside implementations specific for Posix platforms. Not sure how you make, for example os::BSD accessible from all classes in os/bsd/vm yet not be visible anywhere else ?? Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm. There is a lot of cleanup in this area slated for the future - hopefully Java 10. POSIX refactoring etc. Cheers, David > Thanks, and Kind Regards, Thomas > From aph at redhat.com Wed Oct 19 07:50:55 2016 From: aph at redhat.com (Andrew Haley) Date: Wed, 19 Oct 2016 08:50:55 +0100 Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal threadstate encountered In-Reply-To: References: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com> <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com> <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com> Message-ID: <7cec6ce2-68b0-b899-9344-bb4738d13f12@redhat.com> On 19/10/16 06:48, Felix Yang wrote: > I have pushed the patch to jdk9/hs/hotspot repo and also backported to > aarch64-port/jdk8u/hotspot repo. OK, thanks. > I checked the code of icedtea7-forest/hotspot and it seems to me that > it does not have the issue, please take a look. Not, it doesn't, you are right. I wonder how that happened. Andrew. From thomas.stuefe at gmail.com Wed Oct 19 09:07:03 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 19 Oct 2016 11:07:03 +0200 Subject: "os" - make this a real namespace? In-Reply-To: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> References: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> Message-ID: Hi David! On Wed, Oct 19, 2016 at 9:02 AM, David Holmes wrote: > Hi Thomas, > > On 19/10/2016 4:10 PM, Thomas St?fe wrote: > >> Hi all, >> >> a small question. >> >> I sometimes stumble over the fact that "os" is a class, not a namespace. >> > > ?? AFAIK everything in hotspot is a class not a namespace - we don't use > "namespaces". > > I meant that it is used as one would use a C++ namespace, not a class. As far as I see any class derived from AllStatic is actually a namespace in the sense that it serves as bracket for a number of related static functions. > And that we include the platform dependent additions into the middle of >> this class. >> > > Build-time specialization. It allows for the os API to actually be > different on different platforms, as opposed to just being implemented > differently. > > The "os" API is a shared, platform independent API, so there should be no difference between platforms. There should be no need for it to export platform specifics - its whole intent is to hide those specifics. Looking into the various os_.hpp files, I see: 1) things where the declaration does differ between oses and therefore they cannot be called in shared code without #ifdefs (e.g. all subclasses). 2)Things where the declaration is shared but the implementation differs. Again, two cases: a) Either implementation is not time critical. In that case the declaration should live in os.hpp and the implementation should live in some platform dependent C++ file. b) Or implementation is time critical and must be inline, in which case a separate platform dependent header would be needed. For (1) and (2b), C++ namespaces would be more convenient. Now, you are forced to include the platform specific file into the class os{} declaration, because there can just be one: os.hpp class os { ... #include }; . With a namespace, you can add functions to the namespace in various disjunct places and hence could write: os.hpp namespace os { ... functions ... } os_xxx.hpp namespace os { ... functions ... } which would be more natural and > This has a number of repercussions, like not being able to include the >> platform dependent files (os__) directly, not being able to >> > > I'd call that a feature - they are not intended to be standalone APIs. Right now os_.hpp exports the os:: api, which one may want to use separately from "os" because they expose platform dependent APIs which are conceptionally lower than the os namespace. At least that is how I always did interpret the intention behind os::. But actually, because the "Aix" class is part of "os", cannot be used separately and is exposed to the whole of the VM, I always avoided putting anything os::Aix if it could be helped. Hence, for AIX, we added porting_aix.hpp for AIX specific functions which are not to be used outside os/aix/vm. Or mostly just plain left functions to be file scope static inside os_aix.cpp. So, os::Aix was pretty useless for me as a porter. > > > forward declare functions from the "os" namespace (e.g. os::malloc) etc. I >> also cannot split implementations from "os" functions to different >> implementation files without problems. >> >> It seems to me all compiler nowadays support namespaces, would it not make >> sense to convert "os" to a real namespace? >> > > Not being a C++ aficionado I'm not sure exactly what that would entail - > as far as I know we don't use C++ namespaces anywhere in hotspot. > > We start using all kinds of modern C++ features. Templates pop up all over the place. Namespaces in contrast are an old and easily understood feature. We already use it inside our own port. > While we are at it, what is the reason for the "" sub classes? e.g. >> os::Bsd, os::Aix etc? It makes integrating patches between platforms >> difficult and, to me, does not seem to serve any clear purpose. >> > > Must admit this arrangement has also had me confused at times. I think it > is way to add a per-OS helper class for the main os API implementation. > > If the purpose is to be a very low wrapper around OS particularities, it >> makes no sense to have them in the "os" namespace and to make them visible >> to the shared sections of the VM. E.g. there should be no reason to access >> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" >> functions outside implementations specific for Posix platforms. >> > > Not sure how you make, for example os::BSD accessible from all classes in > os/bsd/vm yet not be visible anywhere else ?? > > I think there is no real reason for os::Bsd to exist at all. Either we have shared functions with platform dependent implementation, then they should be declared in "os". Or they are completely platform specific, then they can be moved to a platform specific header outside of "os" like we did with porting_aix.hpp. Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm. > > There is a lot of cleanup in this area slated for the future - hopefully > Java 10. POSIX refactoring etc. > > Sure! Kind Regards, Thomas > Cheers, > David > > > Thanks, and Kind Regards, Thomas >> >> From david.holmes at oracle.com Wed Oct 19 12:19:25 2016 From: david.holmes at oracle.com (David Holmes) Date: Wed, 19 Oct 2016 22:19:25 +1000 Subject: "os" - make this a real namespace? In-Reply-To: References: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> Message-ID: <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com> Hi Thomas, On 19/10/2016 7:07 PM, Thomas St?fe wrote: > Hi David! > > On Wed, Oct 19, 2016 at 9:02 AM, David Holmes > wrote: > > Hi Thomas, > > On 19/10/2016 4:10 PM, Thomas St?fe wrote: > > Hi all, > > a small question. > > I sometimes stumble over the fact that "os" is a class, not a > namespace. > > > ?? AFAIK everything in hotspot is a class not a namespace - we don't > use "namespaces". > > > I meant that it is used as one would use a C++ namespace, not a class. > As far as I see any class derived from AllStatic is actually a namespace > in the sense that it serves as bracket for a number of related static > functions. Okay ... if that is what you mean by a namespace ... I always thought namespaces were like packages, a level up from classes. > > And that we include the platform dependent additions into the > middle of this class. > > > Build-time specialization. It allows for the os API to actually be > different on different platforms, as opposed to just being > implemented differently. > > > The "os" API is a shared, platform independent API, so there should be > no difference between platforms. There should be no need for it to > export platform specifics - its whole intent is to hide those specifics. But that is your current design perspective of what the os API should be. What it is is something with a very long history and which has had to accommodate different things over time. At one point a lot of the JDK native code would call into the VM for functionality that is now directly implemented in JDK native code. There's a lot of historical baggage here. > Looking into the various os_.hpp files, I see: > 1) things where the declaration does differ between oses and therefore > they cannot be called in shared code without #ifdefs (e.g. all > subclasses). > 2)Things where the declaration is shared but the implementation differs. > Again, two cases: > a) Either implementation is not time critical. In that case the > declaration should live in os.hpp and the implementation should live in > some platform dependent C++ file. > b) Or implementation is time critical and must be inline, in which > case a separate platform dependent header would be needed. > > For (1) and (2b), C++ namespaces would be more convenient. Now, you are > forced to include the platform specific file into the class os{} > declaration, because there can just be one: > > os.hpp > class os { > ... > #include > > }; > > . With a namespace, you can add functions to the namespace in various > disjunct places and hence could write: > > os.hpp > namespace os { ... functions ... } > > os_xxx.hpp > namespace os { ... functions ... } > > which would be more natural and Yes I can see that as an alternative way to expand the os API. Though I still prefer to group functionality in a class. > This has a number of repercussions, like not being able to > include the > platform dependent files (os__) directly, not being able to > > > I'd call that a feature - they are not intended to be standalone APIs. > > > Right now os_.hpp exports the os:: api, which one may want to > use separately from "os" because they expose platform dependent APIs > which are conceptionally lower than the os namespace. At least that is > how I always did interpret the intention behind os::. I wouldn't say lower - they extend the os API with platform specific functionality and concepts. The idea is that specific code that wants to use OS facilities that are specific to that access them through the os:: class. > But actually, because the "Aix" class is part of "os", cannot be used > separately and is exposed to the whole of the VM, I always avoided > putting anything os::Aix if it could be helped. Hence, for AIX, we added > porting_aix.hpp for AIX specific functions which are not to be used > outside os/aix/vm. Or mostly just plain left functions to be file scope > static inside os_aix.cpp. So, os::Aix was pretty useless for me as a porter. Seems you made a decision that the os::AIX class didn't meet your ideas as to how platform specifics should be handled and so went with an alternative design. Wouldn't that make it "useless" because you chose not to use it? > > forward declare functions from the "os" namespace (e.g. > os::malloc) etc. I > also cannot split implementations from "os" functions to different > implementation files without problems. > > It seems to me all compiler nowadays support namespaces, would > it not make sense to convert "os" to a real namespace? > > > Not being a C++ aficionado I'm not sure exactly what that would > entail - as far as I know we don't use C++ namespaces anywhere in > hotspot. > > > We start using all kinds of modern C++ features. Templates pop up all > over the place. Namespaces in contrast are an old and easily understood > feature. We already use it inside our own port. > > > While we are at it, what is the reason for the "" sub > classes? e.g. > os::Bsd, os::Aix etc? It makes integrating patches between platforms > difficult and, to me, does not seem to serve any clear purpose. > > > Must admit this arrangement has also had me confused at times. I > think it is way to add a per-OS helper class for the main os API > implementation. > > If the purpose is to be a very low wrapper around OS > particularities, it > makes no sense to have them in the "os" namespace and to make > them visible > to the shared sections of the VM. E.g. there should be no reason > to access > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > functions outside implementations specific for Posix platforms. > > > Not sure how you make, for example os::BSD accessible from all > classes in os/bsd/vm yet not be visible anywhere else ?? > > > I think there is no real reason for os::Bsd to exist at all. Either we > have shared functions with platform dependent implementation, then they > should be declared in "os". Or they are completely platform specific, > then they can be moved to a platform specific header outside of "os" > like we did with porting_aix.hpp. Sure you could do that. But 20 years ago that wasn't how things were designed and we have what we have today. As I said a lot of baggage. Personally I find the nesting of the concrete os API quite natural: os::win32 to me is better than unrelated os and win32 classes or namespaces. Cheers, David ----- > Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm. > > > There is a lot of cleanup in this area slated for the future - > hopefully Java 10. POSIX refactoring etc. > > > Sure! > > Kind Regards, Thomas > > > Cheers, > David > > > Thanks, and Kind Regards, Thomas > > From thomas.stuefe at gmail.com Wed Oct 19 13:54:11 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 19 Oct 2016 15:54:11 +0200 Subject: "os" - make this a real namespace? In-Reply-To: <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com> References: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com> Message-ID: Hi David, my intent was not to attack the existing code, but to ask about the original design intentions and possibly come up with ideas to improve it. See my further answers inline. On Wed, Oct 19, 2016 at 2:19 PM, David Holmes wrote: > Hi Thomas, > > On 19/10/2016 7:07 PM, Thomas St?fe wrote: > >> Hi David! >> >> On Wed, Oct 19, 2016 at 9:02 AM, David Holmes > > wrote: >> >> Hi Thomas, >> >> On 19/10/2016 4:10 PM, Thomas St?fe wrote: >> >> Hi all, >> >> a small question. >> >> I sometimes stumble over the fact that "os" is a class, not a >> namespace. >> >> >> ?? AFAIK everything in hotspot is a class not a namespace - we don't >> use "namespaces". >> >> >> I meant that it is used as one would use a C++ namespace, not a class. >> As far as I see any class derived from AllStatic is actually a namespace >> in the sense that it serves as bracket for a number of related static >> functions. >> > > Okay ... if that is what you mean by a namespace ... I always thought > namespaces were like packages, a level up from classes. > > I always used namespace as a common scope for declarations which belong together, be that classes, global functions, variables. I always thought this is how class "os" is used in the hotspot. I also use C++ namespace to isolate coding in large projects I do not own but where my symbols are, for technical reasons, visible anywhere but I want to avoid name clashes. A typical porter headache. E.g. we have a namespace "sap" in our coding, just to keep our stuff separate from global symbols. > >> And that we include the platform dependent additions into the >> middle of this class. >> >> >> Build-time specialization. It allows for the os API to actually be >> different on different platforms, as opposed to just being >> implemented differently. >> >> >> The "os" API is a shared, platform independent API, so there should be >> no difference between platforms. There should be no need for it to >> export platform specifics - its whole intent is to hide those specifics. >> > > But that is your current design perspective of what the os API should be. > What it is is something with a very long history and which has had to > accommodate different things over time. At one point a lot of the JDK > native code would call into the VM for functionality that is now directly > implemented in JDK native code. There's a lot of historical baggage here. > > Looking into the various os_.hpp files, I see: >> 1) things where the declaration does differ between oses and therefore >> they cannot be called in shared code without #ifdefs (e.g. all >> subclasses). >> 2)Things where the declaration is shared but the implementation differs. >> Again, two cases: >> a) Either implementation is not time critical. In that case the >> declaration should live in os.hpp and the implementation should live in >> some platform dependent C++ file. >> b) Or implementation is time critical and must be inline, in which >> case a separate platform dependent header would be needed. >> >> For (1) and (2b), C++ namespaces would be more convenient. Now, you are >> forced to include the platform specific file into the class os{} >> declaration, because there can just be one: >> >> os.hpp >> class os { >> ... >> #include >> >> }; >> >> . With a namespace, you can add functions to the namespace in various >> disjunct places and hence could write: >> >> os.hpp >> namespace os { ... functions ... } >> >> os_xxx.hpp >> namespace os { ... functions ... } >> >> which would be more natural and >> > > Yes I can see that as an alternative way to expand the os API. Though I > still prefer to group functionality in a class. > > I would argue that the advantage of namespaces here is that the special handling of platform specific headers is not needed anymore. Now, when reading any os_/ header, I need to keep in mind that the content of this header gets inserted into the middle of a class definition. That is just rather exotic and unexpected and tripped me over a few times already. > This has a number of repercussions, like not being able to >> include the >> platform dependent files (os__) directly, not being able >> to >> >> >> I'd call that a feature - they are not intended to be standalone APIs. >> > There are a number of useful "os" APIs which I would sometimes like to use without the bagage of including the whole os.hpp header. For instance, os::malloc(). Normally, I would forward declare them, but this is not possible for class functions. > >> >> Right now os_.hpp exports the os:: api, which one may want to >> use separately from "os" because they expose platform dependent APIs >> which are conceptionally lower than the os namespace. At least that is >> how I always did interpret the intention behind os::. >> > > I wouldn't say lower - they extend the os API with platform specific > functionality and concepts. The idea is that specific code that wants > to use OS facilities that are specific to that access them through the > os:: class. > > But actually, because the "Aix" class is part of "os", cannot be used >> separately and is exposed to the whole of the VM, I always avoided >> putting anything os::Aix if it could be helped. Hence, for AIX, we added >> porting_aix.hpp for AIX specific functions which are not to be used >> outside os/aix/vm. Or mostly just plain left functions to be file scope >> static inside os_aix.cpp. So, os::Aix was pretty useless for me as a >> porter. >> > > Seems you made a decision that the os::AIX class didn't meet your ideas as > to how platform specifics should be handled and so went with an alternative > design. Wouldn't that make it "useless" because you chose not to use it? When the AIX port started, the os interface was not well documented. Nor was there anyone I could ask because OpenJDK did not yet exist. So I had to deduce the intent of the original authors from the code and try to fill it with life as best as possible. > >> forward declare functions from the "os" namespace (e.g. >> os::malloc) etc. I >> also cannot split implementations from "os" functions to different >> implementation files without problems. >> >> It seems to me all compiler nowadays support namespaces, would >> it not make sense to convert "os" to a real namespace? >> >> >> Not being a C++ aficionado I'm not sure exactly what that would >> entail - as far as I know we don't use C++ namespaces anywhere in >> hotspot. >> >> >> We start using all kinds of modern C++ features. Templates pop up all >> over the place. Namespaces in contrast are an old and easily understood >> feature. We already use it inside our own port. >> >> >> While we are at it, what is the reason for the "" sub >> classes? e.g. >> os::Bsd, os::Aix etc? It makes integrating patches between >> platforms >> difficult and, to me, does not seem to serve any clear purpose. >> >> >> Must admit this arrangement has also had me confused at times. I >> think it is way to add a per-OS helper class for the main os API >> implementation. >> >> If the purpose is to be a very low wrapper around OS >> particularities, it >> makes no sense to have them in the "os" namespace and to make >> them visible >> to the shared sections of the VM. E.g. there should be no reason >> to access >> "os::Bsd" functions from outside os/bsd/vm, or to access >> "os::Posix" >> functions outside implementations specific for Posix platforms. >> >> >> Not sure how you make, for example os::BSD accessible from all >> classes in os/bsd/vm yet not be visible anywhere else ?? >> >> >> I think there is no real reason for os::Bsd to exist at all. Either we >> have shared functions with platform dependent implementation, then they >> should be declared in "os". Or they are completely platform specific, >> then they can be moved to a platform specific header outside of "os" >> like we did with porting_aix.hpp. >> > > Sure you could do that. But 20 years ago that wasn't how things were > designed and we have what we have today. As I said a lot of baggage. > > Personally I find the nesting of the concrete os API quite natural: > os::win32 to me is better than unrelated os and win32 classes or namespaces. > > I think there is code structure, and then there is exposure. Both are separate things but currently interwoven in the hotspot. Putting Win32 specifics into os::win32 makes sense. But there is no reason to expose os::win32 to shared parts of the VM, considering how much the hotspot programmers at Oracle dislike platform specific #ifdefs in shared code. I understand that this is historical? Before namespaces, you had to define a class as common bracket, and a class definition must be complete to be valid, so os:: is automatically visible everywhere. But with namespaces this could be disentangled. We could have a globally visible "os" namespace with platform independent shared functions, as well as an "os::win32" namespace which is only visible in windows specific implementation files. Kind Regards, Thomas > Cheers, > David > ----- > > > Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm. >> >> >> There is a lot of cleanup in this area slated for the future - >> hopefully Java 10. POSIX refactoring etc. >> >> >> Sure! >> >> Kind Regards, Thomas >> >> >> Cheers, >> David >> >> >> Thanks, and Kind Regards, Thomas >> >> >> From vladimir.kozlov at oracle.com Wed Oct 19 17:43:50 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Oct 2016 10:43:50 -0700 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> Message-ID: <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com> I missed all this review fun :) Thank you, Alan, for cleaning this up. The only concern I have is removal of conditional macros. > I've also taken the opportunity to strip out most of the '#ifndef(FOO)' > probes for the HW capability bit macros in vm_version_solaris_sparc.cpp. > They are now redundant as the macros are are in the system header files > from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related > ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and > AV2_SPARC_SPARC5. For those I've left the macro probes in place. Most likely people will try to run JDK 9 on Solaris 10. Or in some kind of VM environment which may not have Solaris 11.1 headers. We have a lot such cases before that is why those macros were added. "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is fine but original code would cover more running cases. Sorry for rumbling. Regards, Vladimir On 10/18/16 7:01 PM, David Holmes wrote: > Pushed. > > David > > On 11/10/2016 11:12 AM, David Holmes wrote: >> Ok. I will sponsor this once hs is open again. >> >> Thanks, >> David >> >> On 6/10/2016 10:10 PM, Alan Burlison wrote: >>> On 04/10/2016 19:37, Alan Burlison wrote: >>> >>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >>>>> already being included. >>>> >>>> Cool, I'll pop that in instead - thanks! >>> >>> Done, webrev updated, jprt hotspot testset is clean. >>> From david.holmes at oracle.com Wed Oct 19 23:59:04 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 20 Oct 2016 09:59:04 +1000 Subject: "os" - make this a real namespace? In-Reply-To: References: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com> Message-ID: On 19/10/2016 11:54 PM, Thomas St?fe wrote: > Hi David, > > my intent was not to attack the existing code, but to ask about the > original design intentions and possibly come up with ideas to improve > it. See my further answers inline. Sure - no problem. It is hard to understand the intent of the design when the remaining code doesn't even reflect the original design intentions (nor is there anyone around involved in that!). And there is always scope to redo this as part of the "big OS code cleanup". The "exposure" concern is not one that I've heard expressed previously in relation to the hotspot code. Cheers, David > On Wed, Oct 19, 2016 at 2:19 PM, David Holmes > wrote: > > Hi Thomas, > > On 19/10/2016 7:07 PM, Thomas St?fe wrote: > > Hi David! > > On Wed, Oct 19, 2016 at 9:02 AM, David Holmes > > >> wrote: > > Hi Thomas, > > On 19/10/2016 4:10 PM, Thomas St?fe wrote: > > Hi all, > > a small question. > > I sometimes stumble over the fact that "os" is a class, > not a > namespace. > > > ?? AFAIK everything in hotspot is a class not a namespace - > we don't > use "namespaces". > > > I meant that it is used as one would use a C++ namespace, not a > class. > As far as I see any class derived from AllStatic is actually a > namespace > in the sense that it serves as bracket for a number of related > static > functions. > > > Okay ... if that is what you mean by a namespace ... I always > thought namespaces were like packages, a level up from classes. > > > I always used namespace as a common scope for declarations which belong > together, be that classes, global functions, variables. I always thought > this is how class "os" is used in the hotspot. > > I also use C++ namespace to isolate coding in large projects I do not > own but where my symbols are, for technical reasons, visible anywhere > but I want to avoid name clashes. A typical porter headache. E.g. we > have a namespace "sap" in our coding, just to keep our stuff separate > from global symbols. > > > > And that we include the platform dependent additions > into the > middle of this class. > > > Build-time specialization. It allows for the os API to > actually be > different on different platforms, as opposed to just being > implemented differently. > > > The "os" API is a shared, platform independent API, so there > should be > no difference between platforms. There should be no need for it to > export platform specifics - its whole intent is to hide those > specifics. > > > But that is your current design perspective of what the os API > should be. What it is is something with a very long history and > which has had to accommodate different things over time. At one > point a lot of the JDK native code would call into the VM for > functionality that is now directly implemented in JDK native code. > There's a lot of historical baggage here. > > > Looking into the various os_.hpp files, I see: > 1) things where the declaration does differ between oses and > therefore > they cannot be called in shared code without #ifdefs (e.g. all > subclasses). > 2)Things where the declaration is shared but the implementation > differs. > Again, two cases: > a) Either implementation is not time critical. In that case the > declaration should live in os.hpp and the implementation should > live in > some platform dependent C++ file. > b) Or implementation is time critical and must be inline, in > which > case a separate platform dependent header would be needed. > > For (1) and (2b), C++ namespaces would be more convenient. Now, > you are > forced to include the platform specific file into the class os{} > declaration, because there can just be one: > > os.hpp > class os { > ... > #include > > }; > > . With a namespace, you can add functions to the namespace in > various > disjunct places and hence could write: > > os.hpp > namespace os { ... functions ... } > > os_xxx.hpp > namespace os { ... functions ... } > > which would be more natural and > > > Yes I can see that as an alternative way to expand the os API. > Though I still prefer to group functionality in a class. > > > I would argue that the advantage of namespaces here is that the special > handling of platform specific headers is not needed anymore. Now, when > reading any os_/ header, I need to keep in mind that the > content of this header gets inserted into the middle of a class > definition. That is just rather exotic and unexpected and tripped me > over a few times already. > > > > This has a number of repercussions, like not being able to > include the > platform dependent files (os__) directly, not > being able to > > > I'd call that a feature - they are not intended to be > standalone APIs. > > > There are a number of useful "os" APIs which I would sometimes like to > use without the bagage of including the whole os.hpp header. For > instance, os::malloc(). Normally, I would forward declare them, but this > is not possible for class functions. > > > > > Right now os_.hpp exports the os:: api, which one may > want to > use separately from "os" because they expose platform dependent APIs > which are conceptionally lower than the os namespace. At least > that is > how I always did interpret the intention behind os::. > > > I wouldn't say lower - they extend the os API with platform specific > functionality and concepts. The idea is that specific code that > wants to use OS facilities that are specific to that access > them through the os:: class. > > > > But actually, because the "Aix" class is part of "os", cannot be > used > separately and is exposed to the whole of the VM, I always avoided > putting anything os::Aix if it could be helped. Hence, for AIX, > we added > porting_aix.hpp for AIX specific functions which are not to be used > outside os/aix/vm. Or mostly just plain left functions to be > file scope > static inside os_aix.cpp. So, os::Aix was pretty useless for me > as a porter. > > > Seems you made a decision that the os::AIX class didn't meet your > ideas as to how platform specifics should be handled and so went > with an alternative design. Wouldn't that make it "useless" because > you chose not to use it? > > > When the AIX port started, the os interface was not well documented. Nor > was there anyone I could ask because OpenJDK did not yet exist. So I had > to deduce the intent of the original authors from the code and try to > fill it with life as best as possible. > > > > forward declare functions from the "os" namespace (e.g. > os::malloc) etc. I > also cannot split implementations from "os" functions to > different > implementation files without problems. > > It seems to me all compiler nowadays support namespaces, > would > it not make sense to convert "os" to a real namespace? > > > Not being a C++ aficionado I'm not sure exactly what that would > entail - as far as I know we don't use C++ namespaces > anywhere in > hotspot. > > > We start using all kinds of modern C++ features. Templates pop > up all > over the place. Namespaces in contrast are an old and easily > understood > feature. We already use it inside our own port. > > > While we are at it, what is the reason for the "" sub > classes? e.g. > os::Bsd, os::Aix etc? It makes integrating patches > between platforms > difficult and, to me, does not seem to serve any clear > purpose. > > > Must admit this arrangement has also had me confused at times. I > think it is way to add a per-OS helper class for the main os API > implementation. > > If the purpose is to be a very low wrapper around OS > particularities, it > makes no sense to have them in the "os" namespace and to > make > them visible > to the shared sections of the VM. E.g. there should be > no reason > to access > "os::Bsd" functions from outside os/bsd/vm, or to access > "os::Posix" > functions outside implementations specific for Posix > platforms. > > > Not sure how you make, for example os::BSD accessible from all > classes in os/bsd/vm yet not be visible anywhere else ?? > > > I think there is no real reason for os::Bsd to exist at all. > Either we > have shared functions with platform dependent implementation, > then they > should be declared in "os". Or they are completely platform > specific, > then they can be moved to a platform specific header outside of "os" > like we did with porting_aix.hpp. > > > Sure you could do that. But 20 years ago that wasn't how things were > designed and we have what we have today. As I said a lot of baggage. > > Personally I find the nesting of the concrete os API quite natural: > os::win32 to me is better than unrelated os and win32 classes or > namespaces. > > > I think there is code structure, and then there is exposure. Both are > separate things but currently interwoven in the hotspot. > > Putting Win32 specifics into os::win32 makes sense. But there is no > reason to expose os::win32 to shared parts of the VM, considering how > much the hotspot programmers at Oracle dislike platform specific #ifdefs > in shared code. > > I understand that this is historical? Before namespaces, you had to > define a class as common bracket, and a class definition must be > complete to be valid, so os:: is automatically visible everywhere. > But with namespaces this could be disentangled. We could have a globally > visible "os" namespace with platform independent shared functions, as > well as an "os::win32" namespace which is only visible in windows > specific implementation files. > > Kind Regards, Thomas > > > Cheers, > David > ----- > > > Plus it also needs to potentially be visible from > os_cpu/bsd_XXX/vm. > > > There is a lot of cleanup in this area slated for the future - > hopefully Java 10. POSIX refactoring etc. > > > Sure! > > Kind Regards, Thomas > > > Cheers, > David > > > Thanks, and Kind Regards, Thomas > > > From david.holmes at oracle.com Thu Oct 20 00:17:13 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 20 Oct 2016 10:17:13 +1000 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com> References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com> Message-ID: On 20/10/2016 3:43 AM, Vladimir Kozlov wrote: > I missed all this review fun :) > > Thank you, Alan, for cleaning this up. > > The only concern I have is removal of conditional macros. > >> I've also taken the opportunity to strip out most of the '#ifndef(FOO)' >> probes for the HW capability bit macros in vm_version_solaris_sparc.cpp. >> They are now redundant as the macros are are in the system header files >> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related >> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and >> AV2_SPARC_SPARC5. For those I've left the macro probes in place. > > Most likely people will try to run JDK 9 on Solaris 10. Or in some kind > of VM environment which may not have Solaris 11.1 headers. We have a lot > such cases before that is why those macros were added. run or build? running should not be a problem. Building on S10 without a devkit has not worked for a while AFAIK. David > "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is > fine but original code would cover more running cases. > > Sorry for rumbling. > > Regards, > Vladimir > > On 10/18/16 7:01 PM, David Holmes wrote: >> Pushed. >> >> David >> >> On 11/10/2016 11:12 AM, David Holmes wrote: >>> Ok. I will sponsor this once hs is open again. >>> >>> Thanks, >>> David >>> >>> On 6/10/2016 10:10 PM, Alan Burlison wrote: >>>> On 04/10/2016 19:37, Alan Burlison wrote: >>>> >>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >>>>>> already being included. >>>>> >>>>> Cool, I'll pop that in instead - thanks! >>>> >>>> Done, webrev updated, jprt hotspot testset is clean. >>>> From vladimir.kozlov at oracle.com Thu Oct 20 02:56:58 2016 From: vladimir.kozlov at oracle.com (Vladimir Kozlov) Date: Wed, 19 Oct 2016 19:56:58 -0700 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com> Message-ID: On 10/19/16 5:17 PM, David Holmes wrote: > On 20/10/2016 3:43 AM, Vladimir Kozlov wrote: >> I missed all this review fun :) >> >> Thank you, Alan, for cleaning this up. >> >> The only concern I have is removal of conditional macros. >> >>> I've also taken the opportunity to strip out most of the '#ifndef(FOO)' >>> probes for the HW capability bit macros in vm_version_solaris_sparc.cpp. >>> They are now redundant as the macros are are in the system header files >>> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related >>> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and >>> AV2_SPARC_SPARC5. For those I've left the macro probes in place. >> >> Most likely people will try to run JDK 9 on Solaris 10. Or in some kind >> of VM environment which may not have Solaris 11.1 headers. We have a lot >> such cases before that is why those macros were added. > > run or build? running should not be a problem. Building on S10 without a > devkit has not worked for a while AFAIK. Ooh yes, you are right - it was build problem. Those macros were for time when we did not use devkit yet. Everything is good then. Thanks, Vladimir > > David > >> "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is >> fine but original code would cover more running cases. >> >> Sorry for rumbling. >> >> Regards, >> Vladimir >> >> On 10/18/16 7:01 PM, David Holmes wrote: >>> Pushed. >>> >>> David >>> >>> On 11/10/2016 11:12 AM, David Holmes wrote: >>>> Ok. I will sponsor this once hs is open again. >>>> >>>> Thanks, >>>> David >>>> >>>> On 6/10/2016 10:10 PM, Alan Burlison wrote: >>>>> On 04/10/2016 19:37, Alan Burlison wrote: >>>>> >>>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not >>>>>>> already being included. >>>>>> >>>>>> Cool, I'll pop that in instead - thanks! >>>>> >>>>> Done, webrev updated, jprt hotspot testset is clean. >>>>> From thomas.stuefe at gmail.com Thu Oct 20 07:22:36 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 20 Oct 2016 09:22:36 +0200 Subject: "os" - make this a real namespace? In-Reply-To: References: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com> <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com> Message-ID: On Thu, Oct 20, 2016 at 1:59 AM, David Holmes wrote: > On 19/10/2016 11:54 PM, Thomas St?fe wrote: > >> Hi David, >> >> my intent was not to attack the existing code, but to ask about the >> original design intentions and possibly come up with ideas to improve >> it. See my further answers inline. >> > > Sure - no problem. It is hard to understand the intent of the design when > the remaining code doesn't even reflect the original design intentions (nor > is there anyone around involved in that!). And there is always scope to > redo this as part of the "big OS code cleanup". > > The "exposure" concern is not one that I've heard expressed previously in > relation to the hotspot code. > > Cheers, > David > > When there is time, I may just whip up an example patch. This may be simpler than talking. I understand this would be something for java10, so it has to wait until there is at least a repo. Kind Regards, Thomas > On Wed, Oct 19, 2016 at 2:19 PM, David Holmes > > wrote: >> >> Hi Thomas, >> >> On 19/10/2016 7:07 PM, Thomas St?fe wrote: >> >> Hi David! >> >> On Wed, Oct 19, 2016 at 9:02 AM, David Holmes >> >> > >> >> wrote: >> >> Hi Thomas, >> >> On 19/10/2016 4:10 PM, Thomas St?fe wrote: >> >> Hi all, >> >> a small question. >> >> I sometimes stumble over the fact that "os" is a class, >> not a >> namespace. >> >> >> ?? AFAIK everything in hotspot is a class not a namespace - >> we don't >> use "namespaces". >> >> >> I meant that it is used as one would use a C++ namespace, not a >> class. >> As far as I see any class derived from AllStatic is actually a >> namespace >> in the sense that it serves as bracket for a number of related >> static >> functions. >> >> >> Okay ... if that is what you mean by a namespace ... I always >> thought namespaces were like packages, a level up from classes. >> >> >> I always used namespace as a common scope for declarations which belong >> together, be that classes, global functions, variables. I always thought >> this is how class "os" is used in the hotspot. >> >> I also use C++ namespace to isolate coding in large projects I do not >> own but where my symbols are, for technical reasons, visible anywhere >> but I want to avoid name clashes. A typical porter headache. E.g. we >> have a namespace "sap" in our coding, just to keep our stuff separate >> from global symbols. >> >> >> >> And that we include the platform dependent additions >> into the >> middle of this class. >> >> >> Build-time specialization. It allows for the os API to >> actually be >> different on different platforms, as opposed to just being >> implemented differently. >> >> >> The "os" API is a shared, platform independent API, so there >> should be >> no difference between platforms. There should be no need for it to >> export platform specifics - its whole intent is to hide those >> specifics. >> >> >> But that is your current design perspective of what the os API >> should be. What it is is something with a very long history and >> which has had to accommodate different things over time. At one >> point a lot of the JDK native code would call into the VM for >> functionality that is now directly implemented in JDK native code. >> There's a lot of historical baggage here. >> >> >> Looking into the various os_.hpp files, I see: >> 1) things where the declaration does differ between oses and >> therefore >> they cannot be called in shared code without #ifdefs (e.g. all >> >> subclasses). >> 2)Things where the declaration is shared but the implementation >> differs. >> Again, two cases: >> a) Either implementation is not time critical. In that case >> the >> declaration should live in os.hpp and the implementation should >> live in >> some platform dependent C++ file. >> b) Or implementation is time critical and must be inline, in >> which >> case a separate platform dependent header would be needed. >> >> For (1) and (2b), C++ namespaces would be more convenient. Now, >> you are >> forced to include the platform specific file into the class os{} >> declaration, because there can just be one: >> >> os.hpp >> class os { >> ... >> #include >> >> }; >> >> . With a namespace, you can add functions to the namespace in >> various >> disjunct places and hence could write: >> >> os.hpp >> namespace os { ... functions ... } >> >> os_xxx.hpp >> namespace os { ... functions ... } >> >> which would be more natural and >> >> >> Yes I can see that as an alternative way to expand the os API. >> Though I still prefer to group functionality in a class. >> >> >> I would argue that the advantage of namespaces here is that the special >> handling of platform specific headers is not needed anymore. Now, when >> reading any os_/ header, I need to keep in mind that the >> content of this header gets inserted into the middle of a class >> definition. That is just rather exotic and unexpected and tripped me >> over a few times already. >> >> >> >> This has a number of repercussions, like not being able to >> include the >> platform dependent files (os__) directly, not >> being able to >> >> >> I'd call that a feature - they are not intended to be >> standalone APIs. >> >> >> There are a number of useful "os" APIs which I would sometimes like to >> use without the bagage of including the whole os.hpp header. For >> instance, os::malloc(). Normally, I would forward declare them, but this >> is not possible for class functions. >> >> >> >> >> Right now os_.hpp exports the os:: api, which one may >> want to >> use separately from "os" because they expose platform dependent >> APIs >> which are conceptionally lower than the os namespace. At least >> that is >> how I always did interpret the intention behind os::. >> >> >> I wouldn't say lower - they extend the os API with platform specific >> functionality and concepts. The idea is that specific code that >> wants to use OS facilities that are specific to that access >> them through the os:: class. >> >> >> >> But actually, because the "Aix" class is part of "os", cannot be >> used >> separately and is exposed to the whole of the VM, I always avoided >> putting anything os::Aix if it could be helped. Hence, for AIX, >> we added >> porting_aix.hpp for AIX specific functions which are not to be >> used >> outside os/aix/vm. Or mostly just plain left functions to be >> file scope >> static inside os_aix.cpp. So, os::Aix was pretty useless for me >> as a porter. >> >> >> Seems you made a decision that the os::AIX class didn't meet your >> ideas as to how platform specifics should be handled and so went >> with an alternative design. Wouldn't that make it "useless" because >> you chose not to use it? >> >> >> When the AIX port started, the os interface was not well documented. Nor >> was there anyone I could ask because OpenJDK did not yet exist. So I had >> to deduce the intent of the original authors from the code and try to >> fill it with life as best as possible. >> >> >> >> forward declare functions from the "os" namespace (e.g. >> os::malloc) etc. I >> also cannot split implementations from "os" functions to >> different >> implementation files without problems. >> >> It seems to me all compiler nowadays support namespaces, >> would >> it not make sense to convert "os" to a real namespace? >> >> >> Not being a C++ aficionado I'm not sure exactly what that >> would >> entail - as far as I know we don't use C++ namespaces >> anywhere in >> hotspot. >> >> >> We start using all kinds of modern C++ features. Templates pop >> up all >> over the place. Namespaces in contrast are an old and easily >> understood >> feature. We already use it inside our own port. >> >> >> While we are at it, what is the reason for the "" sub >> classes? e.g. >> os::Bsd, os::Aix etc? It makes integrating patches >> between platforms >> difficult and, to me, does not seem to serve any clear >> purpose. >> >> >> Must admit this arrangement has also had me confused at >> times. I >> think it is way to add a per-OS helper class for the main os >> API >> implementation. >> >> If the purpose is to be a very low wrapper around OS >> particularities, it >> makes no sense to have them in the "os" namespace and to >> make >> them visible >> to the shared sections of the VM. E.g. there should be >> no reason >> to access >> "os::Bsd" functions from outside os/bsd/vm, or to access >> "os::Posix" >> functions outside implementations specific for Posix >> platforms. >> >> >> Not sure how you make, for example os::BSD accessible from all >> classes in os/bsd/vm yet not be visible anywhere else ?? >> >> >> I think there is no real reason for os::Bsd to exist at all. >> Either we >> have shared functions with platform dependent implementation, >> then they >> should be declared in "os". Or they are completely platform >> specific, >> then they can be moved to a platform specific header outside of >> "os" >> like we did with porting_aix.hpp. >> >> >> Sure you could do that. But 20 years ago that wasn't how things were >> designed and we have what we have today. As I said a lot of baggage. >> >> Personally I find the nesting of the concrete os API quite natural: >> os::win32 to me is better than unrelated os and win32 classes or >> namespaces. >> >> >> I think there is code structure, and then there is exposure. Both are >> separate things but currently interwoven in the hotspot. >> >> Putting Win32 specifics into os::win32 makes sense. But there is no >> reason to expose os::win32 to shared parts of the VM, considering how >> much the hotspot programmers at Oracle dislike platform specific #ifdefs >> in shared code. >> >> I understand that this is historical? Before namespaces, you had to >> define a class as common bracket, and a class definition must be >> complete to be valid, so os:: is automatically visible everywhere. >> But with namespaces this could be disentangled. We could have a globally >> visible "os" namespace with platform independent shared functions, as >> well as an "os::win32" namespace which is only visible in windows >> specific implementation files. >> >> Kind Regards, Thomas >> >> >> Cheers, >> David >> ----- >> >> >> Plus it also needs to potentially be visible from >> os_cpu/bsd_XXX/vm. >> >> >> There is a lot of cleanup in this area slated for the future - >> hopefully Java 10. POSIX refactoring etc. >> >> >> Sure! >> >> Kind Regards, Thomas >> >> >> Cheers, >> David >> >> >> Thanks, and Kind Regards, Thomas >> >> >> >> From thomas.stuefe at gmail.com Thu Oct 20 08:00:04 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 20 Oct 2016 10:00:04 +0200 Subject: Question about WatcherThreadCrashProtection Message-ID: Hi all, a small question.` WatcherThreadCrashProtection is a small stack object wrapping setjmp/longjmp. But I cannot find any place where WatcherThreadCrashProtection is actually used. Am I overlooking something or is this dead code? Thank you, Thomas From staffan.larsen at oracle.com Thu Oct 20 08:04:23 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 20 Oct 2016 10:04:23 +0200 Subject: Question about WatcherThreadCrashProtection In-Reply-To: References: Message-ID: <2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com> It is used in some closed code (JFR) that you aren?t seeing. ;) /Staffan > On 20 Oct 2016, at 10:00, Thomas St?fe wrote: > > Hi all, > > a small question.` > > WatcherThreadCrashProtection is a small stack object wrapping > setjmp/longjmp. But I cannot find any place where > WatcherThreadCrashProtection is actually used. Am I overlooking something > or is this dead code? > > Thank you, > > Thomas From thomas.stuefe at gmail.com Thu Oct 20 08:07:41 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 20 Oct 2016 10:07:41 +0200 Subject: Question about WatcherThreadCrashProtection In-Reply-To: <2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com> References: <2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com> Message-ID: :) Ok, thank you! On Thu, Oct 20, 2016 at 10:04 AM, Staffan Larsen wrote: > It is used in some closed code (JFR) that you aren?t seeing. ;) > > /Staffan > > > On 20 Oct 2016, at 10:00, Thomas St?fe wrote: > > > > Hi all, > > > > a small question.` > > > > WatcherThreadCrashProtection is a small stack object wrapping > > setjmp/longjmp. But I cannot find any place where > > WatcherThreadCrashProtection is actually used. Am I overlooking something > > or is this dead code? > > > > Thank you, > > > > Thomas > > From rickard.backman at oracle.com Thu Oct 20 08:27:03 2016 From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=) Date: Thu, 20 Oct 2016 10:27:03 +0200 Subject: "os" - make this a real namespace? In-Reply-To: References: Message-ID: <20161020082703.GA29006@rbackman> Hi Thomas, I tried something like that a couple of years ago and still think it is a good idea. Link to the discussion and patches: http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html /R On 10/19, Thomas St?fe wrote: > Hi all, > > a small question. > > I sometimes stumble over the fact that "os" is a class, not a namespace. > And that we include the platform dependent additions into the middle of > this class. > > This has a number of repercussions, like not being able to include the > platform dependent files (os__) directly, not being able to > forward declare functions from the "os" namespace (e.g. os::malloc) etc. I > also cannot split implementations from "os" functions to different > implementation files without problems. > > It seems to me all compiler nowadays support namespaces, would it not make > sense to convert "os" to a real namespace? > > While we are at it, what is the reason for the "" sub classes? e.g. > os::Bsd, os::Aix etc? It makes integrating patches between platforms > difficult and, to me, does not seem to serve any clear purpose. > > If the purpose is to be a very low wrapper around OS particularities, it > makes no sense to have them in the "os" namespace and to make them visible > to the shared sections of the VM. E.g. there should be no reason to access > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > functions outside implementations specific for Posix platforms. > > Thanks, and Kind Regards, Thomas From martin.doerr at sap.com Thu Oct 20 08:58:24 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 20 Oct 2016 08:58:24 +0000 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 In-Reply-To: <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com> References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com> Message-ID: Hi Coleen, thank you very much for reviewing my PPC change. We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important. A large amount of less optimized changes will make it slower over time. That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well. Maybe we should improve them in 10. Best regards, Martin -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore Sent: Dienstag, 18. Oktober 2016 23:56 To: hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 This seems good. I think it's a shame to change load_mirror() to load_mirror_from_const_method() though because there's load_mirror() with the same parameters on all the other platforms and it makes platform development a little easier. But that's up to you to because you can generate shorter sequences. Coleen On 10/17/16 12:38 PM, Doerr, Martin wrote: > Hi, > > I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9: > > 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed). > > 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev. > > Webrev is here: > http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/ > > Please review. > > Thanks and best regards, > Martin > From david.holmes at oracle.com Thu Oct 20 11:37:03 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 20 Oct 2016 21:37:03 +1000 Subject: "os" - make this a real namespace? In-Reply-To: <20161020082703.GA29006@rbackman> References: <20161020082703.GA29006@rbackman> Message-ID: <2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com> On 20/10/2016 6:27 PM, Rickard B?ckman wrote: > Hi Thomas, > > I tried something like that a couple of years ago and still think it is > a good idea. > > Link to the discussion and patches: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html Yeah but noone else seemed to like your os::pd approach :) Cheers, David > /R > > On 10/19, Thomas St?fe wrote: >> Hi all, >> >> a small question. >> >> I sometimes stumble over the fact that "os" is a class, not a namespace. >> And that we include the platform dependent additions into the middle of >> this class. >> >> This has a number of repercussions, like not being able to include the >> platform dependent files (os__) directly, not being able to >> forward declare functions from the "os" namespace (e.g. os::malloc) etc. I >> also cannot split implementations from "os" functions to different >> implementation files without problems. >> >> It seems to me all compiler nowadays support namespaces, would it not make >> sense to convert "os" to a real namespace? >> >> While we are at it, what is the reason for the "" sub classes? e.g. >> os::Bsd, os::Aix etc? It makes integrating patches between platforms >> difficult and, to me, does not seem to serve any clear purpose. >> >> If the purpose is to be a very low wrapper around OS particularities, it >> makes no sense to have them in the "os" namespace and to make them visible >> to the shared sections of the VM. E.g. there should be no reason to access >> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" >> functions outside implementations specific for Posix platforms. >> >> Thanks, and Kind Regards, Thomas From david.holmes at oracle.com Thu Oct 20 13:23:31 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 20 Oct 2016 23:23:31 +1000 Subject: "os" - make this a real namespace? In-Reply-To: <2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com> References: <20161020082703.GA29006@rbackman> <2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com> Message-ID: <41dfc585-c135-ecfe-5ced-81de653a166c@oracle.com> On 20/10/2016 9:37 PM, David Holmes wrote: > On 20/10/2016 6:27 PM, Rickard B?ckman wrote: >> Hi Thomas, >> >> I tried something like that a couple of years ago and still think it is >> a good idea. >> >> Link to the discussion and patches: >> >> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html > > Yeah but noone else seemed to like your os::pd approach :) Sorry that was a bit too tongue in cheek. David > Cheers, > David > >> /R >> >> On 10/19, Thomas St?fe wrote: >>> Hi all, >>> >>> a small question. >>> >>> I sometimes stumble over the fact that "os" is a class, not a namespace. >>> And that we include the platform dependent additions into the middle of >>> this class. >>> >>> This has a number of repercussions, like not being able to include the >>> platform dependent files (os__) directly, not being able to >>> forward declare functions from the "os" namespace (e.g. os::malloc) >>> etc. I >>> also cannot split implementations from "os" functions to different >>> implementation files without problems. >>> >>> It seems to me all compiler nowadays support namespaces, would it not >>> make >>> sense to convert "os" to a real namespace? >>> >>> While we are at it, what is the reason for the "" sub classes? e.g. >>> os::Bsd, os::Aix etc? It makes integrating patches between platforms >>> difficult and, to me, does not seem to serve any clear purpose. >>> >>> If the purpose is to be a very low wrapper around OS particularities, it >>> makes no sense to have them in the "os" namespace and to make them >>> visible >>> to the shared sections of the VM. E.g. there should be no reason to >>> access >>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" >>> functions outside implementations specific for Posix platforms. >>> >>> Thanks, and Kind Regards, Thomas From thomas.stuefe at gmail.com Thu Oct 20 13:36:27 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 20 Oct 2016 15:36:27 +0200 Subject: "os" - make this a real namespace? In-Reply-To: <20161020082703.GA29006@rbackman> References: <20161020082703.GA29006@rbackman> Message-ID: Hi Rickard, I definitely like some of the aspects of that patch. But like others I'm not a big fan of renaming the files - I like the current naming scheme _ just fine, I am used to it and it helps me in many places. I work both in IDEs (CDT) and on the command line with vi and grep, and having the platform in the file name makes it a easier to work with multiple platforms. I am also quite sure that having different versions of a file with the same name in some locations would bite us at some places. Kind Regards, Thomas On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman < rickard.backman at oracle.com> wrote: > Hi Thomas, > > I tried something like that a couple of years ago and still think it is > a good idea. > > Link to the discussion and patches: > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html > > /R > > On 10/19, Thomas St?fe wrote: > > Hi all, > > > > a small question. > > > > I sometimes stumble over the fact that "os" is a class, not a namespace. > > And that we include the platform dependent additions into the middle of > > this class. > > > > This has a number of repercussions, like not being able to include the > > platform dependent files (os__) directly, not being able to > > forward declare functions from the "os" namespace (e.g. os::malloc) etc. > I > > also cannot split implementations from "os" functions to different > > implementation files without problems. > > > > It seems to me all compiler nowadays support namespaces, would it not > make > > sense to convert "os" to a real namespace? > > > > While we are at it, what is the reason for the "" sub classes? e.g. > > os::Bsd, os::Aix etc? It makes integrating patches between platforms > > difficult and, to me, does not seem to serve any clear purpose. > > > > If the purpose is to be a very low wrapper around OS particularities, it > > makes no sense to have them in the "os" namespace and to make them > visible > > to the shared sections of the VM. E.g. there should be no reason to > access > > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > > functions outside implementations specific for Posix platforms. > > > > Thanks, and Kind Regards, Thomas > From daniel.daugherty at oracle.com Thu Oct 20 14:22:29 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Thu, 20 Oct 2016 08:22:29 -0600 Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2) cleanup In-Reply-To: References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com> <66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com> <5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com> <7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com> <3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com> <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com> <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com> <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com> <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com> <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com> <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com> Message-ID: <9f2ac85c-8373-4f7f-35f7-1872bccf2cc0@oracle.com> On 10/19/16 8:56 PM, Vladimir Kozlov wrote: > On 10/19/16 5:17 PM, David Holmes wrote: >> On 20/10/2016 3:43 AM, Vladimir Kozlov wrote: >>> I missed all this review fun :) >>> >>> Thank you, Alan, for cleaning this up. >>> >>> The only concern I have is removal of conditional macros. >>> >>>> I've also taken the opportunity to strip out most of the >>>> '#ifndef(FOO)' >>>> probes for the HW capability bit macros in >>>> vm_version_solaris_sparc.cpp. >>>> They are now redundant as the macros are are in the system header >>>> files >>>> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related >>>> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and >>>> AV2_SPARC_SPARC5. For those I've left the macro probes in place. >>> >>> Most likely people will try to run JDK 9 on Solaris 10. Or in some kind >>> of VM environment which may not have Solaris 11.1 headers. We have a >>> lot >>> such cases before that is why those macros were added. >> >> run or build? running should not be a problem. Building on S10 without a >> devkit has not worked for a while AFAIK. > > Ooh yes, you are right - it was build problem. Those macros were for > time when we did not use devkit yet. Just a clarification. You cannot build on S10 anymore either with or without a devkit. This is why I had to migrate my big Solaris X64 server from Solaris 10u11 -> Solaris 11.2 SRU5.5. Dan > > Everything is good then. > > Thanks, > Vladimir > >> >> David >> >>> "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is >>> fine but original code would cover more running cases. >>> >>> Sorry for rumbling. >>> >>> Regards, >>> Vladimir >>> >>> On 10/18/16 7:01 PM, David Holmes wrote: >>>> Pushed. >>>> >>>> David >>>> >>>> On 11/10/2016 11:12 AM, David Holmes wrote: >>>>> Ok. I will sponsor this once hs is open again. >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 6/10/2016 10:10 PM, Alan Burlison wrote: >>>>>> On 04/10/2016 19:37, Alan Burlison wrote: >>>>>> >>>>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow >>>>>>>> not >>>>>>>> already being included. >>>>>>> >>>>>>> Cool, I'll pop that in instead - thanks! >>>>>> >>>>>> Done, webrev updated, jprt hotspot testset is clean. >>>>>> From chris.plummer at oracle.com Thu Oct 20 20:28:09 2016 From: chris.plummer at oracle.com (Chris Plummer) Date: Thu, 20 Oct 2016 13:28:09 -0700 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 Message-ID: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Hello, Please review the following: http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ https://bugs.openjdk.java.net/browse/JDK-8166679 The fix is to partially undo the changes for JDK-8159284. There are two places where the fix for JDK-8159284 added an extra check of the validity of the entry frame, but really only the first one is appropriate since for the second one we are not in an entry frame. More details can be found near the end of the bug comments. Note I did a straight patch of the old version of the code. It could probably use some formatting and comment cleanup. I decided not to clean it up to make it easy to compare the current code with the original. I'll clean it up if you feel it would be best to. Tested by running KitchenSink more times than I can count, since that's where JDK-8159284 turned up. However, that's not proving much since I could not reproduce JDK-8159284 even without its fix in place (it also couldn't be reproduced at the time JDK-8159284 was was being investigated and fixed). For this reason I can't be 100% sure that JDK-8159284 is not being re-introduced with my changes. Also tested by running a very large set of tests trough RBT, close to what we do for PIT testing, minus product builds and a few tests that take a long time to run. Lastly, I also tested with the test case in the CR to make sure it now passes. Unforgettably it's not possible to add the test case as a jtreg test since it requires the installation of the Oracle Studio tools. thanks, Chris From david.holmes at oracle.com Fri Oct 21 03:09:17 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 21 Oct 2016 13:09:17 +1000 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Message-ID: <8e77681f-023a-5725-6361-c357edcdd19b@oracle.com> Hi Chris, On 21/10/2016 6:28 AM, Chris Plummer wrote: > Hello, > > Please review the following: > > http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ > https://bugs.openjdk.java.net/browse/JDK-8166679 > > The fix is to partially undo the changes for JDK-8159284. There are two > places where the fix for JDK-8159284 added an extra check of the > validity of the entry frame, but really only the first one is > appropriate since for the second one we are not in an entry frame. More > details can be found near the end of the bug comments. This all seems reasonable. Addressing the regression is important. If this exposes a continuing issue with the reverted code then we can look at this further. The lack of reproducability makes this a difficult area to work in. Thanks, David > Note I did a straight patch of the old version of the code. It could > probably use some formatting and comment cleanup. I decided not to clean > it up to make it easy to compare the current code with the original. > I'll clean it up if you feel it would be best to. > > Tested by running KitchenSink more times than I can count, since that's > where JDK-8159284 turned up. However, that's not proving much since I > could not reproduce JDK-8159284 even without its fix in place (it also > couldn't be reproduced at the time JDK-8159284 was was being > investigated and fixed). For this reason I can't be 100% sure that > JDK-8159284 is not being re-introduced with my changes. > > Also tested by running a very large set of tests trough RBT, close to > what we do for PIT testing, minus product builds and a few tests that > take a long time to run. > > Lastly, I also tested with the test case in the CR to make sure it now > passes. Unforgettably it's not possible to add the test case as a jtreg > test since it requires the installation of the Oracle Studio tools. > > thanks, > > Chris From rickard.backman at oracle.com Fri Oct 21 06:02:28 2016 From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=) Date: Fri, 21 Oct 2016 08:02:28 +0200 Subject: "os" - make this a real namespace? In-Reply-To: References: <20161020082703.GA29006@rbackman> Message-ID: <20161021060228.GB29006@rbackman> Yes the naming was just one try. There were multiple other ways of doing it. Other possibilities were keeping it as is, have one file named os_thread.hpp per platform that includes the os_thread_x86.hpp and just have the #include "os_thread.hpp" in files that need it... Macros *shudder*. /R On 10/20, Thomas St?fe wrote: > Hi Rickard, > > I definitely like some of the aspects of that patch. But like others I'm > not a big fan of renaming the files - I like the current naming scheme > _ just fine, I am used to it and it helps me in many places. I > work both in IDEs (CDT) and on the command line with vi and grep, and > having the platform in the file name makes it a easier to work with > multiple platforms. I am also quite sure that having different versions of > a file with the same name in some locations would bite us at some places. > > Kind Regards, Thomas > > > On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman < > rickard.backman at oracle.com> wrote: > > > Hi Thomas, > > > > I tried something like that a couple of years ago and still think it is > > a good idea. > > > > Link to the discussion and patches: > > > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html > > > > /R > > > > On 10/19, Thomas St?fe wrote: > > > Hi all, > > > > > > a small question. > > > > > > I sometimes stumble over the fact that "os" is a class, not a namespace. > > > And that we include the platform dependent additions into the middle of > > > this class. > > > > > > This has a number of repercussions, like not being able to include the > > > platform dependent files (os__) directly, not being able to > > > forward declare functions from the "os" namespace (e.g. os::malloc) etc. > > I > > > also cannot split implementations from "os" functions to different > > > implementation files without problems. > > > > > > It seems to me all compiler nowadays support namespaces, would it not > > make > > > sense to convert "os" to a real namespace? > > > > > > While we are at it, what is the reason for the "" sub classes? e.g. > > > os::Bsd, os::Aix etc? It makes integrating patches between platforms > > > difficult and, to me, does not seem to serve any clear purpose. > > > > > > If the purpose is to be a very low wrapper around OS particularities, it > > > makes no sense to have them in the "os" namespace and to make them > > visible > > > to the shared sections of the VM. E.g. there should be no reason to > > access > > > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > > > functions outside implementations specific for Posix platforms. > > > > > > Thanks, and Kind Regards, Thomas > > From david.holmes at oracle.com Fri Oct 21 06:12:29 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 21 Oct 2016 16:12:29 +1000 Subject: "os" - make this a real namespace? In-Reply-To: <20161021060228.GB29006@rbackman> References: <20161020082703.GA29006@rbackman> <20161021060228.GB29006@rbackman> Message-ID: On 21/10/2016 4:02 PM, Rickard B?ckman wrote: > Yes the naming was just one try. There were multiple other ways of doing > it. Other possibilities were keeping it as is, have one file named > os_thread.hpp per platform that includes the os_thread_x86.hpp and just > have the #include "os_thread.hpp" in files that need it... Macros > *shudder*. Note that we have already abstracted platform specific includes into macros. eg: #include OS_CPU_HEADER(os) #include OS_HEADER(os) in os.hpp. David > /R > > On 10/20, Thomas St?fe wrote: >> Hi Rickard, >> >> I definitely like some of the aspects of that patch. But like others I'm >> not a big fan of renaming the files - I like the current naming scheme >> _ just fine, I am used to it and it helps me in many places. I >> work both in IDEs (CDT) and on the command line with vi and grep, and >> having the platform in the file name makes it a easier to work with >> multiple platforms. I am also quite sure that having different versions of >> a file with the same name in some locations would bite us at some places. >> >> Kind Regards, Thomas >> >> >> On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman < >> rickard.backman at oracle.com> wrote: >> >>> Hi Thomas, >>> >>> I tried something like that a couple of years ago and still think it is >>> a good idea. >>> >>> Link to the discussion and patches: >>> >>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html >>> >>> /R >>> >>> On 10/19, Thomas St?fe wrote: >>>> Hi all, >>>> >>>> a small question. >>>> >>>> I sometimes stumble over the fact that "os" is a class, not a namespace. >>>> And that we include the platform dependent additions into the middle of >>>> this class. >>>> >>>> This has a number of repercussions, like not being able to include the >>>> platform dependent files (os__) directly, not being able to >>>> forward declare functions from the "os" namespace (e.g. os::malloc) etc. >>> I >>>> also cannot split implementations from "os" functions to different >>>> implementation files without problems. >>>> >>>> It seems to me all compiler nowadays support namespaces, would it not >>> make >>>> sense to convert "os" to a real namespace? >>>> >>>> While we are at it, what is the reason for the "" sub classes? e.g. >>>> os::Bsd, os::Aix etc? It makes integrating patches between platforms >>>> difficult and, to me, does not seem to serve any clear purpose. >>>> >>>> If the purpose is to be a very low wrapper around OS particularities, it >>>> makes no sense to have them in the "os" namespace and to make them >>> visible >>>> to the shared sections of the VM. E.g. there should be no reason to >>> access >>>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" >>>> functions outside implementations specific for Posix platforms. >>>> >>>> Thanks, and Kind Regards, Thomas >>> From goetz.lindenmaier at sap.com Fri Oct 21 06:29:13 2016 From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz) Date: Fri, 21 Oct 2016 06:29:13 +0000 Subject: "os" - make this a real namespace? In-Reply-To: <20161021060228.GB29006@rbackman> References: <20161020082703.GA29006@rbackman> <20161021060228.GB29006@rbackman> Message-ID: Hi, remember that the ugly include cascades are gone, as there is now macro #include OS_CPU_HEADER(thread) including files like thread_linux_x86.hpp. Therefore I think the need to rename files is no more that Important. Best regards, Goetz. > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev- > bounces at openjdk.java.net] On Behalf Of Rickard B?ckman > Sent: Freitag, 21. Oktober 2016 08:02 > To: Thomas St?fe > Cc: hotspot-runtime-dev at openjdk.java.net > Subject: Re: "os" - make this a real namespace? > > Yes the naming was just one try. There were multiple other ways of doing > it. Other possibilities were keeping it as is, have one file named > os_thread.hpp per platform that includes the os_thread_x86.hpp and just > have the #include "os_thread.hpp" in files that need it... Macros > *shudder*. > > /R > > On 10/20, Thomas St?fe wrote: > > Hi Rickard, > > > > I definitely like some of the aspects of that patch. But like others I'm > > not a big fan of renaming the files - I like the current naming scheme > > _ just fine, I am used to it and it helps me in many places. I > > work both in IDEs (CDT) and on the command line with vi and grep, and > > having the platform in the file name makes it a easier to work with > > multiple platforms. I am also quite sure that having different versions of > > a file with the same name in some locations would bite us at some places. > > > > Kind Regards, Thomas > > > > > > On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman < > > rickard.backman at oracle.com> wrote: > > > > > Hi Thomas, > > > > > > I tried something like that a couple of years ago and still think it is > > > a good idea. > > > > > > Link to the discussion and patches: > > > > > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013- > March/008884.html > > > > > > /R > > > > > > On 10/19, Thomas St?fe wrote: > > > > Hi all, > > > > > > > > a small question. > > > > > > > > I sometimes stumble over the fact that "os" is a class, not a namespace. > > > > And that we include the platform dependent additions into the middle > of > > > > this class. > > > > > > > > This has a number of repercussions, like not being able to include the > > > > platform dependent files (os__) directly, not being able to > > > > forward declare functions from the "os" namespace (e.g. os::malloc) > etc. > > > I > > > > also cannot split implementations from "os" functions to different > > > > implementation files without problems. > > > > > > > > It seems to me all compiler nowadays support namespaces, would it > not > > > make > > > > sense to convert "os" to a real namespace? > > > > > > > > While we are at it, what is the reason for the "" sub classes? e.g. > > > > os::Bsd, os::Aix etc? It makes integrating patches between platforms > > > > difficult and, to me, does not seem to serve any clear purpose. > > > > > > > > If the purpose is to be a very low wrapper around OS particularities, it > > > > makes no sense to have them in the "os" namespace and to make > them > > > visible > > > > to the shared sections of the VM. E.g. there should be no reason to > > > access > > > > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix" > > > > functions outside implementations specific for Posix platforms. > > > > > > > > Thanks, and Kind Regards, Thomas > > > From dmitry.samersoff at oracle.com Fri Oct 21 08:42:21 2016 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Fri, 21 Oct 2016 11:42:21 +0300 Subject: RFR(S): JDK-8165496 assert(_exception_caught == false) failed: _exception_caught is out of phase Message-ID: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com> Everybody, Please review a small modification of the fix for JDK-8134434: http://cr.openjdk.java.net/~dsamersoff/JDK-8165496/webrev.04/ Its' possible that we come to rethrow_C when _exception_caught is already cleared. We need not to set exception_detected in this case. -Dmitry -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From daniel.daugherty at oracle.com Fri Oct 21 14:59:47 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 21 Oct 2016 08:59:47 -0600 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Message-ID: On 10/20/16 2:28 PM, Chris Plummer wrote: > Hello, > > Please review the following: > > http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ src/cpu/aarch64/vm/frame_aarch64.cpp So we're in a "if (StubRoutines::returns_to_call_stub()" block and the assumption was that a frame that returns to a call stub must be an entry frame. Hence the use of is_entry_frame_valid(). However, your investigation revealed that you can be in an interpreter frame that returns to a call stub here. That sounds both familiar and right :-) L209: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > (address)sender.fp()); nit: please remove extra blank here: "( jcw" I like the new JavaCallWrapper sanity check. I never thought of that when I worked on AsyncGetCallTrace(). src/cpu/sparc/vm/frame_sparc.cpp old L281: if (sender.is_entry_frame()) { old L282: return sender.is_entry_frame_valid(thread); old L283: } I don't understand this one. Why isn't is_entry_frame_valid() correct here? You are in a "if (sender.is_entry_frame())" block. I can see wanting to add the JavaCallWrapper sanity check as an additional check. If you do that: L286 bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw > sender_fp); nit: please remove extra blank here: "( jcw" src/cpu/x86/vm/frame_x86.cpp Again we're in a if (StubRoutines::returns_to_call_stub()" block so I see why is_entry_frame_valid() is not the right call. L208: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > (address)sender.fp()); nit: please remove extra blank here: "( jcw" OK so I understand the AARCH64 and X86 changes. I don't quite understand the SPARC change... but I can be convinced otherwise. If you fix the nits, I don't need to see a new webrev. Dan > https://bugs.openjdk.java.net/browse/JDK-8166679 > > The fix is to partially undo the changes for JDK-8159284. There are > two places where the fix for JDK-8159284 added an extra check of the > validity of the entry frame, but really only the first one is > appropriate since for the second one we are not in an entry frame. > More details can be found near the end of the bug comments. > > Note I did a straight patch of the old version of the code. It could > probably use some formatting and comment cleanup. I decided not to > clean it up to make it easy to compare the current code with the > original. I'll clean it up if you feel it would be best to. > > Tested by running KitchenSink more times than I can count, since > that's where JDK-8159284 turned up. However, that's not proving much > since I could not reproduce JDK-8159284 even without its fix in place > (it also couldn't be reproduced at the time JDK-8159284 was was being > investigated and fixed). For this reason I can't be 100% sure that > JDK-8159284 is not being re-introduced with my changes. > > Also tested by running a very large set of tests trough RBT, close to > what we do for PIT testing, minus product builds and a few tests that > take a long time to run. > > Lastly, I also tested with the test case in the CR to make sure it now > passes. Unforgettably it's not possible to add the test case as a > jtreg test since it requires the installation of the Oracle Studio tools. > > thanks, > > Chris From chris.plummer at oracle.com Fri Oct 21 19:13:09 2016 From: chris.plummer at oracle.com (Chris Plummer) Date: Fri, 21 Oct 2016 12:13:09 -0700 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Message-ID: Hi Dan, Thanks for the review. Comments inline below: On 10/21/16 7:59 AM, Daniel D. Daugherty wrote: > On 10/20/16 2:28 PM, Chris Plummer wrote: >> Hello, >> >> Please review the following: >> >> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ > > src/cpu/aarch64/vm/frame_aarch64.cpp > So we're in a "if (StubRoutines::returns_to_call_stub()" block > and the assumption was that a frame that returns to a call stub > must be an entry frame. Hence the use of is_entry_frame_valid(). > However, your investigation revealed that you can be in an > interpreter frame that returns to a call stub here. That sounds > both familiar and right :-) > > L209: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > > (address)sender.fp()); > nit: please remove extra blank here: "( jcw" Ok. > > I like the new JavaCallWrapper sanity check. I never thought of > that when I worked on AsyncGetCallTrace(). > > src/cpu/sparc/vm/frame_sparc.cpp > old L281: if (sender.is_entry_frame()) { > old L282: return sender.is_entry_frame_valid(thread); > old L283: } > I don't understand this one. Why isn't is_entry_frame_valid() > correct here? You are in a "if (sender.is_entry_frame())" block. I starred at this one a bit too, since the code is not quite the same as x86 and aarch64. I'm not 100% sure I got it right, so I opted to just change it to what used to be there, especially since 8159284 never turned up on sparc. I did try to go down the path of making sure that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, but getting Dev Studio installed on a Solaris-sparc machine was proving difficult. Maybe I should take another stab at that. As for the similarities and differences between the sparc code an x86, for x86 before my changes we had: if (StubRoutines::returns_to_call_stub(sender_pc)) { ... frame sender(sender_sp, sender_unextended_sp, saved_fp, sender_pc); return sender.is_entry_frame_valid(thread); } And for sparc: frame sender(_SENDER_SP, younger_sp, adjusted_stack); if (sender.is_entry_frame()) { return sender.is_entry_frame_valid(thread); } So for x86 we are only adding the sender.is_entry_frame_valid() check if the "current frame" returns to a stub, but for sparc we are doing the check if the "sender frame" is an entry frame. I don't know the reason for this difference. Aren't stubs entry frames? If yes, it seem that having the check done in this way would cause this CR on sparc just like it does on sparc. > > I can see wanting to add the JavaCallWrapper sanity check as > an additional check. If you do that: > > L286 bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw > > sender_fp); > nit: please remove extra blank here: "( jcw" Ok. > > src/cpu/x86/vm/frame_x86.cpp > Again we're in a if (StubRoutines::returns_to_call_stub()" block > so I see why is_entry_frame_valid() is not the right call. > > L208: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > > (address)sender.fp()); > nit: please remove extra blank here: "( jcw" Ok. > > > OK so I understand the AARCH64 and X86 changes. I don't quite > understand the SPARC change... but I can be convinced otherwise. Ok. Let me know what you think now after a bit more explanation. I can put some more effort into trying out the test case on sprarc if needed. thanks, Chris > > If you fix the nits, I don't need to see a new webrev. > > Dan > > >> https://bugs.openjdk.java.net/browse/JDK-8166679 >> >> The fix is to partially undo the changes for JDK-8159284. There are >> two places where the fix for JDK-8159284 added an extra check of the >> validity of the entry frame, but really only the first one is >> appropriate since for the second one we are not in an entry frame. >> More details can be found near the end of the bug comments. >> >> Note I did a straight patch of the old version of the code. It could >> probably use some formatting and comment cleanup. I decided not to >> clean it up to make it easy to compare the current code with the >> original. I'll clean it up if you feel it would be best to. >> >> Tested by running KitchenSink more times than I can count, since >> that's where JDK-8159284 turned up. However, that's not proving much >> since I could not reproduce JDK-8159284 even without its fix in place >> (it also couldn't be reproduced at the time JDK-8159284 was was being >> investigated and fixed). For this reason I can't be 100% sure that >> JDK-8159284 is not being re-introduced with my changes. >> >> Also tested by running a very large set of tests trough RBT, close to >> what we do for PIT testing, minus product builds and a few tests that >> take a long time to run. >> >> Lastly, I also tested with the test case in the CR to make sure it >> now passes. Unforgettably it's not possible to add the test case as a >> jtreg test since it requires the installation of the Oracle Studio >> tools. >> >> thanks, >> >> Chris > From coleen.phillimore at oracle.com Fri Oct 21 19:42:57 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Fri, 21 Oct 2016 15:42:57 -0400 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Message-ID: <9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com> Chris, This change looks good. Thank you for the analysis and fixing the regression. On 10/21/16 3:13 PM, Chris Plummer wrote: > Hi Dan, > > Thanks for the review. Comments inline below: > > On 10/21/16 7:59 AM, Daniel D. Daugherty wrote: >> On 10/20/16 2:28 PM, Chris Plummer wrote: >>> Hello, >>> >>> Please review the following: >>> >>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ >> >> src/cpu/aarch64/vm/frame_aarch64.cpp >> So we're in a "if (StubRoutines::returns_to_call_stub()" block >> and the assumption was that a frame that returns to a call stub >> must be an entry frame. Hence the use of is_entry_frame_valid(). >> However, your investigation revealed that you can be in an >> interpreter frame that returns to a call stub here. That sounds >> both familiar and right :-) >> >> L209: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw >> > (address)sender.fp()); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> I like the new JavaCallWrapper sanity check. I never thought of >> that when I worked on AsyncGetCallTrace(). >> >> src/cpu/sparc/vm/frame_sparc.cpp >> old L281: if (sender.is_entry_frame()) { >> old L282: return sender.is_entry_frame_valid(thread); >> old L283: } >> I don't understand this one. Why isn't is_entry_frame_valid() >> correct here? You are in a "if (sender.is_entry_frame())" block. > I starred at this one a bit too, since the code is not quite the same > as x86 and aarch64. I'm not 100% sure I got it right, so I opted to > just change it to what used to be there, especially since 8159284 > never turned up on sparc. I did try to go down the path of making sure > that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, but > getting Dev Studio installed on a Solaris-sparc machine was proving > difficult. Maybe I should take another stab at that. > > As for the similarities and differences between the sparc code an x86, > for x86 before my changes we had: > > if (StubRoutines::returns_to_call_stub(sender_pc)) { > ... > frame sender(sender_sp, sender_unextended_sp, saved_fp, > sender_pc); > return sender.is_entry_frame_valid(thread); > } > > And for sparc: > > frame sender(_SENDER_SP, younger_sp, adjusted_stack); > if (sender.is_entry_frame()) { > return sender.is_entry_frame_valid(thread); > } > > So for x86 we are only adding the sender.is_entry_frame_valid() check > if the "current frame" returns to a stub, but for sparc we are doing > the check if the "sender frame" is an entry frame. I don't know the > reason for this difference. Aren't stubs entry frames? If yes, it seem > that having the check done in this way would cause this CR on sparc > just like it does on sparc. I looked at this too and decided the platforms were equivalent, only coded differently. On sparc we create a sender frame, and x86 we look at sender_pc before creating a sender frame. And is_entry_frame is: inline bool frame::is_entry_frame() const { return StubRoutines::returns_to_call_stub(pc()); } Thanks, Coleen >> >> I can see wanting to add the JavaCallWrapper sanity check as >> an additional check. If you do that: >> >> L286 bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw >> > sender_fp); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> src/cpu/x86/vm/frame_x86.cpp >> Again we're in a if (StubRoutines::returns_to_call_stub()" block >> so I see why is_entry_frame_valid() is not the right call. >> >> L208: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw >> > (address)sender.fp()); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> >> OK so I understand the AARCH64 and X86 changes. I don't quite >> understand the SPARC change... but I can be convinced otherwise. > Ok. Let me know what you think now after a bit more explanation. I can > put some more effort into trying out the test case on sprarc if needed. > > thanks, > > Chris >> >> If you fix the nits, I don't need to see a new webrev. >> >> Dan >> >> >>> https://bugs.openjdk.java.net/browse/JDK-8166679 >>> >>> The fix is to partially undo the changes for JDK-8159284. There are >>> two places where the fix for JDK-8159284 added an extra check of the >>> validity of the entry frame, but really only the first one is >>> appropriate since for the second one we are not in an entry frame. >>> More details can be found near the end of the bug comments. >>> >>> Note I did a straight patch of the old version of the code. It could >>> probably use some formatting and comment cleanup. I decided not to >>> clean it up to make it easy to compare the current code with the >>> original. I'll clean it up if you feel it would be best to. >>> >>> Tested by running KitchenSink more times than I can count, since >>> that's where JDK-8159284 turned up. However, that's not proving much >>> since I could not reproduce JDK-8159284 even without its fix in >>> place (it also couldn't be reproduced at the time JDK-8159284 was >>> was being investigated and fixed). For this reason I can't be 100% >>> sure that JDK-8159284 is not being re-introduced with my changes. >>> >>> Also tested by running a very large set of tests trough RBT, close >>> to what we do for PIT testing, minus product builds and a few tests >>> that take a long time to run. >>> >>> Lastly, I also tested with the test case in the CR to make sure it >>> now passes. Unforgettably it's not possible to add the test case as >>> a jtreg test since it requires the installation of the Oracle Studio >>> tools. >>> >>> thanks, >>> >>> Chris >> > From daniel.daugherty at oracle.com Fri Oct 21 22:22:27 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 21 Oct 2016 16:22:27 -0600 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> Message-ID: <637e7454-ec9a-caa9-483c-cc7818a4ba89@oracle.com> > Ok. Let me know what you think now after a bit more explanation. I'm good with it. Thumbs up! Dan On 10/21/16 1:13 PM, Chris Plummer wrote: > Hi Dan, > > Thanks for the review. Comments inline below: > > On 10/21/16 7:59 AM, Daniel D. Daugherty wrote: >> On 10/20/16 2:28 PM, Chris Plummer wrote: >>> Hello, >>> >>> Please review the following: >>> >>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ >> >> src/cpu/aarch64/vm/frame_aarch64.cpp >> So we're in a "if (StubRoutines::returns_to_call_stub()" block >> and the assumption was that a frame that returns to a call stub >> must be an entry frame. Hence the use of is_entry_frame_valid(). >> However, your investigation revealed that you can be in an >> interpreter frame that returns to a call stub here. That sounds >> both familiar and right :-) >> >> L209: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw >> > (address)sender.fp()); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> I like the new JavaCallWrapper sanity check. I never thought of >> that when I worked on AsyncGetCallTrace(). >> >> src/cpu/sparc/vm/frame_sparc.cpp >> old L281: if (sender.is_entry_frame()) { >> old L282: return sender.is_entry_frame_valid(thread); >> old L283: } >> I don't understand this one. Why isn't is_entry_frame_valid() >> correct here? You are in a "if (sender.is_entry_frame())" block. > I starred at this one a bit too, since the code is not quite the same > as x86 and aarch64. I'm not 100% sure I got it right, so I opted to > just change it to what used to be there, especially since 8159284 > never turned up on sparc. I did try to go down the path of making sure > that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, but > getting Dev Studio installed on a Solaris-sparc machine was proving > difficult. Maybe I should take another stab at that. > > As for the similarities and differences between the sparc code an x86, > for x86 before my changes we had: > > if (StubRoutines::returns_to_call_stub(sender_pc)) { > ... > frame sender(sender_sp, sender_unextended_sp, saved_fp, > sender_pc); > return sender.is_entry_frame_valid(thread); > } > > And for sparc: > > frame sender(_SENDER_SP, younger_sp, adjusted_stack); > if (sender.is_entry_frame()) { > return sender.is_entry_frame_valid(thread); > } > > So for x86 we are only adding the sender.is_entry_frame_valid() check > if the "current frame" returns to a stub, but for sparc we are doing > the check if the "sender frame" is an entry frame. I don't know the > reason for this difference. Aren't stubs entry frames? If yes, it seem > that having the check done in this way would cause this CR on sparc > just like it does on sparc. >> >> I can see wanting to add the JavaCallWrapper sanity check as >> an additional check. If you do that: >> >> L286 bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw >> > sender_fp); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> src/cpu/x86/vm/frame_x86.cpp >> Again we're in a if (StubRoutines::returns_to_call_stub()" block >> so I see why is_entry_frame_valid() is not the right call. >> >> L208: bool jcw_safe = (jcw < thread->stack_base()) && ( jcw >> > (address)sender.fp()); >> nit: please remove extra blank here: "( jcw" > Ok. >> >> >> OK so I understand the AARCH64 and X86 changes. I don't quite >> understand the SPARC change... but I can be convinced otherwise. > Ok. Let me know what you think now after a bit more explanation. I can > put some more effort into trying out the test case on sprarc if needed. > > thanks, > > Chris >> >> If you fix the nits, I don't need to see a new webrev. >> >> Dan >> >> >>> https://bugs.openjdk.java.net/browse/JDK-8166679 >>> >>> The fix is to partially undo the changes for JDK-8159284. There are >>> two places where the fix for JDK-8159284 added an extra check of the >>> validity of the entry frame, but really only the first one is >>> appropriate since for the second one we are not in an entry frame. >>> More details can be found near the end of the bug comments. >>> >>> Note I did a straight patch of the old version of the code. It could >>> probably use some formatting and comment cleanup. I decided not to >>> clean it up to make it easy to compare the current code with the >>> original. I'll clean it up if you feel it would be best to. >>> >>> Tested by running KitchenSink more times than I can count, since >>> that's where JDK-8159284 turned up. However, that's not proving much >>> since I could not reproduce JDK-8159284 even without its fix in >>> place (it also couldn't be reproduced at the time JDK-8159284 was >>> was being investigated and fixed). For this reason I can't be 100% >>> sure that JDK-8159284 is not being re-introduced with my changes. >>> >>> Also tested by running a very large set of tests trough RBT, close >>> to what we do for PIT testing, minus product builds and a few tests >>> that take a long time to run. >>> >>> Lastly, I also tested with the test case in the CR to make sure it >>> now passes. Unforgettably it's not possible to add the test case as >>> a jtreg test since it requires the installation of the Oracle Studio >>> tools. >>> >>> thanks, >>> >>> Chris >> > From daniel.daugherty at oracle.com Fri Oct 21 22:27:35 2016 From: daniel.daugherty at oracle.com (Daniel D. Daugherty) Date: Fri, 21 Oct 2016 16:27:35 -0600 Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name with starting with Java 9 b133 In-Reply-To: <9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com> References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com> <9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com> Message-ID: On 10/21/16 1:42 PM, Coleen Phillimore wrote: > > Chris, > > This change looks good. Thank you for the analysis and fixing the > regression. > > On 10/21/16 3:13 PM, Chris Plummer wrote: >> Hi Dan, >> >> Thanks for the review. Comments inline below: >> >> On 10/21/16 7:59 AM, Daniel D. Daugherty wrote: >>> On 10/20/16 2:28 PM, Chris Plummer wrote: >>>> Hello, >>>> >>>> Please review the following: >>>> >>>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ >>>> >>> >>> src/cpu/aarch64/vm/frame_aarch64.cpp >>> So we're in a "if (StubRoutines::returns_to_call_stub()" block >>> and the assumption was that a frame that returns to a call stub >>> must be an entry frame. Hence the use of is_entry_frame_valid(). >>> However, your investigation revealed that you can be in an >>> interpreter frame that returns to a call stub here. That sounds >>> both familiar and right :-) >>> >>> L209: bool jcw_safe = (jcw < thread->stack_base()) && ( >>> jcw > (address)sender.fp()); >>> nit: please remove extra blank here: "( jcw" >> Ok. >>> >>> I like the new JavaCallWrapper sanity check. I never thought of >>> that when I worked on AsyncGetCallTrace(). >>> >>> src/cpu/sparc/vm/frame_sparc.cpp >>> old L281: if (sender.is_entry_frame()) { >>> old L282: return sender.is_entry_frame_valid(thread); >>> old L283: } >>> I don't understand this one. Why isn't is_entry_frame_valid() >>> correct here? You are in a "if (sender.is_entry_frame())" >>> block. >> I starred at this one a bit too, since the code is not quite the same >> as x86 and aarch64. I'm not 100% sure I got it right, so I opted to >> just change it to what used to be there, especially since 8159284 >> never turned up on sparc. I did try to go down the path of making >> sure that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, >> but getting Dev Studio installed on a Solaris-sparc machine was >> proving difficult. Maybe I should take another stab at that. >> >> As for the similarities and differences between the sparc code an >> x86, for x86 before my changes we had: >> >> if (StubRoutines::returns_to_call_stub(sender_pc)) { >> ... >> frame sender(sender_sp, sender_unextended_sp, saved_fp, >> sender_pc); >> return sender.is_entry_frame_valid(thread); >> } >> >> And for sparc: >> >> frame sender(_SENDER_SP, younger_sp, adjusted_stack); >> if (sender.is_entry_frame()) { >> return sender.is_entry_frame_valid(thread); >> } >> >> So for x86 we are only adding the sender.is_entry_frame_valid() check >> if the "current frame" returns to a stub, but for sparc we are doing >> the check if the "sender frame" is an entry frame. I don't know the >> reason for this difference. Aren't stubs entry frames? If yes, it >> seem that having the check done in this way would cause this CR on >> sparc just like it does on sparc. > > I looked at this too and decided the platforms were equivalent, only > coded differently. On sparc we create a sender frame, and x86 we > look at sender_pc before creating a sender frame. And is_entry_frame is: > > inline bool frame::is_entry_frame() const { > return StubRoutines::returns_to_call_stub(pc()); > } That just makes this even more "interesting". So what we're saying here is that for either form of the question: Is this an entry frame? we cannot call is_entry_frame_valid() because that function will sometimes return false when the "entry frame" is also an interpreter frame... Are we possibly fixing this in the wrong place? Dunno. It's Friday afternoon and maybe I'm just too fried to think this one through... Dan > > Thanks, > Coleen > >>> >>> I can see wanting to add the JavaCallWrapper sanity check as >>> an additional check. If you do that: >>> >>> L286 bool jcw_safe = (jcw <= thread->stack_base()) && ( >>> jcw > sender_fp); >>> nit: please remove extra blank here: "( jcw" >> Ok. >>> >>> src/cpu/x86/vm/frame_x86.cpp >>> Again we're in a if (StubRoutines::returns_to_call_stub()" block >>> so I see why is_entry_frame_valid() is not the right call. >>> >>> L208: bool jcw_safe = (jcw < thread->stack_base()) && ( >>> jcw > (address)sender.fp()); >>> nit: please remove extra blank here: "( jcw" >> Ok. >>> >>> >>> OK so I understand the AARCH64 and X86 changes. I don't quite >>> understand the SPARC change... but I can be convinced otherwise. >> Ok. Let me know what you think now after a bit more explanation. I >> can put some more effort into trying out the test case on sprarc if >> needed. >> >> thanks, >> >> Chris >>> >>> If you fix the nits, I don't need to see a new webrev. >>> >>> Dan >>> >>> >>>> https://bugs.openjdk.java.net/browse/JDK-8166679 >>>> >>>> The fix is to partially undo the changes for JDK-8159284. There are >>>> two places where the fix for JDK-8159284 added an extra check of >>>> the validity of the entry frame, but really only the first one is >>>> appropriate since for the second one we are not in an entry frame. >>>> More details can be found near the end of the bug comments. >>>> >>>> Note I did a straight patch of the old version of the code. It >>>> could probably use some formatting and comment cleanup. I decided >>>> not to clean it up to make it easy to compare the current code with >>>> the original. I'll clean it up if you feel it would be best to. >>>> >>>> Tested by running KitchenSink more times than I can count, since >>>> that's where JDK-8159284 turned up. However, that's not proving >>>> much since I could not reproduce JDK-8159284 even without its fix >>>> in place (it also couldn't be reproduced at the time JDK-8159284 >>>> was was being investigated and fixed). For this reason I can't be >>>> 100% sure that JDK-8159284 is not being re-introduced with my changes. >>>> >>>> Also tested by running a very large set of tests trough RBT, close >>>> to what we do for PIT testing, minus product builds and a few tests >>>> that take a long time to run. >>>> >>>> Lastly, I also tested with the test case in the CR to make sure it >>>> now passes. Unforgettably it's not possible to add the test case as >>>> a jtreg test since it requires the installation of the Oracle >>>> Studio tools. >>>> >>>> thanks, >>>> >>>> Chris >>> >> > From thomas.stuefe at gmail.com Mon Oct 24 13:12:13 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 24 Oct 2016 15:12:13 +0200 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 Message-ID: Dear all, please check this tiny bug fix. Bug report: https://bugs.openjdk.java.net/browse/JDK-8168542 Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/ In short, this fixes a corner case for os::realloc() which currently returns NULL if input size is zero. But as we have coding which interprets a return value of NULL as OOM (See ReallocateHeap()), this is not a good solution. It is also inconsistent with how os::malloc() deals with the same situation and potentially with the way the native C-Runtime deals with it (currently, in a debug build we will return NULL in case of size=0 whereas in the release build we just call the native ::realloc() and return whatever it returns.) Thank you, Thomas From thomas.stuefe at gmail.com Mon Oct 24 13:39:43 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Mon, 24 Oct 2016 15:39:43 +0200 Subject: PING: Enhancement Proposal: Reduce metaspace waste by dynamically merging and splitting metaspace chunks. Message-ID: (crossposting to runtime-dev in the hope of getting more interest) Hi all, Please take a look at this proposed JEP. https://bugs.openjdk.java.net/browse/JDK-8166690 The JEP proposes an improved allocator for metaspace. That allocator reduces metaspace wastage for certain corner cases by a lot. We at SAP have already an existing implementation for this proposal, but currently only in our internal code base, not in the OpenJDK. It works fine. I can provide a prototype based on openjdk 9 to look at and play with, but would like to know whether there is any interest before investing the work. Thank you! and Kind Regards, Thomas On Tue, Sep 27, 2016 at 10:45 AM, Thomas St?fe wrote: > Dear all, > > please take a look at this Enhancement Proposal for the metaspace > allocator. I hope these are the right groups for this discussion. > > https://bugs.openjdk.java.net/browse/JDK-8166690 > > Background: > > We at SAP see at times at customer installations OOMs in Metaspace > (usually, with compressed class pointers enabled, in Compressed Class > Space). The VM attempts to allocate metaspace and fails, hitting the > CompressedClassSpaceSize limit. Note that we usually set the limit lower > than the default, typically at 256M. > > When analyzing, we observed that a large part of the metaspace is indeed > free but "locked in" into metaspace chunks of the wrong size: often we > would find a lot of free small chunks, but the allocation request was for > medium chunks, and failed. > > The reason was that if at some point in time a lot of class loaders were > alive, each with only a few small classes loaded. This would lead to the > metaspace being swamped with lots of small chunks. This is because each > SpaceManager first allocates small chunks, only after a certain amount of > allocation requests switches to larger chunks. > > These small chunks are free and wait in the freelist, but cannot be reused > for allocation requests which require larger chunks, even if they are > physically adjacent in the virtual space. > > We (at SAP) added a patch which allows on-the-fly metaspace chunk merging > - to merge multiple adjacent smaller chunk to form a larger chunk. This, in > combination with the reverse direction - splitting a large chunk to get > smaller chunks - partly negates the "chunks-are-locked-in-into-their-size" > limitation and provides for better reuse of metaspace chunks. It also > provides better defragmentation as well. > > I discussed this fix off-list with Coleen Phillimore and Jon Masamitsu, > and instead of just offering this as a fix, both recommended to open a JEP > for this, because its scope would be beyond that of a simple fix. > > So here is my first JEP :) I hope it follows the right form. Please, if > you have time, take a look and tell us what you think. > > Thank you, and Kind Regards, > > Thomas St?fe > > > > From rachel.protacio at oracle.com Mon Oct 24 20:17:44 2016 From: rachel.protacio at oracle.com (Rachel Protacio) Date: Mon, 24 Oct 2016 16:17:44 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" Message-ID: Hi, Please review this small fix, which removes two nested ResourceMark's that were causing problems with defaultmethods logging. Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ Tested with JPRT. Thanks! Rachel From max.ockner at oracle.com Tue Oct 25 01:11:52 2016 From: max.ockner at oracle.com (Max Ockner) Date: Mon, 24 Oct 2016 21:11:52 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: References: Message-ID: <580EB158.7020509@oracle.com> Rachel, Did you mean to remove both ResourceMarks? (I suppose if it passes JPRT then it might not matter) Max On 10/24/2016 4:17 PM, Rachel Protacio wrote: > Hi, > > Please review this small fix, which removes two nested ResourceMark's > that were causing problems with defaultmethods logging. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 > Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ > > Tested with JPRT. > > Thanks! > Rachel From david.holmes at oracle.com Tue Oct 25 04:22:22 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Oct 2016 14:22:22 +1000 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: Hi Thomas, On 24/10/2016 11:12 PM, Thomas St?fe wrote: > Dear all, > > please check this tiny bug fix. > > Bug report: > https://bugs.openjdk.java.net/browse/JDK-8168542 > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/ > > In short, this fixes a corner case for os::realloc() which currently > returns NULL if input size is zero. > > But as we have coding which interprets a return value of NULL as OOM (See > ReallocateHeap()), this is not a good solution. It is also inconsistent > with how os::malloc() deals with the same situation and potentially with > the way the native C-Runtime deals with it (currently, in a debug build we > will return NULL in case of size=0 whereas in the release build we just > call the native ::realloc() and return whatever it returns.) Sorry but I do not like this. A native realloc with a size of zero and a non-NULL ptr acts like free(ptr). Our realloc does not do that. A native malloc that receives a size of zero "returns either NULL, or a unique pointer value that can later be successfully passed to free()". Our os::malloc returns 1 - and I see nothing that indicates that can successfully be passed to os::free. So while the current handling of size==0 is a bit inconsistent and unclear, it is even less clear that returning 1 is a reasonable thing to do. To me passing a size of zero (unless expecting it to act like a free!) is a bug that should be handled in the caller. I welcome opinions from others on this. David PS. I will be traveling soon and unable to respond to emails until Wednesday afternoon at the earliest. > Thank you, > > Thomas > From david.holmes at oracle.com Tue Oct 25 05:41:36 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Oct 2016 15:41:36 +1000 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: References: Message-ID: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> Hi Rachel, On 25/10/2016 6:17 AM, Rachel Protacio wrote: > Hi, > > Please review this small fix, which removes two nested ResourceMark's > that were causing problems with defaultmethods logging. > > Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 > Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ > > Tested with JPRT. It is tricky to determine who has responsibility for positioning the ResourceMarks. Looking at this call chain it initially appeared to me that we now had a missing RM for the code at line #80: 813 slot->print_on(logstream); => 590 void print_on(outputStream* str) const { 591 print_slot(str, name(), signature()); 592 } => 79 static void print_slot(outputStream* str, Symbol* name, Symbol* signature) { 80 str->print("%s%s", name->as_C_string(), signature->as_C_string()); 81 } but we actually have a RM higher up at: 787 ResourceMark rm(THREAD); so that is good, but then we also have a nested ResourceMark further down: 795 if (log_is_enabled(Debug, defaultmethods)) { 796 ResourceMark rm; I must admit I'm unclear if ResourceMarks should never be nested, or should be nested "carefully" - and if the latter exactly what that means and how to recognize it. Thanks, David ----- > Thanks! > Rachel From david.holmes at oracle.com Tue Oct 25 05:46:21 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Oct 2016 15:46:21 +1000 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com> References: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com> Message-ID: <776ac549-77cb-3ac1-69f4-b356f8631019@oracle.com> On 18/10/2016 5:16 PM, David Holmes wrote: > Hi Thomas, > > I took an initial look but am still mulling over things. Sorry Thomas haven't had a chance to get back to this. Hard to find time for future features/enhancements at the moment. :) Others should feel free to chime in on this. :) David > Note that as an enhancement this will need to wait for Java 10 repos to > open - unless you go through the FC extension process. > > Thanks, > David > > On 18/10/2016 4:22 PM, Thomas St?fe wrote: >> Ping. >> >> On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe >> wrote: >> >>> Dear all, >>> >>> please take a look at the following fix: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944- >>> Hanging-Error-Reporting/webrev.00/webrev/index.html >>> >>> --- >>> >>> In short, this fix provides the ability to cancel hanging error >>> reporting >>> steps. This uses the same code paths secondary error handling uses >>> during >>> error reporting. With this patch, steps which take too long will be >>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look >>> like this: >>> >>> 4 [timeout occurred during error reporting in step ""] after >>> xxxx ms. >>> 5 >>> >>> and we now also get a finish message in the hs-err file if we hit the >>> ErrorLogTimeout and error reporting will stop altogether: >>> >>> 6 ------ Timout during error reporting after xxx ms. ------ >>> >>> (in addition to the "time expired, abort" message the WatcherThread >>> writes >>> to stderr) >>> >>> --- >>> >>> This is something which bugged us for a long time, because we rely >>> heavily >>> on the hs_err files for error analysis at customer sites, and there >>> are a >>> number of reasons why one step may hang and prevent the follow-up steps >>> from running. >>> >>> It works like this: >>> >>> Before, when error reporting started, the WatcherThread was waiting for >>> ErrorLogTimeout seconds, then would stop the VM. >>> >>> Now, the WatcherThread periodically pings error reporting, which >>> checks if >>> the last step did timeout. If it does, it sends a signal to the >>> reporting >>> thread, and the thread will continue with the next step. This follows >>> the >>> same path as secondary crash handling. >>> >>> Some implementation details: >>> >>> On Posix platforms, to interrupt the thread, I use pthread_kill. This >>> means I must know the pthread id of the reporting thread, which I now >>> store >>> at the beginning of error reporting. We already store the reporting >>> thread >>> id in first_error_tid, but that I cannot use, because it gets set by >>> os::current_thread_id(), which is not always the pthread id. Should >>> we ever >>> switch to only using pthread id for posix platforms, this coding can be >>> simplified. >>> >>> On Windows, there is unfortunately no easy way to interrupt a >>> non-cooperative thread. I would need a way to cause a SEH inside the >>> target >>> thread, which then would get handled by secondary error handling like on >>> Posix platforms, but that is not easy. It is doable - one can suspend >>> the >>> thread, modify the thread context in a way that it will crash upon >>> resume. >>> But that felt a bit heavyweight for this problem. So on windows, timeout >>> handling still works (after ErrorLogTimeout the VM gets shut down), but >>> error reporting steps are not interruptable. If we feel this is >>> important, >>> this can be added later. >>> >>> Kind Regards, Thomas >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> From thomas.stuefe at gmail.com Tue Oct 25 05:50:32 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Tue, 25 Oct 2016 07:50:32 +0200 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: Hi David, On Tue, Oct 25, 2016 at 6:22 AM, David Holmes wrote: > Hi Thomas, > > On 24/10/2016 11:12 PM, Thomas St?fe wrote: > >> Dear all, >> >> please check this tiny bug fix. >> >> Bug report: >> https://bugs.openjdk.java.net/browse/JDK-8168542 >> >> Webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo >> c_size_0/webrev.00/webrev/ >> >> In short, this fixes a corner case for os::realloc() which currently >> returns NULL if input size is zero. >> >> But as we have coding which interprets a return value of NULL as OOM (See >> ReallocateHeap()), this is not a good solution. It is also inconsistent >> with how os::malloc() deals with the same situation and potentially with >> the way the native C-Runtime deals with it (currently, in a debug build we >> will return NULL in case of size=0 whereas in the release build we just >> call the native ::realloc() and return whatever it returns.) >> > > Sorry but I do not like this. A native realloc with a size of zero and a > non-NULL ptr acts like free(ptr). Our realloc does not do that. A native > malloc that receives a size of zero "returns either NULL, or a unique > pointer value that can later be successfully passed to free()". Our > os::malloc returns 1 - and I see nothing that indicates that can > successfully be passed to os::free. > > So while the current handling of size==0 is a bit inconsistent and > unclear, it is even less clear that returning 1 is a reasonable thing to do. > > To me passing a size of zero (unless expecting it to act like a free!) is > a bug that should be handled in the caller. > > You completely lost me here. I do not return 1, neither does os::malloc(). os::realloc behaviour now is: - in debug: if size==0, do not free but return NULL immediately. Which is not a standard behaviour of a normal ::realloc() - in release builds: if size==0, do whatever the C-Runtime realloc() does, so it will always free() but either return NULL or a unique pointer. So, in debug build the behaviour is unexpected and - assuming os::realloc() mimicks ::realloc() - wrong. In release builds it will be correct but unknown. My patch changes this behaviour to always - in both release and debug builds - free() and return a unique pointer. By setting the size to 1: - for the debug build, I will go thru the normal path - allocating 1 byte of memory (plus NMT/guard pages overhead), copying 1 byte of payload, then freeing the original memory and returning the alloced 1 byte - which is unique and can be passed to os::free(). - for the release build I will force the behaviour to be a realloc to size 1 and thus remove the ambiguity introduced by the native realloc. > I welcome opinions from others on this. > > David > > PS. I will be traveling soon and unable to respond to emails until > Wednesday afternoon at the earliest. > > Thank you, >> >> Thomas >> >> Kind Regards, Thomas From david.holmes at oracle.com Tue Oct 25 05:55:36 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 25 Oct 2016 15:55:36 +1000 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: <7241ac80-0d65-bd4e-2908-59cbe3e65ab5@oracle.com> On 25/10/2016 3:50 PM, Thomas St?fe wrote: > Hi David, > > On Tue, Oct 25, 2016 at 6:22 AM, David Holmes > wrote: > > Hi Thomas, > > On 24/10/2016 11:12 PM, Thomas St?fe wrote: > > Dear all, > > please check this tiny bug fix. > > Bug report: > https://bugs.openjdk.java.net/browse/JDK-8168542 > > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/ > > > In short, this fixes a corner case for os::realloc() which currently > returns NULL if input size is zero. > > But as we have coding which interprets a return value of NULL as > OOM (See > ReallocateHeap()), this is not a good solution. It is also > inconsistent > with how os::malloc() deals with the same situation and > potentially with > the way the native C-Runtime deals with it (currently, in a > debug build we > will return NULL in case of size=0 whereas in the release build > we just > call the native ::realloc() and return whatever it returns.) > > > Sorry but I do not like this. A native realloc with a size of zero > and a non-NULL ptr acts like free(ptr). Our realloc does not do > that. A native malloc that receives a size of zero "returns either > NULL, or a unique pointer value that can later be successfully > passed to free()". Our os::malloc returns 1 - and I see nothing that > indicates that can successfully be passed to os::free. > > So while the current handling of size==0 is a bit inconsistent and > unclear, it is even less clear that returning 1 is a reasonable > thing to do. > > To me passing a size of zero (unless expecting it to act like a > free!) is a bug that should be handled in the caller. > > > You completely lost me here. I do not return 1, neither does os::malloc(). Sorry some kind of visual-neural short-circuit. :) Okay size 0 becomes size 1. Let me just recant my email and let someone else step in. Thanks for the detailed explanation below. David > os::realloc behaviour now is: > - in debug: if size==0, do not free but return NULL immediately. Which > is not a standard behaviour of a normal ::realloc() > - in release builds: if size==0, do whatever the C-Runtime realloc() > does, so it will always free() but either return NULL or a unique pointer. > > So, in debug build the behaviour is unexpected and - assuming > os::realloc() mimicks ::realloc() - wrong. In release builds it will be > correct but unknown. > > My patch changes this behaviour to always - in both release and debug > builds - free() and return a unique pointer. By setting the size to 1: > > - for the debug build, I will go thru the normal path - allocating 1 > byte of memory (plus NMT/guard pages overhead), copying 1 byte of > payload, then freeing the original memory and returning the alloced 1 > byte - which is unique and can be passed to os::free(). > - for the release build I will force the behaviour to be a realloc to > size 1 and thus remove the ambiguity introduced by the native realloc. > > > I welcome opinions from others on this. > > David > > PS. I will be traveling soon and unable to respond to emails until > Wednesday afternoon at the earliest. > > Thank you, > > Thomas > > > Kind Regards, Thomas From Alan.Burlison at oracle.com Tue Oct 25 08:40:52 2016 From: Alan.Burlison at oracle.com (Alan Burlison) Date: Tue, 25 Oct 2016 09:40:52 +0100 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: <95dcaecb-4db1-efec-387b-ccd4954e8c9f@oracle.com> On 24/10/2016 14:12, Thomas St?fe wrote: > In short, this fixes a corner case for os::realloc() which currently > returns NULL if input size is zero. For reference, here's what POSIX.1-2008 says: malloc: "If the space cannot be allocated, a null pointer shall be returned. If the size of the space requested is 0, the behavior is implementation-defined: either a null pointer shall be returned, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the returned pointer is used to access an object." realloc: "If the size of the space requested is zero, the behavior shall be implementation-defined: either a null pointer is returned, or the behavior shall be as if the size were some non-zero value, except that the behavior is undefined if the returned pointer is used to access an object. If the space cannot be allocated, the object shall remain unchanged." C11 says basically the same thing. -- Alan Burlison -- From tobias.hartmann at oracle.com Tue Oct 25 12:43:07 2016 From: tobias.hartmann at oracle.com (Tobias Hartmann) Date: Tue, 25 Oct 2016 14:43:07 +0200 Subject: [9] RFR(S): 8164612: NoSuchMethodException when method name contains NULL or Latin-1 supplement character In-Reply-To: <5808692E.9090905@oracle.com> References: <580622AE.9080802@oracle.com> <5808692E.9090905@oracle.com> Message-ID: <580F535B.8040205@oracle.com> [Ping] As Coleen requested, I executed the JCK/VM tests (see comment in bug). Best regards, Tobias On 20.10.2016 08:50, Tobias Hartmann wrote: > Hi, > > since this is affecting runtime code, could someone from the runtime team please have a look as well? > > Thanks, > Tobias > > On 18.10.2016 15:25, Tobias Hartmann wrote: >> Hi, >> >> please review the following patch: >> https://bugs.openjdk.java.net/browse/JDK-8164612 >> http://cr.openjdk.java.net/~thartmann/8164612/webrev.00/ >> >> The test executes Java Script code that defines getter methods containing Latin-1 supplement characters (0x80 - 0xFF). Those methods are registered at runtime through anonymous classes via Unsafe_DefineAnonymousClass. When calling a method, the VM fails with a NoSuchMethodException in MethodHandles::resolve_MemberName(). >> >> The failure happens while looking up the method name symbol in java_lang_String::as_symbol_or_null() [1]: >> 544 jbyte* position = (length == 0) ? NULL : value->byte_at_addr(0); >> 545 const char* base = UNICODE::as_utf8(position, length); >> 546 return SymbolTable::probe(base, length); >> >> If Compact Strings is enabled, we pass the Latin-1 encoded method name to UNICODE::as_utf8() and probe for the UTF-8 String in the SymbolTable. Since the Latin-1 method name contains non-ASCII characters, the length of the resulting UTF-8 String is larger (characters >= 0x80 are encoded as two bytes in UTF-8). However, we pass the shorter Latin-1 length to SymbolTable::probe() resulting in a lookup failure. >> >> I fixed this by passing the String length by reference to UNICODE::as_utf8(). I also refactored the related code in utf8.cpp, added comments and updated the callers. >> >> Tested with regression test and hs-comp PIT RBT (running). >> >> Thanks, >> Tobias, >> >> [1] http://hg.openjdk.java.net/jdk9/hs/hotspot/file/652537a80080/src/share/vm/classfile/javaClasses.cpp#l535 >> From rachel.protacio at oracle.com Tue Oct 25 15:14:01 2016 From: rachel.protacio at oracle.com (Rachel Protacio) Date: Tue, 25 Oct 2016 11:14:01 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> References: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> Message-ID: Hi, Thanks for taking a look. I think in this particular case the issue was that the nested ResourceMark's were around code that affected an existing outputStream. So in fact the nesting per se isn't what was wrong, the issue was adding a ResourceMark in the middle of a resource that still needed the content after it went out of scope of the RM. So line 796 is good because its functionality is self-contained, and the ones I deleted were bad because they interfered with the functionality of the caller code. (Can someone corroborate this assessment?) However, as those functions still need RMs in general somewhere up the line, I can add a comment of the form // The caller of print_slot() (or one of its callers) // must use a ResourceMark in order to correctly free the result. for print_slot(), print_method(), and print_on() at line 590. Does that sound good? Rachel On 10/25/2016 1:41 AM, David Holmes wrote: > Hi Rachel, > > On 25/10/2016 6:17 AM, Rachel Protacio wrote: >> Hi, >> >> Please review this small fix, which removes two nested ResourceMark's >> that were causing problems with defaultmethods logging. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 >> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ >> >> Tested with JPRT. > > It is tricky to determine who has responsibility for positioning the > ResourceMarks. Looking at this call chain it initially appeared to me > that we now had a missing RM for the code at line #80: > > 813 slot->print_on(logstream); > => > 590 void print_on(outputStream* str) const { > 591 print_slot(str, name(), signature()); > 592 } > => > 79 static void print_slot(outputStream* str, Symbol* name, Symbol* > signature) { > 80 str->print("%s%s", name->as_C_string(), signature->as_C_string()); > 81 } > > but we actually have a RM higher up at: > > 787 ResourceMark rm(THREAD); > > so that is good, but then we also have a nested ResourceMark further > down: > > 795 if (log_is_enabled(Debug, defaultmethods)) { > 796 ResourceMark rm; > > I must admit I'm unclear if ResourceMarks should never be nested, or > should be nested "carefully" - and if the latter exactly what that > means and how to recognize it. > > Thanks, > David > ----- > >> Thanks! >> Rachel From rachel.protacio at oracle.com Tue Oct 25 15:19:42 2016 From: rachel.protacio at oracle.com (Rachel Protacio) Date: Tue, 25 Oct 2016 11:19:42 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: <580EB158.7020509@oracle.com> References: <580EB158.7020509@oracle.com> Message-ID: Hi, Thanks for looking - yes, the issue is that both functions are used as sub-components of a larger printing function so the RMs should only exist at the top level where the stream is created. Rachel On 10/24/2016 9:11 PM, Max Ockner wrote: > Rachel, > Did you mean to remove both ResourceMarks? > (I suppose if it passes JPRT then it might not matter) > Max > On 10/24/2016 4:17 PM, Rachel Protacio wrote: >> Hi, >> >> Please review this small fix, which removes two nested ResourceMark's >> that were causing problems with defaultmethods logging. >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 >> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ >> >> Tested with JPRT. >> >> Thanks! >> Rachel > From harold.seigel at oracle.com Tue Oct 25 19:58:23 2016 From: harold.seigel at oracle.com (harold seigel) Date: Tue, 25 Oct 2016 15:58:23 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: References: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> Message-ID: <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com> Hi Rachel, I think that the ResourceMarks that you removed were the correct ones. My understanding is that based on this assert it looks like all calls to stringStream::write() on a particular Stream need to be done using the same ResourceMark with which the Stream was created. Otherwise, this assert in stringStream::write() will trigger: assert(rm == NULL || Thread::current()->current_resource_mark() == rm, "StringStream is re-allocated with a different ResourceMark..." ...) These two ResourceMarks needed to be removed because their outputStream was constructed with a caller's ResourceMark. If they specified their own ResourceMark then their calls to print(), which eventually calls stringStream::write(), would cause the assert to trigger. static void print_slot(outputStream* str, Symbol* name, Symbol* signature) { ResourceMark rm; str->print("%s%s", name->as_C_string(), signature->as_C_string()); } static void print_method(outputStream* str, Method* mo, bool with_class=true) { ResourceMark rm; if (with_class) { str->print("%s.", mo->klass_name()->as_C_string()); } print_slot(str, mo->name(), mo->signature()); } I think that having a ResourceMark in code like this is okay because debug_stream() probably constructs a new Stream object. if (log_is_enabled(Debug, defaultmethods)) { log_debug(defaultmethods)("Slots that need filling:"); ResourceMark rm; outputStream* logstream = Log(defaultmethods)::debug_stream(); streamIndentor si(logstream); for (int i = 0; i < slots->length(); ++i) { logstream->indent(); slots->at(i)->print_on(logstream); logstream->cr(); } } Harold On 10/25/2016 11:14 AM, Rachel Protacio wrote: > Hi, > > Thanks for taking a look. I think in this particular case the issue > was that the nested ResourceMark's were around code that affected an > existing outputStream. So in fact the nesting per se isn't what was > wrong, the issue was adding a ResourceMark in the middle of a resource > that still needed the content after it went out of scope of the RM. So > line 796 is good because its functionality is self-contained, and the > ones I deleted were bad because they interfered with the functionality > of the caller code. (Can someone corroborate this assessment?) > > However, as those functions still need RMs in general somewhere up the > line, I can add a comment of the form > > // The caller of print_slot() (or one of its callers) > // must use a ResourceMark in order to correctly free the result. > > for print_slot(), print_method(), and print_on() at line 590. Does > that sound good? > Rachel > > On 10/25/2016 1:41 AM, David Holmes wrote: >> Hi Rachel, >> >> On 25/10/2016 6:17 AM, Rachel Protacio wrote: >>> Hi, >>> >>> Please review this small fix, which removes two nested ResourceMark's >>> that were causing problems with defaultmethods logging. >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 >>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ >>> >>> Tested with JPRT. >> >> It is tricky to determine who has responsibility for positioning the >> ResourceMarks. Looking at this call chain it initially appeared to me >> that we now had a missing RM for the code at line #80: >> >> 813 slot->print_on(logstream); >> => >> 590 void print_on(outputStream* str) const { >> 591 print_slot(str, name(), signature()); >> 592 } >> => >> 79 static void print_slot(outputStream* str, Symbol* name, Symbol* >> signature) { >> 80 str->print("%s%s", name->as_C_string(), signature->as_C_string()); >> 81 } >> >> but we actually have a RM higher up at: >> >> 787 ResourceMark rm(THREAD); >> >> so that is good, but then we also have a nested ResourceMark further >> down: >> >> 795 if (log_is_enabled(Debug, defaultmethods)) { >> 796 ResourceMark rm; >> >> I must admit I'm unclear if ResourceMarks should never be nested, or >> should be nested "carefully" - and if the latter exactly what that >> means and how to recognize it. >> >> Thanks, >> David >> ----- >> >>> Thanks! >>> Rachel > From chris.plummer at oracle.com Tue Oct 25 20:19:11 2016 From: chris.plummer at oracle.com (Chris Plummer) Date: Tue, 25 Oct 2016 13:19:11 -0700 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: Hi Thomas, I don't exactly like the behavior our current os::malloc() and os::realloc() is attempting, which is to hide the native malloc and realloc inconsistencies with size 0 by always making it size 1. Like David said, it should be considered a caller bug when this happens. But since it already seems to be baked in, and fixing all callers is way outside the scope of this bug, your fix seems to be the best approach. You could actually move your fix inside the #ifndef ASSERT, since it will be redundant for the ASSERT case (it's already handled in os::malloc). However, it's probably cleaner before the #ifndef ASSERT, and makes it clear that no matter what the size is set to 1. BTW, you can't push this to 9 since it's a p4. It looks like the Fix Version is already set to 10, so I assume that's where it is going. cheers, Chris On 10/24/16 6:12 AM, Thomas St?fe wrote: > Dear all, > > please check this tiny bug fix. > > Bug report: > https://bugs.openjdk.java.net/browse/JDK-8168542 > > Webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/ > > In short, this fixes a corner case for os::realloc() which currently > returns NULL if input size is zero. > > But as we have coding which interprets a return value of NULL as OOM (See > ReallocateHeap()), this is not a good solution. It is also inconsistent > with how os::malloc() deals with the same situation and potentially with > the way the native C-Runtime deals with it (currently, in a debug build we > will return NULL in case of size=0 whereas in the release build we just > call the native ::realloc() and return whatever it returns.) > > Thank you, > > Thomas From mandy.chung at oracle.com Tue Oct 25 23:10:40 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 25 Oct 2016 16:10:40 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names Message-ID: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Webrev at: http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/ Specdiff: http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html This is a long-standing RFE for adding support for class loader names. It's #ClassLoaderNames on JSR 376 issue list where the proposal [1] has been implemented in jake for some time. This patch brings this change to jdk9. A short summary: - New constructors are added in ClassLoader, SecureClassLoader and URLClassLoader to specify the class loader name. - New ClassLoader::getName and StackTraceElement::getClassLoaderName method - StackTraceElement::toString is updated to include the name of the class loader and module of that frame in this format: //(:) The detail is in StackTraceElement::buildLoaderModuleClassName that compress the output string for cases when the loader has no name or the module is unnamed module. Another thing to mention is that VM sets the Class object when filling in a stack trace of a Throwable object. Then the library will build a String from the Class object for serialization purpose. Mandy [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html From serguei.spitsyn at oracle.com Wed Oct 26 03:22:28 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Tue, 25 Oct 2016 20:22:28 -0700 Subject: RFR(S): JDK-8165496 assert(_exception_caught == false) failed: _exception_caught is out of phase In-Reply-To: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com> References: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com> Message-ID: <35b7d0b6-abd9-d06b-18b7-6024d324c37a@oracle.com> Hi Dmitry, Sorry, I do not see how this fixes the problem. What are you trying to solve by calling the set_exception_detected() conditionally? The _exception_detected flag at that point has to be set anyway, right? The root cause of this issue is that the assert is unreasonable and does not solve anything. So that the assert has to be replaced with cleaning the _exception_caught flag. Please, read my comment in the bug report. I also thought that you were agree with this conclusion. :) Thanks, Serguei On 10/21/16 01:42, Dmitry Samersoff wrote: > Everybody, > > Please review a small modification of the fix for JDK-8134434: > > http://cr.openjdk.java.net/~dsamersoff/JDK-8165496/webrev.04/ > > Its' possible that we come to rethrow_C when _exception_caught is > already cleared. We need not to set exception_detected in this > case. > > -Dmitry > From thomas.stuefe at gmail.com Wed Oct 26 05:28:16 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Oct 2016 07:28:16 +0200 Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input size=0 In-Reply-To: References: Message-ID: Hi Chris, thank you for the review! I'll put this on the growing pile of jdk10 fixes and hope we will have a repo soon to fix this in. Kind Regards, Thomas On Tue, Oct 25, 2016 at 10:19 PM, Chris Plummer wrote: > Hi Thomas, > > I don't exactly like the behavior our current os::malloc() and > os::realloc() is attempting, which is to hide the native malloc and realloc > inconsistencies with size 0 by always making it size 1. Like David said, it > should be considered a caller bug when this happens. But since it already > seems to be baked in, and fixing all callers is way outside the scope of > this bug, your fix seems to be the best approach. > > You could actually move your fix inside the #ifndef ASSERT, since it will > be redundant for the ASSERT case (it's already handled in os::malloc). > However, it's probably cleaner before the #ifndef ASSERT, and makes it > clear that no matter what the size is set to 1. > > BTW, you can't push this to 9 since it's a p4. It looks like the Fix > Version is already set to 10, so I assume that's where it is going. > > cheers, > > Chris > > > On 10/24/16 6:12 AM, Thomas St?fe wrote: > >> Dear all, >> >> please check this tiny bug fix. >> >> Bug report: >> https://bugs.openjdk.java.net/browse/JDK-8168542 >> >> Webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo >> c_size_0/webrev.00/webrev/ >> >> In short, this fixes a corner case for os::realloc() which currently >> returns NULL if input size is zero. >> >> But as we have coding which interprets a return value of NULL as OOM (See >> ReallocateHeap()), this is not a good solution. It is also inconsistent >> with how os::malloc() deals with the same situation and potentially with >> the way the native C-Runtime deals with it (currently, in a debug build we >> will return NULL in case of size=0 whereas in the release build we just >> call the native ::realloc() and return whatever it returns.) >> >> Thank you, >> >> Thomas >> > > > > From chris.plummer at oracle.com Wed Oct 26 07:00:20 2016 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 26 Oct 2016 00:00:20 -0700 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: Message-ID: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> Hi Tomas, See JDK-8156821. I'm curious as to how your changes will impact it, since David says you can't interrupt a thread blocked trying to acquire mutex. I suspect that means this enhancement won't help in this case, and presumably in general you are not fixing the issue of error reporting getting deadlocked, or maybe I'm misinterpreting what David said in JDK-8156821. Otherwise overall your changes look good, but I have a few comments. Also, since this is an enhancement, it needs to wait for JDK 10. I think your test will fail for product builds. You should add "@requires vm.debug == true". Also, java files use 4 char indentation, not 2 like we use in hotspot C/C++ code. Lastly, it should only have a 2016 copyright. A couple of files need the copyright updated to 2016. Why do set_to_now() and get_timestamp() need to be atomic, and what are the consequences of cx8 not being supported? 1282 st->print_raw_cr(buffer); 1283 st->cr(); The old code had an additional st->cr() before the above lines. I assume you removed it intentionally. Is there a reason why you decided to only allow one step to timeout. What if the cause of a timeout in a step also impacts other steps, or is that not common when we see timeouts? It's not clear to me why you changed a couple of os::sleep() calls to os::naked_short_sleep(), and the rationale for the sleep periods. Can you please explain? thanks, Chris On 10/12/16 9:55 PM, Thomas St?fe wrote: > Dear all, > > please take a look at the following fix: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html > > --- > > In short, this fix provides the ability to cancel hanging error reporting > steps. This uses the same code paths secondary error handling uses during > error reporting. With this patch, steps which take too long will be > canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this: > > 4 [timeout occurred during error reporting in step ""] after xxxx > ms. > 5 > > and we now also get a finish message in the hs-err file if we hit the > ErrorLogTimeout and error reporting will stop altogether: > > 6 ------ Timout during error reporting after xxx ms. ------ > > (in addition to the "time expired, abort" message the WatcherThread writes > to stderr) > > --- > > This is something which bugged us for a long time, because we rely heavily > on the hs_err files for error analysis at customer sites, and there are a > number of reasons why one step may hang and prevent the follow-up steps > from running. > > It works like this: > > Before, when error reporting started, the WatcherThread was waiting for > ErrorLogTimeout seconds, then would stop the VM. > > Now, the WatcherThread periodically pings error reporting, which checks if > the last step did timeout. If it does, it sends a signal to the reporting > thread, and the thread will continue with the next step. This follows the > same path as secondary crash handling. > > Some implementation details: > > On Posix platforms, to interrupt the thread, I use pthread_kill. This means > I must know the pthread id of the reporting thread, which I now store at > the beginning of error reporting. We already store the reporting thread id > in first_error_tid, but that I cannot use, because it gets set by > os::current_thread_id(), which is not always the pthread id. Should we ever > switch to only using pthread id for posix platforms, this coding can be > simplified. > > On Windows, there is unfortunately no easy way to interrupt a > non-cooperative thread. I would need a way to cause a SEH inside the target > thread, which then would get handled by secondary error handling like on > Posix platforms, but that is not easy. It is doable - one can suspend the > thread, modify the thread context in a way that it will crash upon resume. > But that felt a bit heavyweight for this problem. So on windows, timeout > handling still works (after ErrorLogTimeout the VM gets shut down), but > error reporting steps are not interruptable. If we feel this is important, > this can be added later. > > Kind Regards, Thomas From thomas.stuefe at gmail.com Wed Oct 26 14:45:40 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Wed, 26 Oct 2016 16:45:40 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> Message-ID: Hi Chris, Thanks for the review! New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/ Comments inline. On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer wrote: > Hi Tomas, > > See JDK-8156821. I'm curious as to how your changes will impact it, since > David says you can't interrupt a thread blocked trying to acquire mutex. I > suspect that means this enhancement won't help in this case, and presumably > in general you are not fixing the issue of error reporting getting > deadlocked, or maybe I'm misinterpreting what David said in JDK-8156821. > Not sure what exactly David meant with "You can't "interrupt" a thread that is blocked trying to acquire a mutex." Maybe he can elaborate :) My guesses: 1) If he meant "you cannot interrupt a thread blocking in pthread_mutex_lock()" - not true, you can and my patch works just fine in this situation. Just tested again, to be sure. This covers crashes in sections guarded by pthread_mutex, which then try to reaquire the lock in the error handler. 2) If he meant "you cannot interrupt malloc if it is executing a system call in the linux kernel" - that may be. I am not a linux kernel expert but would have thought that syscalls may block if interrupts are disabled for certain lengths by the syscall author. But in that case i would have expected the process to hang too and to be not killable? Again, I am no expert. > > Otherwise overall your changes look good, but I have a few comments. Also, > since this is an enhancement, it needs to wait for JDK 10. > > I think your test will fail for product builds. You should add "@requires > vm.debug == true". Also, java files use 4 char indentation, not 2 like we > use in hotspot C/C++ code. Lastly, it should only have a 2016 copyright. > > Thank you for the hints. Did fix all that. Note that I had disabled the test for product builds in the code (!Platform.isDebugBuild()) but I added the vm.debug tag as well as you suggested. > A couple of files need the copyright updated to 2016. > > Why do set_to_now() and get_timestamp() need to be atomic, and what are > the consequences of cx8 not being supported? > > The error reporting thread sets the timestamp on each STEP start, and the timestamp is read from another thread, the WatcherThread. Timestamp is 64bit. I wanted to make sure the 64bit value is written and read atomically, especially on 32bit platforms. But then, I had to check whether 64bit atomic stores/loads are even supported by this platform (I actually did not find a 32bit platform whithout 64bit atomics, but the comment in atomic.hpp is pretty insistent and I did not want to risk regressions for other platforms). Well, if no cx8 support was available, I pretty much just give up and read and write timestamps directly. As I said, I am not sure if this code path gets ever executed. Maybe I was overthinking all this and just reading and writing the (C++ volatile) jlongs would have been enough, but I wanted to prevent sporadic test errors because of incompletely read 64bit values. > 1282 st->print_raw_cr(buffer); > 1283 st->cr(); > > The old code had an additional st->cr() before the above lines. I assume > you removed it intentionally. > > I hope I preserved the numbers of cr(). At least that was my intention: 1260 outputStream* const st = log.is_open() ? &log : &out; 1261 st->cr(); ... and then on every path, a cr (or print_raw_cr) at the end. Where do you see the missing cr()? > Is there a reason why you decided to only allow one step to timeout. What > if the cause of a timeout in a step also impacts other steps, or is that > not common when we see timeouts? > > That is mostly guesswork. In our (SAP) code we allow for four steps (so ErrorLogTimeout/4 as step timeout) and additionally allow for "steps known to be long" where timeouts are disabled altogether. But we also have more complicated error reporting steps, so when porting the patch to OpenJDK, I felt the complexity was unneeded. I think in general you will only have one misbehaving step, but you are right, more than one step may timeout if e.g. the file system is slow. I'm open for suggestions: the timeout value should be large enough not to be hit for "normal slow steps" while still leave room enough for other steps to finish. What do you think a reasonable timeout value would be? ErrorLogTimeout/4? > It's not clear to me why you changed a couple of os::sleep() calls to > os::naked_short_sleep(), and the rationale for the sleep periods. Can you > please explain? > > Because os::sleep() does a lot of work under the hood and relies on a bit of VM infrastructure. I think that is not a good idea in error situations where potentially everything may be broken already. You want to step lightly and really only do a naked system sleep. About the sleep periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I have to stay below to not hit the assert. I did use 999ms as the longest interval I am allowed to sleep nakedly. And after the timeout hit and before the WatcherThread calls os::abort, I again sleep 200ms to give the error reporter thread time to write the "error log aborted due to timeout" into the error log and to flush the error log. Those 200ms are just guesswork. > thanks, > > Chris > > Thanks for the review! Kind Regards, Thomas > > On 10/12/16 9:55 PM, Thomas St?fe wrote: > >> Dear all, >> >> please take a look at the following fix: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging- >> Error-Reporting/webrev.00/webrev/index.html >> >> --- >> >> In short, this fix provides the ability to cancel hanging error reporting >> steps. This uses the same code paths secondary error handling uses during >> error reporting. With this patch, steps which take too long will be >> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like >> this: >> >> 4 [timeout occurred during error reporting in step ""] after >> xxxx >> ms. >> 5 >> >> and we now also get a finish message in the hs-err file if we hit the >> ErrorLogTimeout and error reporting will stop altogether: >> >> 6 ------ Timout during error reporting after xxx ms. ------ >> >> (in addition to the "time expired, abort" message the WatcherThread writes >> to stderr) >> >> --- >> >> This is something which bugged us for a long time, because we rely heavily >> on the hs_err files for error analysis at customer sites, and there are a >> number of reasons why one step may hang and prevent the follow-up steps >> from running. >> >> It works like this: >> >> Before, when error reporting started, the WatcherThread was waiting for >> ErrorLogTimeout seconds, then would stop the VM. >> >> Now, the WatcherThread periodically pings error reporting, which checks if >> the last step did timeout. If it does, it sends a signal to the reporting >> thread, and the thread will continue with the next step. This follows the >> same path as secondary crash handling. >> >> Some implementation details: >> >> On Posix platforms, to interrupt the thread, I use pthread_kill. This >> means >> I must know the pthread id of the reporting thread, which I now store at >> the beginning of error reporting. We already store the reporting thread id >> in first_error_tid, but that I cannot use, because it gets set by >> os::current_thread_id(), which is not always the pthread id. Should we >> ever >> switch to only using pthread id for posix platforms, this coding can be >> simplified. >> >> On Windows, there is unfortunately no easy way to interrupt a >> non-cooperative thread. I would need a way to cause a SEH inside the >> target >> thread, which then would get handled by secondary error handling like on >> Posix platforms, but that is not easy. It is doable - one can suspend the >> thread, modify the thread context in a way that it will crash upon resume. >> But that felt a bit heavyweight for this problem. So on windows, timeout >> handling still works (after ErrorLogTimeout the VM gets shut down), but >> error reporting steps are not interruptable. If we feel this is important, >> this can be added later. >> >> Kind Regards, Thomas >> > > > > From coleen.phillimore at oracle.com Wed Oct 26 19:32:13 2016 From: coleen.phillimore at oracle.com (Coleen Phillimore) Date: Wed, 26 Oct 2016 15:32:13 -0400 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 In-Reply-To: References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com> Message-ID: <13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com> On 10/20/16 4:58 AM, Doerr, Martin wrote: > Hi Coleen, > > thank you very much for reviewing my PPC change. > > We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important. > A large amount of less optimized changes will make it slower over time. > That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well. > Maybe we should improve them in 10. I don't know. I though load_mirror() made for a nice API. Does the extra indirect matter? I filed RFE https://bugs.openjdk.java.net/browse/JDK-8168795 so we can investigate further in 10. This is approved and I think reviewed so you can check it in anytime. I put a due date of Friday on your bug. Feel free to change it if that's not good. Thanks, Coleen > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore > Sent: Dienstag, 18. Oktober 2016 23:56 > To: hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 > > > This seems good. I think it's a shame to change load_mirror() to > load_mirror_from_const_method() though because there's load_mirror() > with the same parameters on all the other platforms and it makes > platform development a little easier. But that's up to you to because > you can generate shorter sequences. > > Coleen > > > On 10/17/16 12:38 PM, Doerr, Martin wrote: >> Hi, >> >> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9: >> >> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed). >> >> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev. >> >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/ >> >> Please review. >> >> Thanks and best regards, >> Martin >> From david.holmes at oracle.com Wed Oct 26 19:33:11 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Oct 2016 05:33:11 +1000 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com> References: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com> Message-ID: <151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com> Harold/Rachel, Thanks for clarifying things - I had misconstrued the actual problem. In summary we should not use a ResourceObj in the scope of a nested ResourceMark wrt. the allocation of the ResourceObj. The fix is good and no further changes are needed. Thanks, David On 26/10/2016 5:58 AM, harold seigel wrote: > Hi Rachel, > > I think that the ResourceMarks that you removed were the correct ones. > My understanding is that based on this assert it looks like all calls to > stringStream::write() on a particular Stream need to be done using the > same ResourceMark with which the Stream was created. Otherwise, this > assert in stringStream::write() will trigger: > > assert(rm == NULL || Thread::current()->current_resource_mark() == > rm, > "StringStream is re-allocated with a different > ResourceMark..." ...) > > These two ResourceMarks needed to be removed because their outputStream > was constructed with a caller's ResourceMark. If they specified their > own ResourceMark then their calls to print(), which eventually calls > stringStream::write(), would cause the assert to trigger. > > static void print_slot(outputStream* str, Symbol* name, Symbol* > signature) { > ResourceMark rm; > str->print("%s%s", name->as_C_string(), signature->as_C_string()); > } > > static void print_method(outputStream* str, Method* mo, bool > with_class=true) { > ResourceMark rm; > if (with_class) { > str->print("%s.", mo->klass_name()->as_C_string()); > } > print_slot(str, mo->name(), mo->signature()); > } > > > > I think that having a ResourceMark in code like this is okay because > debug_stream() probably constructs a new Stream object. > > if (log_is_enabled(Debug, defaultmethods)) { > log_debug(defaultmethods)("Slots that need filling:"); > ResourceMark rm; > outputStream* logstream = Log(defaultmethods)::debug_stream(); > streamIndentor si(logstream); > for (int i = 0; i < slots->length(); ++i) { > logstream->indent(); > slots->at(i)->print_on(logstream); > logstream->cr(); > } > } > > Harold > > > On 10/25/2016 11:14 AM, Rachel Protacio wrote: >> Hi, >> >> Thanks for taking a look. I think in this particular case the issue >> was that the nested ResourceMark's were around code that affected an >> existing outputStream. So in fact the nesting per se isn't what was >> wrong, the issue was adding a ResourceMark in the middle of a resource >> that still needed the content after it went out of scope of the RM. So >> line 796 is good because its functionality is self-contained, and the >> ones I deleted were bad because they interfered with the functionality >> of the caller code. (Can someone corroborate this assessment?) >> >> However, as those functions still need RMs in general somewhere up the >> line, I can add a comment of the form >> >> // The caller of print_slot() (or one of its callers) >> // must use a ResourceMark in order to correctly free the result. >> >> for print_slot(), print_method(), and print_on() at line 590. Does >> that sound good? >> Rachel >> >> On 10/25/2016 1:41 AM, David Holmes wrote: >>> Hi Rachel, >>> >>> On 25/10/2016 6:17 AM, Rachel Protacio wrote: >>>> Hi, >>>> >>>> Please review this small fix, which removes two nested ResourceMark's >>>> that were causing problems with defaultmethods logging. >>>> >>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 >>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ >>>> >>>> Tested with JPRT. >>> >>> It is tricky to determine who has responsibility for positioning the >>> ResourceMarks. Looking at this call chain it initially appeared to me >>> that we now had a missing RM for the code at line #80: >>> >>> 813 slot->print_on(logstream); >>> => >>> 590 void print_on(outputStream* str) const { >>> 591 print_slot(str, name(), signature()); >>> 592 } >>> => >>> 79 static void print_slot(outputStream* str, Symbol* name, Symbol* >>> signature) { >>> 80 str->print("%s%s", name->as_C_string(), signature->as_C_string()); >>> 81 } >>> >>> but we actually have a RM higher up at: >>> >>> 787 ResourceMark rm(THREAD); >>> >>> so that is good, but then we also have a nested ResourceMark further >>> down: >>> >>> 795 if (log_is_enabled(Debug, defaultmethods)) { >>> 796 ResourceMark rm; >>> >>> I must admit I'm unclear if ResourceMarks should never be nested, or >>> should be nested "carefully" - and if the latter exactly what that >>> means and how to recognize it. >>> >>> Thanks, >>> David >>> ----- >>> >>>> Thanks! >>>> Rachel >> > From rachel.protacio at oracle.com Wed Oct 26 19:59:57 2016 From: rachel.protacio at oracle.com (Rachel Protacio) Date: Wed, 26 Oct 2016 15:59:57 -0400 Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method descriptor triggers "StringStream is re-allocated with a different ResourceMark" In-Reply-To: <151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com> References: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com> <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com> <151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com> Message-ID: <0ca26d7b-50f3-0ba1-0edf-24b3dfb79c1d@oracle.com> Great! Thank you, David and Harold for the reviews. I'll commit. Rachel On 10/26/2016 3:33 PM, David Holmes wrote: > Harold/Rachel, > > Thanks for clarifying things - I had misconstrued the actual problem. > > In summary we should not use a ResourceObj in the scope of a nested > ResourceMark wrt. the allocation of the ResourceObj. > > The fix is good and no further changes are needed. > > Thanks, > David > > On 26/10/2016 5:58 AM, harold seigel wrote: >> Hi Rachel, >> >> I think that the ResourceMarks that you removed were the correct ones. >> My understanding is that based on this assert it looks like all calls to >> stringStream::write() on a particular Stream need to be done using the >> same ResourceMark with which the Stream was created. Otherwise, this >> assert in stringStream::write() will trigger: >> >> assert(rm == NULL || Thread::current()->current_resource_mark() == >> rm, >> "StringStream is re-allocated with a different >> ResourceMark..." ...) >> >> These two ResourceMarks needed to be removed because their outputStream >> was constructed with a caller's ResourceMark. If they specified their >> own ResourceMark then their calls to print(), which eventually calls >> stringStream::write(), would cause the assert to trigger. >> >> static void print_slot(outputStream* str, Symbol* name, Symbol* >> signature) { >> ResourceMark rm; >> str->print("%s%s", name->as_C_string(), signature->as_C_string()); >> } >> >> static void print_method(outputStream* str, Method* mo, bool >> with_class=true) { >> ResourceMark rm; >> if (with_class) { >> str->print("%s.", mo->klass_name()->as_C_string()); >> } >> print_slot(str, mo->name(), mo->signature()); >> } >> >> >> >> I think that having a ResourceMark in code like this is okay because >> debug_stream() probably constructs a new Stream object. >> >> if (log_is_enabled(Debug, defaultmethods)) { >> log_debug(defaultmethods)("Slots that need filling:"); >> ResourceMark rm; >> outputStream* logstream = Log(defaultmethods)::debug_stream(); >> streamIndentor si(logstream); >> for (int i = 0; i < slots->length(); ++i) { >> logstream->indent(); >> slots->at(i)->print_on(logstream); >> logstream->cr(); >> } >> } >> >> Harold >> >> >> On 10/25/2016 11:14 AM, Rachel Protacio wrote: >>> Hi, >>> >>> Thanks for taking a look. I think in this particular case the issue >>> was that the nested ResourceMark's were around code that affected an >>> existing outputStream. So in fact the nesting per se isn't what was >>> wrong, the issue was adding a ResourceMark in the middle of a resource >>> that still needed the content after it went out of scope of the RM. So >>> line 796 is good because its functionality is self-contained, and the >>> ones I deleted were bad because they interfered with the functionality >>> of the caller code. (Can someone corroborate this assessment?) >>> >>> However, as those functions still need RMs in general somewhere up the >>> line, I can add a comment of the form >>> >>> // The caller of print_slot() (or one of its callers) >>> // must use a ResourceMark in order to correctly free the result. >>> >>> for print_slot(), print_method(), and print_on() at line 590. Does >>> that sound good? >>> Rachel >>> >>> On 10/25/2016 1:41 AM, David Holmes wrote: >>>> Hi Rachel, >>>> >>>> On 25/10/2016 6:17 AM, Rachel Protacio wrote: >>>>> Hi, >>>>> >>>>> Please review this small fix, which removes two nested ResourceMark's >>>>> that were causing problems with defaultmethods logging. >>>>> >>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995 >>>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/ >>>>> >>>>> Tested with JPRT. >>>> >>>> It is tricky to determine who has responsibility for positioning the >>>> ResourceMarks. Looking at this call chain it initially appeared to me >>>> that we now had a missing RM for the code at line #80: >>>> >>>> 813 slot->print_on(logstream); >>>> => >>>> 590 void print_on(outputStream* str) const { >>>> 591 print_slot(str, name(), signature()); >>>> 592 } >>>> => >>>> 79 static void print_slot(outputStream* str, Symbol* name, Symbol* >>>> signature) { >>>> 80 str->print("%s%s", name->as_C_string(), >>>> signature->as_C_string()); >>>> 81 } >>>> >>>> but we actually have a RM higher up at: >>>> >>>> 787 ResourceMark rm(THREAD); >>>> >>>> so that is good, but then we also have a nested ResourceMark further >>>> down: >>>> >>>> 795 if (log_is_enabled(Debug, defaultmethods)) { >>>> 796 ResourceMark rm; >>>> >>>> I must admit I'm unclear if ResourceMarks should never be nested, or >>>> should be nested "carefully" - and if the latter exactly what that >>>> means and how to recognize it. >>>> >>>> Thanks, >>>> David >>>> ----- >>>> >>>>> Thanks! >>>>> Rachel >>> >> From paul.sandoz at oracle.com Wed Oct 26 23:08:11 2016 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Wed, 26 Oct 2016 16:08:11 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: Hi, Looks ok, just some doc suggestions below. Paul. ClassLoader 366 * @param name 367 * Class loader name; can be {@code null} StackTraceElement 100 * @param classLoaderName the class loader name if the class loader of 101 * the class containing the execution point represented by 102 * the stack trace is named; can be {@code null} URLClassLoader 214 * @param name class loader name; can be {@code null} 245 * @param name class loader name; can be {@code null} SecureClassLoader 118 * @param name class loader name; can be {@code null}. "; otherwise {@code null} if the class loader is not named.? ? StackTraceElement 206 * @return the name of the class loader of the class containing the execution 207 * point represented by this stack trace element; {@code null} 208 * if the class loader name is not available. ?{@code null} if the class loader is not named.? 271 * built-in class loader, or it does not have a name, then ?? or is not named?" > On 25 Oct 2016, at 16:10, Mandy Chung wrote: > > Webrev at: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/ > > Specdiff: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html > > This is a long-standing RFE for adding support for class > loader names. It's #ClassLoaderNames on JSR 376 issue > list where the proposal [1] has been implemented in jake > for some time. This patch brings this change to jdk9. > > A short summary: > - New constructors are added in ClassLoader, SecureClassLoader > and URLClassLoader to specify the class loader name. > > - New ClassLoader::getName and StackTraceElement::getClassLoaderName > method > > - StackTraceElement::toString is updated to include the name > of the class loader and module of that frame in this format: > //(:) > > The detail is in StackTraceElement::buildLoaderModuleClassName > that compress the output string for cases when the loader > has no name or the module is unnamed module. Another thing > to mention is that VM sets the Class object when filling in > a stack trace of a Throwable object. Then the library will > build a String from the Class object for serialization purpose. > > Mandy > [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html From mandy.chung at oracle.com Wed Oct 26 23:22:12 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 26 Oct 2016 16:22:12 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: <882F2B82-3565-4E67-BDFC-6C1F80F2FF2F@oracle.com> > On Oct 26, 2016, at 4:08 PM, Paul Sandoz wrote: > : > > "; otherwise {@code null} if the class loader is not named.? > > : > ?{@code null} if the class loader is not named.? > > : > ?? or is not named?" Yup that reads better. I will update them. Thanks. Mandy From david.holmes at oracle.com Thu Oct 27 00:40:03 2016 From: david.holmes at oracle.com (David Holmes) Date: Thu, 27 Oct 2016 10:40:03 +1000 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> Message-ID: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> On 27/10/2016 12:45 AM, Thomas St?fe wrote: > Hi Chris, > > Thanks for the review! > > New > webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/ Have not looked at this yet. > Comments inline. > > On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer > wrote: > > Hi Tomas, > > See JDK-8156821. I'm curious as to how your changes will impact it, > since David says you can't interrupt a thread blocked trying to > acquire mutex. I suspect that means this enhancement won't help in > this case, and presumably in general you are not fixing the issue of > error reporting getting deadlocked, or maybe I'm misinterpreting > what David said in JDK-8156821. That should be 8156823 > > > Not sure what exactly David meant with "You can't "interrupt" a thread > that is blocked trying to acquire a mutex." Maybe he can elaborate :) > > My guesses: > > 1) If he meant "you cannot interrupt a thread blocking in > pthread_mutex_lock()" - not true, you can and my patch works just fine > in this situation. Just tested again, to be sure. This covers crashes in > sections guarded by pthread_mutex, which then try to reaquire the lock > in the error handler. There is no specified, portable way to get a thread blocked acquiring a mutex to stop waiting for the mutex. That is what I meant. pthread_mutex_lock is not a cancellation point, nor will it return EINTR in response to a signal. However, if a signal is received by the thread while waiting then POSIX semantics indicate that the signal handler will run and then return the thread to the waiting state. In our case the crash handler does not return so we are into undefined territory there - but our crash handler is already not a well-defined signal handler as it is not restricted to async-signal-safe functions, so we already run a risk when executing it. I had not considered this aspect in relation to 8156823, so the proposed approach here would also attempt to address that issue. > 2) If he meant "you cannot interrupt malloc if it is executing a system > call in the linux kernel" - that may be. I am not a linux kernel expert > but would have thought that syscalls may block if interrupts are > disabled for certain lengths by the syscall author. But in that case i > would have expected the process to hang too and to be not killable? > Again, I am no expert. Note "interrupt" here is a logical concept not related to hardware level interrupts. I don't know at what point going into malloc you will no longer get signal handlers run - malloc doesn't use pthread level mutexes, but direct futuxes, so the same signal responsiveness may not be present. Thanks, David ----- > > > Otherwise overall your changes look good, but I have a few comments. > Also, since this is an enhancement, it needs to wait for JDK 10. > > I think your test will fail for product builds. You should add > "@requires vm.debug == true". Also, java files use 4 char > indentation, not 2 like we use in hotspot C/C++ code. Lastly, it > should only have a 2016 copyright. > > > Thank you for the hints. Did fix all that. Note that I had disabled the > test for product builds in the code (!Platform.isDebugBuild()) but I > added the vm.debug tag as well as you suggested. > > > A couple of files need the copyright updated to 2016. > > Why do set_to_now() and get_timestamp() need to be atomic, and what > are the consequences of cx8 not being supported? > > > The error reporting thread sets the timestamp on each STEP start, and > the timestamp is read from another thread, the WatcherThread. Timestamp > is 64bit. I wanted to make sure the 64bit value is written and read > atomically, especially on 32bit platforms. > > But then, I had to check whether 64bit atomic stores/loads are even > supported by this platform (I actually did not find a 32bit platform > whithout 64bit atomics, but the comment in atomic.hpp is pretty > insistent and I did not want to risk regressions for other platforms). > > Well, if no cx8 support was available, I pretty much just give up and > read and write timestamps directly. As I said, I am not sure if this > code path gets ever executed. > > Maybe I was overthinking all this and just reading and writing the (C++ > volatile) jlongs would have been enough, but I wanted to prevent > sporadic test errors because of incompletely read 64bit values. > > > 1282 st->print_raw_cr(buffer); > 1283 st->cr(); > > The old code had an additional st->cr() before the above lines. I > assume you removed it intentionally. > > > I hope I preserved the numbers of cr(). At least that was my intention: > > 1260 outputStream* const st = log.is_open() ? &log : &out; > 1261 st->cr(); > > ... > > and then on every path, a cr (or print_raw_cr) at the end. Where do you > see the missing cr()? > > > > Is there a reason why you decided to only allow one step to timeout. > What if the cause of a timeout in a step also impacts other steps, > or is that not common when we see timeouts? > > > That is mostly guesswork. In our (SAP) code we allow for four steps (so > ErrorLogTimeout/4 as step timeout) and additionally allow for "steps > known to be long" where timeouts are disabled altogether. But we also > have more complicated error reporting steps, so when porting the patch > to OpenJDK, I felt the complexity was unneeded. > > I think in general you will only have one misbehaving step, but you are > right, more than one step may timeout if e.g. the file system is slow. > I'm open for suggestions: the timeout value should be large enough not > to be hit for "normal slow steps" while still leave room enough for > other steps to finish. What do you think a reasonable timeout value > would be? ErrorLogTimeout/4? > > > > It's not clear to me why you changed a couple of os::sleep() calls > to os::naked_short_sleep(), and the rationale for the sleep periods. > Can you please explain? > > > Because os::sleep() does a lot of work under the hood and relies on a > bit of VM infrastructure. I think that is not a good idea in error > situations where potentially everything may be broken already. You want > to step lightly and really only do a naked system sleep. About the sleep > periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I > have to stay below to not hit the assert. I did use 999ms as the longest > interval I am allowed to sleep nakedly. And after the timeout hit and > before the WatcherThread calls os::abort, I again sleep 200ms to give > the error reporter thread time to write the "error log aborted due to > timeout" into the error log and to flush the error log. Those 200ms are > just guesswork. > > > thanks, > > Chris > > > Thanks for the review! > > Kind Regards, Thomas > > > > > On 10/12/16 9:55 PM, Thomas St?fe wrote: > > Dear all, > > please take a look at the following fix: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 > > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html > > > --- > > In short, this fix provides the ability to cancel hanging error > reporting > steps. This uses the same code paths secondary error handling > uses during > error reporting. With this patch, steps which take too long will be > canceled after 1/2 ErrorLogTimeout. In the log file, it will > look like this: > > 4 [timeout occurred during error reporting in step ""] > after xxxx > ms. > 5 > > and we now also get a finish message in the hs-err file if we > hit the > ErrorLogTimeout and error reporting will stop altogether: > > 6 ------ Timout during error reporting after xxx ms. ------ > > (in addition to the "time expired, abort" message the > WatcherThread writes > to stderr) > > --- > > This is something which bugged us for a long time, because we > rely heavily > on the hs_err files for error analysis at customer sites, and > there are a > number of reasons why one step may hang and prevent the > follow-up steps > from running. > > It works like this: > > Before, when error reporting started, the WatcherThread was > waiting for > ErrorLogTimeout seconds, then would stop the VM. > > Now, the WatcherThread periodically pings error reporting, which > checks if > the last step did timeout. If it does, it sends a signal to the > reporting > thread, and the thread will continue with the next step. This > follows the > same path as secondary crash handling. > > Some implementation details: > > On Posix platforms, to interrupt the thread, I use pthread_kill. > This means > I must know the pthread id of the reporting thread, which I now > store at > the beginning of error reporting. We already store the reporting > thread id > in first_error_tid, but that I cannot use, because it gets set by > os::current_thread_id(), which is not always the pthread id. > Should we ever > switch to only using pthread id for posix platforms, this coding > can be > simplified. > > On Windows, there is unfortunately no easy way to interrupt a > non-cooperative thread. I would need a way to cause a SEH inside > the target > thread, which then would get handled by secondary error handling > like on > Posix platforms, but that is not easy. It is doable - one can > suspend the > thread, modify the thread context in a way that it will crash > upon resume. > But that felt a bit heavyweight for this problem. So on windows, > timeout > handling still works (after ErrorLogTimeout the VM gets shut > down), but > error reporting steps are not interruptable. If we feel this is > important, > this can be added later. > > Kind Regards, Thomas > > > > > From chris.plummer at oracle.com Thu Oct 27 03:22:35 2016 From: chris.plummer at oracle.com (Chris Plummer) Date: Wed, 26 Oct 2016 20:22:35 -0700 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> Message-ID: <4e390c00-1a96-3b8e-5f67-efce95a29021@oracle.com> On 10/26/16 5:40 PM, David Holmes wrote: > On 27/10/2016 12:45 AM, Thomas St?fe wrote: >> Hi Chris, >> >> Thanks for the review! >> >> New >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/ > > Have not looked at this yet. > >> Comments inline. >> >> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer > > wrote: >> >> Hi Tomas, >> >> See JDK-8156821. I'm curious as to how your changes will impact it, >> since David says you can't interrupt a thread blocked trying to >> acquire mutex. I suspect that means this enhancement won't help in >> this case, and presumably in general you are not fixing the issue of >> error reporting getting deadlocked, or maybe I'm misinterpreting >> what David said in JDK-8156821. > > That should be 8156823 > >> >> >> Not sure what exactly David meant with "You can't "interrupt" a thread >> that is blocked trying to acquire a mutex." Maybe he can elaborate :) >> >> My guesses: >> >> 1) If he meant "you cannot interrupt a thread blocking in >> pthread_mutex_lock()" - not true, you can and my patch works just fine >> in this situation. Just tested again, to be sure. This covers crashes in >> sections guarded by pthread_mutex, which then try to reaquire the lock >> in the error handler. > > There is no specified, portable way to get a thread blocked acquiring > a mutex to stop waiting for the mutex. That is what I meant. > pthread_mutex_lock is not a cancellation point, nor will it return > EINTR in response to a signal. > > However, if a signal is received by the thread while waiting then > POSIX semantics indicate that the signal handler will run and then > return the thread to the waiting state. In our case the crash handler > does not return so we are into undefined territory there - but our > crash handler is already not a well-defined signal handler as it is > not restricted to async-signal-safe functions, so we already run a > risk when executing it. > > I had not considered this aspect in relation to 8156823, so the > proposed approach here would also attempt to address that issue. > >> 2) If he meant "you cannot interrupt malloc if it is executing a system >> call in the linux kernel" - that may be. I am not a linux kernel expert >> but would have thought that syscalls may block if interrupts are >> disabled for certain lengths by the syscall author. But in that case i >> would have expected the process to hang too and to be not killable? >> Again, I am no expert. > > Note "interrupt" here is a logical concept not related to hardware > level interrupts. I don't know at what point going into malloc you > will no longer get signal handlers run - malloc doesn't use pthread > level mutexes, but direct futuxes, so the same signal responsiveness > may not be present. It probably would not be all that hard to create the malloc crash from 8156823 and then see how Thomas' changes impact how VMError handles it. Just find an appropriate place in the VM to malloc a chunk of memory, step all over the bytes before and after it, and then call free. Chris > > > Thanks, > David > ----- > >> >> >> Otherwise overall your changes look good, but I have a few comments. >> Also, since this is an enhancement, it needs to wait for JDK 10. >> >> I think your test will fail for product builds. You should add >> "@requires vm.debug == true". Also, java files use 4 char >> indentation, not 2 like we use in hotspot C/C++ code. Lastly, it >> should only have a 2016 copyright. >> >> >> Thank you for the hints. Did fix all that. Note that I had disabled the >> test for product builds in the code (!Platform.isDebugBuild()) but I >> added the vm.debug tag as well as you suggested. >> >> >> A couple of files need the copyright updated to 2016. >> >> Why do set_to_now() and get_timestamp() need to be atomic, and what >> are the consequences of cx8 not being supported? >> >> >> The error reporting thread sets the timestamp on each STEP start, and >> the timestamp is read from another thread, the WatcherThread. Timestamp >> is 64bit. I wanted to make sure the 64bit value is written and read >> atomically, especially on 32bit platforms. >> >> But then, I had to check whether 64bit atomic stores/loads are even >> supported by this platform (I actually did not find a 32bit platform >> whithout 64bit atomics, but the comment in atomic.hpp is pretty >> insistent and I did not want to risk regressions for other platforms). >> >> Well, if no cx8 support was available, I pretty much just give up and >> read and write timestamps directly. As I said, I am not sure if this >> code path gets ever executed. >> >> Maybe I was overthinking all this and just reading and writing the (C++ >> volatile) jlongs would have been enough, but I wanted to prevent >> sporadic test errors because of incompletely read 64bit values. >> >> >> 1282 st->print_raw_cr(buffer); >> 1283 st->cr(); >> >> The old code had an additional st->cr() before the above lines. I >> assume you removed it intentionally. >> >> >> I hope I preserved the numbers of cr(). At least that was my intention: >> >> 1260 outputStream* const st = log.is_open() ? &log : &out; >> 1261 st->cr(); >> >> ... >> >> and then on every path, a cr (or print_raw_cr) at the end. Where do you >> see the missing cr()? >> >> >> >> Is there a reason why you decided to only allow one step to timeout. >> What if the cause of a timeout in a step also impacts other steps, >> or is that not common when we see timeouts? >> >> >> That is mostly guesswork. In our (SAP) code we allow for four steps (so >> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps >> known to be long" where timeouts are disabled altogether. But we also >> have more complicated error reporting steps, so when porting the patch >> to OpenJDK, I felt the complexity was unneeded. >> >> I think in general you will only have one misbehaving step, but you are >> right, more than one step may timeout if e.g. the file system is slow. >> I'm open for suggestions: the timeout value should be large enough not >> to be hit for "normal slow steps" while still leave room enough for >> other steps to finish. What do you think a reasonable timeout value >> would be? ErrorLogTimeout/4? >> >> >> >> It's not clear to me why you changed a couple of os::sleep() calls >> to os::naked_short_sleep(), and the rationale for the sleep periods. >> Can you please explain? >> >> >> Because os::sleep() does a lot of work under the hood and relies on a >> bit of VM infrastructure. I think that is not a good idea in error >> situations where potentially everything may be broken already. You want >> to step lightly and really only do a naked system sleep. About the sleep >> periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I >> have to stay below to not hit the assert. I did use 999ms as the longest >> interval I am allowed to sleep nakedly. And after the timeout hit and >> before the WatcherThread calls os::abort, I again sleep 200ms to give >> the error reporter thread time to write the "error log aborted due to >> timeout" into the error log and to flush the error log. Those 200ms are >> just guesswork. >> >> >> thanks, >> >> Chris >> >> >> Thanks for the review! >> >> Kind Regards, Thomas >> >> >> >> >> On 10/12/16 9:55 PM, Thomas St?fe wrote: >> >> Dear all, >> >> please take a look at the following fix: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >> >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html >> >> >> --- >> >> In short, this fix provides the ability to cancel hanging error >> reporting >> steps. This uses the same code paths secondary error handling >> uses during >> error reporting. With this patch, steps which take too long >> will be >> canceled after 1/2 ErrorLogTimeout. In the log file, it will >> look like this: >> >> 4 [timeout occurred during error reporting in step ""] >> after xxxx >> ms. >> 5 >> >> and we now also get a finish message in the hs-err file if we >> hit the >> ErrorLogTimeout and error reporting will stop altogether: >> >> 6 ------ Timout during error reporting after xxx ms. ------ >> >> (in addition to the "time expired, abort" message the >> WatcherThread writes >> to stderr) >> >> --- >> >> This is something which bugged us for a long time, because we >> rely heavily >> on the hs_err files for error analysis at customer sites, and >> there are a >> number of reasons why one step may hang and prevent the >> follow-up steps >> from running. >> >> It works like this: >> >> Before, when error reporting started, the WatcherThread was >> waiting for >> ErrorLogTimeout seconds, then would stop the VM. >> >> Now, the WatcherThread periodically pings error reporting, which >> checks if >> the last step did timeout. If it does, it sends a signal to the >> reporting >> thread, and the thread will continue with the next step. This >> follows the >> same path as secondary crash handling. >> >> Some implementation details: >> >> On Posix platforms, to interrupt the thread, I use pthread_kill. >> This means >> I must know the pthread id of the reporting thread, which I now >> store at >> the beginning of error reporting. We already store the reporting >> thread id >> in first_error_tid, but that I cannot use, because it gets >> set by >> os::current_thread_id(), which is not always the pthread id. >> Should we ever >> switch to only using pthread id for posix platforms, this coding >> can be >> simplified. >> >> On Windows, there is unfortunately no easy way to interrupt a >> non-cooperative thread. I would need a way to cause a SEH inside >> the target >> thread, which then would get handled by secondary error handling >> like on >> Posix platforms, but that is not easy. It is doable - one can >> suspend the >> thread, modify the thread context in a way that it will crash >> upon resume. >> But that felt a bit heavyweight for this problem. So on windows, >> timeout >> handling still works (after ErrorLogTimeout the VM gets shut >> down), but >> error reporting steps are not interruptable. If we feel this is >> important, >> this can be added later. >> >> Kind Regards, Thomas >> >> >> >> >> From thomas.stuefe at gmail.com Thu Oct 27 06:09:23 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Oct 2016 08:09:23 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com> References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> <113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com> Message-ID: Hi Chris, On Wed, Oct 26, 2016 at 9:27 PM, Chris Plummer wrote: > Hi Thomas, > > On 10/26/16 7:45 AM, Thomas St?fe wrote: > > Hi Chris, > > Thanks for the review! > > New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/ > 8166944-Hanging-Error-Reporting/webrev.01/webrev/ > > Comments inline. > > On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer > wrote: > >> Hi Tomas, >> >> See JDK-8156821. I'm curious as to how your changes will impact it, since >> David says you can't interrupt a thread blocked trying to acquire mutex. I >> suspect that means this enhancement won't help in this case, and presumably >> in general you are not fixing the issue of error reporting getting >> deadlocked, or maybe I'm misinterpreting what David said in JDK-8156821. >> > > Not sure what exactly David meant with "You can't "interrupt" a thread > that is blocked trying to acquire a mutex." Maybe he can elaborate :) > > My guesses: > > 1) If he meant "you cannot interrupt a thread blocking in > pthread_mutex_lock()" - not true, you can and my patch works just fine in > this situation. Just tested again, to be sure. This covers crashes in > sections guarded by pthread_mutex, which then try to reaquire the lock in > the error handler. > > 2) If he meant "you cannot interrupt malloc if it is executing a system > call in the linux kernel" - that may be. I am not a linux kernel expert > but would have thought that syscalls may block if interrupts are disabled > for certain lengths by the syscall author. But in that case i would have > expected the process to hang too and to be not killable? Again, I am no > expert. > > Ok. I'll let David explain once he's available. > > > >> >> Otherwise overall your changes look good, but I have a few comments. >> Also, since this is an enhancement, it needs to wait for JDK 10. >> >> I think your test will fail for product builds. You should add "@requires >> vm.debug == true". Also, java files use 4 char indentation, not 2 like we >> use in hotspot C/C++ code. Lastly, it should only have a 2016 copyright. >> >> > Thank you for the hints. Did fix all that. Note that I had disabled the > test for product builds in the code (!Platform.isDebugBuild()) but I > added the vm.debug tag as well as you suggested. > > Ah, sorry I missed that, but IMHO the Platform checks should only be used > to alter test behavior, not completely disable the entire test. @requires > is best for disabling a test for certain platforms and builds. You should > probably remove the Platform checks and also add 'os.family != "windows"' > to the @requires line. > Good point, thank you! Will adjust the test accordingly. > > > >> A couple of files need the copyright updated to 2016. >> >> Why do set_to_now() and get_timestamp() need to be atomic, and what are >> the consequences of cx8 not being supported? >> >> > The error reporting thread sets the timestamp on each STEP start, and the > timestamp is read from another thread, the WatcherThread. Timestamp is > 64bit. I wanted to make sure the 64bit value is written and read > atomically, especially on 32bit platforms. > > But then, I had to check whether 64bit atomic stores/loads are even > supported by this platform (I actually did not find a 32bit platform > whithout 64bit atomics, but the comment in atomic.hpp is pretty insistent > and I did not want to risk regressions for other platforms). > > Well, if no cx8 support was available, I pretty much just give up and read > and write timestamps directly. As I said, I am not sure if this code path > gets ever executed. > > Maybe I was overthinking all this and just reading and writing the (C++ > volatile) jlongs would have been enough, but I wanted to prevent sporadic > test errors because of incompletely read 64bit values. > > Closed ports may not have cx8 support, although I don't believe any are > being released with JDK9. Since you just have one writer and one reader, I > think the only concern is word tearing on the read. For this reason you > likely need the cx8 support. David would know, so hopefully he can comment > on this. > > Assuming you need cx8 support, theoretically there are platforms where > your code could fail due to not having cx8 support. You could argue that > the risk of word tearing is minimal, both in likelihood of it happening > (race condition on a platform we aren't currently officially supporting), > and the possible negative behavior if it does (premature timeout or > possibly no timeout, but only with debug builds after a crash). > > The other choice here is to just disable the whole timeout mechanism if > cx8 is not supported. In fact simply making set_to_now() and > get_timestamp() no-ops when cx8 is not supported would accomplish that, > although I'd suggest also adding some more explicit disabling of the code > wherever the timestamps are referenced. > Another thing I thought of would be to change the timestamp to 32bit - I only need second resolution - and handle somehow the year 2038 overflow. But the most easy and pragmatic way is to either ignore the problem completely for non-cx8 or to do anything. > > BTW, the statics you added should probably all be made fields of VMError > rather than in the global scope. > > > >> 1282 st->print_raw_cr(buffer); >> 1283 st->cr(); >> >> The old code had an additional st->cr() before the above lines. I assume >> you removed it intentionally. >> >> > I hope I preserved the numbers of cr(). At least that was my intention: > > 1260 outputStream* const st = log.is_open() ? &log : &out; > 1261 st->cr(); > > ... > > and then on every path, a cr (or print_raw_cr) at the end. Where do you > see the missing cr()? > > Ok. It's just moved up about 20 lines of code now so I missed it. > > > > >> Is there a reason why you decided to only allow one step to timeout. What >> if the cause of a timeout in a step also impacts other steps, or is that >> not common when we see timeouts? >> >> > That is mostly guesswork. In our (SAP) code we allow for four steps (so > ErrorLogTimeout/4 as step timeout) and additionally allow for "steps known > to be long" where timeouts are disabled altogether. But we also have more > complicated error reporting steps, so when porting the patch to OpenJDK, I > felt the complexity was unneeded. > > I think in general you will only have one misbehaving step, but you are > right, more than one step may timeout if e.g. the file system is slow. I'm > open for suggestions: the timeout value should be large enough not to be > hit for "normal slow steps" while still leave room enough for other steps > to finish. What do you think a reasonable timeout value would be? > ErrorLogTimeout/4? > > I don't think we've run into the "slow steps" case causing timeout, just > the deadlocks, so I don't really have any data to give you. If you are > primarily concerned about deadlocks, then you want ErrorLogTimeout div a > fairly large number. If you are mostly addressing slow steps, then you div > with a smallish number. I think I'd prefer /4 over /2, maybe even bigger. > I'll think about it. Maybe 1/4 is a good compromise. > > > > >> It's not clear to me why you changed a couple of os::sleep() calls to >> os::naked_short_sleep(), and the rationale for the sleep periods. Can you >> please explain? >> >> > Because os::sleep() does a lot of work under the hood and relies on a bit > of VM infrastructure. I think that is not a good idea in error situations > where potentially everything may be broken already. You want to step > lightly and really only do a naked system sleep. About the sleep periods, > os::naked_sleep has an inbuilt maximum value of 1000ms, which I have to > stay below to not hit the assert. I did use 999ms as the longest interval I > am allowed to sleep nakedly. And after the timeout hit and before the > WatcherThread calls os::abort, I again sleep 200ms to give the error > reporter thread time to write the "error log aborted due to timeout" into > the error log and to flush the error log. Those 200ms are just guesswork. > > Ok. I'd like to see someone else comment on the os::naked_sleep() use > since it's not something I'm familiar enough with. > > thanks, > > Chris > Thanks, Chris. Thomas > > > >> thanks, >> >> Chris >> >> > Thanks for the review! > > Kind Regards, Thomas > > > >> >> On 10/12/16 9:55 PM, Thomas St?fe wrote: >> >>> Dear all, >>> >>> please take a look at the following fix: >>> >>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >>> webrev: >>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-E >>> rror-Reporting/webrev.00/webrev/index.html >>> >>> --- >>> >>> In short, this fix provides the ability to cancel hanging error reporting >>> steps. This uses the same code paths secondary error handling uses during >>> error reporting. With this patch, steps which take too long will be >>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like >>> this: >>> >>> 4 [timeout occurred during error reporting in step ""] after >>> xxxx >>> ms. >>> 5 >>> >>> and we now also get a finish message in the hs-err file if we hit the >>> ErrorLogTimeout and error reporting will stop altogether: >>> >>> 6 ------ Timout during error reporting after xxx ms. ------ >>> >>> (in addition to the "time expired, abort" message the WatcherThread >>> writes >>> to stderr) >>> >>> --- >>> >>> This is something which bugged us for a long time, because we rely >>> heavily >>> on the hs_err files for error analysis at customer sites, and there are a >>> number of reasons why one step may hang and prevent the follow-up steps >>> from running. >>> >>> It works like this: >>> >>> Before, when error reporting started, the WatcherThread was waiting for >>> ErrorLogTimeout seconds, then would stop the VM. >>> >>> Now, the WatcherThread periodically pings error reporting, which checks >>> if >>> the last step did timeout. If it does, it sends a signal to the reporting >>> thread, and the thread will continue with the next step. This follows the >>> same path as secondary crash handling. >>> >>> Some implementation details: >>> >>> On Posix platforms, to interrupt the thread, I use pthread_kill. This >>> means >>> I must know the pthread id of the reporting thread, which I now store at >>> the beginning of error reporting. We already store the reporting thread >>> id >>> in first_error_tid, but that I cannot use, because it gets set by >>> os::current_thread_id(), which is not always the pthread id. Should we >>> ever >>> switch to only using pthread id for posix platforms, this coding can be >>> simplified. >>> >>> On Windows, there is unfortunately no easy way to interrupt a >>> non-cooperative thread. I would need a way to cause a SEH inside the >>> target >>> thread, which then would get handled by secondary error handling like on >>> Posix platforms, but that is not easy. It is doable - one can suspend the >>> thread, modify the thread context in a way that it will crash upon >>> resume. >>> But that felt a bit heavyweight for this problem. So on windows, timeout >>> handling still works (after ErrorLogTimeout the VM gets shut down), but >>> error reporting steps are not interruptable. If we feel this is >>> important, >>> this can be added later. >>> >>> Kind Regards, Thomas >>> >> >> >> >> > > From thomas.stuefe at gmail.com Thu Oct 27 07:16:39 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Oct 2016 09:16:39 +0200 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> Message-ID: Hi David, On Thu, Oct 27, 2016 at 2:40 AM, David Holmes wrote: > On 27/10/2016 12:45 AM, Thomas St?fe wrote: > >> Hi Chris, >> >> Thanks for the review! >> >> New >> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging- >> Error-Reporting/webrev.01/webrev/ >> > > Have not looked at this yet. > > Comments inline. >> >> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer > > wrote: >> >> Hi Tomas, >> >> See JDK-8156821. I'm curious as to how your changes will impact it, >> since David says you can't interrupt a thread blocked trying to >> acquire mutex. I suspect that means this enhancement won't help in >> this case, and presumably in general you are not fixing the issue of >> error reporting getting deadlocked, or maybe I'm misinterpreting >> what David said in JDK-8156821. >> > > That should be 8156823 > > >> >> Not sure what exactly David meant with "You can't "interrupt" a thread >> that is blocked trying to acquire a mutex." Maybe he can elaborate :) >> >> My guesses: >> >> 1) If he meant "you cannot interrupt a thread blocking in >> pthread_mutex_lock()" - not true, you can and my patch works just fine >> in this situation. Just tested again, to be sure. This covers crashes in >> sections guarded by pthread_mutex, which then try to reaquire the lock >> in the error handler. >> > > There is no specified, portable way to get a thread blocked acquiring a > mutex to stop waiting for the mutex. That is what I meant. > pthread_mutex_lock is not a cancellation point, nor will it return EINTR in > response to a signal. > > However, if a signal is received by the thread while waiting then POSIX > semantics indicate that the signal handler will run and then return the > thread to the waiting state. In our case the crash handler does not return > so we are into undefined territory there - but our crash handler is already > not a well-defined signal handler as it is not restricted to > async-signal-safe functions, so we already run a risk when executing it. > > That was what I meant. Syscalls have not to be interruptible per design, they just have to call user signal handlers for asynchronous signals. And I think my patch does not make matters worse or more unsafe. There is no new concept here - I use the pre-existing secondary signal handling and that only in situations in which otherwise the error handling would very probably hang forever, not producing any error log at all. > I had not considered this aspect in relation to 8156823, so the proposed > approach here would also attempt to address that issue. > > 2) If he meant "you cannot interrupt malloc if it is executing a system >> call in the linux kernel" - that may be. I am not a linux kernel expert >> but would have thought that syscalls may block if interrupts are >> disabled for certain lengths by the syscall author. But in that case i >> would have expected the process to hang too and to be not killable? >> Again, I am no expert. >> > > Note "interrupt" here is a logical concept not related to hardware level > interrupts. I don't know at what point going into malloc you will no longer > get signal handlers run - malloc doesn't use pthread level mutexes, but > direct futuxes, so the same signal responsiveness may not be present. > I will have to take a closer look at what the glibc does. I always thought that any locks it takes in user space are interruptible by signals, and that libc calls only become uninterruptible when it calls kernel syscalls - and those kernel syscalls cannot be interrupted (that was what I meant with interrupts disabled). It may be wrong. Ill have to rethink this. But whatever the outcome, there may be situations where a thread cannot be interrupted by pthread_kill, but I think those cases are rare. More often we just wait in very ordinary situations, be it a pthread mutex deadlock or a slow file system. e.g. one of the more common scenarios is when you want to print a stack trace and try to load symbol information to resolve an pc to a name. Thanks, Thomas > > Thanks, > David > ----- > > > >> >> Otherwise overall your changes look good, but I have a few comments. >> Also, since this is an enhancement, it needs to wait for JDK 10. >> >> I think your test will fail for product builds. You should add >> "@requires vm.debug == true". Also, java files use 4 char >> indentation, not 2 like we use in hotspot C/C++ code. Lastly, it >> should only have a 2016 copyright. >> >> >> Thank you for the hints. Did fix all that. Note that I had disabled the >> test for product builds in the code (!Platform.isDebugBuild()) but I >> added the vm.debug tag as well as you suggested. >> >> >> A couple of files need the copyright updated to 2016. >> >> Why do set_to_now() and get_timestamp() need to be atomic, and what >> are the consequences of cx8 not being supported? >> >> >> The error reporting thread sets the timestamp on each STEP start, and >> the timestamp is read from another thread, the WatcherThread. Timestamp >> is 64bit. I wanted to make sure the 64bit value is written and read >> atomically, especially on 32bit platforms. >> >> But then, I had to check whether 64bit atomic stores/loads are even >> supported by this platform (I actually did not find a 32bit platform >> whithout 64bit atomics, but the comment in atomic.hpp is pretty >> insistent and I did not want to risk regressions for other platforms). >> >> Well, if no cx8 support was available, I pretty much just give up and >> read and write timestamps directly. As I said, I am not sure if this >> code path gets ever executed. >> >> Maybe I was overthinking all this and just reading and writing the (C++ >> volatile) jlongs would have been enough, but I wanted to prevent >> sporadic test errors because of incompletely read 64bit values. >> >> >> 1282 st->print_raw_cr(buffer); >> 1283 st->cr(); >> >> The old code had an additional st->cr() before the above lines. I >> assume you removed it intentionally. >> >> >> I hope I preserved the numbers of cr(). At least that was my intention: >> >> 1260 outputStream* const st = log.is_open() ? &log : &out; >> 1261 st->cr(); >> >> ... >> >> and then on every path, a cr (or print_raw_cr) at the end. Where do you >> see the missing cr()? >> >> >> >> Is there a reason why you decided to only allow one step to timeout. >> What if the cause of a timeout in a step also impacts other steps, >> or is that not common when we see timeouts? >> >> >> That is mostly guesswork. In our (SAP) code we allow for four steps (so >> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps >> known to be long" where timeouts are disabled altogether. But we also >> have more complicated error reporting steps, so when porting the patch >> to OpenJDK, I felt the complexity was unneeded. >> >> I think in general you will only have one misbehaving step, but you are >> right, more than one step may timeout if e.g. the file system is slow. >> I'm open for suggestions: the timeout value should be large enough not >> to be hit for "normal slow steps" while still leave room enough for >> other steps to finish. What do you think a reasonable timeout value >> would be? ErrorLogTimeout/4? >> >> >> >> It's not clear to me why you changed a couple of os::sleep() calls >> to os::naked_short_sleep(), and the rationale for the sleep periods. >> Can you please explain? >> >> >> Because os::sleep() does a lot of work under the hood and relies on a >> bit of VM infrastructure. I think that is not a good idea in error >> situations where potentially everything may be broken already. You want >> to step lightly and really only do a naked system sleep. About the sleep >> periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I >> have to stay below to not hit the assert. I did use 999ms as the longest >> interval I am allowed to sleep nakedly. And after the timeout hit and >> before the WatcherThread calls os::abort, I again sleep 200ms to give >> the error reporter thread time to write the "error log aborted due to >> timeout" into the error log and to flush the error log. Those 200ms are >> just guesswork. >> >> >> thanks, >> >> Chris >> >> >> Thanks for the review! >> >> Kind Regards, Thomas >> >> >> >> >> On 10/12/16 9:55 PM, Thomas St?fe wrote: >> >> Dear all, >> >> please take a look at the following fix: >> >> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 >> >> webrev: >> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging- >> Error-Reporting/webrev.00/webrev/index.html >> > Error-Reporting/webrev.00/webrev/index.html> >> >> --- >> >> In short, this fix provides the ability to cancel hanging error >> reporting >> steps. This uses the same code paths secondary error handling >> uses during >> error reporting. With this patch, steps which take too long will >> be >> canceled after 1/2 ErrorLogTimeout. In the log file, it will >> look like this: >> >> 4 [timeout occurred during error reporting in step ""] >> after xxxx >> ms. >> 5 >> >> and we now also get a finish message in the hs-err file if we >> hit the >> ErrorLogTimeout and error reporting will stop altogether: >> >> 6 ------ Timout during error reporting after xxx ms. ------ >> >> (in addition to the "time expired, abort" message the >> WatcherThread writes >> to stderr) >> >> --- >> >> This is something which bugged us for a long time, because we >> rely heavily >> on the hs_err files for error analysis at customer sites, and >> there are a >> number of reasons why one step may hang and prevent the >> follow-up steps >> from running. >> >> It works like this: >> >> Before, when error reporting started, the WatcherThread was >> waiting for >> ErrorLogTimeout seconds, then would stop the VM. >> >> Now, the WatcherThread periodically pings error reporting, which >> checks if >> the last step did timeout. If it does, it sends a signal to the >> reporting >> thread, and the thread will continue with the next step. This >> follows the >> same path as secondary crash handling. >> >> Some implementation details: >> >> On Posix platforms, to interrupt the thread, I use pthread_kill. >> This means >> I must know the pthread id of the reporting thread, which I now >> store at >> the beginning of error reporting. We already store the reporting >> thread id >> in first_error_tid, but that I cannot use, because it gets set by >> os::current_thread_id(), which is not always the pthread id. >> Should we ever >> switch to only using pthread id for posix platforms, this coding >> can be >> simplified. >> >> On Windows, there is unfortunately no easy way to interrupt a >> non-cooperative thread. I would need a way to cause a SEH inside >> the target >> thread, which then would get handled by secondary error handling >> like on >> Posix platforms, but that is not easy. It is doable - one can >> suspend the >> thread, modify the thread context in a way that it will crash >> upon resume. >> But that felt a bit heavyweight for this problem. So on windows, >> timeout >> handling still works (after ErrorLogTimeout the VM gets shut >> down), but >> error reporting steps are not interruptable. If we feel this is >> important, >> this can be added later. >> >> Kind Regards, Thomas >> >> >> >> >> >> From staffan.larsen at oracle.com Thu Oct 27 08:36:48 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 27 Oct 2016 10:36:48 +0200 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions Message-ID: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> All, Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java bug: https://bugs.openjdk.java.net/browse/JDK-8168305 webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ Thanks, /Staffan From robbin.ehn at oracle.com Thu Oct 27 09:23:34 2016 From: robbin.ehn at oracle.com (Robbin Ehn) Date: Thu, 27 Oct 2016 11:23:34 +0200 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> Message-ID: <7e63d940-5ab1-5858-9d0e-c352d16c989a@oracle.com> Looks good! /Robbin On 10/27/2016 10:36 AM, Staffan Larsen wrote: > All, > > Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. > > While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. > > I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java > > bug: https://bugs.openjdk.java.net/browse/JDK-8168305 > webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ > > Thanks, > /Staffan > From marcus.larsson at oracle.com Thu Oct 27 09:41:48 2016 From: marcus.larsson at oracle.com (Marcus Larsson) Date: Thu, 27 Oct 2016 11:41:48 +0200 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> Message-ID: <7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com> Hi Staffan, On 2016-10-27 10:36, Staffan Larsen wrote: > All, > > Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. > > While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. > > I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java > > bug: https://bugs.openjdk.java.net/browse/JDK-8168305 > webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ The command description still mentions the flag, see src/share/vm/services/diagnosticCommand.hpp:389 Apart from that this looks good to me! Thanks, Marcus > > Thanks, > /Staffan From martin.doerr at sap.com Thu Oct 27 10:10:13 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Thu, 27 Oct 2016 10:10:13 +0000 Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 In-Reply-To: <13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com> References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap> <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com> <13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com> Message-ID: <39282c5a307648cb931e0c2781cc5810@DEWDFE13DE10.global.corp.sap> Hi Coleen, thanks for your email and for opening the bug. Reloading of ConstMethod is not restricted to load_mirror(). E.g. SPARC's generate_fixed_frame loads it 4 times. Therefore, I have added a comment to the bug. I guess the load_mirror change alone is not so relevant, but I appreciate any cleanup there as well. Thanks and best regards, Martin -----Original Message----- From: Coleen Phillimore [mailto:coleen.phillimore at oracle.com] Sent: Mittwoch, 26. Oktober 2016 21:32 To: Doerr, Martin ; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867 On 10/20/16 4:58 AM, Doerr, Martin wrote: > Hi Coleen, > > thank you very much for reviewing my PPC change. > > We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important. > A large amount of less optimized changes will make it slower over time. > That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well. > Maybe we should improve them in 10. I don't know. I though load_mirror() made for a nice API. Does the extra indirect matter? I filed RFE https://bugs.openjdk.java.net/browse/JDK-8168795 so we can investigate further in 10. This is approved and I think reviewed so you can check it in anytime. I put a due date of Friday on your bug. Feel free to change it if that's not good. Thanks, Coleen > > Best regards, > Martin > > > -----Original Message----- > From: hotspot-runtime-dev > [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of > Coleen Phillimore > Sent: Dienstag, 18. Oktober 2016 23:56 > To: hotspot-runtime-dev at openjdk.java.net > Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter > after 8154580 and 8154867 > > > This seems good. I think it's a shame to change load_mirror() to > load_mirror_from_const_method() though because there's load_mirror() > with the same parameters on all the other platforms and it makes > platform development a little easier. But that's up to you to because > you can generate shorter sequences. > > Coleen > > > On 10/17/16 12:38 PM, Doerr, Martin wrote: >> Hi, >> >> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9: >> >> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed). >> >> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev. >> >> Webrev is here: >> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webre >> v.00/ >> >> Please review. >> >> Thanks and best regards, >> Martin >> From staffan.larsen at oracle.com Thu Oct 27 11:25:53 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 27 Oct 2016 13:25:53 +0200 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions In-Reply-To: <7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com> References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> <7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com> Message-ID: > On 27 Oct 2016, at 11:41, Marcus Larsson wrote: > > Hi Staffan, > > > On 2016-10-27 10:36, Staffan Larsen wrote: >> All, >> >> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. >> >> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. >> >> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8168305 >> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ > > The command description still mentions the flag, see > src/share/vm/services/diagnosticCommand.hpp:389 Well spotted! new webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.02/ > Apart from that this looks good to me! Thanks. > > Thanks, > Marcus > >> >> Thanks, >> /Staffan From thomas.stuefe at gmail.com Thu Oct 27 12:08:03 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Oct 2016 14:08:03 +0200 Subject: metaspace.cpp: why are counters in ChunkManager updated atomically? Message-ID: Hi all, I am currently working on a prototype for https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question about the ChunkManager class in metaspace.cpp. ChunkManager has the _free_chunks_total, _free_chunks_count counters. It seems the coding goes some lengths to avoid updating them often, so instead of updating them when a chunk is freed it attemps to delay and accumulate updates. This makes changing the MetaChunk allocation quite complicated, because there are large windows during which the counters are invalid and do not reflect reality. I see that the counters are updated atomically, so I assume the reason for delaying the updates is that atomics are expensive. But I could not find a good reason why the counters are updated atomically. To me, all modifications seem happen under lock protection (SpaceManager::expand_lock()). What am I overlooking? Thanks a lot, Kind Regards, Thomas From mikael.gerdin at oracle.com Thu Oct 27 12:41:26 2016 From: mikael.gerdin at oracle.com (Mikael Gerdin) Date: Thu, 27 Oct 2016 14:41:26 +0200 Subject: metaspace.cpp: why are counters in ChunkManager updated atomically? In-Reply-To: References: Message-ID: Hi Thomas, On 2016-10-27 14:08, Thomas St?fe wrote: > Hi all, > > I am currently working on a prototype for > https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question about > the ChunkManager class in metaspace.cpp. > > ChunkManager has the _free_chunks_total, _free_chunks_count counters. It > seems the coding goes some lengths to avoid updating them often, so instead > of updating them when a chunk is freed it attemps to delay and accumulate > updates. This makes changing the MetaChunk allocation quite complicated, > because there are large windows during which the counters are invalid and > do not reflect reality. > > I see that the counters are updated atomically, so I assume the reason for > delaying the updates is that atomics are expensive. But I could not find a > good reason why the counters are updated atomically. To me, all > modifications seem happen under lock protection > (SpaceManager::expand_lock()). What am I overlooking? I don't think you are overlooking anything. The fact that these are updated with atomics is something that I've noticed as well at some point but I don't think I ever got around to fixing that. I'm not sure I understand where in the code the delayed and accumulated updates take place but if you think that's the case then it's probably true. I suspect that at this point you are one of the handful of people who are familiar with the chunk allocation code :) Regards /Mikael > > Thanks a lot, > > Kind Regards, Thomas > From thomas.stuefe at gmail.com Thu Oct 27 12:56:48 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Oct 2016 14:56:48 +0200 Subject: metaspace.cpp: why are counters in ChunkManager updated atomically? In-Reply-To: References: Message-ID: Hi Mikael, On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin wrote: > Hi Thomas, > > On 2016-10-27 14:08, Thomas St?fe wrote: > >> Hi all, >> >> I am currently working on a prototype for >> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question >> about >> the ChunkManager class in metaspace.cpp. >> >> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It >> seems the coding goes some lengths to avoid updating them often, so >> instead >> of updating them when a chunk is freed it attemps to delay and accumulate >> updates. This makes changing the MetaChunk allocation quite complicated, >> because there are large windows during which the counters are invalid and >> do not reflect reality. >> >> I see that the counters are updated atomically, so I assume the reason for >> delaying the updates is that atomics are expensive. But I could not find a >> good reason why the counters are updated atomically. To me, all >> modifications seem happen under lock protection >> (SpaceManager::expand_lock()). What am I overlooking? >> > > I don't think you are overlooking anything. The fact that these are > updated with atomics is something that I've noticed as well at some point > but I don't think I ever got around to fixing that. > > > I'm not sure I understand where in the code the delayed and accumulated > updates take place but if you think that's the case then it's probably > true. I suspect that at this point you are one of the handful of people who > are familiar with the chunk allocation code :) > > :) That is good news, because I then can straighten the updates out, this makes making the changes much simpler. Will probably do this in a separate fix. Thank you! Thomas > Regards > /Mikael > > > >> Thanks a lot, >> >> Kind Regards, Thomas >> >> From erik.helin at oracle.com Thu Oct 27 13:26:31 2016 From: erik.helin at oracle.com (Erik Helin) Date: Thu, 27 Oct 2016 15:26:31 +0200 Subject: metaspace.cpp: why are counters in ChunkManager updated atomically? In-Reply-To: References: Message-ID: <2121d75b-4762-0961-74a0-edef212b651c@oracle.com> On 10/27/2016 02:56 PM, Thomas St?fe wrote: > Hi Mikael, > > On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin > wrote: > >> Hi Thomas, >> >> On 2016-10-27 14:08, Thomas St?fe wrote: >> >>> Hi all, >>> >>> I am currently working on a prototype for >>> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question >>> about >>> the ChunkManager class in metaspace.cpp. >>> >>> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It >>> seems the coding goes some lengths to avoid updating them often, so >>> instead >>> of updating them when a chunk is freed it attemps to delay and accumulate >>> updates. This makes changing the MetaChunk allocation quite complicated, >>> because there are large windows during which the counters are invalid and >>> do not reflect reality. >>> >>> I see that the counters are updated atomically, so I assume the reason for >>> delaying the updates is that atomics are expensive. But I could not find a >>> good reason why the counters are updated atomically. To me, all >>> modifications seem happen under lock protection >>> (SpaceManager::expand_lock()). What am I overlooking? >>> >> >> I don't think you are overlooking anything. The fact that these are >> updated with atomics is something that I've noticed as well at some point >> but I don't think I ever got around to fixing that. >> >> >> I'm not sure I understand where in the code the delayed and accumulated >> updates take place but if you think that's the case then it's probably >> true. I suspect that at this point you are one of the handful of people who >> are familiar with the chunk allocation code :) >> >> > :) That is good news, because I then can straighten the updates out, this > makes making the changes much simpler. Will probably do this in a separate > fix. Thank you! I came to the same conclusion as Mikael the last time I checked, but due to lack of time I didn't got around to fix it. Please send this out as a separate patch, it will make reviewing much easier. Thanks, Erik > Thomas > > >> Regards >> /Mikael >> >> >> >>> Thanks a lot, >>> >>> Kind Regards, Thomas >>> >>> From thomas.stuefe at gmail.com Thu Oct 27 13:39:31 2016 From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=) Date: Thu, 27 Oct 2016 15:39:31 +0200 Subject: metaspace.cpp: why are counters in ChunkManager updated atomically? In-Reply-To: <2121d75b-4762-0961-74a0-edef212b651c@oracle.com> References: <2121d75b-4762-0961-74a0-edef212b651c@oracle.com> Message-ID: On Thu, Oct 27, 2016 at 3:26 PM, Erik Helin wrote: > On 10/27/2016 02:56 PM, Thomas St?fe wrote: > >> Hi Mikael, >> >> On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin >> wrote: >> >> Hi Thomas, >>> >>> On 2016-10-27 14:08, Thomas St?fe wrote: >>> >>> Hi all, >>>> >>>> I am currently working on a prototype for >>>> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question >>>> about >>>> the ChunkManager class in metaspace.cpp. >>>> >>>> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It >>>> seems the coding goes some lengths to avoid updating them often, so >>>> instead >>>> of updating them when a chunk is freed it attemps to delay and >>>> accumulate >>>> updates. This makes changing the MetaChunk allocation quite complicated, >>>> because there are large windows during which the counters are invalid >>>> and >>>> do not reflect reality. >>>> >>>> I see that the counters are updated atomically, so I assume the reason >>>> for >>>> delaying the updates is that atomics are expensive. But I could not >>>> find a >>>> good reason why the counters are updated atomically. To me, all >>>> modifications seem happen under lock protection >>>> (SpaceManager::expand_lock()). What am I overlooking? >>>> >>>> >>> I don't think you are overlooking anything. The fact that these are >>> updated with atomics is something that I've noticed as well at some point >>> but I don't think I ever got around to fixing that. >>> >>> >>> I'm not sure I understand where in the code the delayed and accumulated >>> updates take place but if you think that's the case then it's probably >>> true. I suspect that at this point you are one of the handful of people >>> who >>> are familiar with the chunk allocation code :) >>> >>> >>> :) That is good news, because I then can straighten the updates out, this >> makes making the changes much simpler. Will probably do this in a separate >> fix. Thank you! >> > > I came to the same conclusion as Mikael the last time I checked, but due > to lack of time I didn't got around to fix it. Please send this out as a > separate patch, it will make reviewing much easier. > > Thanks, > Erik > > Thanks, Erik, will do that. > > Thomas >> >> >> Regards >>> /Mikael >>> >>> >>> >>> Thanks a lot, >>>> >>>> Kind Regards, Thomas >>>> >>>> >>>> From david.holmes at oracle.com Thu Oct 27 17:40:49 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 28 Oct 2016 03:40:49 +1000 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> <113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com> Message-ID: Just picking up some a couple of specific discussion points ... On 27/10/2016 4:09 PM, Thomas St?fe wrote: > On Wed, Oct 26, 2016 at 9:27 PM, Chris Plummer > wrote: >> But then, I had to check whether 64bit atomic stores/loads are >> even supported by this platform (I actually did not find a 32bit >> platform whithout 64bit atomics, but the comment in atomic.hpp is >> pretty insistent and I did not want to risk regressions for other >> platforms). >> >> Well, if no cx8 support was available, I pretty much just give up >> and read and write timestamps directly. As I said, I am not sure >> if this code path gets ever executed. >> >> Maybe I was overthinking all this and just reading and writing the >> (C++ volatile) jlongs would have been enough, but I wanted to >> prevent sporadic test errors because of incompletely read 64bit >> values. > Closed ports may not have cx8 support, although I don't believe any > are being released with JDK9. Since you just have one writer and one > reader, I think the only concern is word tearing on the read. For > this reason you likely need the cx8 support. David would know, so > hopefully he can comment on this. > > Assuming you need cx8 support, theoretically there are platforms > where your code could fail due to not having cx8 support. You could > argue that the risk of word tearing is minimal, both in likelihood > of it happening (race condition on a platform we aren't currently > officially supporting), and the possible negative behavior if it > does (premature timeout or possibly no timeout, but only with debug > builds after a crash). > > The other choice here is to just disable the whole timeout mechanism > if cx8 is not supported. In fact simply making set_to_now() and > get_timestamp() no-ops when cx8 is not supported would accomplish > that, although I'd suggest also adding some more explicit disabling > of the code wherever the timestamps are referenced. > > > Another thing I thought of would be to change the timestamp to 32bit - I > only need second resolution - and handle somehow the year 2038 overflow. > But the most easy and pragmatic way is to either ignore the problem > completely for non-cx8 or to do anything. PPC32 did not support CX8, which is why this is present in the codebase. (It is also present at the Java level too.) All platforms the JVM runs on must supported 64-bit atomic loads and stores by some means - to implement Java volatile long semantics. Even platforms that don't support CX8 have some means to do this eg by using FPU unit. But this isn't necessarily implemented in the Atomic class (it wasn't for PPC32 because there were no calls to those methods in the VM). The warnings in the atomic.hpp file were to avoid causally defining different Atomic ops for jlongs, when they could not be implemented efficiently on systems without CX8 support. So the onus was put back on the user of the API to check this and define an alternative - rather than, for example, forcing use of a global lock on such platforms. Given you are only using Atomic::load/store I think you can dispense with the supports_cx8 check, because, as I said, every platform must have some means to support such atomic loads/stores. And we currently don't have any ports that don't support CX8. >> It's not clear to me why you changed a couple of os::sleep() >> calls to os::naked_short_sleep(), and the rationale for the >> sleep periods. Can you please explain? >> >> >> Because os::sleep() does a lot of work under the hood and relies >> on a bit of VM infrastructure. I think that is not a good idea in >> error situations where potentially everything may be broken >> already. You want to step lightly and really only do a naked >> system sleep. About the sleep periods, os::naked_sleep has an >> inbuilt maximum value of 1000ms, which I have to stay below to not >> hit the assert. I did use 999ms as the longest interval I am >> allowed to sleep nakedly. And after the timeout hit and before the >> WatcherThread calls os::abort, I again sleep 200ms to give the >> error reporter thread time to write the "error log aborted due to >> timeout" into the error log and to flush the error log. Those >> 200ms are just guesswork. > Ok. I'd like to see someone else comment on the os::naked_sleep() > use since it's not something I'm familiar enough with. In this case, because we are in the WatcherThread, os::sleep will not doing anything interesting that relies on other VM infrastructure (modification of osThread wait-state only, calls to javaTimeNanos). For the same reason changing to naked_sleep is also fine. We can/should relax the assert in naked_sleep so that the sleep time is only limited for JavaThreads. Thanks, David From david.holmes at oracle.com Thu Oct 27 17:45:17 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 28 Oct 2016 03:45:17 +1000 Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error logs. In-Reply-To: References: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com> <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com> Message-ID: Hi Thomas, Totally agree your proposal makes things no better nor worse when it comes to what we do from the signal handler, and it may help with those deadlock situations. I'm not concerned about digging too deep into malloc to see whether it may or may not help in that particular case - it either will or it won't. Overall I think this is looking quite good. I hope we get the JDK10 repo very soon ... once the JDK10 project officially takes off 9and probably once the repo consolidation project has settled on a final repo layout). Thanks, David On 27/10/2016 5:16 PM, Thomas St?fe wrote: > Hi David, > > On Thu, Oct 27, 2016 at 2:40 AM, David Holmes > wrote: > > On 27/10/2016 12:45 AM, Thomas St?fe wrote: > > Hi Chris, > > Thanks for the review! > > New > webrev: > http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/ > > > > Have not looked at this yet. > > Comments inline. > > On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer > > >> wrote: > > Hi Tomas, > > See JDK-8156821. I'm curious as to how your changes will > impact it, > since David says you can't interrupt a thread blocked trying to > acquire mutex. I suspect that means this enhancement won't > help in > this case, and presumably in general you are not fixing the > issue of > error reporting getting deadlocked, or maybe I'm misinterpreting > what David said in JDK-8156821. > > > That should be 8156823 > > > > Not sure what exactly David meant with "You can't "interrupt" a > thread > that is blocked trying to acquire a mutex." Maybe he can > elaborate :) > > My guesses: > > 1) If he meant "you cannot interrupt a thread blocking in > pthread_mutex_lock()" - not true, you can and my patch works > just fine > in this situation. Just tested again, to be sure. This covers > crashes in > sections guarded by pthread_mutex, which then try to reaquire > the lock > in the error handler. > > > There is no specified, portable way to get a thread blocked > acquiring a mutex to stop waiting for the mutex. That is what I > meant. pthread_mutex_lock is not a cancellation point, nor will it > return EINTR in response to a signal. > > However, if a signal is received by the thread while waiting then > POSIX semantics indicate that the signal handler will run and then > return the thread to the waiting state. In our case the crash > handler does not return so we are into undefined territory there - > but our crash handler is already not a well-defined signal handler > as it is not restricted to async-signal-safe functions, so we > already run a risk when executing it. > > > That was what I meant. Syscalls have not to be interruptible per design, > they just have to call user signal handlers for asynchronous signals. > > And I think my patch does not make matters worse or more unsafe. There > is no new concept here - I use the pre-existing secondary signal > handling and that only in situations in which otherwise the error > handling would very probably hang forever, not producing any error log > at all. > > > I had not considered this aspect in relation to 8156823, so the > proposed approach here would also attempt to address that issue. > > 2) If he meant "you cannot interrupt malloc if it is executing a > system > call in the linux kernel" - that may be. I am not a linux kernel > expert > but would have thought that syscalls may block if interrupts are > disabled for certain lengths by the syscall author. But in that > case i > would have expected the process to hang too and to be not killable? > Again, I am no expert. > > > Note "interrupt" here is a logical concept not related to hardware > level interrupts. I don't know at what point going into malloc you > will no longer get signal handlers run - malloc doesn't use pthread > level mutexes, but direct futuxes, so the same signal responsiveness > may not be present. > > > I will have to take a closer look at what the glibc does. I always > thought that any locks it takes in user space are interruptible by > signals, and that libc calls only become uninterruptible when it calls > kernel syscalls - and those kernel syscalls cannot be interrupted (that > was what I meant with interrupts disabled). It may be wrong. Ill have to > rethink this. > > But whatever the outcome, there may be situations where a thread cannot > be interrupted by pthread_kill, but I think those cases are rare. More > often we just wait in very ordinary situations, be it a pthread mutex > deadlock or a slow file system. e.g. one of the more common scenarios is > when you want to print a stack trace and try to load symbol information > to resolve an pc to a name. > > Thanks, Thomas > > > > Thanks, > David > ----- > > > > > Otherwise overall your changes look good, but I have a few > comments. > Also, since this is an enhancement, it needs to wait for JDK 10. > > I think your test will fail for product builds. You should add > "@requires vm.debug == true". Also, java files use 4 char > indentation, not 2 like we use in hotspot C/C++ code. Lastly, it > should only have a 2016 copyright. > > > Thank you for the hints. Did fix all that. Note that I had > disabled the > test for product builds in the code (!Platform.isDebugBuild()) but I > added the vm.debug tag as well as you suggested. > > > A couple of files need the copyright updated to 2016. > > Why do set_to_now() and get_timestamp() need to be atomic, > and what > are the consequences of cx8 not being supported? > > > The error reporting thread sets the timestamp on each STEP > start, and > the timestamp is read from another thread, the WatcherThread. > Timestamp > is 64bit. I wanted to make sure the 64bit value is written and read > atomically, especially on 32bit platforms. > > But then, I had to check whether 64bit atomic stores/loads are even > supported by this platform (I actually did not find a 32bit platform > whithout 64bit atomics, but the comment in atomic.hpp is pretty > insistent and I did not want to risk regressions for other > platforms). > > Well, if no cx8 support was available, I pretty much just give > up and > read and write timestamps directly. As I said, I am not sure if this > code path gets ever executed. > > Maybe I was overthinking all this and just reading and writing > the (C++ > volatile) jlongs would have been enough, but I wanted to prevent > sporadic test errors because of incompletely read 64bit values. > > > 1282 st->print_raw_cr(buffer); > 1283 st->cr(); > > The old code had an additional st->cr() before the above > lines. I > assume you removed it intentionally. > > > I hope I preserved the numbers of cr(). At least that was my > intention: > > 1260 outputStream* const st = log.is_open() ? &log : &out; > 1261 st->cr(); > > ... > > and then on every path, a cr (or print_raw_cr) at the end. Where > do you > see the missing cr()? > > > > Is there a reason why you decided to only allow one step to > timeout. > What if the cause of a timeout in a step also impacts other > steps, > or is that not common when we see timeouts? > > > That is mostly guesswork. In our (SAP) code we allow for four > steps (so > ErrorLogTimeout/4 as step timeout) and additionally allow for "steps > known to be long" where timeouts are disabled altogether. But we > also > have more complicated error reporting steps, so when porting the > patch > to OpenJDK, I felt the complexity was unneeded. > > I think in general you will only have one misbehaving step, but > you are > right, more than one step may timeout if e.g. the file system is > slow. > I'm open for suggestions: the timeout value should be large > enough not > to be hit for "normal slow steps" while still leave room enough for > other steps to finish. What do you think a reasonable timeout value > would be? ErrorLogTimeout/4? > > > > It's not clear to me why you changed a couple of os::sleep() > calls > to os::naked_short_sleep(), and the rationale for the sleep > periods. > Can you please explain? > > > Because os::sleep() does a lot of work under the hood and relies > on a > bit of VM infrastructure. I think that is not a good idea in error > situations where potentially everything may be broken already. > You want > to step lightly and really only do a naked system sleep. About > the sleep > periods, os::naked_sleep has an inbuilt maximum value of 1000ms, > which I > have to stay below to not hit the assert. I did use 999ms as the > longest > interval I am allowed to sleep nakedly. And after the timeout > hit and > before the WatcherThread calls os::abort, I again sleep 200ms to > give > the error reporter thread time to write the "error log aborted > due to > timeout" into the error log and to flush the error log. Those > 200ms are > just guesswork. > > > thanks, > > Chris > > > Thanks for the review! > > Kind Regards, Thomas > > > > > On 10/12/16 9:55 PM, Thomas St?fe wrote: > > Dear all, > > please take a look at the following fix: > > Bug: https://bugs.openjdk.java.net/browse/JDK-8166944 > > > > webrev: > > http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html > > > > > > --- > > In short, this fix provides the ability to cancel > hanging error > reporting > steps. This uses the same code paths secondary error > handling > uses during > error reporting. With this patch, steps which take too > long will be > canceled after 1/2 ErrorLogTimeout. In the log file, it will > look like this: > > 4 [timeout occurred during error reporting in step > ""] > after xxxx > ms. > 5 > > and we now also get a finish message in the hs-err file > if we > hit the > ErrorLogTimeout and error reporting will stop altogether: > > 6 ------ Timout during error reporting after xxx ms. ------ > > (in addition to the "time expired, abort" message the > WatcherThread writes > to stderr) > > --- > > This is something which bugged us for a long time, > because we > rely heavily > on the hs_err files for error analysis at customer > sites, and > there are a > number of reasons why one step may hang and prevent the > follow-up steps > from running. > > It works like this: > > Before, when error reporting started, the WatcherThread was > waiting for > ErrorLogTimeout seconds, then would stop the VM. > > Now, the WatcherThread periodically pings error > reporting, which > checks if > the last step did timeout. If it does, it sends a signal > to the > reporting > thread, and the thread will continue with the next step. > This > follows the > same path as secondary crash handling. > > Some implementation details: > > On Posix platforms, to interrupt the thread, I use > pthread_kill. > This means > I must know the pthread id of the reporting thread, > which I now > store at > the beginning of error reporting. We already store the > reporting > thread id > in first_error_tid, but that I cannot use, because it > gets set by > os::current_thread_id(), which is not always the pthread id. > Should we ever > switch to only using pthread id for posix platforms, > this coding > can be > simplified. > > On Windows, there is unfortunately no easy way to > interrupt a > non-cooperative thread. I would need a way to cause a > SEH inside > the target > thread, which then would get handled by secondary error > handling > like on > Posix platforms, but that is not easy. It is doable - > one can > suspend the > thread, modify the thread context in a way that it will > crash > upon resume. > But that felt a bit heavyweight for this problem. So on > windows, > timeout > handling still works (after ErrorLogTimeout the VM gets shut > down), but > error reporting steps are not interruptable. If we feel > this is > important, > this can be added later. > > Kind Regards, Thomas > > > > > > From david.holmes at oracle.com Thu Oct 27 18:15:16 2016 From: david.holmes at oracle.com (David Holmes) Date: Fri, 28 Oct 2016 04:15:16 +1000 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> Message-ID: <2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com> On 27/10/2016 6:36 PM, Staffan Larsen wrote: > All, > > Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. Right - it doesn't make sense to have to use UnlockDiagnosticVMOptions to run any diagnostic Dcmd. Otherwise it should be a requirement for all. > While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. The only reason a Dcmd should require a specific VM option is if the Dcmd will not be able to function unless the VM was started with that option - IMHO :) Is that the case for LogTouchedMethods? Thanks, David > I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java > > bug: https://bugs.openjdk.java.net/browse/JDK-8168305 > webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ > > Thanks, > /Staffan > From staffan.larsen at oracle.com Thu Oct 27 19:19:27 2016 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 27 Oct 2016 21:19:27 +0200 Subject: RFR(S): 8168305 GC.class_stats should not require -XX:+UnlockDiagnosticVMOptions In-Reply-To: <2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com> References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com> <2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com> Message-ID: <1437F2CC-C254-47A0-A235-04A1B9E031CF@oracle.com> > On 27 Oct 2016, at 20:15, David Holmes wrote: > > On 27/10/2016 6:36 PM, Staffan Larsen wrote: >> All, >> >> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. > > Right - it doesn't make sense to have to use UnlockDiagnosticVMOptions to run any diagnostic Dcmd. Otherwise it should be a requirement for all. > >> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead. > > The only reason a Dcmd should require a specific VM option is if the Dcmd will not be able to function unless the VM was started with that option - IMHO :) Is that the case for LogTouchedMethods? I believe LogTouchedMethods stores a long list of all methods ever being run. We don?t want to have that enabled by default. It would maybe be a good future enhancement to be able to turn this on and off? > > Thanks, > David > >> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java >> >> bug: https://bugs.openjdk.java.net/browse/JDK-8168305 >> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/ >> >> Thanks, >> /Staffan >> From mandy.chung at oracle.com Fri Oct 28 02:54:58 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 27 Oct 2016 19:54:58 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> Message-ID: <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> > On Oct 27, 2016, at 3:28 PM, Brent Christian wrote: > > Hi, Mandy > > It looks pretty good to me. Just a couple small things: > > * StackTraceElement.java > > 379 ClassLoader loader = cls.getClassLoader0(); > > It looks as if 'loader' isn't used?? Good catch. Leftover code. Removed. > * Throwable.java > > 832 // VM to fill in StackTraceElement > 833 getStackTraceElements(stackTrace); > 834 // ensure the proper StackTraceElement initialization > 835 for (StackTraceElement ste : stackTrace) { > 836 ste.buildLoaderModuleClassName(); > 837 } > > For my own curiosity, why is this buildLoaderModuleClassName() call needed? When the VM fills in the stack trace, it sets Class object in StackTraceElement and the buildLoaderModuleClassName() call here to (1) build the output string whose format as described in the javadoc, and stored in a serial form (2) not to hold a strong reference to Class object. StackTraceElement is serializable and it can?t build the correct string, when deserialized. Mandy From serguei.spitsyn at oracle.com Fri Oct 28 07:06:39 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Fri, 28 Oct 2016 00:06:39 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: Hi Mandy, I have a few comments. http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java.udiff.html private static class BootClassLoader extends BuiltinClassLoader { BootClassLoader(URLClassPath bcp) { - super(null, bcp); + super(null, null, bcp); } . . . PlatformClassLoader(BootClassLoader parent) { - super(parent, null); + super("platform", parent, null); } . . . AppClassLoader(PlatformClassLoader parent, URLClassPath ucp) { - super(parent, ucp); + super("app", parent, ucp); this.ucp = ucp; } Can we give the bootstrap classloader the name "boot" or "bootstrap"? Or this will impact too many places, and so, very risky to do? http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/java/lang/StackTraceElement.java.frames.html 379 ClassLoader loader = cls.getClassLoader0(); The loader is unused. 402 private static String toLoaderModuleClassName(Class cls) { 403 ClassLoader loader = cls.getClassLoader0(); 404 Module m = cls.getModule(); 405 406 // First element - class loader name 407 String s = ""; 408 if (loader != null && !(loader instanceof BuiltinClassLoader) && 409 loader.getName() != null) { 410 s = loader.getName() + "/"; 411 } 412 413 // Second element - module name and version 414 if (m != null && m.isNamed()) { 415 s = s.isEmpty() ? m.getName() : s + m.getName(); 416 // drop version if it's JDK module tied with java.base, 417 // i.e. non-upgradeable 418 if (!HashedModules.contains(m)) { 419 Optional ov = m.getDescriptor().version(); 420 if (ov.isPresent()) { 421 String version = "@" + ov.get().toString(); 422 s = s.isEmpty() ? version : s + version; 423 } 424 } 425 } 426 427 // fully-qualified class name 428 return s.isEmpty() ? cls.getName() : s + "/" + cls.getName(); 429 } Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 s += version; Also, if the loader has a name but (m == null || !m.isNamed()) then it looks like the sign "/" will be added twice (see L410 and L428). It can be fixed and simplified with: Add line before 425: s += "/"; 428 return s + cls.getName(); Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader? Would it make sense to add a comment explaining it? Thanks, Serguei On 10/25/16 16:10, Mandy Chung wrote: > Webrev at: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/ > > Specdiff: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html > > This is a long-standing RFE for adding support for class > loader names. It's #ClassLoaderNames on JSR 376 issue > list where the proposal [1] has been implemented in jake > for some time. This patch brings this change to jdk9. > > A short summary: > - New constructors are added in ClassLoader, SecureClassLoader > and URLClassLoader to specify the class loader name. > > - New ClassLoader::getName and StackTraceElement::getClassLoaderName > method > > - StackTraceElement::toString is updated to include the name > of the class loader and module of that frame in this format: > //(:) > > The detail is in StackTraceElement::buildLoaderModuleClassName > that compress the output string for cases when the loader > has no name or the module is unnamed module. Another thing > to mention is that VM sets the Class object when filling in > a stack trace of a Throwable object. Then the library will > build a String from the Class object for serialization purpose. > > Mandy > [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html From thomas.schatzl at oracle.com Fri Oct 28 11:31:53 2016 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Fri, 28 Oct 2016 13:31:53 +0200 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <1477654313.3851.11.camel@oracle.com> Hi Hiroshi, ? first, apologies for taking so long for an answer. Sorry. On Mon, 2016-10-17 at 10:44 +0900, Hiroshi H Horii wrote: > Hi David, > > Thank you for your comments. > > > Do you have any metrics on this latest version? > > [...] > >? > > I think the GC experts need to have a discussion to resolve things > > to their mutual satisfaction. > > Thank you for lots of your comments and suggestions. And lots of my > mistakes made the discussion long. very sorry. I would like to know > comments of GC experts. ? we in the gc team have discussed this change quite a bit internally. Overall, we think this change seems far too risky from both a functional and performance perspective to go into 9 at this time. The current proposal lacks some clear analysis on why removing the barriers is safe, most analysis in this thread has been "it is fine"; and "the code is faster and does not crash" on one particular platform for one particular application, and that seems too little. We at least expect the change to be not only analyzed "good" in a review, but also tested thoroughly on all platforms affected (which are all of them in the latest change). We can of course help with testing on platforms we support. We also think the testing needs to include both functional and performance testing, and the performance testing ought to be using some well-chosen benchmarks. (It was pointed out very early in the discussion of this change that specjbb2013 is deprecated, yet that is the only benchmark that's been reported out.) The most recent change also penalizes current platforms that do not implement the release-CAS with an additional acquire. That might be not an issue for TSO platforms, but others will be affected. While we think other platforms could quickly adapt to this, this would force that the developer that implements this for other platforms (arm/aarch64) to be stuck with re-analyzing these issues. We do not think this is fair. We think this is a change (or set of changes) that needs to be pushed for all platforms at the same time. There also one (minor) question about the change: why isn't the CAS result value being used for the failing paths of the CAS, rather than reloaded in copy_to_survivor_space? Thanks, ? Thomas From mandy.chung at oracle.com Fri Oct 28 20:44:19 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 28 Oct 2016 13:44:19 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> Message-ID: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com> > On Oct 28, 2016, at 11:11 AM, Brent Christian wrote: > > Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ? Good catch - I missed it. I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s). http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/ Mandy From david.holmes at oracle.com Fri Oct 28 21:09:45 2016 From: david.holmes at oracle.com (David Holmes) Date: Sat, 29 Oct 2016 07:09:45 +1000 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: Hi Mandy, I know it's rather late in the game to notice this but I only just noticed this due to Serguei's comment ... On 28/10/2016 5:06 PM, serguei.spitsyn at oracle.com wrote: > Hi Mandy, > > > I have a few comments. > > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java.udiff.html > > > private static class BootClassLoader extends BuiltinClassLoader { > BootClassLoader(URLClassPath bcp) { > - super(null, bcp); > + super(null, null, bcp); > } > . . . > > PlatformClassLoader(BootClassLoader parent) { > - super(parent, null); > + super("platform", parent, null); > } > > . . . > > AppClassLoader(PlatformClassLoader parent, URLClassPath ucp) { > - super(parent, ucp); > + super("app", parent, ucp); > this.ucp = ucp; > } > > > Can we give the bootstrap classloader the name "boot" or "bootstrap"? > Or this will impact too many places, and so, very risky to do? Given the BootClassLoader instance is not in fact the boot loader at all I think it would have been clearer and avoid potential confusion to call this something more representative of its purpose - perhaps BootResourceloader or BootLoaderHelper or ... Thanks, David > > > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/java/lang/StackTraceElement.java.frames.html > > > 379 ClassLoader loader = cls.getClassLoader0(); The loader is unused. > 402 private static String toLoaderModuleClassName(Class cls) { > 403 ClassLoader loader = cls.getClassLoader0(); > 404 Module m = cls.getModule(); > 405 > 406 // First element - class loader name > 407 String s = ""; > 408 if (loader != null && !(loader instanceof BuiltinClassLoader) && > 409 loader.getName() != null) { > 410 s = loader.getName() + "/"; > 411 } > 412 > 413 // Second element - module name and version > 414 if (m != null && m.isNamed()) { > 415 s = s.isEmpty() ? m.getName() : s + m.getName(); > 416 // drop version if it's JDK module tied with java.base, > 417 // i.e. non-upgradeable > 418 if (!HashedModules.contains(m)) { > 419 Optional ov = m.getDescriptor().version(); > 420 if (ov.isPresent()) { > 421 String version = "@" + ov.get().toString(); > 422 s = s.isEmpty() ? version : s + version; > 423 } > 424 } > 425 } > 426 > 427 // fully-qualified class name > 428 return s.isEmpty() ? cls.getName() : s + "/" + cls.getName(); > 429 } > Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 > s += version; Also, if the loader has a name but (m == null || > !m.isNamed()) then it looks like the sign "/" will be added twice (see > L410 and L428). It can be fixed and simplified with: Add line before > 425: s += "/"; 428 return s + cls.getName(); > > Also, it is not clear why the loader name is not included for an > instance of theBuiltinClassLoader? > Would it make sense to add a comment explaining it? > > Thanks, Serguei > > On 10/25/16 16:10, Mandy Chung wrote: >> Webrev at: >> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/ >> >> Specdiff: >> >> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html >> >> >> This is a long-standing RFE for adding support for class >> loader names. It's #ClassLoaderNames on JSR 376 issue >> list where the proposal [1] has been implemented in jake >> for some time. This patch brings this change to jdk9. >> >> A short summary: >> - New constructors are added in ClassLoader, SecureClassLoader >> and URLClassLoader to specify the class loader name. >> >> - New ClassLoader::getName and StackTraceElement::getClassLoaderName >> method >> >> - StackTraceElement::toString is updated to include the name >> of the class loader and module of that frame in this format: >> //(:) >> >> The detail is in StackTraceElement::buildLoaderModuleClassName >> that compress the output string for cases when the loader >> has no name or the module is unnamed module. Another thing >> to mention is that VM sets the Class object when filling in >> a stack trace of a Throwable object. Then the library will >> build a String from the Class object for serialization purpose. >> >> Mandy >> [1] >> http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html >> > From mandy.chung at oracle.com Fri Oct 28 21:36:03 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 28 Oct 2016 14:36:03 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: > On Oct 28, 2016, at 12:06 AM, serguei.spitsyn at oracle.com wrote: > > Can we give the bootstrap classloader the name "boot" or "bootstrap?? BootClassLoader is not the boostrap class loader but instead it's implementation details. The bootstrap ClassLoader instance is null and so you can?t invoke ClassLoader::getName. > > Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 s += version; OK. At one point, that was how it was coded. > Also, if the loader has a name but (m == null || !m.isNamed()) then it looks like the sign "/" will be added twice (see L410 and L428). It can be fixed and simplified with: Add line before 425: s += "/"; 428 return s + cls.getName(); ?//? is correct. > > Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader? Make the output compact when it can, for example, the class loader name ?app? and ?platform? from classes from the JDK can be implied. > Would it make sense to add a comment explaining it? Maybe not much to add that. Mandy From serguei.spitsyn at oracle.com Sat Oct 29 10:03:40 2016 From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com) Date: Sat, 29 Oct 2016 03:03:40 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: <58a37318-6f30-5996-c619-ff7b4c23eec2@oracle.com> Thank you for clarifications, Mandy! Serguei On 10/28/16 14:36, Mandy Chung wrote: >> On Oct 28, 2016, at 12:06 AM, serguei.spitsyn at oracle.com wrote: >> >> Can we give the bootstrap classloader the name "boot" or "bootstrap?? > BootClassLoader is not the boostrap class loader but instead it's implementation details. The bootstrap ClassLoader instance is null and so you can?t invoke ClassLoader::getName. > >> >> Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 s += version; > OK. At one point, that was how it was coded. > >> Also, if the loader has a name but (m == null || !m.isNamed()) then it looks like the sign "/" will be added twice (see L410 and L428). It can be fixed and simplified with: Add line before 425: s += "/"; 428 return s + cls.getName(); > ?//? is correct. > >> Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader? > Make the output compact when it can, for example, the class loader name ?app? and ?platform? from classes from the JDK can be implied. > >> Would it make sense to add a comment explaining it? > Maybe not much to add that. > > Mandy From HORII at jp.ibm.com Sat Oct 29 10:37:18 2016 From: HORII at jp.ibm.com (Hiroshi H Horii) Date: Sat, 29 Oct 2016 19:37:18 +0900 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1477654313.3851.11.camel@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Hi Thomas, > we in the gc team have discussed this change quite a bit internally. > Overall, we think this change seems far too risky from both a > functional and performance perspective to go into 9 at this time. Thank you for your comments and giving a decision. I completely agree with the decision and would like to keep contributing to this change for future releases. > We also think the testing needs to include both functional and > performance testing, and the performance testing ought to be using some > well-chosen benchmarks. (It was pointed out very early in the > discussion of this change that specjbb2013 is deprecated, yet that is > the only benchmark that's been reported out.) I see. I will try other workloads and evaluate effects of this change. > The most recent change also penalizes current platforms that do not > implement the release-CAS with an additional acquire. That might be not > an issue for TSO platforms, but others will be affected. > > While we think other platforms could quickly adapt to this, this would > force that the developer that implements this for other platforms > (arm/aarch64) to be stuck with re-analyzing these issues. We > do not think this is fair. We think this is a change (or set of > changes) that needs to be pushed for all platforms at the same time. Sure. I would like to ask developers for the other platforms to consider this change. > There also one (minor) question about the change: why isn't the CAS > result value being used for the failing paths of the CAS, rather than > reloaded in copy_to_survivor_space? I believe, the original code also doesn't use the CAS result because the current cas_forward_to doesn't return the CAS result value. bool oopDesc::cas_forward_to(oop p, markOop compare, cmpxchg_memory_order order) I guess, reloading a forwardee is not expensive because CAS fails are rare, then maintenanceability was emphasized. "Doerr, Martin" wrote on 10/21/2016 21:57:42: > The webrev also contains a logging change in > psPromotionManager.inline.hpp which I'm not sure if it's still wanted. For the future discussion, I would like to inform a webrev that doesn't have any changes of log formats. http://cr.openjdk.java.net/~horii/8154736/webrev.06/ Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From aph at redhat.com Sun Oct 30 18:36:38 2016 From: aph at redhat.com (Andrew Haley) Date: Sun, 30 Oct 2016 18:36:38 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: On 29/10/16 11:37, Hiroshi H Horii wrote: >> The most recent change also penalizes current platforms that do not >> > implement the release-CAS with an additional acquire. That might be not >> > an issue for TSO platforms, but others will be affected. >> > >> > While we think other platforms could quickly adapt to this, this would >> > force that the developer that implements this for other platforms >> > (arm/aarch64) to be stuck with re-analyzing these issues. We >> > do not think this is fair. We think this is a change (or set of >> > changes) that needs to be pushed for all platforms at the same time. > > Sure. I would like to ask developers for the other platforms to consider > this change. OK, I will. Can you please point me to the change and what it means? And, while we're on the subject, is memory_order_conservative actually defined anywhere? Thanks, Andrew. From david.holmes at oracle.com Sun Oct 30 21:26:26 2016 From: david.holmes at oracle.com (David Holmes) Date: Mon, 31 Oct 2016 07:26:26 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com> On 31/10/2016 4:36 AM, Andrew Haley wrote: > On 29/10/16 11:37, Hiroshi H Horii wrote: >>> The most recent change also penalizes current platforms that do not >>>> implement the release-CAS with an additional acquire. That might be not >>>> an issue for TSO platforms, but others will be affected. >>>> >>>> While we think other platforms could quickly adapt to this, this would >>>> force that the developer that implements this for other platforms >>>> (arm/aarch64) to be stuck with re-analyzing these issues. We >>>> do not think this is fair. We think this is a change (or set of >>>> changes) that needs to be pushed for all platforms at the same time. >> >> Sure. I would like to ask developers for the other platforms to consider >> this change. > > OK, I will. Can you please point me to the change and what it means? > > And, while we're on the subject, is memory_order_conservative actually > defined anywhere? No. It was chosen to represent the current status quo that the Atomic:: ops should all be (by default) full bi-directional fences. It is a place holder until this memory order stuff is fleshed out in hotspot. We didn't adopt C++ memory_order_seq_cst as is isn't obvious that actually matches our current semantics. At least it isn't obvious to me. Cheers, David > Thanks, > > Andrew. > From aph at redhat.com Mon Oct 31 09:32:44 2016 From: aph at redhat.com (Andrew Haley) Date: Mon, 31 Oct 2016 09:32:44 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com> References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com> Message-ID: On 30/10/16 21:26, David Holmes wrote: > On 31/10/2016 4:36 AM, Andrew Haley wrote: >> >> And, while we're on the subject, is memory_order_conservative actually >> defined anywhere? > > No. It was chosen to represent the current status quo that the Atomic:: > ops should all be (by default) full bi-directional fences. Does that mean that a CAS is actually stronger than a load acquire followed by a store release? And that a CAS is a release fence even when it fails and no store happens? And that a conservative load is a *store* barrier? > It is a place holder until this memory order stuff is fleshed out in > hotspot. We didn't adopt C++ memory_order_seq_cst as is isn't > obvious that actually matches our current semantics. At least it > isn't obvious to me. It's not obvious to me either, because I don't know what our current semantics are. But I believe that if we need anything stronger than sequential consistency we should look at fixing the callers of the Atomic:: ops. But I guess the real problem is that we don't know which callers actually need the super-strong guarantees, or even that any exist. Andrew. From lois.foltan at oracle.com Mon Oct 31 11:10:52 2016 From: lois.foltan at oracle.com (Lois Foltan) Date: Mon, 31 Oct 2016 07:10:52 -0400 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com> Message-ID: <581726BC.9080007@oracle.com> On 10/28/2016 4:44 PM, Mandy Chung wrote: >> On Oct 28, 2016, at 11:11 AM, Brent Christian wrote: >> >> Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ? > Good catch - I missed it. I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s). > > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/ > > Mandy Looks good. Lois From martin.doerr at sap.com Mon Oct 31 14:38:05 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Mon, 31 Oct 2016 14:38:05 +0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: <5ee98a2421d84934a11ef9f3a24b11de@DEWDFE13DE10.global.corp.sap> Hi Hiroshi, when looking over the change for the first time, I had missed that the cmpxchg_post_membar is not safe for future enhancements: Please use the condition "else if (order != memory_order_relaxed)" for the sync as in cmpxchg_pre_membar. The code should still work reliably if somebody adds new enum values. I think this is a key property to justify the safety of this change. Adding enum values should not break any platform. This can be established by using maximum conservative barriers for unknown values. If I remember correctly, some reviewers had complained about the acquire barriers. I think it will be better to present the change without them as this minimizes the impact to Oracle platforms. At least the comment "call acquire for reading fields of new_obj in callers" does not apply to any supported platform (Alpha is not supported) and should be changed. I think a precise specification of the required ordering semantics is important. This is the second part which is needed to justify the safety of this change. Best regards, Martin From: Hiroshi H Horii [mailto:HORII at jp.ibm.com] Sent: Samstag, 29. Oktober 2016 12:37 To: Thomas Schatzl Cc: David Holmes ; hotspot-compiler-dev ; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Kim Barrett ; Doerr, Martin ; ppc-aix-port-dev at openjdk.java.net Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 Hi Thomas, > we in the gc team have discussed this change quite a bit internally. > Overall, we think this change seems far too risky from both a > functional and performance perspective to go into 9 at this time. Thank you for your comments and giving a decision. I completely agree with the decision and would like to keep contributing to this change for future releases. > We also think the testing needs to include both functional and > performance testing, and the performance testing ought to be using some > well-chosen benchmarks. (It was pointed out very early in the > discussion of this change that specjbb2013 is deprecated, yet that is > the only benchmark that's been reported out.) I see. I will try other workloads and evaluate effects of this change. > The most recent change also penalizes current platforms that do not > implement the release-CAS with an additional acquire. That might be not > an issue for TSO platforms, but others will be affected. > > While we think other platforms could quickly adapt to this, this would > force that the developer that implements this for other platforms > (arm/aarch64) to be stuck with re-analyzing these issues. We > do not think this is fair. We think this is a change (or set of > changes) that needs to be pushed for all platforms at the same time. Sure. I would like to ask developers for the other platforms to consider this change. > There also one (minor) question about the change: why isn't the CAS > result value being used for the failing paths of the CAS, rather than > reloaded in copy_to_survivor_space? I believe, the original code also doesn't use the CAS result because the current cas_forward_to doesn't return the CAS result value. bool oopDesc::cas_forward_to(oop p, markOop compare, cmpxchg_memory_order order) I guess, reloading a forwardee is not expensive because CAS fails are rare, then maintenanceability was emphasized. "Doerr, Martin" > wrote on 10/21/2016 21:57:42: > The webrev also contains a logging change in > psPromotionManager.inline.hpp which I'm not sure if it's still wanted. For the future discussion, I would like to inform a webrev that doesn't have any changes of log formats. http://cr.openjdk.java.net/~horii/8154736/webrev.06/ Regards, Hiroshi ----------------------- Hiroshi Horii, Ph.D. IBM Research - Tokyo From mandy.chung at oracle.com Mon Oct 31 15:09:17 2016 From: mandy.chung at oracle.com (Mandy Chung) Date: Mon, 31 Oct 2016 08:09:17 -0700 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: <3431C1FF-9F28-4993-8CB8-8B38AB0B73BF@oracle.com> > On Oct 28, 2016, at 2:09 PM, David Holmes wrote: > > : > > Given the BootClassLoader instance is not in fact the boot loader at all I think it would have been clearer and avoid potential confusion to call this something more representative of its purpose - perhaps BootResourceloader or BootLoaderHelper or ... BootClassLoader is private class and BootLoader is the internal API to find resources and packages. IMO their names are fine and the comment in BootLoader is clear. Mandy From nipa at codefx.org Mon Oct 31 15:39:51 2016 From: nipa at codefx.org (Nicolai Parlog) Date: Mon, 31 Oct 2016 16:39:51 +0100 Subject: How to use @ReservedStackAccess? Message-ID: <8850ae23-fda8-2481-261e-42b53131eb72@codefx.org> Hi! I've been experimenting with @ReservedStackAccess but couldn't get it to work. Any help would be highly appreciated. ## SETUP I'm artificially creating a stack overflow by recursing indefinitely. I then want to benefit from @ReservedStackAccess by executing some code outside of an exception handler. Here's my code: public static void main(String[] args) { try { recurseThenGreet(); } catch (StackOverflowError err) { // to not have the console spammed with output System.out.println("Error"); } } @ReservedStackAccess private static void recurseThenGreet() { recurse(); System.out.println("Hi!"); } private static void recurse() { recurse(); } I'm using build 9-ea+141-jigsaw-nightly-h5650-20161026. I compile with --add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED to make the annotation available and launch with -XX:-RestrictReservedStack to activate the reserved stack for user land code. ## OBSERVED This is the output I get: Java HotSpot(TM) 64-Bit Server VM warning: Potentially dangerous stack overflow in ReservedStackAccess annotated method org.codefx.demo.java9.internal.stack.ReservingStackFrames_Simple.recurseThenGreet()V[1] Error ## EXPECTED I expected "Hi!" to show up somewhere there. My best guess is that I put the annotation in the wrong place but experimenting didn't help. Any help would be greatly appreciated! Thanks! Nicolai -- PGP Key: http://keys.gnupg.net/pks/lookup?op=vindex&search=0xCA3BAD2E9CCCD509 Web: http://codefx.org a blog about software development https://www.sitepoint.com/java high-quality Java/JVM content http://do-foss.de Free and Open Source Software for the City of Dortmund Twitter: https://twitter.com/nipafx From david.holmes at oracle.com Mon Oct 31 21:30:19 2016 From: david.holmes at oracle.com (David Holmes) Date: Tue, 1 Nov 2016 07:30:19 +1000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com> Message-ID: On 31/10/2016 7:32 PM, Andrew Haley wrote: > On 30/10/16 21:26, David Holmes wrote: >> On 31/10/2016 4:36 AM, Andrew Haley wrote: >>> >>> And, while we're on the subject, is memory_order_conservative actually >>> defined anywhere? >> >> No. It was chosen to represent the current status quo that the Atomic:: >> ops should all be (by default) full bi-directional fences. > > Does that mean that a CAS is actually stronger than a load acquire > followed by a store release? And that a CAS is a release fence even > when it fails and no store happens? Yes. Yes. // All of the atomic operations that imply a read-modify-write action // guarantee a two-way memory barrier across that operation. Historically // these semantics reflect the strength of atomic operations that are // provided on SPARC/X86. We assume that strength is necessary unless // we can prove that a weaker form is sufficiently safe. But there is some contention as to whether the actual implementations obey this completely. > > And that a conservative load is a *store* barrier? Not sure what you mean. Atomic::load is not a r-m-w action so not expected to be a two-way memory barrier. >> It is a place holder until this memory order stuff is fleshed out in >> hotspot. We didn't adopt C++ memory_order_seq_cst as is isn't >> obvious that actually matches our current semantics. At least it >> isn't obvious to me. > > It's not obvious to me either, because I don't know what our current > semantics are. But I believe that if we need anything stronger than > sequential consistency we should look at fixing the callers of the > Atomic:: ops. But I guess the real problem is that we don't know > which callers actually need the super-strong guarantees, or even that > any exist. Indeed. I don't know how to reliably analyse all uses to determine what "strength" is needed, or what features of that code enable, or reject, use of a particular strength. Ref the current discussions. David > Andrew. > From martin.doerr at sap.com Fri Oct 21 12:57:52 2016 From: martin.doerr at sap.com (Doerr, Martin) Date: Fri, 21 Oct 2016 12:57:52 -0000 Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 In-Reply-To: References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com> <0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap> <1475236951.6301.72.camel@oracle.com> <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com> <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com> Message-ID: Hi all, thank you very much for reviewing. I fully agree with the latest replies. I think Hiroshi's latest webrev (http://cr.openjdk.java.net/~horii/8154736/webrev.05/) is pretty close to it. There are only still acquire barriers which could be replaced by a comment like "We rely on memory_order_consume here.". I'd prefer this, too, even though acquire barriers in failure cases would probably not really hurt. Cmpxchg Release,Relaxed + Load Consume seems to be the pattern which matches the needs exactly. The webrev also contains a logging change in psPromotionManager.inline.hpp which I'm not sure if it's still wanted. Not sure if aarch64 should be addressed in a separate change. Besides that, it looks good to me. Best regards, Martin -----Original Message----- From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley Sent: Dienstag, 11. Oktober 2016 11:26 To: Kim Barrett; David Holmes Cc: hotspot-compiler-dev; Hiroshi H Horii; Tim Ellison; ppc-aix-port-dev at openjdk.java.net; Michihiro Horie; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64 On 06/10/16 23:16, Kim Barrett wrote: > The key issue here is that we copy obj into new_obj, and then make > new_obj accessible to other threads via the CAS. Those other > threads might attempt to access data in new_obj. This suggests the > CAS ought to have at least a release fence to ensure the copy is > complete before the CAS is performed. No amount of fencing on the > read side (such as in the work stealing) can remove that need. I agree. > And that might be all that is needed. On the post-CAS side, we load > the forwardee and then load values from it. I thik we can use > implicit consume with dependent loads (except on Alpha) plus the > suggested release fence to get the desired effect. That's probably true, except that there's not really any such thing as "implicit consume" in C++. While all of the hardware we use respects address dependencies, it's not something that the compiler knows about, and it's explicitly undefined behaviour in the C++ memory model. If we're depending on memory_order_consume, perhaps we ought to think about adding it to Atomic, even though it's just a volatile load in older compilers. Andrew. From brent.christian at oracle.com Thu Oct 27 22:28:42 2016 From: brent.christian at oracle.com (Brent Christian) Date: Thu, 27 Oct 2016 22:28:42 -0000 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> Message-ID: <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> Hi, Mandy It looks pretty good to me. Just a couple small things: * StackTraceElement.java 379 ClassLoader loader = cls.getClassLoader0(); It looks as if 'loader' isn't used...? * Throwable.java 832 // VM to fill in StackTraceElement 833 getStackTraceElements(stackTrace); 834 // ensure the proper StackTraceElement initialization 835 for (StackTraceElement ste : stackTrace) { 836 ste.buildLoaderModuleClassName(); 837 } For my own curiosity, why is this buildLoaderModuleClassName() call needed? Thanks, -Brent On 10/25/16 4:10 PM, Mandy Chung wrote: > Webrev at: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/ > > Specdiff: > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html > > This is a long-standing RFE for adding support for class > loader names. It's #ClassLoaderNames on JSR 376 issue > list where the proposal [1] has been implemented in jake > for some time. This patch brings this change to jdk9. > > A short summary: > - New constructors are added in ClassLoader, SecureClassLoader > and URLClassLoader to specify the class loader name. > > - New ClassLoader::getName and StackTraceElement::getClassLoaderName > method > > - StackTraceElement::toString is updated to include the name > of the class loader and module of that frame in this format: > //(:) > > The detail is in StackTraceElement::buildLoaderModuleClassName > that compress the output string for cases when the loader > has no name or the module is unnamed module. Another thing > to mention is that VM sets the Class object when filling in > a stack trace of a Throwable object. Then the library will > build a String from the Class object for serialization purpose. > > Mandy > [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html > From brent.christian at oracle.com Fri Oct 28 18:11:28 2016 From: brent.christian at oracle.com (Brent Christian) Date: Fri, 28 Oct 2016 18:11:28 -0000 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> Message-ID: On 10/27/16 7:54 PM, Mandy Chung wrote: >> On Oct 27, 2016, at 3:28 PM, Brent Christian wrote: >> >> * Throwable.java >> >> 832 // VM to fill in StackTraceElement >> 833 getStackTraceElements(stackTrace); >> 834 // ensure the proper StackTraceElement initialization >> 835 for (StackTraceElement ste : stackTrace) { >> 836 ste.buildLoaderModuleClassName(); >> 837 } >> >> For my own curiosity, why is this buildLoaderModuleClassName() call needed? > > When the VM fills in the stack trace, it sets Class object in > StackTraceElement and the buildLoaderModuleClassName() call here to > (1) build the output string whose format as described in the javadoc, > and stored in a serial form (2) not to hold a strong reference to > Class object. StackTraceElement is serializable and it can?t build > the correct string, when deserialized. Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ? These are also filled in by the VM. The strong Class reference is probably not such a concern, as the StackFrameInfo itself also holds one, but would we run into trouble upon trying to deserialize such an STE? Thanks, -Brent From brent.christian at oracle.com Sat Oct 29 00:14:52 2016 From: brent.christian at oracle.com (Brent Christian) Date: Sat, 29 Oct 2016 00:14:52 -0000 Subject: Request Review: JDK-6479237 (cl) Add support for classloader names In-Reply-To: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com> References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com> <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com> <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com> <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com> Message-ID: <52930077-0002-cb83-f58d-d4ea6040076a@oracle.com> On 10/28/16 1:44 PM, Mandy Chung wrote: > >> On Oct 28, 2016, at 11:11 AM, Brent Christian wrote: >> >> Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ? > > Good catch - I missed it. I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s). > > http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/ > Looks good. -Brent