From kim.barrett at oracle.com  Sat Oct  1 00:08:21 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Fri, 30 Sep 2016 20:08:21 -0400
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
Message-ID: <CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>

> On Sep 30, 2016, at 9:55 AM, Alan Burlison <Alan.Burlison at oracle.com> wrote:
> 
> On 30/09/2016 01:03, Kim Barrett wrote:
> 
>> The old code used literal integer sizes and indices.  The new code
>> assumes avn > AV_HW1_IDX.  I don?t see anything that guarantees that
>> to be true (other than, perhaps, the source code for getisax).  If
>> the array was allocated with a size of the larger of AV_HW1_IDX+1 and
>> AV_HW2_IDX+1 then we?d be guaranteed safe to access
>> av[AV_HW{1,2}_IDX].
> 
> They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants.

I like the use of symbolic constants.  It?s the unnecessary assumptions about their values that I dislike.

I?d prefer something like

  const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1;
  uint_t* av = alloca(av_size);
  getisax(av, av_size);

So av is known to be big enough to access the desired elements.


From HORII at jp.ibm.com  Sun Oct  2 14:46:39 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Sun, 2 Oct 2016 23:46:39 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <1475236951.6301.72.camel@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1e40040e-b494-6e1e-0 <1475236951.6301.72.camel@oracle.com>
Message-ID: <OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>

Hi, Thomas, and David,

Thank you for your comments.

> I think Hiroshi thinks that since the work stealing itself does a CAS
> with barrier after obtaining "new_obj" in the other thread, it should
> be safe (for other threads consuming an object on the task queue).

Thank you. What Thomas thankfully explain is that I wanted to mention why 
relaxed CAS is available for copy_to_survivor.

> I also do not think it is safe as is - for example, at least
> PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> returned new_obj (in another log message :)) regardless of failure.
> 
> That method also reads the forwardee if forwarded, and then again uses
> object information in that same log message. A quick look did not show
> other issues, but don't count this as a review.

Thank you for your comments.

As Carsten suggested, I guess, size may not be necessary for logging when 
CAS is failed (the size will be logged by the other thread that 
successfully operates the CAS). By reducing printing a size of new_obj, 
relaxing CAS for forwarding pointers becomes safe, I believe.

In my understanding, PSPromotionManager::copy_and_push_safe_barrier() 
updates a card table for new_obj. However, this new_obj will not be used 
fro card tables in the same GC as a root of GC because all of entries in 
card tables were registered as tasks before any calls of 
copy_and_push_safe_barrier.

I created a new webrev that reduces print formats when CAS is failed. 
Could you review this and give comments on it?
http://cr.openjdk.java.net/~horii/8154736/webrev.00/

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016 21:02:31:

> From: Thomas Schatzl <thomas.schatzl at oracle.com>
> To: David Holmes <david.holmes at oracle.com>, Hiroshi H 
Horii/Japan/IBM at IBMJP
> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>, 
> Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" 
> <hotspot-runtime-dev at openjdk.java.net>
> Date: 09/30/2016 21:04
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Hi,
> 
> On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
> > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
> > > 
> > > Dear David, and Dan,
> > > 
> > > Thank you for your comments.
> > > 
> > > > 
> > > > In
> > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
> > > > 266 the log line reads data from the forwardee even when the CAS
> > > > fails. I believe those reads will be unsafe without barriers
> > > > after
> > > > the copy of the content of the object.
> > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
> > > > 8
> > > > same problem as in line 266
> > > Can we use o->size() or new_obj_size instead of new_obj->size()?
> 
> They are not equivalent. Parallel GC and other collectors creatively
> reuse the "length" field of objArrays to indicate progress in the
> scanning them during GC.
> 
> new_obj_size is the result of a call to o->size() (and the compiler may
> redo computations at any point), so has the same issue.
> 
> > > > If you feel that the use of new_obj->size() is potentially unsafe
> > > > then
> > > > the fact we return new_obj means that any use of new_obj by the
> > > > caller
> > > > may also potentially be unsafe.
> > > In my understanding, while copying objects to a survivor space, if
> > > a thread creates a new_obj and sets a pointer with CAS, the other
> > > threads can touch the new_obj after the thread calls
> > > push_contents(new_obj) (Line: 239). In push_contents,
> > > OrderAccess::release_store is called before pushing the object as a
> > > task into a deque of workstealing (taskqueue.inline.hpp). If the
> > > other thread reads the task, all of copy for new_obj is safe.
> > I'm not familiar with the larger picture of the GC protocols here,
> > but just looking at this code fragment in isolation if the CAS fails
> > we read o->forwardee() to set new_obj. That in itself is fine because
> > we're reading the field that we were testing with the CAS. But we
> > could then deference new_obj before the thread that won the CAS calls
> > push_contents; and even if it is after push_contents we have not done
> > an acquire to pair with the release-store in push_contents.
> 
> I think Hiroshi thinks that since the work stealing itself does a CAS
> with barrier after obtaining "new_obj" in the other thread, it should
> be safe (for other threads consuming an object on the task queue).
> 
> > So I'm really not seeing how we can use a barrier-less CAS here.
> 
> I also do not think it is safe as is - for example, at least
> PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> returned new_obj (in another log message :)) regardless of failure.
> 
> That method also reads the forwardee if forwarded, and then again uses
> object information in that same log message. A quick look did not show
> other issues, but don't count this as a review.
> 
> Thanks,
>   Thomas
> 


From varming at gmail.com  Mon Oct  3 03:55:25 2016
From: varming at gmail.com (Carsten Varming)
Date: Sun, 2 Oct 2016 23:55:25 -0400
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<1574d9e7-c9cd-b1e8-e9a1-d63630713724@oracle.com>
	<201605061011.u46ABZDR015108@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
Message-ID: <CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>

Dear Hiroshi,

It looks like  psPromotionManager.cpp:509 contains a logging statement that
could read data from an oop forwarded by another thread.

I don't see how your new logging in
PSPromotionManager::copy_and_push_safe_barrier
can be safe. In the two new statements you read data from new_obj, but in
both cases it is possible that another thread still haven't written the
data in new_obj (new_obj->klass() reads new_obj->_metadata).

Carsten

On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:

> Hi, Thomas, and David,
>
> Thank you for your comments.
>
> > I think Hiroshi thinks that since the work stealing itself does a CAS
> > with barrier after obtaining "new_obj" in the other thread, it should
> > be safe (for other threads consuming an object on the task queue).
>
> Thank you. What Thomas thankfully explain is that I wanted to mention why
> relaxed CAS is available for copy_to_survivor.
>
> > I also do not think it is safe as is - for example, at least
> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> > returned new_obj (in another log message :)) regardless of failure.
> >
> > That method also reads the forwardee if forwarded, and then again uses
> > object information in that same log message. A quick look did not show
> > other issues, but don't count this as a review.
>
> Thank you for your comments.
>
> As Carsten suggested, I guess, size may not be necessary for logging when
> CAS is failed (the size will be logged by the other thread that
> successfully operates the CAS). By reducing printing a size of new_obj,
> relaxing CAS for forwarding pointers becomes safe, I believe.
>
> In my understanding, PSPromotionManager::copy_and_push_safe_barrier()
> updates a card table for new_obj. However, this new_obj will not be used
> fro card tables in the same GC as a root of GC because all of entries in
> card tables were registered as tasks before any calls of
> copy_and_push_safe_barrier.
>
> I created a new webrev that reduces print formats when CAS is failed.
> Could you review this and give comments on it?
> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016 21:02:31:
>
> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
> Horii/Japan/IBM at IBMJP
> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
> > <hotspot-runtime-dev at openjdk.java.net>
> > Date: 09/30/2016 21:04
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> > copy_to_survivor for ppc64
> >
> > Hi,
> >
> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
> > > >
> > > > Dear David, and Dan,
> > > >
> > > > Thank you for your comments.
> > > >
> > > > >
> > > > > In
> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
> > > > > 266 the log line reads data from the forwardee even when the CAS
> > > > > fails. I believe those reads will be unsafe without barriers
> > > > > after
> > > > > the copy of the content of the object.
> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
> > > > > 8
> > > > > same problem as in line 266
> > > > Can we use o->size() or new_obj_size instead of new_obj->size()?
> >
> > They are not equivalent. Parallel GC and other collectors creatively
> > reuse the "length" field of objArrays to indicate progress in the
> > scanning them during GC.
> >
> > new_obj_size is the result of a call to o->size() (and the compiler may
> > redo computations at any point), so has the same issue.
> >
> > > > > If you feel that the use of new_obj->size() is potentially unsafe
> > > > > then
> > > > > the fact we return new_obj means that any use of new_obj by the
> > > > > caller
> > > > > may also potentially be unsafe.
> > > > In my understanding, while copying objects to a survivor space, if
> > > > a thread creates a new_obj and sets a pointer with CAS, the other
> > > > threads can touch the new_obj after the thread calls
> > > > push_contents(new_obj) (Line: 239). In push_contents,
> > > > OrderAccess::release_store is called before pushing the object as a
> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the
> > > > other thread reads the task, all of copy for new_obj is safe.
> > > I'm not familiar with the larger picture of the GC protocols here,
> > > but just looking at this code fragment in isolation if the CAS fails
> > > we read o->forwardee() to set new_obj. That in itself is fine because
> > > we're reading the field that we were testing with the CAS. But we
> > > could then deference new_obj before the thread that won the CAS calls
> > > push_contents; and even if it is after push_contents we have not done
> > > an acquire to pair with the release-store in push_contents.
> >
> > I think Hiroshi thinks that since the work stealing itself does a CAS
> > with barrier after obtaining "new_obj" in the other thread, it should
> > be safe (for other threads consuming an object on the task queue).
> >
> > > So I'm really not seeing how we can use a barrier-less CAS here.
> >
> > I also do not think it is safe as is - for example, at least
> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> > returned new_obj (in another log message :)) regardless of failure.
> >
> > That method also reads the forwardee if forwarded, and then again uses
> > object information in that same log message. A quick look did not show
> > other issues, but don't count this as a review.
> >
> > Thanks,
> >   Thomas
> >
>
>

From igor.ignatyev at oracle.com  Mon Oct  3 09:49:34 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 3 Oct 2016 12:49:34 +0300
Subject: RFR(S): 8166804: Convert TestMetachunk_test to GTest
In-Reply-To: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com>
References: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com>
Message-ID: <456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com>

Kirill,

looks good to me, Reviewed.

Thanks,
? Igor
> On Sep 28, 2016, at 5:35 PM, Kirill Zhaldybin <kirill.zhaldybin at oracle.com> wrote:
> 
> Dear all,
> 
> Could you please review this fix for 8166804?
> 
> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166804/webrev.00/
> CR: https://bugs.openjdk.java.net/browse/JDK-8166804
> 
> Thank you.
> 
> Regards, Kirill


From igor.ignatyev at oracle.com  Mon Oct  3 09:51:18 2016
From: igor.ignatyev at oracle.com (Igor Ignatyev)
Date: Mon, 3 Oct 2016 12:51:18 +0300
Subject: RFR(S): 8166563: Convert GuardedMemory_test to Gtest
In-Reply-To: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com>
References: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com>
Message-ID: <398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com>

Kirill,

looks good to me, Reviewed.

Thanks,
? Igor

> On Sep 28, 2016, at 5:26 PM, Kirill Zhaldybin <kirill.zhaldybin at oracle.com> wrote:
> 
> Dear all,
> 
> Could you please review this fix for 8166563?
> 
> The test was separated to a few testcases but the logic of the original test was preserved.
> 
> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166563/webrev.00/
> CR: https://bugs.openjdk.java.net/browse/JDK-8166563
> 
> Thank you.
> 
> Regards, Kirill


From kirill.zhaldybin at oracle.com  Mon Oct  3 09:53:15 2016
From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin)
Date: Mon, 3 Oct 2016 12:53:15 +0300
Subject: RFR(S): 8166804: Convert TestMetachunk_test to GTest
In-Reply-To: <456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com>
References: <738eb5a4-b049-4510-7ca1-818a3cfcb014@oracle.com>
	<456EDE1E-76CE-41BA-8DFB-7970CD130D2D@oracle.com>
Message-ID: <57F22A8B.1060307@oracle.com>

Igor,

Thank you!

Regards, Kirill

On 03.10.2016 12:49, Igor Ignatyev wrote:
> Kirill,
>
> looks good to me, Reviewed.
>
> Thanks,
> ? Igor
>> On Sep 28, 2016, at 5:35 PM, Kirill Zhaldybin <kirill.zhaldybin at oracle.com> wrote:
>>
>> Dear all,
>>
>> Could you please review this fix for 8166804?
>>
>> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166804/webrev.00/
>> CR: https://bugs.openjdk.java.net/browse/JDK-8166804
>>
>> Thank you.
>>
>> Regards, Kirill
>


From kirill.zhaldybin at oracle.com  Mon Oct  3 09:53:40 2016
From: kirill.zhaldybin at oracle.com (Kirill Zhaldybin)
Date: Mon, 3 Oct 2016 12:53:40 +0300
Subject: RFR(S): 8166563: Convert GuardedMemory_test to Gtest
In-Reply-To: <398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com>
References: <683982d0-7d2e-eb03-0106-a4f1e729b472@oracle.com>
	<398FD3C9-1EF1-44E7-AE2D-2FE667BC4720@oracle.com>
Message-ID: <57F22AA4.1040503@oracle.com>

Igor,

Thank you!

Regards, Kirill

On 03.10.2016 12:51, Igor Ignatyev wrote:
> Kirill,
>
> looks good to me, Reviewed.
>
> Thanks,
> ? Igor
>
>> On Sep 28, 2016, at 5:26 PM, Kirill Zhaldybin <kirill.zhaldybin at oracle.com> wrote:
>>
>> Dear all,
>>
>> Could you please review this fix for 8166563?
>>
>> The test was separated to a few testcases but the logic of the original test was preserved.
>>
>> WebRev: http://cr.openjdk.java.net/~kzhaldyb/webrevs/JDK-8166563/webrev.00/
>> CR: https://bugs.openjdk.java.net/browse/JDK-8166563
>>
>> Thank you.
>>
>> Regards, Kirill
>


From HORII at jp.ibm.com  Mon Oct  3 14:15:10 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Mon, 3 Oct 2016 23:15:10 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
Message-ID: <OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-49258041.004E4B29@notes.na.collabserv.com>

Dear Carsten,

Thank you for your correction. And very sorry about my easy mistakes...
I created webrev again. 
http://cr.openjdk.java.net/~horii/8154736/webrev.01/
I believe, all of the unsafe usages of new_obj, which has been pointed in 
this thread, is fixed with this webrev.

Dear all,

Can I ask a review of this webrev and give thoughts and comments again?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:

> From: Carsten Varming <varming at gmail.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP
> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes 
> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" 
> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
> Date: 10/03/2016 12:56
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> Dear Hiroshi,
> 
> It looks like  psPromotionManager.cpp:509 contains a logging 
> statement that could read data from an oop forwarded by another thread.
> 
> I don't see how your new logging 
> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In 
> the two new statements you read data from new_obj, but in both cases
> it is possible that another thread still haven't written the data in
> new_obj (new_obj->klass() reads new_obj->_metadata).
> 
> Carsten
> 
> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com> 
wrote:
> Hi, Thomas, and David,
> 
> Thank you for your comments.
> 
> > I think Hiroshi thinks that since the work stealing itself does a CAS
> > with barrier after obtaining "new_obj" in the other thread, it should
> > be safe (for other threads consuming an object on the task queue).
> 
> Thank you. What Thomas thankfully explain is that I wanted to 
> mention why relaxed CAS is available for copy_to_survivor.
> 
> > I also do not think it is safe as is - for example, at least
> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> > returned new_obj (in another log message :)) regardless of failure.
> > 
> > That method also reads the forwardee if forwarded, and then again uses
> > object information in that same log message. A quick look did not show
> > other issues, but don't count this as a review.
> 
> Thank you for your comments.
> 
> As Carsten suggested, I guess, size may not be necessary for logging
> when CAS is failed (the size will be logged by the other thread that
> successfully operates the CAS). By reducing printing a size of 
> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe.
> 
> In my understanding, PSPromotionManager::copy_and_push_safe_barrier
> () updates a card table for new_obj. However, this new_obj will not 
> be used fro card tables in the same GC as a root of GC because all 
> of entries in card tables were registered as tasks before any calls 
> of copy_and_push_safe_barrier.
> 
> I created a new webrev that reduces print formats when CAS is 
> failed. Could you review this and give comments on it?
> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
> 
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
> 
> 
> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016 21:02:31:
> 
> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H 
Horii/Japan/IBM at IBMJP
> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>, 
> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net" 
> > <hotspot-runtime-dev at openjdk.java.net>
> > Date: 09/30/2016 21:04
> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> > copy_to_survivor for ppc64
> > 
> > Hi,
> > 
> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
> > > > 
> > > > Dear David, and Dan,
> > > > 
> > > > Thank you for your comments.
> > > > 
> > > > > 
> > > > > In
> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
> > > > > 266 the log line reads data from the forwardee even when the CAS
> > > > > fails. I believe those reads will be unsafe without barriers
> > > > > after
> > > > > the copy of the content of the object.
> > > > > 
hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
> > > > > 8
> > > > > same problem as in line 266
> > > > Can we use o->size() or new_obj_size instead of new_obj->size()?
> > 
> > They are not equivalent. Parallel GC and other collectors creatively
> > reuse the "length" field of objArrays to indicate progress in the
> > scanning them during GC.
> > 
> > new_obj_size is the result of a call to o->size() (and the compiler 
may
> > redo computations at any point), so has the same issue.
> > 
> > > > > If you feel that the use of new_obj->size() is potentially 
unsafe
> > > > > then
> > > > > the fact we return new_obj means that any use of new_obj by the
> > > > > caller
> > > > > may also potentially be unsafe.
> > > > In my understanding, while copying objects to a survivor space, if
> > > > a thread creates a new_obj and sets a pointer with CAS, the other
> > > > threads can touch the new_obj after the thread calls
> > > > push_contents(new_obj) (Line: 239). In push_contents,
> > > > OrderAccess::release_store is called before pushing the object as 
a
> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the
> > > > other thread reads the task, all of copy for new_obj is safe.
> > > I'm not familiar with the larger picture of the GC protocols here,
> > > but just looking at this code fragment in isolation if the CAS fails
> > > we read o->forwardee() to set new_obj. That in itself is fine 
because
> > > we're reading the field that we were testing with the CAS. But we
> > > could then deference new_obj before the thread that won the CAS 
calls
> > > push_contents; and even if it is after push_contents we have not 
done
> > > an acquire to pair with the release-store in push_contents.
> > 
> > I think Hiroshi thinks that since the work stealing itself does a CAS
> > with barrier after obtaining "new_obj" in the other thread, it should
> > be safe (for other threads consuming an object on the task queue).
> > 
> > > So I'm really not seeing how we can use a barrier-less CAS here.
> > 
> > I also do not think it is safe as is - for example, at least
> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
> > returned new_obj (in another log message :)) regardless of failure.
> > 
> > That method also reads the forwardee if forwarded, and then again uses
> > object information in that same log message. A quick look did not show
> > other issues, but don't count this as a review.
> > 
> > Thanks,
> >   Thomas
> > 


From Alan.Burlison at oracle.com  Mon Oct  3 15:04:51 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Mon, 3 Oct 2016 16:04:51 +0100
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
Message-ID: <a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>

On 01/10/2016 01:08, Kim Barrett wrote:

>> They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants.
>
> I like the use of symbolic constants.  It?s the unnecessary assumptions about their values that I dislike.
>
> I?d prefer something like
>
>   const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1;
>   uint_t* av = alloca(av_size);
>   getisax(av, av_size);
>
> So av is known to be big enough to access the desired elements.

The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without 
breaking binary compatibility, and we simply aren't ever going to do 
that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known, 
namely 0 and 1 respectively, there's no point in comparing them. Plus 
if/when we get the 3rd capabilities word it will need changing anyway.

I've therefore changed the code to remove the alloca() call and be just:

// Extract valid instruction set extensions.
uint_t avs[AV_HW2_IDX + 1];
uint_t avn = getisax(avs, sizeof(avs));

webrev updated accordingly.

-- 
Alan Burlison
--

From kim.barrett at oracle.com  Mon Oct  3 18:24:14 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Mon, 3 Oct 2016 14:24:14 -0400
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
Message-ID: <5FE62FCC-5AA3-47AD-AEBA-C2CFF6D33613@oracle.com>

> On Oct 3, 2016, at 11:04 AM, Alan Burlison <Alan.Burlison at oracle.com> wrote:
> 
> On 01/10/2016 01:08, Kim Barrett wrote:
> 
>>> They are never going to change value, but if you'd prefer they weren't used let me know and I'll revert to the integer constants.
>> 
>> I like the use of symbolic constants.  It?s the unnecessary assumptions about their values that I dislike.
>> 
>> I?d prefer something like
>> 
>>  const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1;
>>  uint_t* av = alloca(av_size);
>>  getisax(av, av_size);
>> 
>> So av is known to be big enough to access the desired elements.
> 
> The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without breaking binary compatibility, and we simply aren't ever going to do that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known, namely 0 and 1 respectively, there's no point in comparing them. Plus if/when we get the 3rd capabilities word it will need changing anyway.
> 
> I've therefore changed the code to remove the alloca() call and be just:
> 
> // Extract valid instruction set extensions.
> uint_t avs[AV_HW2_IDX + 1];
> uint_t avn = getisax(avs, sizeof(avs));
> 
> webrev updated accordingly.
> 
> -- 
> Alan Burlison
> --

Looks good.


From david.holmes at oracle.com  Tue Oct  4 01:45:03 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 4 Oct 2016 11:45:03 +1000
Subject: (M) RFR: 8081800: AbstractMethodError when evaluating a private
	method in an interface via debugger
In-Reply-To: <a768a41c-1c2f-0d67-7011-d141dc155d2d@oracle.com>
References: <ccad8e15-5b00-4f62-d054-58351186a2ff@oracle.com>
	<a768a41c-1c2f-0d67-7011-d141dc155d2d@oracle.com>
Message-ID: <a619ff4f-3b75-66fd-223a-02a0442fd54d@oracle.com>

Hi Coleen,

Thanks for the review.

On 1/10/2016 6:55 AM, Coleen Phillimore wrote:
> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/src/share/vm/oops/klassVtable.cpp.udiff.html
>
>
> + assert(!mh()->is_private(), "private interface method in the default
> method list");
>
>
> Nit, don't need mh() parentheses.  methodHandle has an operator ->

Fixed.

> + // private methods in classes always have a new entry in the vtable.
> + // Specification interpretation since classic has private methods not
> overriding.
>
> What does this mean exactly?   Does it mean that we add private methods
> to the vtable but we don't have to because they do not override other
> private methods?   Why is this compatible with classic?   I know this is
> something pre-existing but could you clarify the comment since you
> touched it?

I only "touched it" by deleting the irrelevant:

  // JDK8 adds private  methods in interfaces which require invokespecial

because we've already bailed out if dealing with a non-abstract 
interface method. I do not know what the remaining existing comments 
refer to exactly so I can not clarify them. I have no knowledge of 
"classic" at all.

> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/src/share/vm/oops/method.cpp.udiff.html
>
> + // private methods in classes get vtable entries for backward class
> compatibility.
>
> This is a bit more clear, and it's not important why.  Is this something
> that can be cleaned up in future release?  If so, it would be good to
> have the explanation in an RFE.

I only added the "in classes" because this doesn't apply to interface 
methods. The whole treatment of private methods is subject to 
re-examination in the future as private methods should be treated as 
effectively final - I thought Karen had filed an RFE for that but I 
can't find it. :( I suspect the backward compatibility rationale is 
extremely old and probably no longer an issue - private methods should 
not need vtable entries.

> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/test/runtime/RedefineTests/RedefineInterfaceMethods.java.html
>
> Thank you for adding this test.

Thanks for pointing me to these tests as it made it trivially easy to 
write the new one!

> You other test looks good.
>
> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot-rename/
>
> Renaming also looks good.

Thanks!

I don't think there is anything preventing this from pushing now so will 
go ahead and do that.

David
-----

> Thanks,
> Coleen
>
> On 9/28/16 7:50 AM, David Holmes wrote:
>> Warning: long discussion, but in the end relatively simple code
>> change. :)
>>
>> Thanks to Karen for explaining vtables and itables and pointing out
>> various tests to be executed; Coleen for the discussions around
>> interface initialization and terminology, and pointing me to simple
>> redefinition tests; and Stas Lukyanov for indicating the right JCK
>> tests to run.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8081800
>>
>> Background:
>>
>> In JDK 8 default and static interface methods were added to the Java
>> language. Private interface methods were also considered, and support
>> in the VM was added, but were dropped due to schedule pressure. In
>> Java 9 private interface methods have now been enabled at the
>> language-level and because the VM already supported invokespecial for
>> private interface methods, the direct language use, and core
>> reflection use, of these methods works fine. However, what was
>> overlooked (and which the test case in this bug report highlighted)
>> was that the other interfaces to the VM (JNI, JDWP, JDI, JVM TI) had
>> not been updated to account for private interface method, and such
>> usage did not work.
>>
>> The updates to the specifications, plus some small JDI/JDWP related
>> code changes are being handled under:
>>
>> JDK-8165827 Support private interface methods in JDI, JDWP and JDB
>> https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> This bug, although originally discovered via JDI/JDB, is being used to
>> fix the underlying mechanics in the VM used by the JNI layer - after
>> which the test in the bug report will run fine.
>>
>> Problem:
>>
>> Because private interface methods are only invocable via invokespecial
>> (the JVMS goes to great lengths to explicitly prohibit all other
>> invocation forms on them) they are in essence always statically bound
>> and don't require lookup in either itables (for invokeinterface) or
>> vtables (for general lookup). However, JNI etc, uses itables/vtables
>> to perform their invocations, and what we got was behaviour where the
>> private interface methods did have an itable entry, which made them
>> appear to be regular abstract interface methods, and so they ended up
>> with initial vtable entries that were set to throw AbstractMethodError
>> on invocation (normally those vtable entries would be replaced by the
>> concrete methods in the implementing class) - and that is what was
>> observed via JDB. It turns out that depending on whether a class
>> method with the same signature existed in a class implementing the
>> interface, that you could also get IllegalAccessError (a path that
>> actually crashes the debug VM due to an assertion failure in jni.cpp!).
>>
>> Solution:
>>
>> Private interface methods do not need, and should not have, an itable
>> entry - they are never invoked via invokeinterface. (Thanks Karen)
>>
>> Private interface methods can always be statically bound -
>> Method::can_be_statically_bound() should return true, and their vtable
>> entry should be Method::nonvirtual_vtable_index.
>>
>> Private interface methods are not default methods and
>> Method::is_default_method() should return false. There is a
>> terminology confusion here that I address further down.
>>
>> See the bug report for a detailed analysis of all the places where
>> changing these Method properties may have had an affect.
>>
>> Main webrev:
>>
>> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot/
>>
>> The main changes are in:
>> - klassVtable.cpp
>> - method.cpp
>>
>> there are minor changes to comments and assertions in other files (the
>> jni.cpp change was due to the crash I encountered that I referred to
>> earlier). The change in linkResolver.cpp fixes an error in the tracing
>> code as the bytecode need not be "invokeinterface" and clarifies it is
>> an interface method (and adds a missing colon in the message) - there
>> is a corresponding tweak to the logging/ItablesTest.java test.
>>
>> I added new tests for JNI invocations of private, interface methods,
>> and also to test JVM TI retransformation of private and default
>> interface methods.
>>
>> ---
>>
>> Terminology problem:
>>
>> While working on this issue, and helping Coleen with:
>>
>> 8163969: Cyclic interface initialization causes JVM crash
>>
>> it became apparent that there was a terminology error in the VM code
>> with respect to default methods. A "default method" is very
>> specifically a public interface method, marked with the default
>> keyword, which has a method body defined. A static interface method
>> also has a body, but is not a default method. A private interface
>> method also has a body, but is not a default method. The JVMS refers
>> to non-static, non-abstract interface methods - which covers default
>> methods and private interface methods. But the code in the VM,
>> primilarly in instanceKlass.cpp and classFileParser.cpp used the term
>> "default methods" to mean "non-abstract and non-static" - which is
>> wrong and potentially very confusing. So a second part of this change
>> is to rename "has_default_methods" (and related variables) to
>> "has_nonstatic_concrete_methods". This is somewhat of a mouthful,
>> though less so than has_nonstatic_nonabstract_methods. Suggestions to
>> abbreviate this to has_nsna_methods, or has_nans_methods, were
>> rejected during pre-review.
>>
>> The renaming webrev is here:
>>
>> http://cr.openjdk.java.net/~dholmes/8081800/webrev.hotspot-rename/
>>
>> and is best viewed via the patch file, where the renaming is more
>> obvious. In classFileParser.cpp I also simplified the check for static
>> interface methods in pre-java8 classfiles.
>>
>> ---
>>
>> Testing:
>>  - JPRT
>>  - nsk.jdb/jdi/jdwp/jvmti
>>  - jtreg: com/sun/jdi (including InterfaceMethodsTest)
>>           runtime/SelectionResolution/
>>  - internal: vm.defmeth
>>  - JCK: subset of lang and vm tests that cover default/static/private
>> interface methods
>>  - new tests
>>
>> Together these tests cover interface method invocation at the language
>> level, via core reflection, via MethodHandles, via JNI, via
>> JDI/JDWP/JDB, and via JVM TI.
>>
>> Thanks,
>> David
>

From david.holmes at oracle.com  Tue Oct  4 07:32:35 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 4 Oct 2016 17:32:35 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-49258041.004E4B29@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<848a70ad-00b3-b742-fa4e-87dc0124e0e3@oracle.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-49258041.004E4B29@notes.na.collabserv.com>
Message-ID: <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>

On 4/10/2016 12:15 AM, Hiroshi H Horii wrote:
> Dear Carsten,
>
> Thank you for your correction. And very sorry about my easy mistakes...
> I created webrev again. http://cr.openjdk.java.net/~horii/8154736/webrev.01/
> I believe, all of the unsafe usages of new_obj, which has been pointed
> in this thread, is fixed with this webrev.

I still am uneasy about this. If it is not safe to access the fields of 
new_obj in the tracing statements but we return new_obj to the caller, 
then it may not be safe for the caller to access the fields of new_obj!

That aside:

src/share/vm/gc/parallel/psPromotionManager.inline.hpp

  293   if (o->is_forwarded()) {
  294     new_obj = o->forwardee();
  295     // fields in new_obj may not be synchronized.
  296     if (log_develop_is_enabled(Trace, gc, scavenge) && 
o->is_forwarded()) {

Why the second check of o->is_forwarded() ?

297       log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " 
PTR_FORMAT "}",
  298                         "forwarding",

Why are you passing "forwarding" as an argument for the first %s instead 
of just expressing it directly? I see this is a copy'n'paste from the 
existing code - and I'm guessing at one point there was a conditional 
around that. I think it should be fixed.

Thanks,
David

> Dear all,
>
> Can I ask a review of this webrev and give thoughts and comments again?
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:
>
>> From: Carsten Varming <varming at gmail.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes
>> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
>> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
>> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
>> Date: 10/03/2016 12:56
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> Dear Hiroshi,
>>
>> It looks like  psPromotionManager.cpp:509 contains a logging
>> statement that could read data from an oop forwarded by another thread.
>>
>> I don't see how your new logging
>> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In
>> the two new statements you read data from new_obj, but in both cases
>> it is possible that another thread still haven't written the data in
>> new_obj (new_obj->klass() reads new_obj->_metadata).
>>
>> Carsten
>>
>> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com> wrote:
>> Hi, Thomas, and David,
>>
>> Thank you for your comments.
>>
>> > I think Hiroshi thinks that since the work stealing itself does a CAS
>> > with barrier after obtaining "new_obj" in the other thread, it should
>> > be safe (for other threads consuming an object on the task queue).
>>
>> Thank you. What Thomas thankfully explain is that I wanted to
>> mention why relaxed CAS is available for copy_to_survivor.
>>
>> > I also do not think it is safe as is - for example, at least
>> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
>> > returned new_obj (in another log message :)) regardless of failure.
>> >
>> > That method also reads the forwardee if forwarded, and then again uses
>> > object information in that same log message. A quick look did not show
>> > other issues, but don't count this as a review.
>>
>> Thank you for your comments.
>>
>> As Carsten suggested, I guess, size may not be necessary for logging
>> when CAS is failed (the size will be logged by the other thread that
>> successfully operates the CAS). By reducing printing a size of
>> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe.
>>
>> In my understanding, PSPromotionManager::copy_and_push_safe_barrier
>> () updates a card table for new_obj. However, this new_obj will not
>> be used fro card tables in the same GC as a root of GC because all
>> of entries in card tables were registered as tasks before any calls
>> of copy_and_push_safe_barrier.
>>
>> I created a new webrev that reduces print formats when CAS is
>> failed. Could you review this and give comments on it?
>> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
>>
>> Regards,
>> Hiroshi
>> -----------------------
>> Hiroshi Horii, Ph.D.
>> IBM Research - Tokyo
>>
>>
>> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016 21:02:31:
>>
>> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
>> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
> Horii/Japan/IBM at IBMJP
>> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
>> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> > <hotspot-runtime-dev at openjdk.java.net>
>> > Date: 09/30/2016 21:04
>> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> > copy_to_survivor for ppc64
>> >
>> > Hi,
>> >
>> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
>> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
>> > > >
>> > > > Dear David, and Dan,
>> > > >
>> > > > Thank you for your comments.
>> > > >
>> > > > >
>> > > > > In
>> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
>> > > > > 266 the log line reads data from the forwardee even when the CAS
>> > > > > fails. I believe those reads will be unsafe without barriers
>> > > > > after
>> > > > > the copy of the content of the object.
>> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
>> > > > > 8
>> > > > > same problem as in line 266
>> > > > Can we use o->size() or new_obj_size instead of new_obj->size()?
>> >
>> > They are not equivalent. Parallel GC and other collectors creatively
>> > reuse the "length" field of objArrays to indicate progress in the
>> > scanning them during GC.
>> >
>> > new_obj_size is the result of a call to o->size() (and the compiler may
>> > redo computations at any point), so has the same issue.
>> >
>> > > > > If you feel that the use of new_obj->size() is potentially unsafe
>> > > > > then
>> > > > > the fact we return new_obj means that any use of new_obj by the
>> > > > > caller
>> > > > > may also potentially be unsafe.
>> > > > In my understanding, while copying objects to a survivor space, if
>> > > > a thread creates a new_obj and sets a pointer with CAS, the other
>> > > > threads can touch the new_obj after the thread calls
>> > > > push_contents(new_obj) (Line: 239). In push_contents,
>> > > > OrderAccess::release_store is called before pushing the object as a
>> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the
>> > > > other thread reads the task, all of copy for new_obj is safe.
>> > > I'm not familiar with the larger picture of the GC protocols here,
>> > > but just looking at this code fragment in isolation if the CAS fails
>> > > we read o->forwardee() to set new_obj. That in itself is fine because
>> > > we're reading the field that we were testing with the CAS. But we
>> > > could then deference new_obj before the thread that won the CAS calls
>> > > push_contents; and even if it is after push_contents we have not done
>> > > an acquire to pair with the release-store in push_contents.
>> >
>> > I think Hiroshi thinks that since the work stealing itself does a CAS
>> > with barrier after obtaining "new_obj" in the other thread, it should
>> > be safe (for other threads consuming an object on the task queue).
>> >
>> > > So I'm really not seeing how we can use a barrier-less CAS here.
>> >
>> > I also do not think it is safe as is - for example, at least
>> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
>> > returned new_obj (in another log message :)) regardless of failure.
>> >
>> > That method also reads the forwardee if forwarded, and then again uses
>> > object information in that same log message. A quick look did not show
>> > other issues, but don't count this as a review.
>> >
>> > Thanks,
>> >   Thomas
>> >

From david.holmes at oracle.com  Tue Oct  4 08:15:20 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 4 Oct 2016 18:15:20 +1000
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
Message-ID: <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>

On 4/10/2016 1:04 AM, Alan Burlison wrote:
> On 01/10/2016 01:08, Kim Barrett wrote:
>
>>> They are never going to change value, but if you'd prefer they
>>> weren't used let me know and I'll revert to the integer constants.
>>
>> I like the use of symbolic constants.  It?s the unnecessary
>> assumptions about their values that I dislike.
>>
>> I?d prefer something like
>>
>>   const uint_t av_size = MAX2(AV_HW1_IDX, AV_HW2_IDX) + 1;
>>   uint_t* av = alloca(av_size);
>>   getisax(av, av_size);
>>
>> So av is known to be big enough to access the desired elements.
>
> The values of AV_HW1_IDX and AV_HW2_IDX can't be changed without
> breaking binary compatibility, and we simply aren't ever going to do
> that. We can then assume the values of AV_HW1_IDX and AV_HW2_IDX known,
> namely 0 and 1 respectively, there's no point in comparing them. Plus
> if/when we get the 3rd capabilities word it will need changing anyway.
>
> I've therefore changed the code to remove the alloca() call and be just:
>
> // Extract valid instruction set extensions.
> uint_t avs[AV_HW2_IDX + 1];
> uint_t avn = getisax(avs, sizeof(avs));
>
> webrev updated accordingly.

But it shouldn't be passing sizeof(avs), it should be passing 
(AV_HW2_IDX + 1)

David
-----


From HORII at jp.ibm.com  Tue Oct  4 10:22:49 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Tue, 4 Oct 2016 19:22:49 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-4925
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
Message-ID: <OF50AA5835.E2A6C4E2-ON49258042.0035917E-49258042.00390564@notes.na.collabserv.com>

Dear David,

Thank you for your comments. You are correct. In the previous webrev, a 
caller (in copy_and_push_safe_barrier) may use new_obj's fields unsafely. 
Very sorry. 

I changed the log format in copy_and_push_safe_barrier not to use fields 
of new_obj. Could you review this again? 
http://cr.openjdk.java.net/~horii/8154736/webrev.02/

The callers of PSPromotionManager::copy_to_survivor_space are here.
  PSPromotionManager::copy_and_push_safe_barrier
  PSScavengeFromKlassClosure::do_oop

I confirmed any fields of new_obj is not used in the two methods in this 
webrev. 

In addition, I reduced passing a constant literal "forwarding" in 
copy_and_push_safe_barrier and added some guards before logging in 
PSPromotionManager::copy_to_survivor_space as follows.

  if (log_develop_is_enabled(Trace, gc, scavenge)) {
    log_develop_trace(gc, scavenge)(...);
  }

If copy_to_survivor_space should not return new_obj if its fields are 
unsafe, I would like to change the return type of copy_to_survivor_space 
to "void" (or allow copy_to_survivor_space to return NULL).
 
Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 16:32:35:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming <varming at gmail.com>
> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>, 
> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, 
> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison 
> <Tim_Ellison at uk.ibm.com>
> Date: 10/04/2016 16:33
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote:
> > Dear Carsten,
> >
> > Thank you for your correction. And very sorry about my easy 
mistakes...
> > I created webrev again. 
http://cr.openjdk.java.net/~horii/8154736/webrev.01/
> > I believe, all of the unsafe usages of new_obj, which has been pointed
> > in this thread, is fixed with this webrev.
> 
> I still am uneasy about this. If it is not safe to access the fields of 
> new_obj in the tracing statements but we return new_obj to the caller, 
> then it may not be safe for the caller to access the fields of new_obj!
> 
> That aside:
> 
> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> 
>   293   if (o->is_forwarded()) {
>   294     new_obj = o->forwardee();
>   295     // fields in new_obj may not be synchronized.
>   296     if (log_develop_is_enabled(Trace, gc, scavenge) && 
> o->is_forwarded()) {
> 
> Why the second check of o->is_forwarded() ?
> 
> 297       log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> " 
> PTR_FORMAT "}",
>   298                         "forwarding",
> 
> Why are you passing "forwarding" as an argument for the first %s instead 

> of just expressing it directly? I see this is a copy'n'paste from the 
> existing code - and I'm guessing at one point there was a conditional 
> around that. I think it should be fixed.
> 
> Thanks,
> David
> 
> > Dear all,
> >
> > Can I ask a review of this webrev and give thoughts and comments 
again?
> >
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> > Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:
> >
> >> From: Carsten Varming <varming at gmail.com>
> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes
> >> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
> >> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> >> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
> >> Date: 10/03/2016 12:56
> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> copy_to_survivor for ppc64
> >>
> >> Dear Hiroshi,
> >>
> >> It looks like  psPromotionManager.cpp:509 contains a logging
> >> statement that could read data from an oop forwarded by another 
thread.
> >>
> >> I don't see how your new logging
> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In
> >> the two new statements you read data from new_obj, but in both cases
> >> it is possible that another thread still haven't written the data in
> >> new_obj (new_obj->klass() reads new_obj->_metadata).
> >>
> >> Carsten
> >>
> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com> 
wrote:
> >> Hi, Thomas, and David,
> >>
> >> Thank you for your comments.
> >>
> >> > I think Hiroshi thinks that since the work stealing itself does a 
CAS
> >> > with barrier after obtaining "new_obj" in the other thread, it 
should
> >> > be safe (for other threads consuming an object on the task queue).
> >>
> >> Thank you. What Thomas thankfully explain is that I wanted to
> >> mention why relaxed CAS is available for copy_to_survivor.
> >>
> >> > I also do not think it is safe as is - for example, at least
> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from 
the
> >> > returned new_obj (in another log message :)) regardless of failure.
> >> >
> >> > That method also reads the forwardee if forwarded, and then again 
uses
> >> > object information in that same log message. A quick look did not 
show
> >> > other issues, but don't count this as a review.
> >>
> >> Thank you for your comments.
> >>
> >> As Carsten suggested, I guess, size may not be necessary for logging
> >> when CAS is failed (the size will be logged by the other thread that
> >> successfully operates the CAS). By reducing printing a size of
> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I 
believe.
> >>
> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier
> >> () updates a card table for new_obj. However, this new_obj will not
> >> be used fro card tables in the same GC as a root of GC because all
> >> of entries in card tables were registered as tasks before any calls
> >> of copy_and_push_safe_barrier.
> >>
> >> I created a new webrev that reduces print formats when CAS is
> >> failed. Could you review this and give comments on it?
> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
> >>
> >> Regards,
> >> Hiroshi
> >> -----------------------
> >> Hiroshi Horii, Ph.D.
> >> IBM Research - Tokyo
> >>
> >>
> >> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016 
21:02:31:
> >>
> >> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> >> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
> > Horii/Japan/IBM at IBMJP
> >> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
> >> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
> >> > <hotspot-runtime-dev at openjdk.java.net>
> >> > Date: 09/30/2016 21:04
> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> > copy_to_survivor for ppc64
> >> >
> >> > Hi,
> >> >
> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
> >> > > >
> >> > > > Dear David, and Dan,
> >> > > >
> >> > > > Thank you for your comments.
> >> > > >
> >> > > > >
> >> > > > > In
> >> > > > > 
hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
> >> > > > > 266 the log line reads data from the forwardee even when the 
CAS
> >> > > > > fails. I believe those reads will be unsafe without barriers
> >> > > > > after
> >> > > > > the copy of the content of the object.
> >> > > > > 
hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
> >> > > > > 8
> >> > > > > same problem as in line 266
> >> > > > Can we use o->size() or new_obj_size instead of 
new_obj->size()?
> >> >
> >> > They are not equivalent. Parallel GC and other collectors 
creatively
> >> > reuse the "length" field of objArrays to indicate progress in the
> >> > scanning them during GC.
> >> >
> >> > new_obj_size is the result of a call to o->size() (and the compiler 
may
> >> > redo computations at any point), so has the same issue.
> >> >
> >> > > > > If you feel that the use of new_obj->size() is potentially 
unsafe
> >> > > > > then
> >> > > > > the fact we return new_obj means that any use of new_obj by 
the
> >> > > > > caller
> >> > > > > may also potentially be unsafe.
> >> > > > In my understanding, while copying objects to a survivor space, 
if
> >> > > > a thread creates a new_obj and sets a pointer with CAS, the 
other
> >> > > > threads can touch the new_obj after the thread calls
> >> > > > push_contents(new_obj) (Line: 239). In push_contents,
> >> > > > OrderAccess::release_store is called before pushing the object 
as a
> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If 
the
> >> > > > other thread reads the task, all of copy for new_obj is safe.
> >> > > I'm not familiar with the larger picture of the GC protocols 
here,
> >> > > but just looking at this code fragment in isolation if the CAS 
fails
> >> > > we read o->forwardee() to set new_obj. That in itself is fine 
because
> >> > > we're reading the field that we were testing with the CAS. But we
> >> > > could then deference new_obj before the thread that won the CAS 
calls
> >> > > push_contents; and even if it is after push_contents we have not 
done
> >> > > an acquire to pair with the release-store in push_contents.
> >> >
> >> > I think Hiroshi thinks that since the work stealing itself does a 
CAS
> >> > with barrier after obtaining "new_obj" in the other thread, it 
should
> >> > be safe (for other threads consuming an object on the task queue).
> >> >
> >> > > So I'm really not seeing how we can use a barrier-less CAS here.
> >> >
> >> > I also do not think it is safe as is - for example, at least
> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from 
the
> >> > returned new_obj (in another log message :)) regardless of failure.
> >> >
> >> > That method also reads the forwardee if forwarded, and then again 
uses
> >> > object information in that same log message. A quick look did not 
show
> >> > other issues, but don't count this as a review.
> >> >
> >> > Thanks,
> >> >   Thomas
> >> >
> 


From david.holmes at oracle.com  Tue Oct  4 12:16:33 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 4 Oct 2016 22:16:33 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OF50AA5835.E2A6C4E2-ON49258042.0035917E-49258042.00390564@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<347b1733-fbbc-b65b-5417-7be52a0b5d68@oracle.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-4925
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<OF50AA5835.E2A6C4E2-ON49258042.0035917E-49258042.00390564@notes.na.collabserv.com>
Message-ID: <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>

On 4/10/2016 8:22 PM, Hiroshi H Horii wrote:
> Dear David,
>
> Thank you for your comments. You are correct. In the previous webrev, a
> caller (in copy_and_push_safe_barrier) may use new_obj's fields
> unsafely. Very sorry.
>
> I changed the log format in copy_and_push_safe_barrier not to use fields
> of new_obj. Could you review this again?
> http://cr.openjdk.java.net/~horii/8154736/webrev.02/

src/share/vm/gc/parallel/psPromotionManager.inline.hpp

274       new_obj = NULL;
285     new_obj = NULL;

Sorry but you are losing me here. You've gone from simply removing 
barriers on the cmpxchg to changing the functionality of the methods 
that use the cmpxchg - instead of return the forwardee() you are now 
returning NULL! ??

David
-----

> The callers of PSPromotionManager::copy_to_survivor_space are here.
>   PSPromotionManager::copy_and_push_safe_barrier
>   PSScavengeFromKlassClosure::do_oop
>
> I confirmed any fields of new_obj is not used in the two methods in this
> webrev.
>
> In addition, I reduced passing a constant literal "forwarding" in
> copy_and_push_safe_barrier and added some guards before logging in
> PSPromotionManager::copy_to_survivor_space as follows.
>
>   if (log_develop_is_enabled(Trace, gc, scavenge)) {
>    log_develop_trace(gc, scavenge)(...);
>  }
>
> If copy_to_survivor_space should not return new_obj if its fields are
> unsafe, I would like to change the return type of copy_to_survivor_space
> to "void" (or allow copy_to_survivor_space to return NULL).
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 16:32:35:
>
>> From: David Holmes <david.holmes at oracle.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming <varming at gmail.com>
>> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
>> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
>> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
>> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
>> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison
>> <Tim_Ellison at uk.ibm.com>
>> Date: 10/04/2016 16:33
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote:
>> > Dear Carsten,
>> >
>> > Thank you for your correction. And very sorry about my easy mistakes...
>> > I created webrev again.
> http://cr.openjdk.java.net/~horii/8154736/webrev.01/
>> > I believe, all of the unsafe usages of new_obj, which has been pointed
>> > in this thread, is fixed with this webrev.
>>
>> I still am uneasy about this. If it is not safe to access the fields of
>> new_obj in the tracing statements but we return new_obj to the caller,
>> then it may not be safe for the caller to access the fields of new_obj!
>>
>> That aside:
>>
>> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>
>>   293   if (o->is_forwarded()) {
>>   294     new_obj = o->forwardee();
>>   295     // fields in new_obj may not be synchronized.
>>   296     if (log_develop_is_enabled(Trace, gc, scavenge) &&
>> o->is_forwarded()) {
>>
>> Why the second check of o->is_forwarded() ?
>>
>> 297       log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> "
>> PTR_FORMAT "}",
>>   298                         "forwarding",
>>
>> Why are you passing "forwarding" as an argument for the first %s instead
>> of just expressing it directly? I see this is a copy'n'paste from the
>> existing code - and I'm guessing at one point there was a conditional
>> around that. I think it should be fixed.
>>
>> Thanks,
>> David
>>
>> > Dear all,
>> >
>> > Can I ask a review of this webrev and give thoughts and comments again?
>> >
>> > Regards,
>> > Hiroshi
>> > -----------------------
>> > Hiroshi Horii, Ph.D.
>> > IBM Research - Tokyo
>> >
>> >
>> > Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:
>> >
>> >> From: Carsten Varming <varming at gmail.com>
>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
>> >> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes
>> >> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
>> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> >> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
>> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> >> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
>> >> Date: 10/03/2016 12:56
>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> copy_to_survivor for ppc64
>> >>
>> >> Dear Hiroshi,
>> >>
>> >> It looks like  psPromotionManager.cpp:509 contains a logging
>> >> statement that could read data from an oop forwarded by another thread.
>> >>
>> >> I don't see how your new logging
>> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In
>> >> the two new statements you read data from new_obj, but in both cases
>> >> it is possible that another thread still haven't written the data in
>> >> new_obj (new_obj->klass() reads new_obj->_metadata).
>> >>
>> >> Carsten
>> >>
>> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com>
> wrote:
>> >> Hi, Thomas, and David,
>> >>
>> >> Thank you for your comments.
>> >>
>> >> > I think Hiroshi thinks that since the work stealing itself does a CAS
>> >> > with barrier after obtaining "new_obj" in the other thread, it should
>> >> > be safe (for other threads consuming an object on the task queue).
>> >>
>> >> Thank you. What Thomas thankfully explain is that I wanted to
>> >> mention why relaxed CAS is available for copy_to_survivor.
>> >>
>> >> > I also do not think it is safe as is - for example, at least
>> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
>> >> > returned new_obj (in another log message :)) regardless of failure.
>> >> >
>> >> > That method also reads the forwardee if forwarded, and then again
> uses
>> >> > object information in that same log message. A quick look did not
> show
>> >> > other issues, but don't count this as a review.
>> >>
>> >> Thank you for your comments.
>> >>
>> >> As Carsten suggested, I guess, size may not be necessary for logging
>> >> when CAS is failed (the size will be logged by the other thread that
>> >> successfully operates the CAS). By reducing printing a size of
>> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I believe.
>> >>
>> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier
>> >> () updates a card table for new_obj. However, this new_obj will not
>> >> be used fro card tables in the same GC as a root of GC because all
>> >> of entries in card tables were registered as tasks before any calls
>> >> of copy_and_push_safe_barrier.
>> >>
>> >> I created a new webrev that reduces print formats when CAS is
>> >> failed. Could you review this and give comments on it?
>> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
>> >>
>> >> Regards,
>> >> Hiroshi
>> >> -----------------------
>> >> Hiroshi Horii, Ph.D.
>> >> IBM Research - Tokyo
>> >>
>> >>
>> >> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016
> 21:02:31:
>> >>
>> >> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
>> >> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
>> > Horii/Japan/IBM at IBMJP
>> >> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> >> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
>> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> >> > <hotspot-runtime-dev at openjdk.java.net>
>> >> > Date: 09/30/2016 21:04
>> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> > copy_to_survivor for ppc64
>> >> >
>> >> > Hi,
>> >> >
>> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
>> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
>> >> > > >
>> >> > > > Dear David, and Dan,
>> >> > > >
>> >> > > > Thank you for your comments.
>> >> > > >
>> >> > > > >
>> >> > > > > In
>> >> > > > > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
>> >> > > > > 266 the log line reads data from the forwardee even when
> the CAS
>> >> > > > > fails. I believe those reads will be unsafe without barriers
>> >> > > > > after
>> >> > > > > the copy of the content of the object.
>> >> > > > >
> hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
>> >> > > > > 8
>> >> > > > > same problem as in line 266
>> >> > > > Can we use o->size() or new_obj_size instead of new_obj->size()?
>> >> >
>> >> > They are not equivalent. Parallel GC and other collectors creatively
>> >> > reuse the "length" field of objArrays to indicate progress in the
>> >> > scanning them during GC.
>> >> >
>> >> > new_obj_size is the result of a call to o->size() (and the
> compiler may
>> >> > redo computations at any point), so has the same issue.
>> >> >
>> >> > > > > If you feel that the use of new_obj->size() is potentially
> unsafe
>> >> > > > > then
>> >> > > > > the fact we return new_obj means that any use of new_obj by the
>> >> > > > > caller
>> >> > > > > may also potentially be unsafe.
>> >> > > > In my understanding, while copying objects to a survivor
> space, if
>> >> > > > a thread creates a new_obj and sets a pointer with CAS, the other
>> >> > > > threads can touch the new_obj after the thread calls
>> >> > > > push_contents(new_obj) (Line: 239). In push_contents,
>> >> > > > OrderAccess::release_store is called before pushing the
> object as a
>> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If the
>> >> > > > other thread reads the task, all of copy for new_obj is safe.
>> >> > > I'm not familiar with the larger picture of the GC protocols here,
>> >> > > but just looking at this code fragment in isolation if the CAS
> fails
>> >> > > we read o->forwardee() to set new_obj. That in itself is fine
> because
>> >> > > we're reading the field that we were testing with the CAS. But we
>> >> > > could then deference new_obj before the thread that won the CAS
> calls
>> >> > > push_contents; and even if it is after push_contents we have
> not done
>> >> > > an acquire to pair with the release-store in push_contents.
>> >> >
>> >> > I think Hiroshi thinks that since the work stealing itself does a CAS
>> >> > with barrier after obtaining "new_obj" in the other thread, it should
>> >> > be safe (for other threads consuming an object on the task queue).
>> >> >
>> >> > > So I'm really not seeing how we can use a barrier-less CAS here.
>> >> >
>> >> > I also do not think it is safe as is - for example, at least
>> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from the
>> >> > returned new_obj (in another log message :)) regardless of failure.
>> >> >
>> >> > That method also reads the forwardee if forwarded, and then again
> uses
>> >> > object information in that same log message. A quick look did not
> show
>> >> > other issues, but don't count this as a review.
>> >> >
>> >> > Thanks,
>> >> >   Thomas
>> >> >
>>
>

From martin.doerr at sap.com  Tue Oct  4 13:15:46 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 4 Oct 2016 13:15:46 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<d3420519-3953-112f-b5be-f4c939aa41fb@oracle.com> 
Message-ID: <df8111c4d08e4ce49adb4deb5ae292d0@DEWDFE13DE14.global.corp.sap>

Hi Coleen,

thank you very much.

Thomas is currently out. The idea to use offsetof wouldn't work because it can only be computed after the layout of the class is computed. (_name[MONITOR_NAME_LEN] is a field of the Monitor class in the original implementation. The MONITOR_NAME_LEN is already needed to compute the class layout.)

Just for information:
The sizes appear to be:
debug build on linux PPC64le:
sizeof(MonitorBase):96, sizeof(Monitor):160, CACHE_LINE_PADDING:32 (no padding used because _name would get less than 64 characters)
product build on linux PPC64le:
sizeof(MonitorBase):56, sizeof(Monitor):128, CACHE_LINE_PADDING:72 (the length of _name gets extended from 64 to 72)

Hence, the change is also relevant for platforms with DEFAULT_CACHE_LINE_SIZE=128 (like PPC64).
A large amount of padding only gets inserted on s390 where we have DEFAULT_CACHE_LINE_SIZE=256.

Maybe someone else wants to review the change. (Thomas is not an official reviewer.)

Thanks and best regards,
Martin


-----Original Message-----
From: Coleen Phillimore [mailto:coleen.phillimore at oracle.com] 
Sent: Montag, 3. Oktober 2016 23:33
To: Doerr, Martin <martin.doerr at sap.com>
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE


Hi Martin,

This change was approved for JDK9 so I can sponsor it for you anytime.  
I don't know if Thomas Steufe was a reviewer or not.  I think I prefer 
the way you did it to his suggestion, because I like subclasses.

I think you still need another reviewer though, then commit the change 
and send me the export file (so it has your comments, etc in it).

thanks,
Coleen


On 9/30/16 11:48 AM, Doerr, Martin wrote:
> Hi,
>
> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>
> My proposed change is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> Please review. If will also need a sponsor.
>
> Thanks and best regards,
> Martin
>


From Alan.Burlison at oracle.com  Tue Oct  4 14:14:08 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Tue, 4 Oct 2016 15:14:08 +0100
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
Message-ID: <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>

On 04/10/2016 09:15, David Holmes wrote:

> But it shouldn't be passing sizeof(avs), it should be passing
> (AV_HW2_IDX + 1)

You are right, it expects the number of elements rather than the more 
usual convention of passing buffer length in bytes. Sigh.

I've replaced it with:

uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0]));

as that will auto-adapt if the declaration of avs is ever changed.

-- 
Alan Burlison
--

From kim.barrett at oracle.com  Tue Oct  4 16:18:09 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Tue, 4 Oct 2016 12:18:09 -0400
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
Message-ID: <E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>

> On Oct 4, 2016, at 10:14 AM, Alan Burlison <Alan.Burlison at oracle.com> wrote:
> 
> On 04/10/2016 09:15, David Holmes wrote:
> 
>> But it shouldn't be passing sizeof(avs), it should be passing
>> (AV_HW2_IDX + 1)
> 
> You are right, it expects the number of elements rather than the more usual convention of passing buffer length in bytes. Sigh.

Yikes!  Sorry I missed that.

> I've replaced it with:
> 
> uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0]));
> 
> as that will auto-adapt if the declaration of avs is ever changed.

We have a macro for that - ARRAY_SIZE(avs)

It?s in globalDefinitions.hpp, on the off chance that?s somehow not already being included.


From Alan.Burlison at oracle.com  Tue Oct  4 18:37:11 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Tue, 4 Oct 2016 19:37:11 +0100
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
Message-ID: <ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>

On 04/10/16 17:18, Kim Barrett wrote:

>> You are right, it expects the number of elements rather than the more usual convention of passing buffer length in bytes. Sigh.
>
> Yikes!  Sorry I missed that.

Dunno what you are apologizing for, it was my bug ;-)

>> I've replaced it with:
>>
>> uint_t avn = getisax(avs, sizeof(avs) / sizeof(avs[0]));
>>
>> as that will auto-adapt if the declaration of avs is ever changed.
>
> We have a macro for that - ARRAY_SIZE(avs)
>
> It?s in globalDefinitions.hpp, on the off chance that?s somehow not already being included.

Cool, I'll pop that in instead - thanks!

-- 
Alan Burlison
--

From HORII at jp.ibm.com  Wed Oct  5 00:36:37 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Wed, 5 Oct 2016 09:36:37 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-4925
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
Message-ID: <OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>

Dear David,

Thank you for your comments.

I just used to think that it may be better that copy_to_survivor_space 
doesn't return forwardee if CAS was failed in order to prevent from 
reading fields in forwardee. But as you pointed, this extends fix for this 
topic. 

I removed two NULL assignments from the previous wevrev. 
http://cr.openjdk.java.net/~horii/8154736/webrev.03/

Thank you for reviewing multiple times...

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 21:16:33:

> From: David Holmes <david.holmes at oracle.com>
> To: Hiroshi H Horii/Japan/IBM at IBMJP
> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>, 
> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>, 
> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison 
> <Tim_Ellison at uk.ibm.com>, Carsten Varming <varming at gmail.com>
> Date: 10/04/2016 21:17
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> On 4/10/2016 8:22 PM, Hiroshi H Horii wrote:
> > Dear David,
> >
> > Thank you for your comments. You are correct. In the previous webrev, 
a
> > caller (in copy_and_push_safe_barrier) may use new_obj's fields
> > unsafely. Very sorry.
> >
> > I changed the log format in copy_and_push_safe_barrier not to use 
fields
> > of new_obj. Could you review this again?
> > http://cr.openjdk.java.net/~horii/8154736/webrev.02/
> 
> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> 
> 274       new_obj = NULL;
> 285     new_obj = NULL;
> 
> Sorry but you are losing me here. You've gone from simply removing 
> barriers on the cmpxchg to changing the functionality of the methods 
> that use the cmpxchg - instead of return the forwardee() you are now 
> returning NULL! ??
> 
> David
> -----
> 
> > The callers of PSPromotionManager::copy_to_survivor_space are here.
> >   PSPromotionManager::copy_and_push_safe_barrier
> >   PSScavengeFromKlassClosure::do_oop
> >
> > I confirmed any fields of new_obj is not used in the two methods in 
this
> > webrev.
> >
> > In addition, I reduced passing a constant literal "forwarding" in
> > copy_and_push_safe_barrier and added some guards before logging in
> > PSPromotionManager::copy_to_survivor_space as follows.
> >
> >   if (log_develop_is_enabled(Trace, gc, scavenge)) {
> >    log_develop_trace(gc, scavenge)(...);
> >  }
> >
> > If copy_to_survivor_space should not return new_obj if its fields are
> > unsafe, I would like to change the return type of 
copy_to_survivor_space
> > to "void" (or allow copy_to_survivor_space to return NULL).
> >
> > Regards,
> > Hiroshi
> > -----------------------
> > Hiroshi Horii, Ph.D.
> > IBM Research - Tokyo
> >
> >
> > David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 16:32:35:
> >
> >> From: David Holmes <david.holmes at oracle.com>
> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming 
<varming at gmail.com>
> >> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
> >> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
> >> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
> >> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
> >> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
> >> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison
> >> <Tim_Ellison at uk.ibm.com>
> >> Date: 10/04/2016 16:33
> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> copy_to_survivor for ppc64
> >>
> >> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote:
> >> > Dear Carsten,
> >> >
> >> > Thank you for your correction. And very sorry about my easy 
mistakes...
> >> > I created webrev again.
> > http://cr.openjdk.java.net/~horii/8154736/webrev.01/
> >> > I believe, all of the unsafe usages of new_obj, which has been 
pointed
> >> > in this thread, is fixed with this webrev.
> >>
> >> I still am uneasy about this. If it is not safe to access the fields 
of
> >> new_obj in the tracing statements but we return new_obj to the 
caller,
> >> then it may not be safe for the caller to access the fields of 
new_obj!
> >>
> >> That aside:
> >>
> >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
> >>
> >>   293   if (o->is_forwarded()) {
> >>   294     new_obj = o->forwardee();
> >>   295     // fields in new_obj may not be synchronized.
> >>   296     if (log_develop_is_enabled(Trace, gc, scavenge) &&
> >> o->is_forwarded()) {
> >>
> >> Why the second check of o->is_forwarded() ?
> >>
> >> 297       log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> "
> >> PTR_FORMAT "}",
> >>   298                         "forwarding",
> >>
> >> Why are you passing "forwarding" as an argument for the first %s 
instead
> >> of just expressing it directly? I see this is a copy'n'paste from the
> >> existing code - and I'm guessing at one point there was a conditional
> >> around that. I think it should be fixed.
> >>
> >> Thanks,
> >> David
> >>
> >> > Dear all,
> >> >
> >> > Can I ask a review of this webrev and give thoughts and comments 
again?
> >> >
> >> > Regards,
> >> > Hiroshi
> >> > -----------------------
> >> > Hiroshi Horii, Ph.D.
> >> > IBM Research - Tokyo
> >> >
> >> >
> >> > Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:
> >> >
> >> >> From: Carsten Varming <varming at gmail.com>
> >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
> >> >> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes
> >> >> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
> >> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
> >> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
> >> >> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
> >> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> >> >> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
> >> >> Date: 10/03/2016 12:56
> >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> >> copy_to_survivor for ppc64
> >> >>
> >> >> Dear Hiroshi,
> >> >>
> >> >> It looks like  psPromotionManager.cpp:509 contains a logging
> >> >> statement that could read data from an oop forwarded by another 
thread.
> >> >>
> >> >> I don't see how your new logging
> >> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In
> >> >> the two new statements you read data from new_obj, but in both 
cases
> >> >> it is possible that another thread still haven't written the data 
in
> >> >> new_obj (new_obj->klass() reads new_obj->_metadata).
> >> >>
> >> >> Carsten
> >> >>
> >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii 
<HORII at jp.ibm.com>
> > wrote:
> >> >> Hi, Thomas, and David,
> >> >>
> >> >> Thank you for your comments.
> >> >>
> >> >> > I think Hiroshi thinks that since the work stealing itself does 
a CAS
> >> >> > with barrier after obtaining "new_obj" in the other thread, it 
should
> >> >> > be safe (for other threads consuming an object on the task 
queue).
> >> >>
> >> >> Thank you. What Thomas thankfully explain is that I wanted to
> >> >> mention why relaxed CAS is available for copy_to_survivor.
> >> >>
> >> >> > I also do not think it is safe as is - for example, at least
> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from 
the
> >> >> > returned new_obj (in another log message :)) regardless of 
failure.
> >> >> >
> >> >> > That method also reads the forwardee if forwarded, and then 
again
> > uses
> >> >> > object information in that same log message. A quick look did 
not
> > show
> >> >> > other issues, but don't count this as a review.
> >> >>
> >> >> Thank you for your comments.
> >> >>
> >> >> As Carsten suggested, I guess, size may not be necessary for 
logging
> >> >> when CAS is failed (the size will be logged by the other thread 
that
> >> >> successfully operates the CAS). By reducing printing a size of
> >> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I 
believe.
> >> >>
> >> >> In my understanding, 
PSPromotionManager::copy_and_push_safe_barrier
> >> >> () updates a card table for new_obj. However, this new_obj will 
not
> >> >> be used fro card tables in the same GC as a root of GC because all
> >> >> of entries in card tables were registered as tasks before any 
calls
> >> >> of copy_and_push_safe_barrier.
> >> >>
> >> >> I created a new webrev that reduces print formats when CAS is
> >> >> failed. Could you review this and give comments on it?
> >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
> >> >>
> >> >> Regards,
> >> >> Hiroshi
> >> >> -----------------------
> >> >> Hiroshi Horii, Ph.D.
> >> >> IBM Research - Tokyo
> >> >>
> >> >>
> >> >> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016
> > 21:02:31:
> >> >>
> >> >> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
> >> >> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
> >> > Horii/Japan/IBM at IBMJP
> >> >> > Cc: hotspot-compiler-dev 
<hotspot-compiler-dev at openjdk.java.net>,
> >> >> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
> >> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
> >> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" 
<hotspot-
> >> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
> >> >> > <hotspot-runtime-dev at openjdk.java.net>
> >> >> > Date: 09/30/2016 21:04
> >> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
> >> >> > copy_to_survivor for ppc64
> >> >> >
> >> >> > Hi,
> >> >> >
> >> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
> >> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
> >> >> > > >
> >> >> > > > Dear David, and Dan,
> >> >> > > >
> >> >> > > > Thank you for your comments.
> >> >> > > >
> >> >> > > > >
> >> >> > > > > In
> >> >> > > > > 
hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
> >> >> > > > > 266 the log line reads data from the forwardee even when
> > the CAS
> >> >> > > > > fails. I believe those reads will be unsafe without 
barriers
> >> >> > > > > after
> >> >> > > > > the copy of the content of the object.
> >> >> > > > >
> > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
> >> >> > > > > 8
> >> >> > > > > same problem as in line 266
> >> >> > > > Can we use o->size() or new_obj_size instead of 
new_obj->size()?
> >> >> >
> >> >> > They are not equivalent. Parallel GC and other collectors 
creatively
> >> >> > reuse the "length" field of objArrays to indicate progress in 
the
> >> >> > scanning them during GC.
> >> >> >
> >> >> > new_obj_size is the result of a call to o->size() (and the
> > compiler may
> >> >> > redo computations at any point), so has the same issue.
> >> >> >
> >> >> > > > > If you feel that the use of new_obj->size() is potentially
> > unsafe
> >> >> > > > > then
> >> >> > > > > the fact we return new_obj means that any use of new_obj 
by the
> >> >> > > > > caller
> >> >> > > > > may also potentially be unsafe.
> >> >> > > > In my understanding, while copying objects to a survivor
> > space, if
> >> >> > > > a thread creates a new_obj and sets a pointer with CAS, the 
other
> >> >> > > > threads can touch the new_obj after the thread calls
> >> >> > > > push_contents(new_obj) (Line: 239). In push_contents,
> >> >> > > > OrderAccess::release_store is called before pushing the
> > object as a
> >> >> > > > task into a deque of workstealing (taskqueue.inline.hpp). If 
the
> >> >> > > > other thread reads the task, all of copy for new_obj is 
safe.
> >> >> > > I'm not familiar with the larger picture of the GC protocols 
here,
> >> >> > > but just looking at this code fragment in isolation if the CAS
> > fails
> >> >> > > we read o->forwardee() to set new_obj. That in itself is fine
> > because
> >> >> > > we're reading the field that we were testing with the CAS. But 
we
> >> >> > > could then deference new_obj before the thread that won the 
CAS
> > calls
> >> >> > > push_contents; and even if it is after push_contents we have
> > not done
> >> >> > > an acquire to pair with the release-store in push_contents.
> >> >> >
> >> >> > I think Hiroshi thinks that since the work stealing itself does 
a CAS
> >> >> > with barrier after obtaining "new_obj" in the other thread, it 
should
> >> >> > be safe (for other threads consuming an object on the task 
queue).
> >> >> >
> >> >> > > So I'm really not seeing how we can use a barrier-less CAS 
here.
> >> >> >
> >> >> > I also do not think it is safe as is - for example, at least
> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data from 
the
> >> >> > returned new_obj (in another log message :)) regardless of 
failure.
> >> >> >
> >> >> > That method also reads the forwardee if forwarded, and then 
again
> > uses
> >> >> > object information in that same log message. A quick look did 
not
> > show
> >> >> > other issues, but don't count this as a review.
> >> >> >
> >> >> > Thanks,
> >> >> >   Thomas
> >> >> >
> >>
> >
> 


From robbin.ehn at oracle.com  Wed Oct  5 08:09:39 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 5 Oct 2016 10:09:39 +0200
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
Message-ID: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>

Hi all, please review!

We want to try the below work-around for this closed bug.
The closed bug concerns same failure as in:

https://bugs.openjdk.java.net/browse/JDK-8079441
Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388)

EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic.
In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER.

Thanks!

/Robbin

diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
--- a/src/os/windows/vm/os_windows.cpp	Mon Oct 03 21:48:21 2016 -0400
+++ b/src/os/windows/vm/os_windows.cpp	Wed Oct 05 06:24:02 2016 +0100
@@ -786,3 +790,3 @@
      RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info );
-  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
+  } __except(EXCEPTION_EXECUTE_HANDLER) {}
  }

From david.holmes at oracle.com  Wed Oct  5 08:16:51 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 5 Oct 2016 18:16:51 +1000
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
Message-ID: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>

Hi Robbin,

This seems fine to me as it is the MSDN way of using this mechanism.

https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx

Thanks,
David

On 5/10/2016 6:09 PM, Robbin Ehn wrote:
> Hi all, please review!
>
> We want to try the below work-around for this closed bug.
> The closed bug concerns same failure as in:
>
> https://bugs.openjdk.java.net/browse/JDK-8079441
> Intermittent failures on Windows with "Unexpected exit from test [exit
> code: 1080890248]" (0x406d1388)
>
> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can
> be problematic.
> In this case we do not see any direct issue but still want to change it
> to EXCEPTION_EXECUTE_HANDLER.
>
> Thanks!
>
> /Robbin
>
> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
> --- a/src/os/windows/vm/os_windows.cpp    Mon Oct 03 21:48:21 2016 -0400
> +++ b/src/os/windows/vm/os_windows.cpp    Wed Oct 05 06:24:02 2016 +0100
> @@ -786,3 +790,3 @@
>      RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD),
> (const ULONG_PTR*)&info );
> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
>  }

From robbin.ehn at oracle.com  Wed Oct  5 11:04:21 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 5 Oct 2016 13:04:21 +0200
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
	<2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>
Message-ID: <c43f6dbd-fb14-a346-5480-864fc7b21f9d@oracle.com>

Thanks David!

/Robbin

On 10/05/2016 10:16 AM, David Holmes wrote:
> Hi Robbin,
>
> This seems fine to me as it is the MSDN way of using this mechanism.
>
> https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx
>
> Thanks,
> David
>
> On 5/10/2016 6:09 PM, Robbin Ehn wrote:
>> Hi all, please review!
>>
>> We want to try the below work-around for this closed bug.
>> The closed bug concerns same failure as in:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8079441
>> Intermittent failures on Windows with "Unexpected exit from test [exit
>> code: 1080890248]" (0x406d1388)
>>
>> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can
>> be problematic.
>> In this case we do not see any direct issue but still want to change it
>> to EXCEPTION_EXECUTE_HANDLER.
>>
>> Thanks!
>>
>> /Robbin
>>
>> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
>> --- a/src/os/windows/vm/os_windows.cpp    Mon Oct 03 21:48:21 2016 -0400
>> +++ b/src/os/windows/vm/os_windows.cpp    Wed Oct 05 06:24:02 2016 +0100
>> @@ -786,3 +790,3 @@
>>      RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD),
>> (const ULONG_PTR*)&info );
>> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
>> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
>>  }

From staffan.larsen at oracle.com  Wed Oct  5 11:07:57 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 5 Oct 2016 13:07:57 +0200
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
Message-ID: <D72C3DB1-6B4B-4F26-BC66-3547C92101EF@oracle.com>

Looks good!

Thanks,
/Staffan

> On 5 Oct 2016, at 10:09, Robbin Ehn <robbin.ehn at oracle.com> wrote:
> 
> Hi all, please review!
> 
> We want to try the below work-around for this closed bug.
> The closed bug concerns same failure as in:
> 
> https://bugs.openjdk.java.net/browse/JDK-8079441
> Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388)
> 
> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic.
> In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER.
> 
> Thanks!
> 
> /Robbin
> 
> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
> --- a/src/os/windows/vm/os_windows.cpp	Mon Oct 03 21:48:21 2016 -0400
> +++ b/src/os/windows/vm/os_windows.cpp	Wed Oct 05 06:24:02 2016 +0100
> @@ -786,3 +790,3 @@
>     RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info );
> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
> }


From robbin.ehn at oracle.com  Wed Oct  5 11:19:40 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 5 Oct 2016 13:19:40 +0200
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <D72C3DB1-6B4B-4F26-BC66-3547C92101EF@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
	<D72C3DB1-6B4B-4F26-BC66-3547C92101EF@oracle.com>
Message-ID: <574bf750-8275-656e-7a15-fdebc93e7433@oracle.com>

Thanks Staffan!

/Robbin

On 10/05/2016 01:07 PM, Staffan Larsen wrote:
> Looks good!
>
> Thanks,
> /Staffan
>
>> On 5 Oct 2016, at 10:09, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>>
>> Hi all, please review!
>>
>> We want to try the below work-around for this closed bug.
>> The closed bug concerns same failure as in:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8079441
>> Intermittent failures on Windows with "Unexpected exit from test [exit code: 1080890248]" (0x406d1388)
>>
>> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can be problematic.
>> In this case we do not see any direct issue but still want to change it to EXCEPTION_EXECUTE_HANDLER.
>>
>> Thanks!
>>
>> /Robbin
>>
>> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
>> --- a/src/os/windows/vm/os_windows.cpp	Mon Oct 03 21:48:21 2016 -0400
>> +++ b/src/os/windows/vm/os_windows.cpp	Wed Oct 05 06:24:02 2016 +0100
>> @@ -786,3 +790,3 @@
>>     RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD), (const ULONG_PTR*)&info );
>> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
>> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
>> }
>

From marcus.larsson at oracle.com  Wed Oct  5 13:26:04 2016
From: marcus.larsson at oracle.com (Marcus Larsson)
Date: Wed, 5 Oct 2016 15:26:04 +0200
Subject: RFR: 8166117: Add UTC timestamp decorator for UL
Message-ID: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>

Hi,

Please review the following patch to add a UTC timestamp decorator for UL.

os::iso8601_time() has been modified to allow timestamps based on UTC. 
os::gmtime_pd() has been added to replace os::localtime_pd() when UTC is 
requested. Patch also includes a unit test for the new decoration.

Webrev:
http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/

Issue:
https://bugs.openjdk.java.net/browse/JDK-8166117

Testing:
New unit test through JPRT.

Thanks,
Marcus

From gerard.ziemski at oracle.com  Wed Oct  5 14:37:26 2016
From: gerard.ziemski at oracle.com (Gerard Ziemski)
Date: Wed, 5 Oct 2016 09:37:26 -0500
Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with
	ExitCode 0
In-Reply-To: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
References: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
Message-ID: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>

Ping. Can I have this simple fix reviewed please?


> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
> 
> hi all,
> 
> Please review this straightforward fix for a regression caused by JDK-8138760
> 
> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?.
> 
> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix.
> 
> The new debug info is refactored into its own method ?printPerformanceInfoDetails?
> 
> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?.
> 
> bug:	https://bugs.openjdk.java.net/browse/JDK-8166145
> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1
> 
> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all
> 


From george.triantafillou at oracle.com  Wed Oct  5 14:58:33 2016
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Wed, 5 Oct 2016 10:58:33 -0400
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
	<2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>
Message-ID: <ffd227d3-57df-d5e0-f523-60d495fd4511@oracle.com>

+1

-George

On 10/5/2016 4:16 AM, David Holmes wrote:
> Hi Robbin,
>
> This seems fine to me as it is the MSDN way of using this mechanism.
>
> https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx
>
> Thanks,
> David
>
> On 5/10/2016 6:09 PM, Robbin Ehn wrote:
>> Hi all, please review!
>>
>> We want to try the below work-around for this closed bug.
>> The closed bug concerns same failure as in:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8079441
>> Intermittent failures on Windows with "Unexpected exit from test [exit
>> code: 1080890248]" (0x406d1388)
>>
>> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can
>> be problematic.
>> In this case we do not see any direct issue but still want to change it
>> to EXCEPTION_EXECUTE_HANDLER.
>>
>> Thanks!
>>
>> /Robbin
>>
>> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
>> --- a/src/os/windows/vm/os_windows.cpp    Mon Oct 03 21:48:21 2016 -0400
>> +++ b/src/os/windows/vm/os_windows.cpp    Wed Oct 05 06:24:02 2016 +0100
>> @@ -786,3 +790,3 @@
>>      RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD),
>> (const ULONG_PTR*)&info );
>> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
>> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
>>  }


From coleen.phillimore at oracle.com  Wed Oct  5 19:12:47 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 5 Oct 2016 15:12:47 -0400
Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with
	ExitCode 0
In-Reply-To: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
References: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
	<4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
Message-ID: <2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com>

http://cr.openjdk.java.net/~gziemski/8166145_rev1/src/share/vm/classfile/dictionary.cpp.udiff.html

I was going to suggest that you change

  732 void Dictionary::print(bool details) {


to pass outputStream so it can be converted to logging, but that's a 
bigger change than we should do right now.  Can you file an RFE for 10 
to convert the hashtable printing to UL?

This change looks good.

Coleen


On 10/5/16 10:37 AM, Gerard Ziemski wrote:
> Ping. Can I have this simple fix reviewed please?
>
>
>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>>
>> hi all,
>>
>> Please review this straightforward fix for a regression caused by JDK-8138760
>>
>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?.
>>
>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix.
>>
>> The new debug info is refactored into its own method ?printPerformanceInfoDetails?
>>
>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?.
>>
>> bug:	https://bugs.openjdk.java.net/browse/JDK-8166145
>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1
>>
>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all
>>


From robbin.ehn at oracle.com  Wed Oct  5 20:28:51 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 5 Oct 2016 22:28:51 +0200
Subject: RFR: 8165526: Kitchensink sudden death - error code 0x406d1388
In-Reply-To: <ffd227d3-57df-d5e0-f523-60d495fd4511@oracle.com>
References: <c6f87003-d5c2-dca0-e600-fd67a72dd8a6@oracle.com>
	<2ac24434-62f8-94ca-6829-2c2524cc9e0f@oracle.com>
	<ffd227d3-57df-d5e0-f523-60d495fd4511@oracle.com>
Message-ID: <93e77f72-a35d-e30b-92f3-f6d60672aef2@oracle.com>

Thanks George!

/Robbin


On 10/05/2016 04:58 PM, George Triantafillou wrote:
> +1
>
> -George
>
> On 10/5/2016 4:16 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> This seems fine to me as it is the MSDN way of using this mechanism.
>>
>> https://msdn.microsoft.com/en-us/library/xcb2z8hs.aspx
>>
>> Thanks,
>> David
>>
>> On 5/10/2016 6:09 PM, Robbin Ehn wrote:
>>> Hi all, please review!
>>>
>>> We want to try the below work-around for this closed bug.
>>> The closed bug concerns same failure as in:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8079441
>>> Intermittent failures on Windows with "Unexpected exit from test [exit
>>> code: 1080890248]" (0x406d1388)
>>>
>>> EXCEPTION_CONTINUE_EXECUTION restarts at the same instruction which can
>>> be problematic.
>>> In this case we do not see any direct issue but still want to change it
>>> to EXCEPTION_EXECUTE_HANDLER.
>>>
>>> Thanks!
>>>
>>> /Robbin
>>>
>>> diff -r 4962f9f46728 src/os/windows/vm/os_windows.cpp
>>> --- a/src/os/windows/vm/os_windows.cpp    Mon Oct 03 21:48:21 2016 
>>> -0400
>>> +++ b/src/os/windows/vm/os_windows.cpp    Wed Oct 05 06:24:02 2016 
>>> +0100
>>> @@ -786,3 +790,3 @@
>>>      RaiseException (MS_VC_EXCEPTION, 0, sizeof(info)/sizeof(DWORD),
>>> (const ULONG_PTR*)&info );
>>> -  } __except(EXCEPTION_CONTINUE_EXECUTION) {}
>>> +  } __except(EXCEPTION_EXECUTE_HANDLER) {}
>>>  }
>


From robbin.ehn at oracle.com  Wed Oct  5 20:34:41 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Wed, 5 Oct 2016 22:34:41 +0200
Subject: RFR: 8166117: Add UTC timestamp decorator for UL
In-Reply-To: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
Message-ID: <50847eea-27db-136a-8192-4c524a1e894a@oracle.com>

Hi Marcus, looks good!

/Robbin


On 10/05/2016 03:26 PM, Marcus Larsson wrote:
> Hi,
>
> Please review the following patch to add a UTC timestamp decorator for 
> UL.
>
> os::iso8601_time() has been modified to allow timestamps based on UTC. 
> os::gmtime_pd() has been added to replace os::localtime_pd() when UTC 
> is requested. Patch also includes a unit test for the new decoration.
>
> Webrev:
> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/
>
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8166117
>
> Testing:
> New unit test through JPRT.
>
> Thanks,
> Marcus


From gerard.ziemski at oracle.com  Wed Oct  5 20:45:52 2016
From: gerard.ziemski at oracle.com (Gerard Ziemski)
Date: Wed, 5 Oct 2016 15:45:52 -0500
Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with
	ExitCode 0
In-Reply-To: <2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com>
References: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
	<4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
	<2e46d96b-cfd5-7beb-b92f-ea8aab0c143c@oracle.com>
Message-ID: <EA51F889-9BCB-467A-8663-8A2690DFD51E@oracle.com>

Thank you for the review!

> On Oct 5, 2016, at 2:12 PM, Coleen Phillimore <coleen.phillimore at oracle.com> wrote:
> 
> http://cr.openjdk.java.net/~gziemski/8166145_rev1/src/share/vm/classfile/dictionary.cpp.udiff.html
> 
> I was going to suggest that you change
> 
> 732 void Dictionary::print(bool details) {
> 
> 
> to pass outputStream so it can be converted to logging, but that's a bigger change than we should do right now.  Can you file an RFE for 10 to convert the hashtable printing to UL?

Done, please see JDK-8167232


cheers

> 
> This change looks good.
> 
> Coleen
> 
> 
> On 10/5/16 10:37 AM, Gerard Ziemski wrote:
>> Ping. Can I have this simple fix reviewed please?
>> 
>> 
>>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>>> 
>>> hi all,
>>> 
>>> Please review this straightforward fix for a regression caused by JDK-8138760
>>> 
>>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?.
>>> 
>>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix.
>>> 
>>> The new debug info is refactored into its own method ?printPerformanceInfoDetails?
>>> 
>>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?.
>>> 
>>> bug:	https://bugs.openjdk.java.net/browse/JDK-8166145
>>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1
>>> 
>>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all
>>> 
> 


From rachel.protacio at oracle.com  Wed Oct  5 21:45:48 2016
From: rachel.protacio at oracle.com (Rachel Protacio)
Date: Wed, 5 Oct 2016 17:45:48 -0400
Subject: RFR: 8166117: Add UTC timestamp decorator for UL
In-Reply-To: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
Message-ID: <362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com>

Looks good to me too!

Rachel


On 10/5/2016 9:26 AM, Marcus Larsson wrote:
> Hi,
>
> Please review the following patch to add a UTC timestamp decorator for 
> UL.
>
> os::iso8601_time() has been modified to allow timestamps based on UTC. 
> os::gmtime_pd() has been added to replace os::localtime_pd() when UTC 
> is requested. Patch also includes a unit test for the new decoration.
>
> Webrev:
> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/
>
> Issue:
> https://bugs.openjdk.java.net/browse/JDK-8166117
>
> Testing:
> New unit test through JPRT.
>
> Thanks,
> Marcus


From david.holmes at oracle.com  Wed Oct  5 23:52:58 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 6 Oct 2016 09:52:58 +1000
Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with
	ExitCode 0
In-Reply-To: <4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
References: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
	<4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
Message-ID: <b193ebe5-44d4-a1dd-6185-d4cac06e09cc@oracle.com>

Sorry for the delay - takes  a while to catch up after a long weekend :)

On 6/10/2016 12:37 AM, Gerard Ziemski wrote:
> Ping. Can I have this simple fix reviewed please?

The changes seem fine to me too.

Thanks,
David

>
>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>>
>> hi all,
>>
>> Please review this straightforward fix for a regression caused by JDK-8138760
>>
>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?.
>>
>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix.
>>
>> The new debug info is refactored into its own method ?printPerformanceInfoDetails?
>>
>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?.
>>
>> bug:	https://bugs.openjdk.java.net/browse/JDK-8166145
>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1
>>
>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all
>>
>

From david.holmes at oracle.com  Thu Oct  6 01:36:15 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 6 Oct 2016 11:36:15 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<OFB3D9B9A2.7E1FF07A-ON49258041.004D238F-4925
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
Message-ID: <f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>

On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
> Dear David,
>
> Thank you for your comments.
>
> I just used to think that it may be better that copy_to_survivor_space
> doesn't return forwardee if CAS was failed in order to prevent from
> reading fields in forwardee. But as you pointed, this extends fix for
> this topic.
>
> I removed two NULL assignments from the previous wevrev.
> http://cr.openjdk.java.net/~horii/8154736/webrev.03/

Which simply takes us back to where we were. It may not be safe for the 
caller of those methods to access the fields of the returned "forwardee".

Sorry but I'm not seeing anything here that justifies removing the 
barriers from the cas in this code. GC lurkers feel free to jump in here 
- this is your code afterall! ;-)

David
-----

> Thank you for reviewing multiple times...
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 21:16:33:
>
>> From: David Holmes <david.holmes at oracle.com>
>> To: Hiroshi H Horii/Japan/IBM at IBMJP
>> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
>> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
>> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
>> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
>> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison
>> <Tim_Ellison at uk.ibm.com>, Carsten Varming <varming at gmail.com>
>> Date: 10/04/2016 21:17
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> On 4/10/2016 8:22 PM, Hiroshi H Horii wrote:
>> > Dear David,
>> >
>> > Thank you for your comments. You are correct. In the previous webrev, a
>> > caller (in copy_and_push_safe_barrier) may use new_obj's fields
>> > unsafely. Very sorry.
>> >
>> > I changed the log format in copy_and_push_safe_barrier not to use fields
>> > of new_obj. Could you review this again?
>> > http://cr.openjdk.java.net/~horii/8154736/webrev.02/
>>
>> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>>
>> 274       new_obj = NULL;
>> 285     new_obj = NULL;
>>
>> Sorry but you are losing me here. You've gone from simply removing
>> barriers on the cmpxchg to changing the functionality of the methods
>> that use the cmpxchg - instead of return the forwardee() you are now
>> returning NULL! ??
>>
>> David
>> -----
>>
>> > The callers of PSPromotionManager::copy_to_survivor_space are here.
>> >   PSPromotionManager::copy_and_push_safe_barrier
>> >   PSScavengeFromKlassClosure::do_oop
>> >
>> > I confirmed any fields of new_obj is not used in the two methods in this
>> > webrev.
>> >
>> > In addition, I reduced passing a constant literal "forwarding" in
>> > copy_and_push_safe_barrier and added some guards before logging in
>> > PSPromotionManager::copy_to_survivor_space as follows.
>> >
>> >   if (log_develop_is_enabled(Trace, gc, scavenge)) {
>> >    log_develop_trace(gc, scavenge)(...);
>> >  }
>> >
>> > If copy_to_survivor_space should not return new_obj if its fields are
>> > unsafe, I would like to change the return type of copy_to_survivor_space
>> > to "void" (or allow copy_to_survivor_space to return NULL).
>> >
>> > Regards,
>> > Hiroshi
>> > -----------------------
>> > Hiroshi Horii, Ph.D.
>> > IBM Research - Tokyo
>> >
>> >
>> > David Holmes <david.holmes at oracle.com> wrote on 10/04/2016 16:32:35:
>> >
>> >> From: David Holmes <david.holmes at oracle.com>
>> >> To: Hiroshi H Horii/Japan/IBM at IBMJP, Carsten Varming
> <varming at gmail.com>
>> >> Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> >> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
>> >> "hotspot-runtime-dev at openjdk.java.net" <hotspot-runtime-
>> >> dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, "ppc-aix-
>> >> port-dev at openjdk.java.net" <ppc-aix-port-dev at openjdk.java.net>,
>> >> Thomas Schatzl <thomas.schatzl at oracle.com>, Tim Ellison
>> >> <Tim_Ellison at uk.ibm.com>
>> >> Date: 10/04/2016 16:33
>> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> copy_to_survivor for ppc64
>> >>
>> >> On 4/10/2016 12:15 AM, Hiroshi H Horii wrote:
>> >> > Dear Carsten,
>> >> >
>> >> > Thank you for your correction. And very sorry about my easy
> mistakes...
>> >> > I created webrev again.
>> > http://cr.openjdk.java.net/~horii/8154736/webrev.01/
>> >> > I believe, all of the unsafe usages of new_obj, which has been
> pointed
>> >> > in this thread, is fixed with this webrev.
>> >>
>> >> I still am uneasy about this. If it is not safe to access the fields of
>> >> new_obj in the tracing statements but we return new_obj to the caller,
>> >> then it may not be safe for the caller to access the fields of new_obj!
>> >>
>> >> That aside:
>> >>
>> >> src/share/vm/gc/parallel/psPromotionManager.inline.hpp
>> >>
>> >>   293   if (o->is_forwarded()) {
>> >>   294     new_obj = o->forwardee();
>> >>   295     // fields in new_obj may not be synchronized.
>> >>   296     if (log_develop_is_enabled(Trace, gc, scavenge) &&
>> >> o->is_forwarded()) {
>> >>
>> >> Why the second check of o->is_forwarded() ?
>> >>
>> >> 297       log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " -> "
>> >> PTR_FORMAT "}",
>> >>   298                         "forwarding",
>> >>
>> >> Why are you passing "forwarding" as an argument for the first %s
> instead
>> >> of just expressing it directly? I see this is a copy'n'paste from the
>> >> existing code - and I'm guessing at one point there was a conditional
>> >> around that. I think it should be fixed.
>> >>
>> >> Thanks,
>> >> David
>> >>
>> >> > Dear all,
>> >> >
>> >> > Can I ask a review of this webrev and give thoughts and comments
> again?
>> >> >
>> >> > Regards,
>> >> > Hiroshi
>> >> > -----------------------
>> >> > Hiroshi Horii, Ph.D.
>> >> > IBM Research - Tokyo
>> >> >
>> >> >
>> >> > Carsten Varming <varming at gmail.com> wrote on 10/03/2016 12:55:25:
>> >> >
>> >> >> From: Carsten Varming <varming at gmail.com>
>> >> >> To: Hiroshi H Horii/Japan/IBM at IBMJP
>> >> >> Cc: Thomas Schatzl <thomas.schatzl at oracle.com>, David Holmes
>> >> >> <david.holmes at oracle.com>, hotspot-compiler-dev <hotspot-compiler-
>> >> >> dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> >> >> gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> >> >> <hotspot-runtime-dev at openjdk.java.net>, Michihiro Horie/Japan/
>> >> >> IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> >> >> dev at openjdk.java.net>, Tim Ellison <Tim_Ellison at uk.ibm.com>
>> >> >> Date: 10/03/2016 12:56
>> >> >> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> >> copy_to_survivor for ppc64
>> >> >>
>> >> >> Dear Hiroshi,
>> >> >>
>> >> >> It looks like  psPromotionManager.cpp:509 contains a logging
>> >> >> statement that could read data from an oop forwarded by another
> thread.
>> >> >>
>> >> >> I don't see how your new logging
>> >> >> in PSPromotionManager::copy_and_push_safe_barrier can be safe. In
>> >> >> the two new statements you read data from new_obj, but in both cases
>> >> >> it is possible that another thread still haven't written the data in
>> >> >> new_obj (new_obj->klass() reads new_obj->_metadata).
>> >> >>
>> >> >> Carsten
>> >> >>
>> >> >> On Sun, Oct 2, 2016 at 10:46 AM, Hiroshi H Horii <HORII at jp.ibm.com>
>> > wrote:
>> >> >> Hi, Thomas, and David,
>> >> >>
>> >> >> Thank you for your comments.
>> >> >>
>> >> >> > I think Hiroshi thinks that since the work stealing itself
> does a CAS
>> >> >> > with barrier after obtaining "new_obj" in the other thread, it
> should
>> >> >> > be safe (for other threads consuming an object on the task queue).
>> >> >>
>> >> >> Thank you. What Thomas thankfully explain is that I wanted to
>> >> >> mention why relaxed CAS is available for copy_to_survivor.
>> >> >>
>> >> >> > I also do not think it is safe as is - for example, at least
>> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data
> from the
>> >> >> > returned new_obj (in another log message :)) regardless of
> failure.
>> >> >> >
>> >> >> > That method also reads the forwardee if forwarded, and then again
>> > uses
>> >> >> > object information in that same log message. A quick look did not
>> > show
>> >> >> > other issues, but don't count this as a review.
>> >> >>
>> >> >> Thank you for your comments.
>> >> >>
>> >> >> As Carsten suggested, I guess, size may not be necessary for logging
>> >> >> when CAS is failed (the size will be logged by the other thread that
>> >> >> successfully operates the CAS). By reducing printing a size of
>> >> >> new_obj, relaxing CAS for forwarding pointers becomes safe, I
> believe.
>> >> >>
>> >> >> In my understanding, PSPromotionManager::copy_and_push_safe_barrier
>> >> >> () updates a card table for new_obj. However, this new_obj will not
>> >> >> be used fro card tables in the same GC as a root of GC because all
>> >> >> of entries in card tables were registered as tasks before any calls
>> >> >> of copy_and_push_safe_barrier.
>> >> >>
>> >> >> I created a new webrev that reduces print formats when CAS is
>> >> >> failed. Could you review this and give comments on it?
>> >> >> http://cr.openjdk.java.net/~horii/8154736/webrev.00/
>> >> >>
>> >> >> Regards,
>> >> >> Hiroshi
>> >> >> -----------------------
>> >> >> Hiroshi Horii, Ph.D.
>> >> >> IBM Research - Tokyo
>> >> >>
>> >> >>
>> >> >> Thomas Schatzl <thomas.schatzl at oracle.com> wrote on 09/30/2016
>> > 21:02:31:
>> >> >>
>> >> >> > From: Thomas Schatzl <thomas.schatzl at oracle.com>
>> >> >> > To: David Holmes <david.holmes at oracle.com>, Hiroshi H
>> >> > Horii/Japan/IBM at IBMJP
>> >> >> > Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>,
>> >> >> > Tim Ellison <Tim_Ellison at uk.ibm.com>, Michihiro Horie/Japan/
>> >> >> > IBM at IBMJP, "ppc-aix-port-dev at openjdk.java.net" <ppc-aix-port-
>> >> >> > dev at openjdk.java.net>, "hotspot-gc-dev at openjdk.java.net" <hotspot-
>> >> >> > gc-dev at openjdk.java.net>, "hotspot-runtime-dev at openjdk.java.net"
>> >> >> > <hotspot-runtime-dev at openjdk.java.net>
>> >> >> > Date: 09/30/2016 21:04
>> >> >> > Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> >> >> > copy_to_survivor for ppc64
>> >> >> >
>> >> >> > Hi,
>> >> >> >
>> >> >> > On Fri, 2016-09-30 at 21:12 +1000, David Holmes wrote:
>> >> >> > > On 30/09/2016 8:17 PM, Hiroshi H Horii wrote:
>> >> >> > > >
>> >> >> > > > Dear David, and Dan,
>> >> >> > > >
>> >> >> > > > Thank you for your comments.
>> >> >> > > >
>> >> >> > > > >
>> >> >> > > > > In
>> >> >> > > > >
> hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:
>> >> >> > > > > 266 the log line reads data from the forwardee even when
>> > the CAS
>> >> >> > > > > fails. I believe those reads will be unsafe without barriers
>> >> >> > > > > after
>> >> >> > > > > the copy of the content of the object.
>> >> >> > > > >
>> > hotspot/src/share/vm/gc/parallel/psPromotionManager.inline.hpp:28
>> >> >> > > > > 8
>> >> >> > > > > same problem as in line 266
>> >> >> > > > Can we use o->size() or new_obj_size instead of
> new_obj->size()?
>> >> >> >
>> >> >> > They are not equivalent. Parallel GC and other collectors
> creatively
>> >> >> > reuse the "length" field of objArrays to indicate progress in the
>> >> >> > scanning them during GC.
>> >> >> >
>> >> >> > new_obj_size is the result of a call to o->size() (and the
>> > compiler may
>> >> >> > redo computations at any point), so has the same issue.
>> >> >> >
>> >> >> > > > > If you feel that the use of new_obj->size() is potentially
>> > unsafe
>> >> >> > > > > then
>> >> >> > > > > the fact we return new_obj means that any use of new_obj
> by the
>> >> >> > > > > caller
>> >> >> > > > > may also potentially be unsafe.
>> >> >> > > > In my understanding, while copying objects to a survivor
>> > space, if
>> >> >> > > > a thread creates a new_obj and sets a pointer with CAS,
> the other
>> >> >> > > > threads can touch the new_obj after the thread calls
>> >> >> > > > push_contents(new_obj) (Line: 239). In push_contents,
>> >> >> > > > OrderAccess::release_store is called before pushing the
>> > object as a
>> >> >> > > > task into a deque of workstealing (taskqueue.inline.hpp).
> If the
>> >> >> > > > other thread reads the task, all of copy for new_obj is safe.
>> >> >> > > I'm not familiar with the larger picture of the GC protocols
> here,
>> >> >> > > but just looking at this code fragment in isolation if the CAS
>> > fails
>> >> >> > > we read o->forwardee() to set new_obj. That in itself is fine
>> > because
>> >> >> > > we're reading the field that we were testing with the CAS.
> But we
>> >> >> > > could then deference new_obj before the thread that won the CAS
>> > calls
>> >> >> > > push_contents; and even if it is after push_contents we have
>> > not done
>> >> >> > > an acquire to pair with the release-store in push_contents.
>> >> >> >
>> >> >> > I think Hiroshi thinks that since the work stealing itself
> does a CAS
>> >> >> > with barrier after obtaining "new_obj" in the other thread, it
> should
>> >> >> > be safe (for other threads consuming an object on the task queue).
>> >> >> >
>> >> >> > > So I'm really not seeing how we can use a barrier-less CAS here.
>> >> >> >
>> >> >> > I also do not think it is safe as is - for example, at least
>> >> >> > PSPromotionManager::copy_and_push_safe_barrier() reads data
> from the
>> >> >> > returned new_obj (in another log message :)) regardless of
> failure.
>> >> >> >
>> >> >> > That method also reads the forwardee if forwarded, and then again
>> > uses
>> >> >> > object information in that same log message. A quick look did not
>> > show
>> >> >> > other issues, but don't count this as a review.
>> >> >> >
>> >> >> > Thanks,
>> >> >> >   Thomas
>> >> >> >
>> >>
>> >
>>
>

From david.holmes at oracle.com  Thu Oct  6 02:30:46 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 6 Oct 2016 12:30:46 +1000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
Message-ID: <e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>

On 1/10/2016 1:48 AM, Doerr, Martin wrote:
> Hi,
>
> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>
> My proposed change is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/

Not sure I understand the existing padding code. What false sharing are 
we trying to avoid?

And if the existing code assumes a cache line size of 64 and declares 
_name to be 64 chars, then why can't the new code declare name to be 
DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code 
is wrong (not just hard-wired).

Which platforms will this cause an actual change in Monitor size other 
than S390?

Thanks,
David

> Please review. If will also need a sponsor.
>
> Thanks and best regards,
> Martin
>

From marcus.larsson at oracle.com  Thu Oct  6 06:58:27 2016
From: marcus.larsson at oracle.com (Marcus Larsson)
Date: Thu, 6 Oct 2016 08:58:27 +0200
Subject: RFR: 8166117: Add UTC timestamp decorator for UL
In-Reply-To: <50847eea-27db-136a-8192-4c524a1e894a@oracle.com>
References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
	<50847eea-27db-136a-8192-4c524a1e894a@oracle.com>
Message-ID: <dd51df4c-1da6-b172-6ed0-95c0e8315987@oracle.com>

Thanks Robbin!


On 10/05/2016 10:34 PM, Robbin Ehn wrote:
> Hi Marcus, looks good!
>
> /Robbin
>
>
> On 10/05/2016 03:26 PM, Marcus Larsson wrote:
>> Hi,
>>
>> Please review the following patch to add a UTC timestamp decorator 
>> for UL.
>>
>> os::iso8601_time() has been modified to allow timestamps based on 
>> UTC. os::gmtime_pd() has been added to replace os::localtime_pd() 
>> when UTC is requested. Patch also includes a unit test for the new 
>> decoration.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8166117
>>
>> Testing:
>> New unit test through JPRT.
>>
>> Thanks,
>> Marcus
>


From marcus.larsson at oracle.com  Thu Oct  6 06:59:18 2016
From: marcus.larsson at oracle.com (Marcus Larsson)
Date: Thu, 6 Oct 2016 08:59:18 +0200
Subject: RFR: 8166117: Add UTC timestamp decorator for UL
In-Reply-To: <362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com>
References: <9a8ea23a-efd5-f903-41a2-195034671b64@oracle.com>
	<362c4c6f-a537-4256-7416-dc0c945d4fff@oracle.com>
Message-ID: <2b070fbd-9529-a0fb-d10f-113160eb9e62@oracle.com>

Thanks Rachel!


On 10/05/2016 11:45 PM, Rachel Protacio wrote:
> Looks good to me too!
>
> Rachel
>
>
> On 10/5/2016 9:26 AM, Marcus Larsson wrote:
>> Hi,
>>
>> Please review the following patch to add a UTC timestamp decorator 
>> for UL.
>>
>> os::iso8601_time() has been modified to allow timestamps based on 
>> UTC. os::gmtime_pd() has been added to replace os::localtime_pd() 
>> when UTC is requested. Patch also includes a unit test for the new 
>> decoration.
>>
>> Webrev:
>> http://cr.openjdk.java.net/~mlarsson/8166117/webrev.00/
>>
>> Issue:
>> https://bugs.openjdk.java.net/browse/JDK-8166117
>>
>> Testing:
>> New unit test through JPRT.
>>
>> Thanks,
>> Marcus
>


From martin.doerr at sap.com  Thu Oct  6 09:09:21 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 6 Oct 2016 09:09:21 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
Message-ID: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>

Hi David,

thanks for taking a look at my proposal.

Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive.

The purpose of the padding is to avoid the following situation:
2 Monitor instances are located behind each other and some fields end up on the same cache line.
Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one.
The cache line needs to get transferred between all involved processors.

If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more.

The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short.

x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well.
(On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390.

Best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Donnerstag, 6. Oktober 2016 04:31
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

On 1/10/2016 1:48 AM, Doerr, Martin wrote:
> Hi,
>
> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>
> My proposed change is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/

Not sure I understand the existing padding code. What false sharing are 
we trying to avoid?

And if the existing code assumes a cache line size of 64 and declares 
_name to be 64 chars, then why can't the new code declare name to be 
DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code 
is wrong (not just hard-wired).

Which platforms will this cause an actual change in Monitor size other 
than S390?

Thanks,
David

> Please review. If will also need a sponsor.
>
> Thanks and best regards,
> Martin
>

From david.holmes at oracle.com  Thu Oct  6 10:20:31 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 6 Oct 2016 20:20:31 +1000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
	<6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
Message-ID: <68676855-6c87-44f5-2030-be3045298f39@oracle.com>

On 6/10/2016 7:09 PM, Doerr, Martin wrote:
> Hi David,
>
> thanks for taking a look at my proposal.
>
> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive.
>
> The purpose of the padding is to avoid the following situation:
> 2 Monitor instances are located behind each other and some fields end up on the same cache line.
> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one.
> The cache line needs to get transferred between all involved processors.
>
> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more.

But that only helps for the case where the two monitors are exactly the 
wrong distance apart. Two other monitors that previously did not share 
cache lines may now do so if you make the monitors bigger.

This seems completely ad-hoc. ??

David
-----


> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short.
>
> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well.
> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390.
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 04:31
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 1/10/2016 1:48 AM, Doerr, Martin wrote:
>> Hi,
>>
>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>
>> My proposed change is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> Not sure I understand the existing padding code. What false sharing are
> we trying to avoid?
>
> And if the existing code assumes a cache line size of 64 and declares
> _name to be 64 chars, then why can't the new code declare name to be
> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code
> is wrong (not just hard-wired).
>
> Which platforms will this cause an actual change in Monitor size other
> than S390?
>
> Thanks,
> David
>
>> Please review. If will also need a sponsor.
>>
>> Thanks and best regards,
>> Martin
>>

From martin.doerr at sap.com  Thu Oct  6 11:05:40 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 6 Oct 2016 11:05:40 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <68676855-6c87-44f5-2030-be3045298f39@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
	<6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
	<68676855-6c87-44f5-2030-be3045298f39@oracle.com>
Message-ID: <bcfeb760a4784ad391a083b45af0d80d@DEWDFE13DE14.global.corp.sap>

Hi David,

there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general.
The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing.

Even without having the cache line alignment, the padding does help:
Please note that the padding is inserted at the end. The critical fields are at the beginning.
Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple).

Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution.

Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Donnerstag, 6. Oktober 2016 12:21
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

On 6/10/2016 7:09 PM, Doerr, Martin wrote:
> Hi David,
>
> thanks for taking a look at my proposal.
>
> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive.
>
> The purpose of the padding is to avoid the following situation:
> 2 Monitor instances are located behind each other and some fields end up on the same cache line.
> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one.
> The cache line needs to get transferred between all involved processors.
>
> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more.

But that only helps for the case where the two monitors are exactly the 
wrong distance apart. Two other monitors that previously did not share 
cache lines may now do so if you make the monitors bigger.

This seems completely ad-hoc. ??

David
-----


> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short.
>
> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well.
> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390.
>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 04:31
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 1/10/2016 1:48 AM, Doerr, Martin wrote:
>> Hi,
>>
>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>
>> My proposed change is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> Not sure I understand the existing padding code. What false sharing are
> we trying to avoid?
>
> And if the existing code assumes a cache line size of 64 and declares
> _name to be 64 chars, then why can't the new code declare name to be
> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code
> is wrong (not just hard-wired).
>
> Which platforms will this cause an actual change in Monitor size other
> than S390?
>
> Thanks,
> David
>
>> Please review. If will also need a sponsor.
>>
>> Thanks and best regards,
>> Martin
>>

From Alan.Burlison at oracle.com  Thu Oct  6 12:10:16 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Thu, 6 Oct 2016 13:10:16 +0100
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
Message-ID: <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>

On 04/10/2016 19:37, Alan Burlison wrote:

>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>> already being included.
>
> Cool, I'll pop that in instead - thanks!

Done, webrev updated, jprt hotspot testset is clean.

-- 
Alan Burlison
--

From david.holmes at oracle.com  Thu Oct  6 13:33:47 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 6 Oct 2016 23:33:47 +1000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <bcfeb760a4784ad391a083b45af0d80d@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
	<6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
	<68676855-6c87-44f5-2030-be3045298f39@oracle.com>
	<bcfeb760a4784ad391a083b45af0d80d@DEWDFE13DE14.global.corp.sap>
Message-ID: <a0a001d6-100d-4e55-4001-a9ceda7a1826@oracle.com>

Hi Martin,

Thanks for bearing with me here, these optimizations are not really my 
forte.

On 6/10/2016 9:05 PM, Doerr, Martin wrote:
> Hi David,
>
> there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general.
> The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing.

So this is for all the mutexes/monitors created in mutex_init() which 
are assumed to be laid out in a linear fashion. Ok.

Has anyone actually done any metrics on this or is it all theoretical? 
ie are any adjacent, or otherwise cache-line-aligned, monitors actually 
contended at the same time? Padding to avoid false-sharing always seems 
a very local optimization to me - more obvious with hot fields in the 
same object than with distinct fields in distinct objects.

> Even without having the cache line alignment, the padding does help:
> Please note that the padding is inserted at the end. The critical fields are at the beginning.
> Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple).

Seems to me the existing code, as it doesn't take into account the size 
of the rest of the Monitor, isn't really addressing this correctly at 
all - even on platforms with a 64-byte cache line.

> Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution.

The other option is an operator new that only allocates on the desired 
alignment - as we do in some other places. That also avoids wasted space 
with Monitors embedded in other objects - not that I think we have that 
many of them.

Thanks,
David

> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 12:21
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 6/10/2016 7:09 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> thanks for taking a look at my proposal.
>>
>> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive.
>>
>> The purpose of the padding is to avoid the following situation:
>> 2 Monitor instances are located behind each other and some fields end up on the same cache line.
>> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one.
>> The cache line needs to get transferred between all involved processors.
>>
>> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more.
>
> But that only helps for the case where the two monitors are exactly the
> wrong distance apart. Two other monitors that previously did not share
> cache lines may now do so if you make the monitors bigger.
>
> This seems completely ad-hoc. ??
>
> David
> -----
>
>
>
>> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short.
>>
>> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well.
>> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390.
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Donnerstag, 6. Oktober 2016 04:31
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> On 1/10/2016 1:48 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>>
>>> My proposed change is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>
>> Not sure I understand the existing padding code. What false sharing are
>> we trying to avoid?
>>
>> And if the existing code assumes a cache line size of 64 and declares
>> _name to be 64 chars, then why can't the new code declare name to be
>> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code
>> is wrong (not just hard-wired).
>>
>> Which platforms will this cause an actual change in Monitor size other
>> than S390?
>>
>> Thanks,
>> David
>>
>>> Please review. If will also need a sponsor.
>>>
>>> Thanks and best regards,
>>> Martin
>>>

From claes.redestad at oracle.com  Thu Oct  6 13:55:21 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 6 Oct 2016 15:55:21 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <a0a001d6-100d-4e55-4001-a9ceda7a1826@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
	<6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
	<68676855-6c87-44f5-2030-be3045298f39@oracle.com>
	<bcfeb760a4784ad391a083b45af0d80d@DEWDFE13DE14.global.corp.sap>
	<a0a001d6-100d-4e55-4001-a9ceda7a1826@oracle.com>
Message-ID: <57F657C9.20204@oracle.com>

Hi,

(cc:ing hotspot-gc-dev)

On 2016-10-06 15:33, David Holmes wrote:
> Hi Martin,
>
> Thanks for bearing with me here, these optimizations are not really my
> forte.
>
> On 6/10/2016 9:05 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> there are many Monitor instances behind each other so I think the idea
>> of padding (which was not mine) was not bad in general.
>> The ideal situation would be to have them cache line aligned and
>> sizeof(Monitor) equals the cache line size (or a multiple). This would
>> completely prevent cache line sharing.
>
> So this is for all the mutexes/monitors created in mutex_init() which
> are assumed to be laid out in a linear fashion. Ok.
>
> Has anyone actually done any metrics on this or is it all theoretical?
> ie are any adjacent, or otherwise cache-line-aligned, monitors actually
> contended at the same time? Padding to avoid false-sharing always seems
> a very local optimization to me - more obvious with hot fields in the
> same object than with distinct fields in distinct objects.
>
>> Even without having the cache line alignment, the padding does help:
>> Please note that the padding is inserted at the end. The critical
>> fields are at the beginning.
>> Especially _LockWord of 2 Monitors will never be on the same cache
>> line when sizeof(Monitor) equals the cache line size (or a multiple).
>
> Seems to me the existing code, as it doesn't take into account the size
> of the rest of the Monitor, isn't really addressing this correctly at
> all - even on platforms with a 64-byte cache line.
>
>> Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case
>> of bad alignment, but I didn't want to waste more space. I'd rather
>> prefer the alignment solution.
>
> The other option is an operator new that only allocates on the desired
> alignment - as we do in some other places. That also avoids wasted space
> with Monitors embedded in other objects - not that I think we have that
> many of them.

IIRC GC code has a number of places where Monitors are created and
embedded in other, larger objects and have reported footprint overhead
issues with the current anti-sharing solution.

Additionally, if memory serves me, it appears this char[64] name field
is only ever set to an actual name for the Monitors that are allocated
globally in mutex_list, so...

... wouldn't a possibly better solution be to remove padding altogether
from the base Monitor and wrap the mutex_list Monitors in some class 
that adds the name/padding?

Thanks!

/Claes

>
> Thanks,
> David
>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Donnerstag, 6. Oktober 2016 12:21
>> To: Doerr, Martin <martin.doerr at sap.com>;
>> hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>> DEFAULT_CACHE_LINE_SIZE
>>
>> On 6/10/2016 7:09 PM, Doerr, Martin wrote:
>>> Hi David,
>>>
>>> thanks for taking a look at my proposal.
>>>
>>> Maybe "unnecessary cache line sharing of contended memory" would be
>>> more comprehensive.
>>>
>>> The purpose of the padding is to avoid the following situation:
>>> 2 Monitor instances are located behind each other and some fields end
>>> up on the same cache line.
>>> Some threads running on some processors compete for the first Monitor
>>> while some other threads running on some processors compete for the
>>> second one.
>>> The cache line needs to get transferred between all involved processors.
>>>
>>> If we add enough padding, the fields which are accessed by many
>>> processors end up on different cache lines. This splits the problem
>>> into 2 independent problems. The threads competing for the first
>>> Monitor don't interfere with those ones competing for the second one
>>> any more.
>>
>> But that only helps for the case where the two monitors are exactly the
>> wrong distance apart. Two other monitors that previously did not share
>> cache lines may now do so if you make the monitors bigger.
>>
>> This seems completely ad-hoc. ??
>>
>> David
>> -----
>>
>>
>>
>>> The existing padding implementation is not optimal. It's a little too
>>> small on some platforms. On other platforms, it is not wrong to pad
>>> more than necessary, but ideally, one would pad to make the Monitor
>>> size equal to the cache line size. I have kept the minimum of 64
>>> because _name is not only used for padding and I guess people don't
>>> want it too short.
>>>
>>> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical
>>> configurations. That's like PPC64 so the change also improves the
>>> padding on these platforms as well.
>>> (On x86_64 we get the same result as on PPC64: The length of _name
>>> gets extended from 64 to 72 in product build). The padding increase
>>> only gets huge on S390.
>>>
>>> Best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: David Holmes [mailto:david.holmes at oracle.com]
>>> Sent: Donnerstag, 6. Oktober 2016 04:31
>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>> hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> On 1/10/2016 1:48 AM, Doerr, Martin wrote:
>>>> Hi,
>>>>
>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>> padding is to avoid false sharing.
>>>>
>>>> My proposed change is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>
>>> Not sure I understand the existing padding code. What false sharing are
>>> we trying to avoid?
>>>
>>> And if the existing code assumes a cache line size of 64 and declares
>>> _name to be 64 chars, then why can't the new code declare name to be
>>> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code
>>> is wrong (not just hard-wired).
>>>
>>> Which platforms will this cause an actual change in Monitor size other
>>> than S390?
>>>
>>> Thanks,
>>> David
>>>
>>>> Please review. If will also need a sponsor.
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>

From gerard.ziemski at oracle.com  Thu Oct  6 14:10:15 2016
From: gerard.ziemski at oracle.com (Gerard Ziemski)
Date: Thu, 6 Oct 2016 09:10:15 -0500
Subject: RFR: 8166145: runtime/threads/ThreadInterruptTest3 fails with
	ExitCode 0
In-Reply-To: <b193ebe5-44d4-a1dd-6185-d4cac06e09cc@oracle.com>
References: <D1C948FE-EEFA-4A3C-82EF-CCD0F6880AE3@oracle.com>
	<4A1BA059-F6C5-4C2A-8492-330689D46C1D@oracle.com>
	<b193ebe5-44d4-a1dd-6185-d4cac06e09cc@oracle.com>
Message-ID: <8519EAB4-FD98-49F4-A37A-B00129087542@oracle.com>

Thank you for the review!


> On Oct 5, 2016, at 6:52 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Sorry for the delay - takes  a while to catch up after a long weekend :)
> 
> On 6/10/2016 12:37 AM, Gerard Ziemski wrote:
>> Ping. Can I have this simple fix reviewed please?
> 
> The changes seem fine to me too.
> 
> Thanks,
> David
> 
>> 
>>> On Sep 29, 2016, at 11:08 AM, Gerard Ziemski <gerard.ziemski at oracle.com> wrote:
>>> 
>>> hi all,
>>> 
>>> Please review this straightforward fix for a regression caused by JDK-8138760
>>> 
>>> For JDK-8138760 we added more debug info to help us understand the "Performance bug: SystemDictionary? issue. That, however, caused a regression in tests that could not account for the new info printed out, such as tests using golden file to compare their output, and those that searched output for keywords like ?Error?, which now matched on output that printed entries of Symbol Table like ?java.lang.VirtualMachineError, loader NULL class_loader?.
>>> 
>>> In this fix we wrap the extra debug info in a new ?hashtables? UL tag, which means that in order to get the new debug info a test must now pass "-Xlog:hashtables=info? into VM at startup. I filed JDK-8166848 to track followup issue, like finding an optimization that would solve this performance issue and finding an appropriate test dedicated to tracking the issue and verifying the fix.
>>> 
>>> The new debug info is refactored into its own method ?printPerformanceInfoDetails?
>>> 
>>> We also make a small change to the ?verify_lookup_length? method, which now takes the name of the table, instead of hardcoding it to ?SymbolTable?.
>>> 
>>> bug:	https://bugs.openjdk.java.net/browse/JDK-8166145
>>> webrev: http://cr.openjdk.java.net/~gziemski/8166145_rev1
>>> 
>>> Passes local tonga ThreadInterruptTest3 test and RBT hotspot_all
>>> 
>> 


From martin.doerr at sap.com  Thu Oct  6 16:15:29 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 6 Oct 2016 16:15:29 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <a0a001d6-100d-4e55-4001-a9ceda7a1826@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<e2cab654-6c17-b55a-dcf2-4001e76b7364@oracle.com>
	<6cfa7f5dc52b4647809b2b1a66551ed8@DEWDFE13DE14.global.corp.sap>
	<68676855-6c87-44f5-2030-be3045298f39@oracle.com>
	<bcfeb760a4784ad391a083b45af0d80d@DEWDFE13DE14.global.corp.sap>
	<a0a001d6-100d-4e55-4001-a9ceda7a1826@oracle.com>
Message-ID: <97c50a5b06cf44e5a8bb8478e3710c3f@DEWDFE13DE14.global.corp.sap>

Hi David,

we have made the change a long time ago when we were looking for concurrency issues. I don't remember if it was a fix for anything specific which we observed.

I don't know if the authors of the original code had seen issues or made performance measurements.

I think the current implementation is not too bad for 64 byte cache lines because the _LockWord fields are always on different cache lines (with 64 byte _name[]).

The intention of my proposal was to improve the situation for 128 and especially 256 byte cache lines which I still think gets achieved by my webrev. Not sure if more sophisticated solutions would be kind of overbuilt.

Thanks for your time and best regards,
Martin


-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Donnerstag, 6. Oktober 2016 15:34
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Hi Martin,

Thanks for bearing with me here, these optimizations are not really my 
forte.

On 6/10/2016 9:05 PM, Doerr, Martin wrote:
> Hi David,
>
> there are many Monitor instances behind each other so I think the idea of padding (which was not mine) was not bad in general.
> The ideal situation would be to have them cache line aligned and sizeof(Monitor) equals the cache line size (or a multiple). This would completely prevent cache line sharing.

So this is for all the mutexes/monitors created in mutex_init() which 
are assumed to be laid out in a linear fashion. Ok.

Has anyone actually done any metrics on this or is it all theoretical? 
ie are any adjacent, or otherwise cache-line-aligned, monitors actually 
contended at the same time? Padding to avoid false-sharing always seems 
a very local optimization to me - more obvious with hot fields in the 
same object than with distinct fields in distinct objects.

> Even without having the cache line alignment, the padding does help:
> Please note that the padding is inserted at the end. The critical fields are at the beginning.
> Especially _LockWord of 2 Monitors will never be on the same cache line when sizeof(Monitor) equals the cache line size (or a multiple).

Seems to me the existing code, as it doesn't take into account the size 
of the rest of the Monitor, isn't really addressing this correctly at 
all - even on platforms with a 64-byte cache line.

> Padding = DEFAULT_CACHE_LINE_SIZE could prevent more sharing in case of bad alignment, but I didn't want to waste more space. I'd rather prefer the alignment solution.

The other option is an operator new that only allocates on the desired 
alignment - as we do in some other places. That also avoids wasted space 
with Monitors embedded in other objects - not that I think we have that 
many of them.

Thanks,
David

> Best regards,
> Martin
>
> -----Original Message-----
> From: David Holmes [mailto:david.holmes at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 12:21
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 6/10/2016 7:09 PM, Doerr, Martin wrote:
>> Hi David,
>>
>> thanks for taking a look at my proposal.
>>
>> Maybe "unnecessary cache line sharing of contended memory" would be more comprehensive.
>>
>> The purpose of the padding is to avoid the following situation:
>> 2 Monitor instances are located behind each other and some fields end up on the same cache line.
>> Some threads running on some processors compete for the first Monitor while some other threads running on some processors compete for the second one.
>> The cache line needs to get transferred between all involved processors.
>>
>> If we add enough padding, the fields which are accessed by many processors end up on different cache lines. This splits the problem into 2 independent problems. The threads competing for the first Monitor don't interfere with those ones competing for the second one any more.
>
> But that only helps for the case where the two monitors are exactly the
> wrong distance apart. Two other monitors that previously did not share
> cache lines may now do so if you make the monitors bigger.
>
> This seems completely ad-hoc. ??
>
> David
> -----
>
>
>
>> The existing padding implementation is not optimal. It's a little too small on some platforms. On other platforms, it is not wrong to pad more than necessary, but ideally, one would pad to make the Monitor size equal to the cache line size. I have kept the minimum of 64 because _name is not only used for padding and I guess people don't want it too short.
>>
>> x86_64 and SPARC have DEFAULT_CACHE_LINE_SIZE=128 in typical configurations. That's like PPC64 so the change also improves the padding on these platforms as well.
>> (On x86_64 we get the same result as on PPC64: The length of _name gets extended from 64 to 72 in product build). The padding increase only gets huge on S390.
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: David Holmes [mailto:david.holmes at oracle.com]
>> Sent: Donnerstag, 6. Oktober 2016 04:31
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> On 1/10/2016 1:48 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>>
>>> My proposed change is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>
>> Not sure I understand the existing padding code. What false sharing are
>> we trying to avoid?
>>
>> And if the existing code assumes a cache line size of 64 and declares
>> _name to be 64 chars, then why can't the new code declare name to be
>> DEFAULT_CACHE_LINE_SIZE chars? This suggests the existing padding code
>> is wrong (not just hard-wired).
>>
>> Which platforms will this cause an actual change in Monitor size other
>> than S390?
>>
>> Thanks,
>> David
>>
>>> Please review. If will also need a sponsor.
>>>
>>> Thanks and best regards,
>>> Martin
>>>

From daniel.daugherty at oracle.com  Thu Oct  6 21:13:01 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 6 Oct 2016 15:13:01 -0600
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
Message-ID: <c0660659-1374-0040-7142-8af890f49d9a@oracle.com>

On 9/30/16 9:48 AM, Doerr, Martin wrote:
> Hi,
>
> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>
> My proposed change is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/

src/share/vm/runtime/mutex.hpp
     Please update the copyright year before pushing.

     L172:   // The default length of monitor name is chosen to avoid 
false sharing.
     L173:   enum {
     L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - 
sizeof(MonitorBase),
     L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? 
CACHE_LINE_PADDING : 64
     L176:   };
     L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex

         I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
         can vary between platforms; I like that it is a minimum of 64 bytes
         and is still a constant.

         I'm also not happy that the resulting sizeof(Monitor) may not 
be a multiple
         of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate 
that unhappiness
         with the fact that sizeof(Monitor) hasn't been a multiple of 
the cache line
         size since at least 2008 and no one complained (that I know of).

         So if I was making this change, I would make MONITOR_NAME_LEN 
64 bytes
         (like it was) and add a pad field that would bring up 
sizeof(Monitor)
         to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes 
would be
         unhappy with me and anyone embedding a Monitor into another data
         structure would be unhappy with me, but I'm used to that :-)

         So what you have is fine, especially for JDK9.

     L180:  public:
     L181: #ifndef PRODUCT
     L182:   debug_only(static bool contains(Monitor * locks, Monitor * 
lock);)
     L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor * 
locks);)
     L184:   debug_only(Monitor * 
get_least_ranked_lock_besides_this(Monitor * locks);)
     L185: #endif
     L186:
     L187:   void set_owner_implementation(Thread* 
owner)                        PRODUCT_RETURN;
     L188:   void check_prelock_state     (Thread* 
thread)                       PRODUCT_RETURN;
     L189:   void check_block_state       (Thread* thread)

         These were all "protected" before. Now they are "public".
         Any particular reason?

Thumbs up on the mechanics of this change. I'm interested in the
answer to the "protected" versus "public" question, but don't
considered that query to be a blocker.


The rest of this isn't code review, but some of this caught
my attention.

src/share/vm/runtime/mutex.hpp

     old L84: // The default length of monitor name is chosen to be 64 
to avoid false sharing.
     old L85: static const int MONITOR_NAME_LEN = 64;

I had to look up the history of this comment:

$ hg log -r 55 src/share/vm/runtime/mutex.hpp
changeset:   55:2a8eb116ebbe
user:        xlu
date:        Tue Feb 05 23:21:57 2008 -0800
summary:     6610420: Debug VM crashes during monitor lock rank checking

$ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
--- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
+++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
@@ -82,6 +82,9 @@ class ParkEvent ;
  // *in that order*.  If their implementations change such that these
  // assumptions are violated, a whole lot of code will break.

+// The default length of monitor name is choosen to be 64 to avoid 
false sharing.
+static const int MONITOR_NAME_LEN = 64;
+
  class Monitor : public CHeapObj {

   public:
@@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
    volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
    ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
    volatile bool     _snuck;              // Used for sneaky locking 
(evil).
-  const char * _name;                    // Name of mutex
    int NotifyCount ;                      // diagnostic assist
-  double pad [8] ;                       // avoid false sharing
+  char _name[MONITOR_NAME_LEN];          // Name of mutex

    // Debugging fields for naming, deadlock detection, etc. (some only 
used in debug mode)
  #ifndef PRODUCT
@@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
     int  ILocked () ;

   protected:
-   static void ClearMonitor (Monitor * m) ;
+   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
     Monitor() ;

So the original code had an 8-double pad for avoiding false sharing.
Sounds very much like the old ObjectMonitor padding. I'm sure at the
time that Dice determined that 8-double value, the result was to pad
the size of Monitor to an even multiple of a particular cache line
size.

Xiobin changed the 'name' field to be an array so that the name
chars could serve double duty as the cache line pad... pun intended.
Unfortunately that pad doesn't make sure that the resulting Monitor
size is a multiple of the cache line size.

Dan


>
> Please review. If will also need a sponsor.
>
> Thanks and best regards,
> Martin
>


From claes.redestad at oracle.com  Thu Oct  6 21:51:36 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Thu, 6 Oct 2016 23:51:36 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
Message-ID: <57F6C768.7060605@oracle.com>

Hi Dan,

yes, I'm slighly unhappy with this change... :-)

... and would rather see a reuse of PaddedEnd<[Monitor|Mutex]> from
share/vm/memory/padded.hpp in the places where padding makes sense,
such as the globally allocated lists, rather than perpetuating the wart
of dual-purposing the name field for padding.

This is sort of like what you're already suggesting, except that
PaddedEnd uses template magic to actually add nothing if we're already
cache aligned, as well as allowing us to not add any footprint overhead
to existing uses where Monitors and Mutexes are already embedded (and
there are a number of existing uses in key places in both GC and
compiler code, see, e.g., CompileTask).

Thanks!

/Claes

On 2016-10-06 23:13, Daniel D. Daugherty wrote:
> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>> Hi,
>>
>> the current implementation of Monitor padding (mutex.cpp) assumes that
>> cache lines are 64 Bytes. There's a platform dependent define
>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>> padding is to avoid false sharing.
>>
>> My proposed change is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> src/share/vm/runtime/mutex.hpp
>      Please update the copyright year before pushing.
>
>      L172:   // The default length of monitor name is chosen to avoid
> false sharing.
>      L173:   enum {
>      L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
> sizeof(MonitorBase),
>      L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
> CACHE_LINE_PADDING : 64
>      L176:   };
>      L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>          I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>          can vary between platforms; I like that it is a minimum of 64
> bytes
>          and is still a constant.
>
>          I'm also not happy that the resulting sizeof(Monitor) may not
> be a multiple
>          of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
> that unhappiness
>          with the fact that sizeof(Monitor) hasn't been a multiple of
> the cache line
>          size since at least 2008 and no one complained (that I know of).
>
>          So if I was making this change, I would make MONITOR_NAME_LEN
> 64 bytes
>          (like it was) and add a pad field that would bring up
> sizeof(Monitor)
>          to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
> would be
>          unhappy with me and anyone embedding a Monitor into another data
>          structure would be unhappy with me, but I'm used to that :-)
>
>          So what you have is fine, especially for JDK9.
>
>      L180:  public:
>      L181: #ifndef PRODUCT
>      L182:   debug_only(static bool contains(Monitor * locks, Monitor *
> lock);)
>      L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
> locks);)
>      L184:   debug_only(Monitor *
> get_least_ranked_lock_besides_this(Monitor * locks);)
>      L185: #endif
>      L186:
>      L187:   void set_owner_implementation(Thread*
> owner)                        PRODUCT_RETURN;
>      L188:   void check_prelock_state     (Thread*
> thread)                       PRODUCT_RETURN;
>      L189:   void check_block_state       (Thread* thread)
>
>          These were all "protected" before. Now they are "public".
>          Any particular reason?
>
> Thumbs up on the mechanics of this change. I'm interested in the
> answer to the "protected" versus "public" question, but don't
> considered that query to be a blocker.
>
>
> The rest of this isn't code review, but some of this caught
> my attention.
>
> src/share/vm/runtime/mutex.hpp
>
>      old L84: // The default length of monitor name is chosen to be 64
> to avoid false sharing.
>      old L85: static const int MONITOR_NAME_LEN = 64;
>
> I had to look up the history of this comment:
>
> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
> changeset:   55:2a8eb116ebbe
> user:        xlu
> date:        Tue Feb 05 23:21:57 2008 -0800
> summary:     6610420: Debug VM crashes during monitor lock rank checking
>
> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
> @@ -82,6 +82,9 @@ class ParkEvent ;
>   // *in that order*.  If their implementations change such that these
>   // assumptions are violated, a whole lot of code will break.
>
> +// The default length of monitor name is choosen to be 64 to avoid
> false sharing.
> +static const int MONITOR_NAME_LEN = 64;
> +
>   class Monitor : public CHeapObj {
>
>    public:
> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>     volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>     ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>     volatile bool     _snuck;              // Used for sneaky locking
> (evil).
> -  const char * _name;                    // Name of mutex
>     int NotifyCount ;                      // diagnostic assist
> -  double pad [8] ;                       // avoid false sharing
> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>     // Debugging fields for naming, deadlock detection, etc. (some only
> used in debug mode)
>   #ifndef PRODUCT
> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>      int  ILocked () ;
>
>    protected:
> -   static void ClearMonitor (Monitor * m) ;
> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>      Monitor() ;
>
> So the original code had an 8-double pad for avoiding false sharing.
> Sounds very much like the old ObjectMonitor padding. I'm sure at the
> time that Dice determined that 8-double value, the result was to pad
> the size of Monitor to an even multiple of a particular cache line
> size.
>
> Xiobin changed the 'name' field to be an array so that the name
> chars could serve double duty as the cache line pad... pun intended.
> Unfortunately that pad doesn't make sure that the resulting Monitor
> size is a multiple of the cache line size.
>
> Dan
>
>
>>
>> Please review. If will also need a sponsor.
>>
>> Thanks and best regards,
>> Martin
>>
>

From kim.barrett at oracle.com  Thu Oct  6 22:16:28 2016
From: kim.barrett at oracle.com (Kim Barrett)
Date: Thu, 6 Oct 2016 18:16:28 -0400
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
Message-ID: <D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>

> On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>> Dear David,
>> 
>> Thank you for your comments.
>> 
>> I just used to think that it may be better that copy_to_survivor_space
>> doesn't return forwardee if CAS was failed in order to prevent from
>> reading fields in forwardee. But as you pointed, this extends fix for
>> this topic.
>> 
>> I removed two NULL assignments from the previous wevrev.
>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
> 
> Which simply takes us back to where we were. It may not be safe for the caller of those methods to access the fields of the returned "forwardee".
> 
> Sorry but I'm not seeing anything here that justifies removing the barriers from the cas in this code. GC lurkers feel free to jump in here - this is your code afterall! ;-)
> 
> David
> -----

Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
to me to be extremely fragile and hard to reason about.  The places
where that copied object might escape to and be examined seem to be
myriad.  And not only do we need to worry about them today, but also
for future maintenance.  Even if it can modified and shown to be
correct today, it would be very easy to intoduce a bug later, as
should be obvious from the various issues pointed out so far during
this review.

The key issue here is that we copy obj into new_obj, and then make
new_obj accessible to other threads via the CAS.  Those other threads
might attempt to access data in new_obj.  This suggests the CAS ought
to have at least a release fence to ensure the copy is complete before
the CAS is performed.  No amount of fencing on the read side (such as
in the work stealing) can remove that need.

And that might be all that is needed.  On the post-CAS side, we load
the forwardee and then load values from it.  I thik we can use
implicit consume with dependent loads (except on Alpha) plus the
suggested release fence to get the desired effect.  (If not, use an
acquire form of forwardee()?)

I'm not certain that just a release fence is sufficient (I'm less
familiar with ParallelGC than I'd like for looking at something like
this), but I'm pretty sure I wouldn't want to go any weaker than that.


From daniel.daugherty at oracle.com  Thu Oct  6 23:02:53 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 6 Oct 2016 17:02:53 -0600
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <57F6C768.7060605@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<57F6C768.7060605@oracle.com>
Message-ID: <0f3ed330-11cb-1a9d-70e3-45515309032e@oracle.com>

I was going to bring up PaddedEnd, but decided not to since the example
closest to my fingertips is what we did with ObjectMonitor and PaddedEnd...
I didn't think you liked that one either... maybe I'm just confused... :-)

Dan


On 10/6/16 3:51 PM, Claes Redestad wrote:
> Hi Dan,
>
> yes, I'm slighly unhappy with this change... :-)
>
> ... and would rather see a reuse of PaddedEnd<[Monitor|Mutex]> from
> share/vm/memory/padded.hpp in the places where padding makes sense,
> such as the globally allocated lists, rather than perpetuating the wart
> of dual-purposing the name field for padding.
>
> This is sort of like what you're already suggesting, except that
> PaddedEnd uses template magic to actually add nothing if we're already
> cache aligned, as well as allowing us to not add any footprint overhead
> to existing uses where Monitors and Mutexes are already embedded (and
> there are a number of existing uses in key places in both GC and
> compiler code, see, e.g., CompileTask).
>
> Thanks!
>
> /Claes
>
> On 2016-10-06 23:13, Daniel D. Daugherty wrote:
>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> the current implementation of Monitor padding (mutex.cpp) assumes that
>>> cache lines are 64 Bytes. There's a platform dependent define
>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>> padding is to avoid false sharing.
>>>
>>> My proposed change is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>
>> src/share/vm/runtime/mutex.hpp
>>      Please update the copyright year before pushing.
>>
>>      L172:   // The default length of monitor name is chosen to avoid
>> false sharing.
>>      L173:   enum {
>>      L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>> sizeof(MonitorBase),
>>      L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>> CACHE_LINE_PADDING : 64
>>      L176:   };
>>      L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>          I have to say that I'm not fond of the fact that 
>> MONITOR_NAME_LEN
>>          can vary between platforms; I like that it is a minimum of 64
>> bytes
>>          and is still a constant.
>>
>>          I'm also not happy that the resulting sizeof(Monitor) may not
>> be a multiple
>>          of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
>> that unhappiness
>>          with the fact that sizeof(Monitor) hasn't been a multiple of
>> the cache line
>>          size since at least 2008 and no one complained (that I know 
>> of).
>>
>>          So if I was making this change, I would make MONITOR_NAME_LEN
>> 64 bytes
>>          (like it was) and add a pad field that would bring up
>> sizeof(Monitor)
>>          to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
>> would be
>>          unhappy with me and anyone embedding a Monitor into another 
>> data
>>          structure would be unhappy with me, but I'm used to that :-)
>>
>>          So what you have is fine, especially for JDK9.
>>
>>      L180:  public:
>>      L181: #ifndef PRODUCT
>>      L182:   debug_only(static bool contains(Monitor * locks, Monitor *
>> lock);)
>>      L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
>> locks);)
>>      L184:   debug_only(Monitor *
>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>      L185: #endif
>>      L186:
>>      L187:   void set_owner_implementation(Thread*
>> owner)                        PRODUCT_RETURN;
>>      L188:   void check_prelock_state     (Thread*
>> thread)                       PRODUCT_RETURN;
>>      L189:   void check_block_state       (Thread* thread)
>>
>>          These were all "protected" before. Now they are "public".
>>          Any particular reason?
>>
>> Thumbs up on the mechanics of this change. I'm interested in the
>> answer to the "protected" versus "public" question, but don't
>> considered that query to be a blocker.
>>
>>
>> The rest of this isn't code review, but some of this caught
>> my attention.
>>
>> src/share/vm/runtime/mutex.hpp
>>
>>      old L84: // The default length of monitor name is chosen to be 64
>> to avoid false sharing.
>>      old L85: static const int MONITOR_NAME_LEN = 64;
>>
>> I had to look up the history of this comment:
>>
>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>> changeset:   55:2a8eb116ebbe
>> user:        xlu
>> date:        Tue Feb 05 23:21:57 2008 -0800
>> summary:     6610420: Debug VM crashes during monitor lock rank checking
>>
>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>   // *in that order*.  If their implementations change such that these
>>   // assumptions are violated, a whole lot of code will break.
>>
>> +// The default length of monitor name is choosen to be 64 to avoid
>> false sharing.
>> +static const int MONITOR_NAME_LEN = 64;
>> +
>>   class Monitor : public CHeapObj {
>>
>>    public:
>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>     volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>     ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>     volatile bool     _snuck;              // Used for sneaky locking
>> (evil).
>> -  const char * _name;                    // Name of mutex
>>     int NotifyCount ;                      // diagnostic assist
>> -  double pad [8] ;                       // avoid false sharing
>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>     // Debugging fields for naming, deadlock detection, etc. (some only
>> used in debug mode)
>>   #ifndef PRODUCT
>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>      int  ILocked () ;
>>
>>    protected:
>> -   static void ClearMonitor (Monitor * m) ;
>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>      Monitor() ;
>>
>> So the original code had an 8-double pad for avoiding false sharing.
>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>> time that Dice determined that 8-double value, the result was to pad
>> the size of Monitor to an even multiple of a particular cache line
>> size.
>>
>> Xiobin changed the 'name' field to be an array so that the name
>> chars could serve double duty as the cache line pad... pun intended.
>> Unfortunately that pad doesn't make sure that the resulting Monitor
>> size is a multiple of the cache line size.
>>
>> Dan
>>
>>
>>>
>>> Please review. If will also need a sponsor.
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>


From HORII at jp.ibm.com  Fri Oct  7 02:50:51 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Fri, 7 Oct 2016 11:50:51 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
Message-ID: <OFA46B5429.431E2E63-ON49258045.000A886B-49258045.000FA3FB@notes.na.collabserv.com>

Dear Kim, David, and all,

Thank you for your comments. 

I created a new webrev. I added memory_order_release as a new enum of 
cmpxchg_memory_order (atomic.hpp) and use it to update forwardees. 

http://cr.openjdk.java.net/~horii/8154736/webrev.04/

Originally, two sync were called before and after cmpxchg in ppc. With 
this change, one of them is reduced. Though one sync still remains, 
performance will be improved.

Could you give your comments on this new webrev?

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


Kim Barrett <kim.barrett at oracle.com> wrote on 10/07/2016 07:16:28:

> From: Kim Barrett <kim.barrett at oracle.com>
> To: David Holmes <david.holmes at oracle.com>
> Cc: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-compiler-dev <hotspot-
> compiler-dev at openjdk.java.net>, Tim Ellison 
> <Tim_Ellison at uk.ibm.com>, "ppc-aix-port-dev at openjdk.java.net" <ppc-
> aix-port-dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP, 
> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
> "hotspot-runtime-dev at openjdk.java.net" 
<hotspot-runtime-dev at openjdk.java.net>
> Date: 10/07/2016 07:17
> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and 
> copy_to_survivor for ppc64
> 
> > On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com> 
wrote:
> > 
> > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
> >> Dear David,
> >> 
> >> Thank you for your comments.
> >> 
> >> I just used to think that it may be better that 
copy_to_survivor_space
> >> doesn't return forwardee if CAS was failed in order to prevent from
> >> reading fields in forwardee. But as you pointed, this extends fix for
> >> this topic.
> >> 
> >> I removed two NULL assignments from the previous wevrev.
> >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
> > 
> > Which simply takes us back to where we were. It may not be safe 
> for the caller of those methods to access the fields of the returned
> "forwardee".
> > 
> > Sorry but I'm not seeing anything here that justifies removing the
> barriers from the cas in this code. GC lurkers feel free to jump in 
> here - this is your code afterall! ;-)
> > 
> > David
> > -----
> 
> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
> to me to be extremely fragile and hard to reason about.  The places
> where that copied object might escape to and be examined seem to be
> myriad.  And not only do we need to worry about them today, but also
> for future maintenance.  Even if it can modified and shown to be
> correct today, it would be very easy to intoduce a bug later, as
> should be obvious from the various issues pointed out so far during
> this review.
> 
> The key issue here is that we copy obj into new_obj, and then make
> new_obj accessible to other threads via the CAS.  Those other threads
> might attempt to access data in new_obj.  This suggests the CAS ought
> to have at least a release fence to ensure the copy is complete before
> the CAS is performed.  No amount of fencing on the read side (such as
> in the work stealing) can remove that need.
> 
> And that might be all that is needed.  On the post-CAS side, we load
> the forwardee and then load values from it.  I thik we can use
> implicit consume with dependent loads (except on Alpha) plus the
> suggested release fence to get the desired effect.  (If not, use an
> acquire form of forwardee()?)
> 
> I'm not certain that just a release fence is sufficient (I'm less
> familiar with ParallelGC than I'd like for looking at something like
> this), but I'm pretty sure I wouldn't want to go any weaker than that.
> 
> 


From david.holmes at oracle.com  Fri Oct  7 03:23:03 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 7 Oct 2016 13:23:03 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OFA46B5429.431E2E63-ON49258045.000A886B-49258045.000FA3FB@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
	<OFA46B5429.431E2E63-ON49258045.000A886B-49258045.000FA3FB@notes.na.collabserv.com>
Message-ID: <1ca79f91-4096-f404-349e-0906ce976748@oracle.com>

On 7/10/2016 12:50 PM, Hiroshi H Horii wrote:
> Dear Kim, David, and all,
>
> Thank you for your comments.
>
> I created a new webrev. I added memory_order_release as a new enum of
> cmpxchg_memory_order (atomic.hpp) and use it to update forwardees.
>
> http://cr.openjdk.java.net/~horii/8154736/webrev.04/

I think you intended to modify cmpxchg_pre_membar not 
cmpxchg_post_membar! Release semantics require the "post" fence. Though 
technically release semantics would put the barrier before the store, 
not after. But with no pre-fence you could in theory have a store before 
the cas move inside the cas implementation (on ppc/arm) and get 
reordered with the store performed by the cas.

src/share/vm/gc/parallel/psPromotionManager.cpp still uses 
memory_order_relaxed.

That aside this seems too reactive to me. Kim may be right that release 
semantics are sufficient for this code, but that is a claim that needs 
some consideration and validation before we just run with it and make 
the change. The approach to changes like this needs a lot more 
discipline and methodology in my opinion.

David
-----

> Originally, two sync were called before and after cmpxchg in ppc. With
> this change, one of them is reduced. Though one sync still remains,
> performance will be improved.
>
> Could you give your comments on this new webrev?
>
> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>
>
> Kim Barrett <kim.barrett at oracle.com> wrote on 10/07/2016 07:16:28:
>
>> From: Kim Barrett <kim.barrett at oracle.com>
>> To: David Holmes <david.holmes at oracle.com>
>> Cc: Hiroshi H Horii/Japan/IBM at IBMJP, hotspot-compiler-dev <hotspot-
>> compiler-dev at openjdk.java.net>, Tim Ellison
>> <Tim_Ellison at uk.ibm.com>, "ppc-aix-port-dev at openjdk.java.net" <ppc-
>> aix-port-dev at openjdk.java.net>, Michihiro Horie/Japan/IBM at IBMJP,
>> "hotspot-gc-dev at openjdk.java.net" <hotspot-gc-dev at openjdk.java.net>,
>> "hotspot-runtime-dev at openjdk.java.net"
> <hotspot-runtime-dev at openjdk.java.net>
>> Date: 10/07/2016 07:17
>> Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and
>> copy_to_survivor for ppc64
>>
>> > On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
> wrote:
>> >
>> > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>> >> Dear David,
>> >>
>> >> Thank you for your comments.
>> >>
>> >> I just used to think that it may be better that copy_to_survivor_space
>> >> doesn't return forwardee if CAS was failed in order to prevent from
>> >> reading fields in forwardee. But as you pointed, this extends fix for
>> >> this topic.
>> >>
>> >> I removed two NULL assignments from the previous wevrev.
>> >> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
>> >
>> > Which simply takes us back to where we were. It may not be safe
>> for the caller of those methods to access the fields of the returned
>> "forwardee".
>> >
>> > Sorry but I'm not seeing anything here that justifies removing the
>> barriers from the cas in this code. GC lurkers feel free to jump in
>> here - this is your code afterall! ;-)
>> >
>> > David
>> > -----
>>
>> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
>> to me to be extremely fragile and hard to reason about.  The places
>> where that copied object might escape to and be examined seem to be
>> myriad.  And not only do we need to worry about them today, but also
>> for future maintenance.  Even if it can modified and shown to be
>> correct today, it would be very easy to intoduce a bug later, as
>> should be obvious from the various issues pointed out so far during
>> this review.
>>
>> The key issue here is that we copy obj into new_obj, and then make
>> new_obj accessible to other threads via the CAS.  Those other threads
>> might attempt to access data in new_obj.  This suggests the CAS ought
>> to have at least a release fence to ensure the copy is complete before
>> the CAS is performed.  No amount of fencing on the read side (such as
>> in the work stealing) can remove that need.
>>
>> And that might be all that is needed.  On the post-CAS side, we load
>> the forwardee and then load values from it.  I thik we can use
>> implicit consume with dependent loads (except on Alpha) plus the
>> suggested release fence to get the desired effect.  (If not, use an
>> acquire form of forwardee()?)
>>
>> I'm not certain that just a release fence is sufficient (I'm less
>> familiar with ParallelGC than I'd like for looking at something like
>> this), but I'm pretty sure I wouldn't want to go any weaker than that.
>>
>>
>

From jiangli.zhou at Oracle.COM  Fri Oct  7 04:39:00 2016
From: jiangli.zhou at Oracle.COM (Jiangli Zhou)
Date: Thu, 6 Oct 2016 21:39:00 -0700
Subject: RFR: 8167333: Invalid source path info might be used when creating
	ClassFileStream after CFLH transforms a shared classes in some cases
Message-ID: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>

Hi,

Please review the following fix for JDK-8167333 <https://bugs.openjdk.java.net/browse/JDK-8167333>:

  webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>

When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info.

Tested with all existing class data sharing tests.

Thanks,
Jiangli 

From david.holmes at oracle.com  Fri Oct  7 05:33:13 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 7 Oct 2016 15:33:13 +1000
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes in some
	cases
In-Reply-To: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
Message-ID: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>

Hi Jiangli,

On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
> Hi,
>
> Please review the following fix for JDK-8167333 <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>
>   webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>
> When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info.

It isn't obvious to me that all callers of CFS::source()/clone_source() 
will handle getting a NULL. Of course I can't tell which of those 
callers may be involved in this particular use-case.

Thanks,
David

> Tested with all existing class data sharing tests.
>
> Thanks,
> Jiangli
>

From dmitry.samersoff at oracle.com  Fri Oct  7 08:36:57 2016
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Fri, 7 Oct 2016 11:36:57 +0300
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes in some
	cases
In-Reply-To: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
Message-ID: <09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com>

Jiangli,

I see couple of places in hotspot where result of
FileMapInfo::shared_classpath() is de-referenced without additional null
check.

Could you insert check/assert/comments as appropriate to these places?

-Dmitry

On 2016-10-07 07:39, Jiangli Zhou wrote:
> Hi,
> 
> Please review the following fix for JDK-8167333
> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
> 
> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
> 
> When a shared class is transformed by a JVMTI agent during initial
> loading (via CFLH), the VM creates a new ClassFileStream using the
> transformed class data. The source path info from the class?
> associated SharedClassPathEntry is passed as the ?source? argument to
> ClassFileStream. However, some shared classes may not have an
> associated SharedClassPathEntry and the class_path_index is -1. The
> VM needs to detect such case and not passing an invalid source path
> info.
> 
> Tested with all existing class data sharing tests.
> 
> Thanks, Jiangli
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From martin.doerr at sap.com  Fri Oct  7 09:34:10 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 7 Oct 2016 09:34:10 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
Message-ID: <f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>

Hi Dan,

thank you very much for reviewing and for investigating the history.

It was not intended to make the functions you mentioned public. I've fixed that.
I also updated the copyright information.

New webrev is here:
http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/

@Coleen: Please use this one. I have also added reviewer attribution.

Thanks and best regards,
Martin


-----Original Message-----
From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com] 
Sent: Donnerstag, 6. Oktober 2016 23:13
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

On 9/30/16 9:48 AM, Doerr, Martin wrote:
> Hi,
>
> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>
> My proposed change is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/

src/share/vm/runtime/mutex.hpp
     Please update the copyright year before pushing.

     L172:   // The default length of monitor name is chosen to avoid 
false sharing.
     L173:   enum {
     L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE - 
sizeof(MonitorBase),
     L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ? 
CACHE_LINE_PADDING : 64
     L176:   };
     L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex

         I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
         can vary between platforms; I like that it is a minimum of 64 bytes
         and is still a constant.

         I'm also not happy that the resulting sizeof(Monitor) may not 
be a multiple
         of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate 
that unhappiness
         with the fact that sizeof(Monitor) hasn't been a multiple of 
the cache line
         size since at least 2008 and no one complained (that I know of).

         So if I was making this change, I would make MONITOR_NAME_LEN 
64 bytes
         (like it was) and add a pad field that would bring up 
sizeof(Monitor)
         to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes 
would be
         unhappy with me and anyone embedding a Monitor into another data
         structure would be unhappy with me, but I'm used to that :-)

         So what you have is fine, especially for JDK9.

     L180:  public:
     L181: #ifndef PRODUCT
     L182:   debug_only(static bool contains(Monitor * locks, Monitor * 
lock);)
     L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor * 
locks);)
     L184:   debug_only(Monitor * 
get_least_ranked_lock_besides_this(Monitor * locks);)
     L185: #endif
     L186:
     L187:   void set_owner_implementation(Thread* 
owner)                        PRODUCT_RETURN;
     L188:   void check_prelock_state     (Thread* 
thread)                       PRODUCT_RETURN;
     L189:   void check_block_state       (Thread* thread)

         These were all "protected" before. Now they are "public".
         Any particular reason?

Thumbs up on the mechanics of this change. I'm interested in the
answer to the "protected" versus "public" question, but don't
considered that query to be a blocker.


The rest of this isn't code review, but some of this caught
my attention.

src/share/vm/runtime/mutex.hpp

     old L84: // The default length of monitor name is chosen to be 64 
to avoid false sharing.
     old L85: static const int MONITOR_NAME_LEN = 64;

I had to look up the history of this comment:

$ hg log -r 55 src/share/vm/runtime/mutex.hpp
changeset:   55:2a8eb116ebbe
user:        xlu
date:        Tue Feb 05 23:21:57 2008 -0800
summary:     6610420: Debug VM crashes during monitor lock rank checking

$ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
--- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
+++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
@@ -82,6 +82,9 @@ class ParkEvent ;
  // *in that order*.  If their implementations change such that these
  // assumptions are violated, a whole lot of code will break.

+// The default length of monitor name is choosen to be 64 to avoid 
false sharing.
+static const int MONITOR_NAME_LEN = 64;
+
  class Monitor : public CHeapObj {

   public:
@@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
    volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
    ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
    volatile bool     _snuck;              // Used for sneaky locking 
(evil).
-  const char * _name;                    // Name of mutex
    int NotifyCount ;                      // diagnostic assist
-  double pad [8] ;                       // avoid false sharing
+  char _name[MONITOR_NAME_LEN];          // Name of mutex

    // Debugging fields for naming, deadlock detection, etc. (some only 
used in debug mode)
  #ifndef PRODUCT
@@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
     int  ILocked () ;

   protected:
-   static void ClearMonitor (Monitor * m) ;
+   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
     Monitor() ;

So the original code had an 8-double pad for avoiding false sharing.
Sounds very much like the old ObjectMonitor padding. I'm sure at the
time that Dice determined that 8-double value, the result was to pad
the size of Monitor to an even multiple of a particular cache line
size.

Xiobin changed the 'name' field to be an array so that the name
chars could serve double duty as the cache line pad... pun intended.
Unfortunately that pad doesn't make sure that the resulting Monitor
size is a multiple of the cache line size.

Dan


>
> Please review. If will also need a sponsor.
>
> Thanks and best regards,
> Martin
>


From claes.redestad at oracle.com  Fri Oct  7 09:59:30 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Fri, 7 Oct 2016 11:59:30 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
Message-ID: <57F77202.8070201@oracle.com>

Hi,

after due consideration I strongly consider this change unacceptable
since it adds footprint overhead to performance critcial compiler and
GC code with little to no data to support this won't cause regressions.

Changes to Monitor/Mutex needs to be done with more surgical precision
than this.

If I do have a veto on the matter, here it is.

Thanks!

/Claes

On 2016-10-07 11:34, Doerr, Martin wrote:
> Hi Dan,
>
> thank you very much for reviewing and for investigating the history.
>
> It was not intended to make the functions you mentioned public. I've fixed that.
> I also updated the copyright information.
>
> New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>
> @Coleen: Please use this one. I have also added reviewer attribution.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 23:13
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>> Hi,
>>
>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>
>> My proposed change is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> src/share/vm/runtime/mutex.hpp
>       Please update the copyright year before pushing.
>
>       L172:   // The default length of monitor name is chosen to avoid
> false sharing.
>       L173:   enum {
>       L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
> sizeof(MonitorBase),
>       L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
> CACHE_LINE_PADDING : 64
>       L176:   };
>       L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>           I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>           can vary between platforms; I like that it is a minimum of 64 bytes
>           and is still a constant.
>
>           I'm also not happy that the resulting sizeof(Monitor) may not
> be a multiple
>           of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
> that unhappiness
>           with the fact that sizeof(Monitor) hasn't been a multiple of
> the cache line
>           size since at least 2008 and no one complained (that I know of).
>
>           So if I was making this change, I would make MONITOR_NAME_LEN
> 64 bytes
>           (like it was) and add a pad field that would bring up
> sizeof(Monitor)
>           to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
> would be
>           unhappy with me and anyone embedding a Monitor into another data
>           structure would be unhappy with me, but I'm used to that :-)
>
>           So what you have is fine, especially for JDK9.
>
>       L180:  public:
>       L181: #ifndef PRODUCT
>       L182:   debug_only(static bool contains(Monitor * locks, Monitor *
> lock);)
>       L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
> locks);)
>       L184:   debug_only(Monitor *
> get_least_ranked_lock_besides_this(Monitor * locks);)
>       L185: #endif
>       L186:
>       L187:   void set_owner_implementation(Thread*
> owner)                        PRODUCT_RETURN;
>       L188:   void check_prelock_state     (Thread*
> thread)                       PRODUCT_RETURN;
>       L189:   void check_block_state       (Thread* thread)
>
>           These were all "protected" before. Now they are "public".
>           Any particular reason?
>
> Thumbs up on the mechanics of this change. I'm interested in the
> answer to the "protected" versus "public" question, but don't
> considered that query to be a blocker.
>
>
> The rest of this isn't code review, but some of this caught
> my attention.
>
> src/share/vm/runtime/mutex.hpp
>
>       old L84: // The default length of monitor name is chosen to be 64
> to avoid false sharing.
>       old L85: static const int MONITOR_NAME_LEN = 64;
>
> I had to look up the history of this comment:
>
> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
> changeset:   55:2a8eb116ebbe
> user:        xlu
> date:        Tue Feb 05 23:21:57 2008 -0800
> summary:     6610420: Debug VM crashes during monitor lock rank checking
>
> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
> @@ -82,6 +82,9 @@ class ParkEvent ;
>    // *in that order*.  If their implementations change such that these
>    // assumptions are violated, a whole lot of code will break.
>
> +// The default length of monitor name is choosen to be 64 to avoid
> false sharing.
> +static const int MONITOR_NAME_LEN = 64;
> +
>    class Monitor : public CHeapObj {
>
>     public:
> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>      volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>      ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>      volatile bool     _snuck;              // Used for sneaky locking
> (evil).
> -  const char * _name;                    // Name of mutex
>      int NotifyCount ;                      // diagnostic assist
> -  double pad [8] ;                       // avoid false sharing
> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>      // Debugging fields for naming, deadlock detection, etc. (some only
> used in debug mode)
>    #ifndef PRODUCT
> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>       int  ILocked () ;
>
>     protected:
> -   static void ClearMonitor (Monitor * m) ;
> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>       Monitor() ;
>
> So the original code had an 8-double pad for avoiding false sharing.
> Sounds very much like the old ObjectMonitor padding. I'm sure at the
> time that Dice determined that 8-double value, the result was to pad
> the size of Monitor to an even multiple of a particular cache line
> size.
>
> Xiobin changed the 'name' field to be an array so that the name
> chars could serve double duty as the cache line pad... pun intended.
> Unfortunately that pad doesn't make sure that the resulting Monitor
> size is a multiple of the cache line size.
>
> Dan
>
>
>>
>> Please review. If will also need a sponsor.
>>
>> Thanks and best regards,
>> Martin
>>
>

From martin.doerr at sap.com  Fri Oct  7 10:18:56 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 7 Oct 2016 10:18:56 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <57F77202.8070201@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
Message-ID: <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>

Hi Claes,

what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler.
What exactly is your concern about the footprint overhead?
Are you not concerned about the risk of false sharing?

Best regards,
Martin

-----Original Message-----
From: Claes Redestad [mailto:claes.redestad at oracle.com] 
Sent: Freitag, 7. Oktober 2016 12:00
To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Hi,

after due consideration I strongly consider this change unacceptable
since it adds footprint overhead to performance critcial compiler and
GC code with little to no data to support this won't cause regressions.

Changes to Monitor/Mutex needs to be done with more surgical precision
than this.

If I do have a veto on the matter, here it is.

Thanks!

/Claes

On 2016-10-07 11:34, Doerr, Martin wrote:
> Hi Dan,
>
> thank you very much for reviewing and for investigating the history.
>
> It was not intended to make the functions you mentioned public. I've fixed that.
> I also updated the copyright information.
>
> New webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>
> @Coleen: Please use this one. I have also added reviewer attribution.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
> Sent: Donnerstag, 6. Oktober 2016 23:13
> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>> Hi,
>>
>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>
>> My proposed change is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>
> src/share/vm/runtime/mutex.hpp
>       Please update the copyright year before pushing.
>
>       L172:   // The default length of monitor name is chosen to avoid
> false sharing.
>       L173:   enum {
>       L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
> sizeof(MonitorBase),
>       L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
> CACHE_LINE_PADDING : 64
>       L176:   };
>       L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>           I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>           can vary between platforms; I like that it is a minimum of 64 bytes
>           and is still a constant.
>
>           I'm also not happy that the resulting sizeof(Monitor) may not
> be a multiple
>           of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
> that unhappiness
>           with the fact that sizeof(Monitor) hasn't been a multiple of
> the cache line
>           size since at least 2008 and no one complained (that I know of).
>
>           So if I was making this change, I would make MONITOR_NAME_LEN
> 64 bytes
>           (like it was) and add a pad field that would bring up
> sizeof(Monitor)
>           to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
> would be
>           unhappy with me and anyone embedding a Monitor into another data
>           structure would be unhappy with me, but I'm used to that :-)
>
>           So what you have is fine, especially for JDK9.
>
>       L180:  public:
>       L181: #ifndef PRODUCT
>       L182:   debug_only(static bool contains(Monitor * locks, Monitor *
> lock);)
>       L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
> locks);)
>       L184:   debug_only(Monitor *
> get_least_ranked_lock_besides_this(Monitor * locks);)
>       L185: #endif
>       L186:
>       L187:   void set_owner_implementation(Thread*
> owner)                        PRODUCT_RETURN;
>       L188:   void check_prelock_state     (Thread*
> thread)                       PRODUCT_RETURN;
>       L189:   void check_block_state       (Thread* thread)
>
>           These were all "protected" before. Now they are "public".
>           Any particular reason?
>
> Thumbs up on the mechanics of this change. I'm interested in the
> answer to the "protected" versus "public" question, but don't
> considered that query to be a blocker.
>
>
> The rest of this isn't code review, but some of this caught
> my attention.
>
> src/share/vm/runtime/mutex.hpp
>
>       old L84: // The default length of monitor name is chosen to be 64
> to avoid false sharing.
>       old L85: static const int MONITOR_NAME_LEN = 64;
>
> I had to look up the history of this comment:
>
> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
> changeset:   55:2a8eb116ebbe
> user:        xlu
> date:        Tue Feb 05 23:21:57 2008 -0800
> summary:     6610420: Debug VM crashes during monitor lock rank checking
>
> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
> @@ -82,6 +82,9 @@ class ParkEvent ;
>    // *in that order*.  If their implementations change such that these
>    // assumptions are violated, a whole lot of code will break.
>
> +// The default length of monitor name is choosen to be 64 to avoid
> false sharing.
> +static const int MONITOR_NAME_LEN = 64;
> +
>    class Monitor : public CHeapObj {
>
>     public:
> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>      volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>      ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>      volatile bool     _snuck;              // Used for sneaky locking
> (evil).
> -  const char * _name;                    // Name of mutex
>      int NotifyCount ;                      // diagnostic assist
> -  double pad [8] ;                       // avoid false sharing
> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>
>      // Debugging fields for naming, deadlock detection, etc. (some only
> used in debug mode)
>    #ifndef PRODUCT
> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>       int  ILocked () ;
>
>     protected:
> -   static void ClearMonitor (Monitor * m) ;
> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>       Monitor() ;
>
> So the original code had an 8-double pad for avoiding false sharing.
> Sounds very much like the old ObjectMonitor padding. I'm sure at the
> time that Dice determined that 8-double value, the result was to pad
> the size of Monitor to an even multiple of a particular cache line
> size.
>
> Xiobin changed the 'name' field to be an array so that the name
> chars could serve double duty as the cache line pad... pun intended.
> Unfortunately that pad doesn't make sure that the resulting Monitor
> size is a multiple of the cache line size.
>
> Dan
>
>
>>
>> Please review. If will also need a sponsor.
>>
>> Thanks and best regards,
>> Martin
>>
>

From claes.redestad at oracle.com  Fri Oct  7 10:34:51 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Fri, 7 Oct 2016 12:34:51 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
Message-ID: <57F77A4B.6060604@oracle.com>

Hi,

I'm concerned that this might be an easy-but-wrong fix to a complex
problem, and acknowledging that there are already use cases where the
_name field is contra-productive. This change adds complexity that
makes it even less likely such uses will be optimized for in the
future.

There are Padded* types put in place to deal with these concerns
explicitly rather than implicitly *where it matters*, which allows us
the choice of applying padding or not on a per use-case basis (which
means we can also remove the _name field for those use cases that don't
care about either, which might be most outside of the global lists).

I am very concerned about false sharing, but I have no data to support
that this change has any measurable benefit in practice: I even did an
experiment years ago now where I turned _name into a pointer to not pad
at all and saw nothing exceeding noise levels on any benchmark.

Thanks!

/Claes

On 2016-10-07 12:18, Doerr, Martin wrote:
> Hi Claes,
>
> what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler.
> What exactly is your concern about the footprint overhead?
> Are you not concerned about the risk of false sharing?
>
> Best regards,
> Martin
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Freitag, 7. Oktober 2016 12:00
> To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> after due consideration I strongly consider this change unacceptable
> since it adds footprint overhead to performance critcial compiler and
> GC code with little to no data to support this won't cause regressions.
>
> Changes to Monitor/Mutex needs to be done with more surgical precision
> than this.
>
> If I do have a veto on the matter, here it is.
>
> Thanks!
>
> /Claes
>
> On 2016-10-07 11:34, Doerr, Martin wrote:
>> Hi Dan,
>>
>> thank you very much for reviewing and for investigating the history.
>>
>> It was not intended to make the functions you mentioned public. I've fixed that.
>> I also updated the copyright information.
>>
>> New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>
>> @Coleen: Please use this one. I have also added reviewer attribution.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>> Sent: Donnerstag, 6. Oktober 2016 23:13
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>>
>>> My proposed change is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>
>> src/share/vm/runtime/mutex.hpp
>>        Please update the copyright year before pushing.
>>
>>        L172:   // The default length of monitor name is chosen to avoid
>> false sharing.
>>        L173:   enum {
>>        L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>> sizeof(MonitorBase),
>>        L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>> CACHE_LINE_PADDING : 64
>>        L176:   };
>>        L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>            I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>>            can vary between platforms; I like that it is a minimum of 64 bytes
>>            and is still a constant.
>>
>>            I'm also not happy that the resulting sizeof(Monitor) may not
>> be a multiple
>>            of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
>> that unhappiness
>>            with the fact that sizeof(Monitor) hasn't been a multiple of
>> the cache line
>>            size since at least 2008 and no one complained (that I know of).
>>
>>            So if I was making this change, I would make MONITOR_NAME_LEN
>> 64 bytes
>>            (like it was) and add a pad field that would bring up
>> sizeof(Monitor)
>>            to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
>> would be
>>            unhappy with me and anyone embedding a Monitor into another data
>>            structure would be unhappy with me, but I'm used to that :-)
>>
>>            So what you have is fine, especially for JDK9.
>>
>>        L180:  public:
>>        L181: #ifndef PRODUCT
>>        L182:   debug_only(static bool contains(Monitor * locks, Monitor *
>> lock);)
>>        L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
>> locks);)
>>        L184:   debug_only(Monitor *
>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>        L185: #endif
>>        L186:
>>        L187:   void set_owner_implementation(Thread*
>> owner)                        PRODUCT_RETURN;
>>        L188:   void check_prelock_state     (Thread*
>> thread)                       PRODUCT_RETURN;
>>        L189:   void check_block_state       (Thread* thread)
>>
>>            These were all "protected" before. Now they are "public".
>>            Any particular reason?
>>
>> Thumbs up on the mechanics of this change. I'm interested in the
>> answer to the "protected" versus "public" question, but don't
>> considered that query to be a blocker.
>>
>>
>> The rest of this isn't code review, but some of this caught
>> my attention.
>>
>> src/share/vm/runtime/mutex.hpp
>>
>>        old L84: // The default length of monitor name is chosen to be 64
>> to avoid false sharing.
>>        old L85: static const int MONITOR_NAME_LEN = 64;
>>
>> I had to look up the history of this comment:
>>
>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>> changeset:   55:2a8eb116ebbe
>> user:        xlu
>> date:        Tue Feb 05 23:21:57 2008 -0800
>> summary:     6610420: Debug VM crashes during monitor lock rank checking
>>
>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>     // *in that order*.  If their implementations change such that these
>>     // assumptions are violated, a whole lot of code will break.
>>
>> +// The default length of monitor name is choosen to be 64 to avoid
>> false sharing.
>> +static const int MONITOR_NAME_LEN = 64;
>> +
>>     class Monitor : public CHeapObj {
>>
>>      public:
>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>       volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>       ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>       volatile bool     _snuck;              // Used for sneaky locking
>> (evil).
>> -  const char * _name;                    // Name of mutex
>>       int NotifyCount ;                      // diagnostic assist
>> -  double pad [8] ;                       // avoid false sharing
>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>       // Debugging fields for naming, deadlock detection, etc. (some only
>> used in debug mode)
>>     #ifndef PRODUCT
>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>        int  ILocked () ;
>>
>>      protected:
>> -   static void ClearMonitor (Monitor * m) ;
>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>        Monitor() ;
>>
>> So the original code had an 8-double pad for avoiding false sharing.
>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>> time that Dice determined that 8-double value, the result was to pad
>> the size of Monitor to an even multiple of a particular cache line
>> size.
>>
>> Xiobin changed the 'name' field to be an array so that the name
>> chars could serve double duty as the cache line pad... pun intended.
>> Unfortunately that pad doesn't make sure that the resulting Monitor
>> size is a multiple of the cache line size.
>>
>> Dan
>>
>>
>>>
>>> Please review. If will also need a sponsor.
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>

From thomas.schatzl at oracle.com  Fri Oct  7 10:37:52 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 07 Oct 2016 12:37:52 +0200
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
	for ppc64
In-Reply-To: <1ca79f91-4096-f404-349e-0906ce976748@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
	<OFA46B5429.431E2E63-ON49258045.000A886B-49258045.000FA3FB@notes.na.collabserv.com>
	<1ca79f91-4096-f404-349e-0906ce976748@oracle.com>
Message-ID: <1475836672.2622.81.camel@oracle.com>

Hi,

On Fri, 2016-10-07 at 13:23 +1000, David Holmes wrote:
> On 7/10/2016 12:50 PM, Hiroshi H Horii wrote:
> > 
> > Dear Kim, David, and all,
> > 
> > Thank you for your comments.
> > 
> > I created a new webrev. I added memory_order_release as a new enum
> > of
> > cmpxchg_memory_order (atomic.hpp) and use it to update forwardees.
> > 
> > http://cr.openjdk.java.net/~horii/8154736/webrev.04/
> I think you intended to modify cmpxchg_pre_membar not?
> cmpxchg_post_membar! Release semantics require the "post" fence.
> Though technically release semantics would put the barrier before the
> store, not after. But with no pre-fence you could in theory have a
> store before the cas move inside the cas implementation (on ppc/arm)
> and get reordered with the store performed by the cas.
> 
> src/share/vm/gc/parallel/psPromotionManager.cpp still uses?
> memory_order_relaxed.
> 
> That aside this seems too reactive to me. Kim may be right that
> release semantics are sufficient for this code, but that is a claim
> that needs some consideration and validation before we just run with
> it and make the change. The approach to changes like this needs a lot
> more discipline and methodology in my opinion.

There are some other small issues with the suggested change:

- the idiom used to print trace log messages

 244       if (log_develop_is_enabled(Trace, gc, scavenge)) {
 245         log_develop_trace(gc, scavenge)("{%s %s " PTR_FORMAT " ->
" PTR_FORMAT " (%d)}",

does not require the first line.

Log_develop_trace() will only generate code when compiled in debug mode anyway, so the check before that is superfluous.

I saw that in several places.

- could you explain what the advantage of

?298???if (!o->is_forwarded()) {
?299?????copy_to_survivor_space<promote_immediately>(o);
?300???}
?301???oop new_obj = o->forwardee();

compared to

?281???oop new_obj = o->is_forwarded()
?282?????????? o->forwardee()
?283?????????: copy_to_survivor_space<promote_immediately>(o);

in PSPromotionManager::copy_and_push_safe_barrier() is?

This seems to introduce a superfluous forced reload (forwardee()
accesses a volatile variable), as copy_to_survivor_space already
reloads and returns the forwardee even with all these changes.

I may be overlooking something crucial (and it's Friday), but I do not see a difference in behavior (and problems) compared to old code, just the additional load.

- the new assert at

?302???assert(forwardee != NULL, "forwardee should not be NULL");

seems superfluous. At this point, after the CAS has been executed, we
assume that there must be a forwardee. Either copy_or_survivor_space
returns it, or there has already been a forwardee.

Thanks,
  Thomas


From thomas.schatzl at oracle.com  Fri Oct  7 10:38:55 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 07 Oct 2016 12:38:55 +0200
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
	for ppc64
In-Reply-To: <D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
Message-ID: <1475836735.2622.82.camel@oracle.com>

Hi,

On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote:
> > 
> > On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
> > wrote:
> > 
> > On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
> > > 
> > > Dear David,
> > > 
> > > Thank you for your comments.
> > > 
> > > I just used to think that it may be better that
> > > copy_to_survivor_space
> > > doesn't return forwardee if CAS was failed in order to prevent
> > > from
> > > reading fields in forwardee. But as you pointed, this extends fix
> > > for
> > > this topic.
> > > 
> > > I removed two NULL assignments from the previous wevrev.
> > > http://cr.openjdk.java.net/~horii/8154736/webrev.03/
> > Which simply takes us back to where we were. It may not be safe for
> > the caller of those methods to access the fields of the returned
> > "forwardee".
> > 
> > Sorry but I'm not seeing anything here that justifies removing the
> > barriers from the cas in this code. GC lurkers feel free to jump in
> > here - this is your code afterall! ;-)
> > 
> > David
> > -----
> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
> to me to be extremely fragile and hard to reason about.??The places
> where that copied object might escape to and be examined seem to be
> myriad.??And not only do we need to worry about them today, but also
> for future maintenance.??Even if it can modified and shown to be
> correct today, it would be very easy to intoduce a bug later, as
> should be obvious from the various issues pointed out so far during
> this review.
> 
> The key issue here is that we copy obj into new_obj, and then make
> new_obj accessible to other threads via the CAS.??Those other threads
> might attempt to access data in new_obj.??This suggests the CAS ought
> to have at least a release fence to ensure the copy is complete
> before the CAS is performed.??No amount of fencing on the read side
> (such a in the work stealing) can remove that need.

Depending on what "other threads" means.

The thread that pops the reference should be okay (as it does a fence),
because the thread pushing the entry on the mark stack also releases
all stores.

Threads not participating in this protocol are problematic, and this is
indeed worrying me as well a bit.
I have not seen any so far, but there is always a risk of overlooking
some place.

> And that might be all that is needed.??On the post-CAS side, we load
> the forwardee and then load values from it.??I thik we can use
> implicit consume with dependent loads (except on Alpha) plus the
> suggested release fence to get the desired effect.??(If not, use an
> acquire form of forwardee()?)
> 
> I'm not certain that just a release fence is sufficient (I'm less
> familiar with ParallelGC than I'd like for looking at something like
> this), but I'm pretty sure I wouldn't want to go any weaker than
> that.

This change "only" impacts ppc64 at this time.

Thanks,
? Thomas

From david.holmes at oracle.com  Fri Oct  7 12:08:25 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 7 Oct 2016 22:08:25 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <1475836735.2622.82.camel@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
	<1475836735.2622.82.camel@oracle.com>
Message-ID: <c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>

Thomas,

 > This change "only" impacts ppc64 at this time.

This is a dangerous stance. The changes are to shared code. It only 
happens that only PPC atomic implementations support anything other than 
conservative barriers today. If someone adds additional forms on other 
platforms their GC code suddenly has new behaviour!

Changes in shared code must be algorithmically correct on all platforms. 
Not just "it will work fine today".

Given all then work being done to add missing barriers, removing them 
must come with a detailed analysis establishing the safety of doing so. 
And I am not seeing that here.

David

On 7/10/2016 8:38 PM, Thomas Schatzl wrote:
> Hi,
>
> On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote:
>>>
>>> On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
>>> wrote:
>>>
>>> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>>>>
>>>> Dear David,
>>>>
>>>> Thank you for your comments.
>>>>
>>>> I just used to think that it may be better that
>>>> copy_to_survivor_space
>>>> doesn't return forwardee if CAS was failed in order to prevent
>>>> from
>>>> reading fields in forwardee. But as you pointed, this extends fix
>>>> for
>>>> this topic.
>>>>
>>>> I removed two NULL assignments from the previous wevrev.
>>>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
>>> Which simply takes us back to where we were. It may not be safe for
>>> the caller of those methods to access the fields of the returned
>>> "forwardee".
>>>
>>> Sorry but I'm not seeing anything here that justifies removing the
>>> barriers from the cas in this code. GC lurkers feel free to jump in
>>> here - this is your code afterall! ;-)
>>>
>>> David
>>> -----
>> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems
>> to me to be extremely fragile and hard to reason about.  The places
>> where that copied object might escape to and be examined seem to be
>> myriad.  And not only do we need to worry about them today, but also
>> for future maintenance.  Even if it can modified and shown to be
>> correct today, it would be very easy to intoduce a bug later, as
>> should be obvious from the various issues pointed out so far during
>> this review.
>>
>> The key issue here is that we copy obj into new_obj, and then make
>> new_obj accessible to other threads via the CAS.  Those other threads
>> might attempt to access data in new_obj.  This suggests the CAS ought
>> to have at least a release fence to ensure the copy is complete
>> before the CAS is performed.  No amount of fencing on the read side
>> (such a in the work stealing) can remove that need.
>
> Depending on what "other threads" means.
>
> The thread that pops the reference should be okay (as it does a fence),
> because the thread pushing the entry on the mark stack also releases
> all stores.
>
> Threads not participating in this protocol are problematic, and this is
> indeed worrying me as well a bit.
> I have not seen any so far, but there is always a risk of overlooking
> some place.
>
>> And that might be all that is needed.  On the post-CAS side, we load
>> the forwardee and then load values from it.  I thik we can use
>> implicit consume with dependent loads (except on Alpha) plus the
>> suggested release fence to get the desired effect.  (If not, use an
>> acquire form of forwardee()?)
>>
>> I'm not certain that just a release fence is sufficient (I'm less
>> familiar with ParallelGC than I'd like for looking at something like
>> this), but I'm pretty sure I wouldn't want to go any weaker than
>> that.
>
> This change "only" impacts ppc64 at this time.
>
> Thanks,
>   Thomas
>

From harold.seigel at oracle.com  Fri Oct  7 15:20:17 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Fri, 7 Oct 2016 11:20:17 -0400
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16 out
	of order with lock Module_lock/6 -- possible deadlock
Message-ID: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>

Hi,

Please review this fix for JDK-8166364.

This fix moves the setting of the module fields in the class mirrors of 
the fixup_module_list outside of the Module_lock.  The determination of 
whether a mirror should be added to the fixup_module_list is still done 
under Module_lock as is the defining of module java.base.  This prevents 
any synchronization issues with a mirror being erroneously added to the 
fixup_module_list after module java.base is defined.  The other piece is 
that the VM, in Modules::define_javabase_module(), guarantees under 
Module_lock that only one thread will ever successfully define module 
java.base.

Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/

JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364

The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, 
java/io, java/lang, and java/util tests using both fastdebug and 
slowdebug builds.  The nsk cololocated and the non-colocated quick tests 
were also run against a slowdebug build.

Thanks, Harold


From jiangli.zhou at Oracle.COM  Fri Oct  7 18:23:46 2016
From: jiangli.zhou at Oracle.COM (Jiangli Zhou)
Date: Fri, 7 Oct 2016 11:23:46 -0700
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes
	in some cases
In-Reply-To: <8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
	<8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>
Message-ID: <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>

Hi David,

Thanks for taking a look.

> On Oct 6, 2016, at 10:33 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> Hi Jiangli,
> 
> On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
>> Hi,
>> 
>> Please review the following fix for JDK-8167333 <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>> 
>>  webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/ <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>> 
>> When a shared class is transformed by a JVMTI agent during initial loading (via CFLH), the VM creates a new ClassFileStream using the transformed class data. The source path info from the class? associated SharedClassPathEntry is passed as the ?source? argument to ClassFileStream. However, some shared classes may not have an associated SharedClassPathEntry and the class_path_index is -1. The VM needs to detect such case and not passing an invalid source path info.
> 
> It isn't obvious to me that all callers of CFS::source()/clone_source() will handle getting a NULL. Of course I can't tell which of those callers may be involved in this particular use-case.

I took a look of all the code that calls CFS::source()/clone_source(). They all handle the NULL case with explicit NULL check. For our specific case, the particular caller involved is InstanceKlass::print_loading_log. Before the fix, it crashed when trying to print the invalid cfs->source after (cfs->source() != NULL) check.

Thanks,
Jiangli

> 
> Thanks,
> David
> 
>> Tested with all existing class data sharing tests.
>> 
>> Thanks,
>> Jiangli
>> 


From jiangli.zhou at oracle.com  Fri Oct  7 22:35:19 2016
From: jiangli.zhou at oracle.com (Jiangli Zhou)
Date: Fri, 7 Oct 2016 15:35:19 -0700
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes
	in some cases
In-Reply-To: <09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
	<09ec6b8e-f071-e12a-bbc8-8c45bab3b9a8@oracle.com>
Message-ID: <6FCF7FDC-BFDD-43F1-A311-3F7B72CA8D2C@oracle.com>

Hi Dmitry,

Thanks for the review.

> On Oct 7, 2016, at 1:36 AM, Dmitry Samersoff <dmitry.samersoff at oracle.com> wrote:
> 
> Jiangli,
> 
> I see couple of places in hotspot where result of
> FileMapInfo::shared_classpath() is de-referenced without additional null
> check.
> 
> Could you insert check/assert/comments as appropriate to these places?

That?s a very good point. I double-checked all other places that call FileMapInfo::shared_classpath(). They all have valid non-NULL shared class path entry when the entry field is accessed. Just being cautious, I added some asserts to make sure the shared class path entry is not NULL. Here is updated webrev:

http://cr.openjdk.java.net/~jiangli/8167333/webrev.01/ <http://cr.openjdk.java.net/~jiangli/8167333/webrev.01/>

I?ve rerun all related tests.

Thanks,
Jiangli

> 
> -Dmitry
> 
> On 2016-10-07 07:39, Jiangli Zhou wrote:
>> Hi,
>> 
>> Please review the following fix for JDK-8167333
>> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>> 
>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
>> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>> 
>> When a shared class is transformed by a JVMTI agent during initial
>> loading (via CFLH), the VM creates a new ClassFileStream using the
>> transformed class data. The source path info from the class?
>> associated SharedClassPathEntry is passed as the ?source? argument to
>> ClassFileStream. However, some shared classes may not have an
>> associated SharedClassPathEntry and the class_path_index is -1. The
>> VM needs to detect such case and not passing an invalid source path
>> info.
>> 
>> Tested with all existing class data sharing tests.
>> 
>> Thanks, Jiangli
>> 
> 
> 
> -- 
> Dmitry Samersoff
> Oracle Java development team, Saint Petersburg, Russia
> * I would love to change the world, but they won't give me the sources.


From dmitry.samersoff at oracle.com  Sat Oct  8 16:15:34 2016
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Sat, 8 Oct 2016 19:15:34 +0300
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
Message-ID: <2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com>

Harold,

I'd tried your fix in my kitchensync setup and can confirm,
that VM doesn't crash anymore.

The fix looks good for me.

-Dmitry


On 2016-10-07 18:20, harold seigel wrote:
> Hi,
> 
> Please review this fix for JDK-8166364.
> 
> This fix moves the setting of the module fields in the class mirrors of
> the fixup_module_list outside of the Module_lock.  The determination of
> whether a mirror should be added to the fixup_module_list is still done
> under Module_lock as is the defining of module java.base.  This prevents
> any synchronization issues with a mirror being erroneously added to the
> fixup_module_list after module java.base is defined.  The other piece is
> that the VM, in Modules::define_javabase_module(), guarantees under
> Module_lock that only one thread will ever successfully define module
> java.base.
> 
> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/
> 
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
> 
> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot,
> java/io, java/lang, and java/util tests using both fastdebug and
> slowdebug builds.  The nsk cololocated and the non-colocated quick tests
> were also run against a slowdebug build.
> 
> Thanks, Harold
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From david.holmes at oracle.com  Sun Oct  9 20:59:08 2016
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 10 Oct 2016 06:59:08 +1000
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
Message-ID: <988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com>

Hi Harold,

Change looks good. A couple of suggestions re comments below.

On 8/10/2016 1:20 AM, harold seigel wrote:
> Hi,
>
> Please review this fix for JDK-8166364.
>
> This fix moves the setting of the module fields in the class mirrors of
> the fixup_module_list outside of the Module_lock.  The determination of
> whether a mirror should be added to the fixup_module_list is still done
> under Module_lock as is the defining of module java.base.  This prevents
> any synchronization issues with a mirror being erroneously added to the
> fixup_module_list after module java.base is defined.  The other piece is
> that the VM, in Modules::define_javabase_module(), guarantees under
> Module_lock that only one thread will ever successfully define module
> java.base.
>
> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/

src/share/vm/classfile/modules.cpp

Can you add a comment here:

  246
+      // Only the thread that actually defined the base module will get 
here,
+      // so no locking is needed.
+
  247   // Patch any previously loaded class's module field with 
java.base's java.lang.reflect.Module.
  248   ModuleEntryTable::patch_javabase_entries(module_handle);

---

src/share/vm/classfile/javaClasses.cpp

This comment is no longer quite reads right now it is not the else clause:

  801     // java.base was defined at some point between calling 
create_mirror()
  802     // and obtaining the Module_lock, patch this particular class 
with java.base.

suggest:

// If java.base was already defined then patch this particular class 
with java.base.


Thanks,
David


> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
>
> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot,
> java/io, java/lang, and java/util tests using both fastdebug and
> slowdebug builds.  The nsk cololocated and the non-colocated quick tests
> were also run against a slowdebug build.
>
> Thanks, Harold
>

From david.holmes at oracle.com  Sun Oct  9 21:10:37 2016
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 10 Oct 2016 07:10:37 +1000
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes in some
	cases
In-Reply-To: <386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
	<8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>
	<386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>
Message-ID: <aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>

On 8/10/2016 4:23 AM, Jiangli Zhou wrote:
> Hi David,
>
> Thanks for taking a look.
>
>> On Oct 6, 2016, at 10:33 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>> Hi Jiangli,
>>
>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
>>> Hi,
>>>
>>> Please review the following fix for JDK-8167333
>>> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>>>
>>>  webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
>>> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>>>
>>> When a shared class is transformed by a JVMTI agent during initial
>>> loading (via CFLH), the VM creates a new ClassFileStream using the
>>> transformed class data. The source path info from the class?
>>> associated SharedClassPathEntry is passed as the ?source? argument to
>>> ClassFileStream. However, some shared classes may not have an
>>> associated SharedClassPathEntry and the class_path_index is -1. The
>>> VM needs to detect such case and not passing an invalid source path info.
>>
>> It isn't obvious to me that all callers of
>> CFS::source()/clone_source() will handle getting a NULL. Of course I
>> can't tell which of those callers may be involved in this particular
>> use-case.
>
> I took a look of all the code that calls CFS::source()/clone_source().
> They all handle the NULL case with explicit NULL check. For our specific
> case, the particular caller involved
> is InstanceKlass::print_loading_log. Before the fix, it crashed when
> trying to print the invalid cfs->source after (cfs->source() != NULL) check.

Thanks for verifying. I've looked at the latest webrev with the 
additional asserts - all looks good.

David

> Thanks,
> Jiangli
>
>>
>> Thanks,
>> David
>>
>>> Tested with all existing class data sharing tests.
>>>
>>> Thanks,
>>> Jiangli
>>>
>

From ioi.lam at oracle.com  Mon Oct 10 06:27:56 2016
From: ioi.lam at oracle.com (Ioi Lam)
Date: Sun, 09 Oct 2016 23:27:56 -0700
Subject: RFR: 8167333: Invalid source path info might be used when creating
	ClassFileStream after CFLH transforms a shared classes in some cases
In-Reply-To: <aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>	<8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>	<386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>
	<aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>
Message-ID: <57FB34EC.2070606@oracle.com>


On 10/9/16 2:10 PM, David Holmes wrote:
> On 8/10/2016 4:23 AM, Jiangli Zhou wrote:
>> Hi David,
>>
>> Thanks for taking a look.
>>
>>> On Oct 6, 2016, at 10:33 PM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>>
>>> Hi Jiangli,
>>>
>>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
>>>> Hi,
>>>>
>>>> Please review the following fix for JDK-8167333
>>>> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>>>>
>>>>  webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
>>>> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>>>>
>>>> When a shared class is transformed by a JVMTI agent during initial
>>>> loading (via CFLH), the VM creates a new ClassFileStream using the
>>>> transformed class data. The source path info from the class?
>>>> associated SharedClassPathEntry is passed as the ?source? argument to
>>>> ClassFileStream. However, some shared classes may not have an
>>>> associated SharedClassPathEntry and the class_path_index is -1. The
>>>> VM needs to detect such case and not passing an invalid source path 
>>>> info.
>>>
>>> It isn't obvious to me that all callers of
>>> CFS::source()/clone_source() will handle getting a NULL. Of course I
>>> can't tell which of those callers may be involved in this particular
>>> use-case.
>>
>> I took a look of all the code that calls CFS::source()/clone_source().
>> They all handle the NULL case with explicit NULL check. For our specific
>> case, the particular caller involved
>> is InstanceKlass::print_loading_log. Before the fix, it crashed when
>> trying to print the invalid cfs->source after (cfs->source() != NULL) 
>> check.
>
> Thanks for verifying. I've looked at the latest webrev with the 
> additional asserts - all looks good.
>

Looks good to me, too. Thanks

- Ioi
> David
>
>> Thanks,
>> Jiangli
>>
>>>
>>> Thanks,
>>> David
>>>
>>>> Tested with all existing class data sharing tests.
>>>>
>>>> Thanks,
>>>> Jiangli
>>>>
>>


From robbin.ehn at oracle.com  Mon Oct 10 07:06:04 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 10 Oct 2016 09:06:04 +0200
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
Message-ID: <cc9aeb7a-0a71-79d4-5e91-f1ebab1556de@oracle.com>

Thanks for fixing, looks good and works fine!

/Robbin

On 10/07/2016 05:20 PM, harold seigel wrote:
> Hi,
>
> Please review this fix for JDK-8166364.
>
> This fix moves the setting of the module fields in the class mirrors of the fixup_module_list outside of the Module_lock.  The determination of whether a mirror should be
> added to the fixup_module_list is still done under Module_lock as is the defining of module java.base.  This prevents any synchronization issues with a mirror being
> erroneously added to the fixup_module_list after module java.base is defined.  The other piece is that the VM, in Modules::define_javabase_module(), guarantees under
> Module_lock that only one thread will ever successfully define module java.base.
>
> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/
>
> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
>
> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, java/io, java/lang, and java/util tests using both fastdebug and slowdebug builds.  The nsk
> cololocated and the non-colocated quick tests were also run against a slowdebug build.
>
> Thanks, Harold
>

From shafi.s.ahmad at oracle.com  Mon Oct 10 07:24:37 2016
From: shafi.s.ahmad at oracle.com (Shafi Ahmad)
Date: Mon, 10 Oct 2016 00:24:37 -0700 (PDT)
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM
	caused by inability to create threads'
Message-ID: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>

Hi All,

Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'.

Summary: 
In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError.
While I was analysis this issue I found there are two jvm state where OOM can happen:
 1.  OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is.
 2.  OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it.

Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004

Testing: jprt and jtreg (on Linux x86_64)

Regards,
Shafi

From robbin.ehn at oracle.com  Mon Oct 10 09:07:46 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Mon, 10 Oct 2016 11:07:46 +0200
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
Message-ID: <6e5229df-7012-4fec-eaa9-cd80ec7bbb4b@oracle.com>

Hi Shafi,

Looks good and works fine (tested with repro from bug), thanks for fixing!

/Robbin

(not a Reviewer)

On 10/10/2016 09:24 AM, Shafi Ahmad wrote:
> Hi All,
>
> Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'.
>
> Summary:
> In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> While I was analysis this issue I found there are two jvm state where OOM can happen:
>  1.  OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is.
>  2.  OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it.
>
> Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>
> Testing: jprt and jtreg (on Linux x86_64)
>
> Regards,
> Shafi
>

From christian.tornqvist at oracle.com  Mon Oct 10 11:43:29 2016
From: christian.tornqvist at oracle.com (Christian Tornqvist)
Date: Mon, 10 Oct 2016 07:43:29 -0400
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM	caused by inability to create threads'
In-Reply-To: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
Message-ID: <20e601d222eb$85dbf710$9193e530$@oracle.com>

Hi Shafi,

Note that this bug is targeted for JDK 10, you need to wait with pushing
this until the repository for that release is open.

Thanks,
Christian

-----Original Message-----
From: hotspot-runtime-dev
[mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Shafi
Ahmad
Sent: Monday, October 10, 2016 3:25 AM
To: hotspot-runtime-dev at openjdk.java.net
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
OOM caused by inability to create threads'

Hi All,

Please review the simple change for the fix of bug '' JDK-8155004:
CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create
threads'.

Summary: 
In the current implementation there are few scenarios where we are not
obeying the jvm option -XX:+CrashOnOutOfMemoryError.
While I was analysis this issue I found there are two jvm state where OOM
can happen:
 1.  OOM during VM initialization - as per our internal discussion for this
case it is not worth for dumping core file, so this is left as it is.
 2.  OOM once VM is initialized - For this scenario most of the place code
is already added but few place corresponding code changes are missing so
this change covers it.

Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004

Testing: jprt and jtreg (on Linux x86_64)

Regards,
Shafi


From harold.seigel at oracle.com  Mon Oct 10 12:29:51 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Mon, 10 Oct 2016 08:29:51 -0400
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
	<2bc43aca-ea7b-25f7-6dc0-0b026c6811f5@oracle.com>
Message-ID: <c2266d1c-7cb7-daa9-19c2-500e232a1d0e@oracle.com>

Hi Dmitry,

Thanks for the review.

Harold


On 10/8/2016 12:15 PM, Dmitry Samersoff wrote:
> Harold,
>
> I'd tried your fix in my kitchensync setup and can confirm,
> that VM doesn't crash anymore.
>
> The fix looks good for me.
>
> -Dmitry
>
>
> On 2016-10-07 18:20, harold seigel wrote:
>> Hi,
>>
>> Please review this fix for JDK-8166364.
>>
>> This fix moves the setting of the module fields in the class mirrors of
>> the fixup_module_list outside of the Module_lock.  The determination of
>> whether a mirror should be added to the fixup_module_list is still done
>> under Module_lock as is the defining of module java.base.  This prevents
>> any synchronization issues with a mirror being erroneously added to the
>> fixup_module_list after module java.base is defined.  The other piece is
>> that the VM, in Modules::define_javabase_module(), guarantees under
>> Module_lock that only one thread will ever successfully define module
>> java.base.
>>
>> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
>>
>> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot,
>> java/io, java/lang, and java/util tests using both fastdebug and
>> slowdebug builds.  The nsk cololocated and the non-colocated quick tests
>> were also run against a slowdebug build.
>>
>> Thanks, Harold
>>
>


From harold.seigel at oracle.com  Mon Oct 10 12:33:13 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Mon, 10 Oct 2016 08:33:13 -0400
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
	<988ef919-dbf0-092e-be72-622dbc1c663f@oracle.com>
Message-ID: <b3e9b2c9-a66b-e3a4-b45d-5c23690fa66b@oracle.com>

Hi David,

Thanks for the review.  I'll fix the comments before pushing the fix.

Harold


On 10/9/2016 4:59 PM, David Holmes wrote:
> Hi Harold,
>
> Change looks good. A couple of suggestions re comments below.
>
> On 8/10/2016 1:20 AM, harold seigel wrote:
>> Hi,
>>
>> Please review this fix for JDK-8166364.
>>
>> This fix moves the setting of the module fields in the class mirrors of
>> the fixup_module_list outside of the Module_lock.  The determination of
>> whether a mirror should be added to the fixup_module_list is still done
>> under Module_lock as is the defining of module java.base.  This prevents
>> any synchronization issues with a mirror being erroneously added to the
>> fixup_module_list after module java.base is defined.  The other piece is
>> that the VM, in Modules::define_javabase_module(), guarantees under
>> Module_lock that only one thread will ever successfully define module
>> java.base.
>>
>> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/
>
> src/share/vm/classfile/modules.cpp
>
> Can you add a comment here:
>
>  246
> +      // Only the thread that actually defined the base module will 
> get here,
> +      // so no locking is needed.
> +
>  247   // Patch any previously loaded class's module field with 
> java.base's java.lang.reflect.Module.
>  248   ModuleEntryTable::patch_javabase_entries(module_handle);
>
> ---
>
> src/share/vm/classfile/javaClasses.cpp
>
> This comment is no longer quite reads right now it is not the else 
> clause:
>
>  801     // java.base was defined at some point between calling 
> create_mirror()
>  802     // and obtaining the Module_lock, patch this particular class 
> with java.base.
>
> suggest:
>
> // If java.base was already defined then patch this particular class 
> with java.base.
>
>
> Thanks,
> David
>
>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
>>
>> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot,
>> java/io, java/lang, and java/util tests using both fastdebug and
>> slowdebug builds.  The nsk cololocated and the non-colocated quick tests
>> were also run against a slowdebug build.
>>
>> Thanks, Harold
>>


From harold.seigel at oracle.com  Mon Oct 10 12:33:38 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Mon, 10 Oct 2016 08:33:38 -0400
Subject: RFR(S) 8166364: fatal error: acquiring lock DirtyCardQ_CBL_mon/16
	out of order with lock Module_lock/6 -- possible deadlock
In-Reply-To: <cc9aeb7a-0a71-79d4-5e91-f1ebab1556de@oracle.com>
References: <aa78b0c3-3f77-4265-5859-4f7916188a83@oracle.com>
	<cc9aeb7a-0a71-79d4-5e91-f1ebab1556de@oracle.com>
Message-ID: <307d74fc-6abe-8376-adb9-ed21601655a1@oracle.com>

Hi Robin,

Thanks for the review and testing it.

Harold


On 10/10/2016 3:06 AM, Robbin Ehn wrote:
> Thanks for fixing, looks good and works fine!
>
> /Robbin
>
> On 10/07/2016 05:20 PM, harold seigel wrote:
>> Hi,
>>
>> Please review this fix for JDK-8166364.
>>
>> This fix moves the setting of the module fields in the class mirrors 
>> of the fixup_module_list outside of the Module_lock. The 
>> determination of whether a mirror should be
>> added to the fixup_module_list is still done under Module_lock as is 
>> the defining of module java.base.  This prevents any synchronization 
>> issues with a mirror being
>> erroneously added to the fixup_module_list after module java.base is 
>> defined.  The other piece is that the VM, in 
>> Modules::define_javabase_module(), guarantees under
>> Module_lock that only one thread will ever successfully define module 
>> java.base.
>>
>> Open webrev: http://cr.openjdk.java.net/~hseigel/bug_8166364/
>>
>> JBS Bug: https://bugs.openjdk.java.net/browse/JDK-8166364
>>
>> The fix was tested with the JCK Lang and vm tests, and JTreg hotspot, 
>> java/io, java/lang, and java/util tests using both fastdebug and 
>> slowdebug builds.  The nsk
>> cololocated and the non-colocated quick tests were also run against a 
>> slowdebug build.
>>
>> Thanks, Harold
>>


From mikael.gerdin at oracle.com  Mon Oct 10 13:59:41 2016
From: mikael.gerdin at oracle.com (Mikael Gerdin)
Date: Mon, 10 Oct 2016 15:59:41 +0200
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
Message-ID: <eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>

Hi,

On 2016-10-10 09:24, Shafi Ahmad wrote:
> Hi All,
>
> Please review the simple change for the fix of bug '' JDK-8155004: CrashOnOutOfMemoryError doesn't work for OOM caused by inability to create threads'.
>
> Summary:
> In the current implementation there are few scenarios where we are not obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> While I was analysis this issue I found there are two jvm state where OOM can happen:
>  1.  OOM during VM initialization - as per our internal discussion for this case it is not worth for dumping core file, so this is left as it is.
>  2.  OOM once VM is initialized - For this scenario most of the place code is already added but few place corresponding code changes are missing so this change covers it.
>
> Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/


There is a lot of confusion in the VM code with the term "out of memory 
error".
In some places it refers to code throwing a java.lang.OutOfMemoryError 
and expecting running java code to be able to potentially catch that 
Error and continue running.

In other places, such as callers of report_vm_out_of_memory, the 
situation is much more dire and the calling thread may not even be a 
JavaThread and as such cannot "throw" an exception.
report_vm_out_of_memory is only invoked through the macro 
vm_exit_out_of_memory, which of course implies that the condition is 
fatal and we are about to terminate the JVM process altogether.

I think that it's incorrect to call code related to 
java.lang.OutOfMemoryError in report_vm_out_of_memory since the 
condition may not even be correlated with Java level application behavior.

/Mikael

> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>
> Testing: jprt and jtreg (on Linux x86_64)
>
> Regards,
> Shafi
>

From jiangli.zhou at oracle.com  Mon Oct 10 17:04:53 2016
From: jiangli.zhou at oracle.com (Jiangli Zhou)
Date: Mon, 10 Oct 2016 10:04:53 -0700
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes
	in some cases
In-Reply-To: <aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
	<8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>
	<386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>
	<aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>
Message-ID: <0C4A4742-70F4-419E-8C96-B8311C0D98CE@oracle.com>

Thanks, David!

Jiangli

> On Oct 9, 2016, at 2:10 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 8/10/2016 4:23 AM, Jiangli Zhou wrote:
>> Hi David,
>> 
>> Thanks for taking a look.
>> 
>>> On Oct 6, 2016, at 10:33 PM, David Holmes <david.holmes at oracle.com
>>> <mailto:david.holmes at oracle.com>> wrote:
>>> 
>>> Hi Jiangli,
>>> 
>>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
>>>> Hi,
>>>> 
>>>> Please review the following fix for JDK-8167333
>>>> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>>>> 
>>>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
>>>> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>>>> 
>>>> When a shared class is transformed by a JVMTI agent during initial
>>>> loading (via CFLH), the VM creates a new ClassFileStream using the
>>>> transformed class data. The source path info from the class?
>>>> associated SharedClassPathEntry is passed as the ?source? argument to
>>>> ClassFileStream. However, some shared classes may not have an
>>>> associated SharedClassPathEntry and the class_path_index is -1. The
>>>> VM needs to detect such case and not passing an invalid source path info.
>>> 
>>> It isn't obvious to me that all callers of
>>> CFS::source()/clone_source() will handle getting a NULL. Of course I
>>> can't tell which of those callers may be involved in this particular
>>> use-case.
>> 
>> I took a look of all the code that calls CFS::source()/clone_source().
>> They all handle the NULL case with explicit NULL check. For our specific
>> case, the particular caller involved
>> is InstanceKlass::print_loading_log. Before the fix, it crashed when
>> trying to print the invalid cfs->source after (cfs->source() != NULL) check.
> 
> Thanks for verifying. I've looked at the latest webrev with the additional asserts - all looks good.
> 
> David
> 
>> Thanks,
>> Jiangli
>> 
>>> 
>>> Thanks,
>>> David
>>> 
>>>> Tested with all existing class data sharing tests.
>>>> 
>>>> Thanks,
>>>> Jiangli
>>>> 
>> 


From jiangli.zhou at oracle.com  Mon Oct 10 17:10:27 2016
From: jiangli.zhou at oracle.com (Jiangli Zhou)
Date: Mon, 10 Oct 2016 10:10:27 -0700
Subject: RFR: 8167333: Invalid source path info might be used when
	creating ClassFileStream after CFLH transforms a shared classes
	in some cases
In-Reply-To: <57FB34EC.2070606@oracle.com>
References: <F0639AF9-0685-44E8-BB71-A79998BB17B3@oracle.com>
	<8705f5d4-3437-2aac-57cd-fc232d0ddeef@oracle.com>
	<386D2372-D26F-4075-94DD-F9D9F7F10013@oracle.com>
	<aba4a718-6caf-8b45-4fcc-293007ed5c7a@oracle.com>
	<57FB34EC.2070606@oracle.com>
Message-ID: <5E913B80-34CA-4744-906E-DC4C4E576E2B@oracle.com>

Hi Ioi,

Thanks for the review!

Jiangli

> On Oct 9, 2016, at 11:27 PM, Ioi Lam <ioi.lam at oracle.com> wrote:
> 
> 
> 
> On 10/9/16 2:10 PM, David Holmes wrote:
>> On 8/10/2016 4:23 AM, Jiangli Zhou wrote:
>>> Hi David,
>>> 
>>> Thanks for taking a look.
>>> 
>>>> On Oct 6, 2016, at 10:33 PM, David Holmes <david.holmes at oracle.com
>>>> <mailto:david.holmes at oracle.com>> wrote:
>>>> 
>>>> Hi Jiangli,
>>>> 
>>>> On 7/10/2016 2:39 PM, Jiangli Zhou wrote:
>>>>> Hi,
>>>>> 
>>>>> Please review the following fix for JDK-8167333
>>>>> <https://bugs.openjdk.java.net/browse/JDK-8167333>:
>>>>> 
>>>>> webrev: http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/
>>>>> <http://cr.openjdk.java.net/~jiangli/8167333/webrev.00/>
>>>>> 
>>>>> When a shared class is transformed by a JVMTI agent during initial
>>>>> loading (via CFLH), the VM creates a new ClassFileStream using the
>>>>> transformed class data. The source path info from the class?
>>>>> associated SharedClassPathEntry is passed as the ?source? argument to
>>>>> ClassFileStream. However, some shared classes may not have an
>>>>> associated SharedClassPathEntry and the class_path_index is -1. The
>>>>> VM needs to detect such case and not passing an invalid source path info.
>>>> 
>>>> It isn't obvious to me that all callers of
>>>> CFS::source()/clone_source() will handle getting a NULL. Of course I
>>>> can't tell which of those callers may be involved in this particular
>>>> use-case.
>>> 
>>> I took a look of all the code that calls CFS::source()/clone_source().
>>> They all handle the NULL case with explicit NULL check. For our specific
>>> case, the particular caller involved
>>> is InstanceKlass::print_loading_log. Before the fix, it crashed when
>>> trying to print the invalid cfs->source after (cfs->source() != NULL) check.
>> 
>> Thanks for verifying. I've looked at the latest webrev with the additional asserts - all looks good.
>> 
> 
> Looks good to me, too. Thanks
> 
> - Ioi
>> David
>> 
>>> Thanks,
>>> Jiangli
>>> 
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>>> Tested with all existing class data sharing tests.
>>>>> 
>>>>> Thanks,
>>>>> Jiangli
>>>>> 
>>> 
> 


From martin.doerr at sap.com  Mon Oct 10 18:00:19 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 10 Oct 2016 18:00:19 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <57F77A4B.6060604@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
Message-ID: <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>

Hi Claes,

thank you very much for your explanations.

I agree with you that it would be better to pad where the Monitors are used. It would still fulfill the purpose of this RFE without disturbing other usages.

So I could introduce:
class PaddedMonitor : public Monitor {
  enum {
    CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - (int)sizeof(Monitor),
    PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
  };
  char _padding[PADDING_LEN];
};
and similarly PaddedMutex and replace all of the ones which get allocated in a linear fashion (mutexLocker.cpp mutex_init()).

Would you agree with this change?

Thanks and best regards,
Martin


-----Original Message-----
From: Claes Redestad [mailto:claes.redestad at oracle.com] 
Sent: Freitag, 7. Oktober 2016 12:35
To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Hi,

I'm concerned that this might be an easy-but-wrong fix to a complex
problem, and acknowledging that there are already use cases where the
_name field is contra-productive. This change adds complexity that
makes it even less likely such uses will be optimized for in the
future.

There are Padded* types put in place to deal with these concerns
explicitly rather than implicitly *where it matters*, which allows us
the choice of applying padding or not on a per use-case basis (which
means we can also remove the _name field for those use cases that don't
care about either, which might be most outside of the global lists).

I am very concerned about false sharing, but I have no data to support
that this change has any measurable benefit in practice: I even did an
experiment years ago now where I turned _name into a pointer to not pad
at all and saw nothing exceeding noise levels on any benchmark.

Thanks!

/Claes

On 2016-10-07 12:18, Doerr, Martin wrote:
> Hi Claes,
>
> what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler.
> What exactly is your concern about the footprint overhead?
> Are you not concerned about the risk of false sharing?
>
> Best regards,
> Martin
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Freitag, 7. Oktober 2016 12:00
> To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> after due consideration I strongly consider this change unacceptable
> since it adds footprint overhead to performance critcial compiler and
> GC code with little to no data to support this won't cause regressions.
>
> Changes to Monitor/Mutex needs to be done with more surgical precision
> than this.
>
> If I do have a veto on the matter, here it is.
>
> Thanks!
>
> /Claes
>
> On 2016-10-07 11:34, Doerr, Martin wrote:
>> Hi Dan,
>>
>> thank you very much for reviewing and for investigating the history.
>>
>> It was not intended to make the functions you mentioned public. I've fixed that.
>> I also updated the copyright information.
>>
>> New webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>
>> @Coleen: Please use this one. I have also added reviewer attribution.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>> Sent: Donnerstag, 6. Oktober 2016 23:13
>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>> Hi,
>>>
>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>>
>>> My proposed change is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>
>> src/share/vm/runtime/mutex.hpp
>>        Please update the copyright year before pushing.
>>
>>        L172:   // The default length of monitor name is chosen to avoid
>> false sharing.
>>        L173:   enum {
>>        L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>> sizeof(MonitorBase),
>>        L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>> CACHE_LINE_PADDING : 64
>>        L176:   };
>>        L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>            I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>>            can vary between platforms; I like that it is a minimum of 64 bytes
>>            and is still a constant.
>>
>>            I'm also not happy that the resulting sizeof(Monitor) may not
>> be a multiple
>>            of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
>> that unhappiness
>>            with the fact that sizeof(Monitor) hasn't been a multiple of
>> the cache line
>>            size since at least 2008 and no one complained (that I know of).
>>
>>            So if I was making this change, I would make MONITOR_NAME_LEN
>> 64 bytes
>>            (like it was) and add a pad field that would bring up
>> sizeof(Monitor)
>>            to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
>> would be
>>            unhappy with me and anyone embedding a Monitor into another data
>>            structure would be unhappy with me, but I'm used to that :-)
>>
>>            So what you have is fine, especially for JDK9.
>>
>>        L180:  public:
>>        L181: #ifndef PRODUCT
>>        L182:   debug_only(static bool contains(Monitor * locks, Monitor *
>> lock);)
>>        L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
>> locks);)
>>        L184:   debug_only(Monitor *
>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>        L185: #endif
>>        L186:
>>        L187:   void set_owner_implementation(Thread*
>> owner)                        PRODUCT_RETURN;
>>        L188:   void check_prelock_state     (Thread*
>> thread)                       PRODUCT_RETURN;
>>        L189:   void check_block_state       (Thread* thread)
>>
>>            These were all "protected" before. Now they are "public".
>>            Any particular reason?
>>
>> Thumbs up on the mechanics of this change. I'm interested in the
>> answer to the "protected" versus "public" question, but don't
>> considered that query to be a blocker.
>>
>>
>> The rest of this isn't code review, but some of this caught
>> my attention.
>>
>> src/share/vm/runtime/mutex.hpp
>>
>>        old L84: // The default length of monitor name is chosen to be 64
>> to avoid false sharing.
>>        old L85: static const int MONITOR_NAME_LEN = 64;
>>
>> I had to look up the history of this comment:
>>
>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>> changeset:   55:2a8eb116ebbe
>> user:        xlu
>> date:        Tue Feb 05 23:21:57 2008 -0800
>> summary:     6610420: Debug VM crashes during monitor lock rank checking
>>
>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>     // *in that order*.  If their implementations change such that these
>>     // assumptions are violated, a whole lot of code will break.
>>
>> +// The default length of monitor name is choosen to be 64 to avoid
>> false sharing.
>> +static const int MONITOR_NAME_LEN = 64;
>> +
>>     class Monitor : public CHeapObj {
>>
>>      public:
>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>       volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>       ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>       volatile bool     _snuck;              // Used for sneaky locking
>> (evil).
>> -  const char * _name;                    // Name of mutex
>>       int NotifyCount ;                      // diagnostic assist
>> -  double pad [8] ;                       // avoid false sharing
>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>
>>       // Debugging fields for naming, deadlock detection, etc. (some only
>> used in debug mode)
>>     #ifndef PRODUCT
>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>        int  ILocked () ;
>>
>>      protected:
>> -   static void ClearMonitor (Monitor * m) ;
>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>        Monitor() ;
>>
>> So the original code had an 8-double pad for avoiding false sharing.
>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>> time that Dice determined that 8-double value, the result was to pad
>> the size of Monitor to an even multiple of a particular cache line
>> size.
>>
>> Xiobin changed the 'name' field to be an array so that the name
>> chars could serve double duty as the cache line pad... pun intended.
>> Unfortunately that pad doesn't make sure that the resulting Monitor
>> size is a multiple of the cache line size.
>>
>> Dan
>>
>>
>>>
>>> Please review. If will also need a sponsor.
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>

From Derek.White at cavium.com  Fri Oct  7 17:48:26 2016
From: Derek.White at cavium.com (White, Derek)
Date: Fri, 7 Oct 2016 17:48:26 +0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
	<1475836735.2622.82.camel@oracle.com>
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
Message-ID: <CY1PR07MB239370C82A11D7110BDBAFD584C60@CY1PR07MB2393.namprd07.prod.outlook.com>

FYI, On the aarch64 side, this change would turn a CAS+acquire/release semantics (CASAL) into a naked CAS. In v8.1, or removes a post write barrier after doing a series of load/store -exclusives.
 -  Derek

-----Original Message-----
From: hotspot-compiler-dev [mailto:hotspot-compiler-dev-bounces at openjdk.java.net] On Behalf Of David Holmes
Sent: Friday, October 07, 2016 8:08 AM
To: Thomas Schatzl <thomas.schatzl at oracle.com>; Kim Barrett <kim.barrett at oracle.com>
Cc: hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>; Hiroshi H Horii <HORII at jp.ibm.com>; Tim Ellison <Tim_Ellison at uk.ibm.com>; ppc-aix-port-dev at openjdk.java.net; Michihiro Horie <HORIE at jp.ibm.com>; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Thomas,

 > This change "only" impacts ppc64 at this time.

This is a dangerous stance. The changes are to shared code. It only happens that only PPC atomic implementations support anything other than conservative barriers today. If someone adds additional forms on other platforms their GC code suddenly has new behaviour!

Changes in shared code must be algorithmically correct on all platforms. 
Not just "it will work fine today".

Given all then work being done to add missing barriers, removing them must come with a detailed analysis establishing the safety of doing so. 
And I am not seeing that here.

David

On 7/10/2016 8:38 PM, Thomas Schatzl wrote:
> Hi,
>
> On Thu, 2016-10-06 at 18:16 -0400, Kim Barrett wrote:
>>>
>>> On Oct 5, 2016, at 9:36 PM, David Holmes <david.holmes at oracle.com>
>>> wrote:
>>>
>>> On 5/10/2016 10:36 AM, Hiroshi H Horii wrote:
>>>>
>>>> Dear David,
>>>>
>>>> Thank you for your comments.
>>>>
>>>> I just used to think that it may be better that 
>>>> copy_to_survivor_space doesn't return forwardee if CAS was failed 
>>>> in order to prevent from reading fields in forwardee. But as you 
>>>> pointed, this extends fix for this topic.
>>>>
>>>> I removed two NULL assignments from the previous wevrev.
>>>> http://cr.openjdk.java.net/~horii/8154736/webrev.03/
>>> Which simply takes us back to where we were. It may not be safe for 
>>> the caller of those methods to access the fields of the returned 
>>> "forwardee".
>>>
>>> Sorry but I'm not seeing anything here that justifies removing the 
>>> barriers from the cas in this code. GC lurkers feel free to jump in 
>>> here - this is your code afterall! ;-)
>>>
>>> David
>>> -----
>> Using a CAS with memory_order_relaxed in copy_to_survivor_space seems 
>> to me to be extremely fragile and hard to reason about.  The places 
>> where that copied object might escape to and be examined seem to be 
>> myriad.  And not only do we need to worry about them today, but also 
>> for future maintenance.  Even if it can modified and shown to be 
>> correct today, it would be very easy to intoduce a bug later, as 
>> should be obvious from the various issues pointed out so far during 
>> this review.
>>
>> The key issue here is that we copy obj into new_obj, and then make 
>> new_obj accessible to other threads via the CAS.  Those other threads 
>> might attempt to access data in new_obj.  This suggests the CAS ought 
>> to have at least a release fence to ensure the copy is complete 
>> before the CAS is performed.  No amount of fencing on the read side 
>> (such a in the work stealing) can remove that need.
>
> Depending on what "other threads" means.
>
> The thread that pops the reference should be okay (as it does a 
> fence), because the thread pushing the entry on the mark stack also 
> releases all stores.
>
> Threads not participating in this protocol are problematic, and this 
> is indeed worrying me as well a bit.
> I have not seen any so far, but there is always a risk of overlooking 
> some place.
>
>> And that might be all that is needed.  On the post-CAS side, we load 
>> the forwardee and then load values from it.  I thik we can use 
>> implicit consume with dependent loads (except on Alpha) plus the 
>> suggested release fence to get the desired effect.  (If not, use an 
>> acquire form of forwardee()?)
>>
>> I'm not certain that just a release fence is sufficient (I'm less 
>> familiar with ParallelGC than I'd like for looking at something like 
>> this), but I'm pretty sure I wouldn't want to go any weaker than 
>> that.
>
> This change "only" impacts ppc64 at this time.
>
> Thanks,
>   Thomas
>

From HORII at jp.ibm.com  Mon Oct 10 14:30:47 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Mon, 10 Oct 2016 23:30:47 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
Message-ID: <OFC81622C2.ABFF22F0-ON49258048.00271D0E-49258048.004FB8B8@notes.na.collabserv.com>

Hi Thomas, David, and all,

> I think you intended to modify cmpxchg_pre_membar not 
> cmpxchg_post_membar!

The previous patch will change only behavior of cmpxchg_pre_membar. But 
the patch is not good to be reviewed (it was not obvious) and Martin 
suggested me to use lwsync rather than sync. 
I created a new webrev. This webrev includes all points that David and 
Thomas pointed also.

http://cr.openjdk.java.net/~horii/8154736/webrev.05/

With this change, callers of copy_to_survivor_space can safely touch 
fields of returned obj because OrderAccess::acquire() is called in 
copy_to_survivor_space when CAS fails.

> Changes in shared code must be algorithmically correct on all platforms. 

> Not just "it will work fine today".
> 
> Given all then work being done to add missing barriers, removing them 
> must come with a detailed analysis establishing the safety of doing so. 
> And I am not seeing that here.

The latest codes in the repository are missing some calls of 
OrderAccess::acquire() before touching fileds of new_obj or o->forwardee() 
in PSPromotionManager::copy_and_push_safe_barrier and 
copy_to_survivor_space respectivey. I believe, this webrev correct them, 
also.

Some methods call forwardee(). However, they don't toruch fields of 
forwardee while copying survived objects to a survivor space.
  PSMarkSweepDecorator::compact()
  PSPromotionManager::process_array_chunk()
  PSPromotionManager::claim_or_forward_internal_depth()

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


From coleen.phillimore at oracle.com  Tue Oct 11 00:03:20 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Mon, 10 Oct 2016 20:03:20 -0400
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
Message-ID: <fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>


Hi,

Was the linear allocation in mutex.cpp the cause of the false sharing 
that you observed?  I think I like this change better than the original, 
because I've wondered myself why the name string was so long.  So with 
this, we could make Monitor's smaller if they're embedded in metadata or 
other structures.

Thanks,
Coleen

On 10/10/16 2:00 PM, Doerr, Martin wrote:
> Hi Claes,
>
> thank you very much for your explanations.
>
> I agree with you that it would be better to pad where the Monitors are used. It would still fulfill the purpose of this RFE without disturbing other usages.
>
> So I could introduce:
> class PaddedMonitor : public Monitor {
>    enum {
>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - (int)sizeof(Monitor),
>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>    };
>    char _padding[PADDING_LEN];
> };
> and similarly PaddedMutex and replace all of the ones which get allocated in a linear fashion (mutexLocker.cpp mutex_init()).
>
> Would you agree with this change?
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Freitag, 7. Oktober 2016 12:35
> To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> I'm concerned that this might be an easy-but-wrong fix to a complex
> problem, and acknowledging that there are already use cases where the
> _name field is contra-productive. This change adds complexity that
> makes it even less likely such uses will be optimized for in the
> future.
>
> There are Padded* types put in place to deal with these concerns
> explicitly rather than implicitly *where it matters*, which allows us
> the choice of applying padding or not on a per use-case basis (which
> means we can also remove the _name field for those use cases that don't
> care about either, which might be most outside of the global lists).
>
> I am very concerned about false sharing, but I have no data to support
> that this change has any measurable benefit in practice: I even did an
> experiment years ago now where I turned _name into a pointer to not pad
> at all and saw nothing exceeding noise levels on any benchmark.
>
> Thanks!
>
> /Claes
>
> On 2016-10-07 12:18, Doerr, Martin wrote:
>> Hi Claes,
>>
>> what the change basically does is that the _name[] field gets enlarged by 8 bytes on platforms with 128 byte DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed by the C++ compiler.
>> What exactly is your concern about the footprint overhead?
>> Are you not concerned about the risk of false sharing?
>>
>> Best regards,
>> Martin
>>
>> -----Original Message-----
>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>> Sent: Freitag, 7. Oktober 2016 12:00
>> To: Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; Coleen Phillimore (coleen.phillimore at oracle.com) <coleen.phillimore at oracle.com>
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> Hi,
>>
>> after due consideration I strongly consider this change unacceptable
>> since it adds footprint overhead to performance critcial compiler and
>> GC code with little to no data to support this won't cause regressions.
>>
>> Changes to Monitor/Mutex needs to be done with more surgical precision
>> than this.
>>
>> If I do have a veto on the matter, here it is.
>>
>> Thanks!
>>
>> /Claes
>>
>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>> Hi Dan,
>>>
>>> thank you very much for reviewing and for investigating the history.
>>>
>>> It was not intended to make the functions you mentioned public. I've fixed that.
>>> I also updated the copyright information.
>>>
>>> New webrev is here:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>
>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>> To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>>
>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>> Hi,
>>>>
>>>> the current implementation of Monitor padding (mutex.cpp) assumes that cache lines are 64 Bytes. There's a platform dependent define "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of padding is to avoid false sharing.
>>>>
>>>> My proposed change is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>> src/share/vm/runtime/mutex.hpp
>>>         Please update the copyright year before pushing.
>>>
>>>         L172:   // The default length of monitor name is chosen to avoid
>>> false sharing.
>>>         L173:   enum {
>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>> sizeof(MonitorBase),
>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>> CACHE_LINE_PADDING : 64
>>>         L176:   };
>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>
>>>             I have to say that I'm not fond of the fact that MONITOR_NAME_LEN
>>>             can vary between platforms; I like that it is a minimum of 64 bytes
>>>             and is still a constant.
>>>
>>>             I'm also not happy that the resulting sizeof(Monitor) may not
>>> be a multiple
>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to mitigate
>>> that unhappiness
>>>             with the fact that sizeof(Monitor) hasn't been a multiple of
>>> the cache line
>>>             size since at least 2008 and no one complained (that I know of).
>>>
>>>             So if I was making this change, I would make MONITOR_NAME_LEN
>>> 64 bytes
>>>             (like it was) and add a pad field that would bring up
>>> sizeof(Monitor)
>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, Claes
>>> would be
>>>             unhappy with me and anyone embedding a Monitor into another data
>>>             structure would be unhappy with me, but I'm used to that :-)
>>>
>>>             So what you have is fine, especially for JDK9.
>>>
>>>         L180:  public:
>>>         L181: #ifndef PRODUCT
>>>         L182:   debug_only(static bool contains(Monitor * locks, Monitor *
>>> lock);)
>>>         L183:   debug_only(static Monitor * get_least_ranked_lock(Monitor *
>>> locks);)
>>>         L184:   debug_only(Monitor *
>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>         L185: #endif
>>>         L186:
>>>         L187:   void set_owner_implementation(Thread*
>>> owner)                        PRODUCT_RETURN;
>>>         L188:   void check_prelock_state     (Thread*
>>> thread)                       PRODUCT_RETURN;
>>>         L189:   void check_block_state       (Thread* thread)
>>>
>>>             These were all "protected" before. Now they are "public".
>>>             Any particular reason?
>>>
>>> Thumbs up on the mechanics of this change. I'm interested in the
>>> answer to the "protected" versus "public" question, but don't
>>> considered that query to be a blocker.
>>>
>>>
>>> The rest of this isn't code review, but some of this caught
>>> my attention.
>>>
>>> src/share/vm/runtime/mutex.hpp
>>>
>>>         old L84: // The default length of monitor name is chosen to be 64
>>> to avoid false sharing.
>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>
>>> I had to look up the history of this comment:
>>>
>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>> changeset:   55:2a8eb116ebbe
>>> user:        xlu
>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>> summary:     6610420: Debug VM crashes during monitor lock rank checking
>>>
>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>      // *in that order*.  If their implementations change such that these
>>>      // assumptions are violated, a whole lot of code will break.
>>>
>>> +// The default length of monitor name is choosen to be 64 to avoid
>>> false sharing.
>>> +static const int MONITOR_NAME_LEN = 64;
>>> +
>>>      class Monitor : public CHeapObj {
>>>
>>>       public:
>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>        volatile bool     _snuck;              // Used for sneaky locking
>>> (evil).
>>> -  const char * _name;                    // Name of mutex
>>>        int NotifyCount ;                      // diagnostic assist
>>> -  double pad [8] ;                       // avoid false sharing
>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>
>>>        // Debugging fields for naming, deadlock detection, etc. (some only
>>> used in debug mode)
>>>      #ifndef PRODUCT
>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>         int  ILocked () ;
>>>
>>>       protected:
>>> -   static void ClearMonitor (Monitor * m) ;
>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>         Monitor() ;
>>>
>>> So the original code had an 8-double pad for avoiding false sharing.
>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>> time that Dice determined that 8-double value, the result was to pad
>>> the size of Monitor to an even multiple of a particular cache line
>>> size.
>>>
>>> Xiobin changed the 'name' field to be an array so that the name
>>> chars could serve double duty as the cache line pad... pun intended.
>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>> size is a multiple of the cache line size.
>>>
>>> Dan
>>>
>>>
>>>> Please review. If will also need a sponsor.
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>


From david.holmes at oracle.com  Tue Oct 11 00:35:05 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 11 Oct 2016 10:35:05 +1000
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
Message-ID: <3cec9797-64dc-d767-255c-8ce3fb66b7bb@oracle.com>

Hi Shafi,

On 10/10/2016 11:59 PM, Mikael Gerdin wrote:
> Hi,
>
> On 2016-10-10 09:24, Shafi Ahmad wrote:
>> Hi All,
>>
>> Please review the simple change for the fix of bug '' JDK-8155004:
>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
>> create threads'.
>>
>> Summary:
>> In the current implementation there are few scenarios where we are not
>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
>> While I was analysis this issue I found there are two jvm state where
>> OOM can happen:
>>  1.  OOM during VM initialization - as per our internal discussion for
>> this case it is not worth for dumping core file, so this is left as it
>> is.
>>  2.  OOM once VM is initialized - For this scenario most of the place
>> code is already added but few place corresponding code changes are
>> missing so this change covers it.
>>
>> Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
>
>
> There is a lot of confusion in the VM code with the term "out of memory
> error".
> In some places it refers to code throwing a java.lang.OutOfMemoryError
> and expecting running java code to be able to potentially catch that
> Error and continue running.
>
> In other places, such as callers of report_vm_out_of_memory, the
> situation is much more dire and the calling thread may not even be a
> JavaThread and as such cannot "throw" an exception.
> report_vm_out_of_memory is only invoked through the macro
> vm_exit_out_of_memory, which of course implies that the condition is
> fatal and we are about to terminate the JVM process altogether.
>
> I think that it's incorrect to call code related to
> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
> condition may not even be correlated with Java level application behavior.

I totally agree with Mikael. A call to report_java_out_of_memory should 
only be made on a code path that will throw an OOME.

There is a lot of contention over how things like HeapDumpOnOutOfMemory 
and CrashOnOutOfMemory should behave given the various reasons why we 
can run out of memory. I see little point in doing a heap dump, for 
example, if we did not exhaust the heap. I think there are a lot of 
issues with this mechanism and the placement of some of the calls to 
report_java_out_of_memory are questionable (eg should it come before or 
after posting JVMTI resource exhaustion events? should it come before or 
after vm initialization checks? [I think after, but that isn't always so!]).

In the context of this fix the change to jvm.cpp, in JVM_StartThread, is 
acceptable. And it addresses the request made by the bug submitter.

Thanks,
David

> /Mikael
>
>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>>
>> Testing: jprt and jtreg (on Linux x86_64)
>>
>> Regards,
>> Shafi
>>

From david.holmes at oracle.com  Tue Oct 11 01:12:16 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 11 Oct 2016 11:12:16 +1000
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
Message-ID: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>

Ok. I will sponsor this once hs is open again.

Thanks,
David

On 6/10/2016 10:10 PM, Alan Burlison wrote:
> On 04/10/2016 19:37, Alan Burlison wrote:
>
>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>>> already being included.
>>
>> Cool, I'll pop that in instead - thanks!
>
> Done, webrev updated, jprt hotspot testset is clean.
>

From david.holmes at oracle.com  Tue Oct 11 01:55:12 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 11 Oct 2016 11:55:12 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI and
	JDB
Message-ID: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>

Turns out the only place changes were needed were in JDI.

Bug: https://bugs.openjdk.java.net/browse/JDK-8165827

webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/

The spec change in ObjectReference is very simple and there is a CCC 
request in progress to ratify that change.

The implementation change in ObjectReferenceImpl mirrors the updated 
spec and use the same format as already present in the class version of 
the check method.

The test is a little more complex. This is obviously an extension to 
what is already tested in InterfaceMethodsTest. However IMT has a number 
of problem with the way it is currently written [1] - specifically it 
doesn't properly separate method lookup from method invocation. So I've 
added the capability to separate lookup and invocation for use with the 
private interface methods - I have not tried to address shortcomings of 
the existing tests. Though I did fix the return value checking logic! 
And did some clarifying comments and renaming in a couple of place.

Still on the test I can't add the negative tests I would like to add 
because they actually pass due to a different long standing bug in JDI - 
[2]. So the actual private interface method testing is very simple: can 
I get the Method from the InterfaceType for the interface declaring the 
method? Can I then invoke that method on an instance of a class that 
implements the interface.

Thanks,
David

[1] https://bugs.openjdk.java.net/browse/JDK-8166453
[2] https://bugs.openjdk.java.net/browse/JDK-8167416

From calvin.cheung at oracle.com  Tue Oct 11 03:59:40 2016
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Mon, 10 Oct 2016 20:59:40 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
Message-ID: <57FC63AC.3020809@oracle.com>


Please review this small fix for not including classes in the classlist 
file which are unusable during run time.

bug: https://bugs.openjdk.java.net/browse/JDK-8166931

webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/

Testing:
     JPRT with -testset hotspot
     jtreg tests under hotspot/runtime on all supported platforms (in 
progress)

thanks,
Calvin

From aph at redhat.com  Tue Oct 11 09:25:52 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 11 Oct 2016 10:25:52 +0100
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
Message-ID: <a76e06b0-4004-cd32-73e8-cdb5850a96b9@redhat.com>

On 06/10/16 23:16, Kim Barrett wrote:

> The key issue here is that we copy obj into new_obj, and then make
> new_obj accessible to other threads via the CAS.  Those other
> threads might attempt to access data in new_obj.  This suggests the
> CAS ought to have at least a release fence to ensure the copy is
> complete before the CAS is performed.  No amount of fencing on the
> read side (such as in the work stealing) can remove that need.

I agree.

> And that might be all that is needed.  On the post-CAS side, we load
> the forwardee and then load values from it.  I thik we can use
> implicit consume with dependent loads (except on Alpha) plus the
> suggested release fence to get the desired effect.

That's probably true, except that there's not really any such thing as
"implicit consume" in C++.  While all of the hardware we use respects
address dependencies, it's not something that the compiler knows
about, and it's explicitly undefined behaviour in the C++ memory
model.  If we're depending on memory_order_consume, perhaps we ought
to think about adding it to Atomic, even though it's just a volatile
load in older compilers.

Andrew.

From Alan.Burlison at oracle.com  Tue Oct 11 09:31:54 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Tue, 11 Oct 2016 10:31:54 +0100
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
Message-ID: <870f60a7-b0c0-eaae-7a59-7bea05c323af@oracle.com>

On 11/10/2016 02:12, David Holmes wrote:

> Ok. I will sponsor this once hs is open again.

Thanks, is there a schedule somewhere for that which I can go look at?

-- 
Alan Burlison
--

From claes.redestad at oracle.com  Tue Oct 11 10:05:15 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 11 Oct 2016 12:05:15 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
Message-ID: <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>

Hi,

On 2016-10-11 02:03, Coleen Phillimore wrote:
>
> Hi,
>
> Was the linear allocation in mutex.cpp the cause of the false sharing 
> that you observed?  I think I like this change better than the 
> original, because I've wondered myself why the name string was so 
> long.  So with this, we could make Monitor's smaller if they're 
> embedded in metadata or other structures.

Music to my ears!

I even think most embedded uses would see improvements if _name was 
removed entirely (or "simply" turned into a const char * so that it's 
not copied and embedded into the Monitor/Mutex)

>
> Thanks,
> Coleen
>
> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>> Hi Claes,
>>
>> thank you very much for your explanations.
>>
>> I agree with you that it would be better to pad where the Monitors 
>> are used. It would still fulfill the purpose of this RFE without 
>> disturbing other usages.
>>
>> So I could introduce:
>> class PaddedMonitor : public Monitor {
>>    enum {
>>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - 
>> (int)sizeof(Monitor),
>>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>    };
>>    char _padding[PADDING_LEN];
>> };
>> and similarly PaddedMutex and replace all of the ones which get 
>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).

Sure!

Some compilers may take issue with cases where PADDING_LEN == 0 (since 
char _padding[0] is technically illegal C++, but works on gcc etc) so 
maybe that special case will have to be (somewhat excessively):

PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 
DEFAULT_CACHE_LINE_SIZE

We took a look at if it'd be feasible to express class PaddedMonitor : 
public PaddedEnd<Monitor>, but it appears that'd require variadic 
template arguments (C++11) to get right (since we'd need PaddedEnd to
transitively publish constructors of Monitor).

Thanks!

/Claes

>>
>> Would you agree with this change?
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>> Sent: Freitag, 7. Oktober 2016 12:35
>> To: Doerr, Martin <martin.doerr at sap.com>; 
>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>> <coleen.phillimore at oracle.com>
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>> DEFAULT_CACHE_LINE_SIZE
>>
>> Hi,
>>
>> I'm concerned that this might be an easy-but-wrong fix to a complex
>> problem, and acknowledging that there are already use cases where the
>> _name field is contra-productive. This change adds complexity that
>> makes it even less likely such uses will be optimized for in the
>> future.
>>
>> There are Padded* types put in place to deal with these concerns
>> explicitly rather than implicitly *where it matters*, which allows us
>> the choice of applying padding or not on a per use-case basis (which
>> means we can also remove the _name field for those use cases that don't
>> care about either, which might be most outside of the global lists).
>>
>> I am very concerned about false sharing, but I have no data to support
>> that this change has any measurable benefit in practice: I even did an
>> experiment years ago now where I turned _name into a pointer to not pad
>> at all and saw nothing exceeding noise levels on any benchmark.
>>
>> Thanks!
>>
>> /Claes
>>
>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> what the change basically does is that the _name[] field gets 
>>> enlarged by 8 bytes on platforms with 128 byte 
>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed 
>>> by the C++ compiler.
>>> What exactly is your concern about the footprint overhead?
>>> Are you not concerned about the risk of false sharing?
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:00
>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> after due consideration I strongly consider this change unacceptable
>>> since it adds footprint overhead to performance critcial compiler and
>>> GC code with little to no data to support this won't cause regressions.
>>>
>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>> than this.
>>>
>>> If I do have a veto on the matter, here it is.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>> Hi Dan,
>>>>
>>>> thank you very much for reviewing and for investigating the history.
>>>>
>>>> It was not intended to make the functions you mentioned public. 
>>>> I've fixed that.
>>>> I also updated the copyright information.
>>>>
>>>> New webrev is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>
>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>> Hi,
>>>>>
>>>>> the current implementation of Monitor padding (mutex.cpp) assumes 
>>>>> that cache lines are 64 Bytes. There's a platform dependent define 
>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of 
>>>>> padding is to avoid false sharing.
>>>>>
>>>>> My proposed change is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>> src/share/vm/runtime/mutex.hpp
>>>>         Please update the copyright year before pushing.
>>>>
>>>>         L172:   // The default length of monitor name is chosen to 
>>>> avoid
>>>> false sharing.
>>>>         L173:   enum {
>>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>> sizeof(MonitorBase),
>>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>> CACHE_LINE_PADDING : 64
>>>>         L176:   };
>>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of 
>>>> mutex
>>>>
>>>>             I have to say that I'm not fond of the fact that 
>>>> MONITOR_NAME_LEN
>>>>             can vary between platforms; I like that it is a minimum 
>>>> of 64 bytes
>>>>             and is still a constant.
>>>>
>>>>             I'm also not happy that the resulting sizeof(Monitor) 
>>>> may not
>>>> be a multiple
>>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to 
>>>> mitigate
>>>> that unhappiness
>>>>             with the fact that sizeof(Monitor) hasn't been a 
>>>> multiple of
>>>> the cache line
>>>>             size since at least 2008 and no one complained (that I 
>>>> know of).
>>>>
>>>>             So if I was making this change, I would make 
>>>> MONITOR_NAME_LEN
>>>> 64 bytes
>>>>             (like it was) and add a pad field that would bring up
>>>> sizeof(Monitor)
>>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, 
>>>> Claes
>>>> would be
>>>>             unhappy with me and anyone embedding a Monitor into 
>>>> another data
>>>>             structure would be unhappy with me, but I'm used to 
>>>> that :-)
>>>>
>>>>             So what you have is fine, especially for JDK9.
>>>>
>>>>         L180:  public:
>>>>         L181: #ifndef PRODUCT
>>>>         L182:   debug_only(static bool contains(Monitor * locks, 
>>>> Monitor *
>>>> lock);)
>>>>         L183:   debug_only(static Monitor * 
>>>> get_least_ranked_lock(Monitor *
>>>> locks);)
>>>>         L184:   debug_only(Monitor *
>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>         L185: #endif
>>>>         L186:
>>>>         L187:   void set_owner_implementation(Thread*
>>>> owner)                        PRODUCT_RETURN;
>>>>         L188:   void check_prelock_state     (Thread*
>>>> thread)                       PRODUCT_RETURN;
>>>>         L189:   void check_block_state       (Thread* thread)
>>>>
>>>>             These were all "protected" before. Now they are "public".
>>>>             Any particular reason?
>>>>
>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>> answer to the "protected" versus "public" question, but don't
>>>> considered that query to be a blocker.
>>>>
>>>>
>>>> The rest of this isn't code review, but some of this caught
>>>> my attention.
>>>>
>>>> src/share/vm/runtime/mutex.hpp
>>>>
>>>>         old L84: // The default length of monitor name is chosen to 
>>>> be 64
>>>> to avoid false sharing.
>>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>>
>>>> I had to look up the history of this comment:
>>>>
>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>> changeset:   55:2a8eb116ebbe
>>>> user:        xlu
>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>> summary:     6610420: Debug VM crashes during monitor lock rank 
>>>> checking
>>>>
>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>      // *in that order*.  If their implementations change such that 
>>>> these
>>>>      // assumptions are violated, a whole lot of code will break.
>>>>
>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>> false sharing.
>>>> +static const int MONITOR_NAME_LEN = 64;
>>>> +
>>>>      class Monitor : public CHeapObj {
>>>>
>>>>       public:
>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>        volatile bool     _snuck;              // Used for sneaky 
>>>> locking
>>>> (evil).
>>>> -  const char * _name;                    // Name of mutex
>>>>        int NotifyCount ;                      // diagnostic assist
>>>> -  double pad [8] ;                       // avoid false sharing
>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>
>>>>        // Debugging fields for naming, deadlock detection, etc. 
>>>> (some only
>>>> used in debug mode)
>>>>      #ifndef PRODUCT
>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>         int  ILocked () ;
>>>>
>>>>       protected:
>>>> -   static void ClearMonitor (Monitor * m) ;
>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>         Monitor() ;
>>>>
>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>> time that Dice determined that 8-double value, the result was to pad
>>>> the size of Monitor to an even multiple of a particular cache line
>>>> size.
>>>>
>>>> Xiobin changed the 'name' field to be an array so that the name
>>>> chars could serve double duty as the cache line pad... pun intended.
>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>> size is a multiple of the cache line size.
>>>>
>>>> Dan
>>>>
>>>>
>>>>> Please review. If will also need a sponsor.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>


From lois.foltan at oracle.com  Tue Oct 11 11:38:06 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 11 Oct 2016 07:38:06 -0400
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FC63AC.3020809@oracle.com>
References: <57FC63AC.3020809@oracle.com>
Message-ID: <57FCCF1E.1080703@oracle.com>


On 10/10/2016 11:59 PM, Calvin Cheung wrote:
>
> Please review this small fix for not including classes in the 
> classlist file which are unusable during run time.
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>
> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/

Hi Calvin,

src/share/vm/classfile/classFileParser.cpp
- line #5781, I find the if statement logic to be somewhat confusing.  
This check seems to be only for classes defined to the boot and platform 
class loader.  I am assuming it does not apply to the application class 
loader because there is no way to differentiate a class defined to the 
application class loader from being on the --patch-module list and the 
-classpath?  Is that why the if statement logic does not include the 
application class loader?  Maybe it is enough to improve the comment to 
something like:

   // For the boot and platform class loaders, check if the class is not 
found in the java runtime image
   // or the boot loader's appended entries.  This indicates that the 
class must be located on the --patch-module list and
   // is not useable during run time, so should be skipped.

Then please indent the start of line #5782 by one space to show that the 
check for the platform class loader is part of that first || expression.

test/runtime/modules/PatchModule/PatchModuleClassList.java
- good test!

Thanks,
Lois

>
> Testing:
>     JPRT with -testset hotspot
>     jtreg tests under hotspot/runtime on all supported platforms (in 
> progress)
>
> thanks,
> Calvin


From lois.foltan at oracle.com  Tue Oct 11 11:48:10 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 11 Oct 2016 07:48:10 -0400
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP,
	JDI and JDB
In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
Message-ID: <57FCD17A.6000501@oracle.com>

Hi David,
This looks good and I like the improvements you made to the test.
Lois

On 10/10/2016 9:55 PM, David Holmes wrote:
> Turns out the only place changes were needed were in JDI.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>
> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>
> The spec change in ObjectReference is very simple and there is a CCC 
> request in progress to ratify that change.
>
> The implementation change in ObjectReferenceImpl mirrors the updated 
> spec and use the same format as already present in the class version 
> of the check method.
>
> The test is a little more complex. This is obviously an extension to 
> what is already tested in InterfaceMethodsTest. However IMT has a 
> number of problem with the way it is currently written [1] - 
> specifically it doesn't properly separate method lookup from method 
> invocation. So I've added the capability to separate lookup and 
> invocation for use with the private interface methods - I have not 
> tried to address shortcomings of the existing tests. Though I did fix 
> the return value checking logic! And did some clarifying comments and 
> renaming in a couple of place.
>
> Still on the test I can't add the negative tests I would like to add 
> because they actually pass due to a different long standing bug in JDI 
> - [2]. So the actual private interface method testing is very simple: 
> can I get the Method from the InterfaceType for the interface 
> declaring the method? Can I then invoke that method on an instance of 
> a class that implements the interface.
>
> Thanks,
> David
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From david.holmes at oracle.com  Tue Oct 11 13:33:35 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 11 Oct 2016 23:33:35 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <57FCD17A.6000501@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<57FCD17A.6000501@oracle.com>
Message-ID: <d197f2e3-72bb-a787-e789-b96d176421e5@oracle.com>

Thanks for looking at this Lois!

David

On 11/10/2016 9:48 PM, Lois Foltan wrote:
> Hi David,
> This looks good and I like the improvements you made to the test.
> Lois
>
> On 10/10/2016 9:55 PM, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version
>> of the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a
>> number of problem with the way it is currently written [1] -
>> specifically it doesn't properly separate method lookup from method
>> invocation. So I've added the capability to separate lookup and
>> invocation for use with the private interface methods - I have not
>> tried to address shortcomings of the existing tests. Though I did fix
>> the return value checking logic! And did some clarifying comments and
>> renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI
>> - [2]. So the actual private interface method testing is very simple:
>> can I get the Method from the InterfaceType for the interface
>> declaring the method? Can I then invoke that method on an instance of
>> a class that implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>

From jiangli.zhou at Oracle.COM  Tue Oct 11 15:53:18 2016
From: jiangli.zhou at Oracle.COM (Jiangli Zhou)
Date: Tue, 11 Oct 2016 08:53:18 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FC63AC.3020809@oracle.com>
References: <57FC63AC.3020809@oracle.com>
Message-ID: <D321CC36-D26D-4E36-BB86-A6FF216869DB@oracle.com>

Looks good.

Thanks,
Jiangli

> On Oct 10, 2016, at 8:59 PM, Calvin Cheung <calvin.cheung at oracle.com> wrote:
> 
> 
> Please review this small fix for not including classes in the classlist file which are unusable during run time.
> 
> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
> 
> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
> 
> Testing:
>    JPRT with -testset hotspot
>    jtreg tests under hotspot/runtime on all supported platforms (in progress)
> 
> thanks,
> Calvin


From martin.doerr at sap.com  Tue Oct 11 16:26:29 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Tue, 11 Oct 2016 16:26:29 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
Message-ID: <a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>

Hi all,

I came to the same conclusion regarding inheritance from PaddingEnd.
Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.)

I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid.

Here's a new webrev:
http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/

It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?).

Please take a look.

Thanks and best regards,
Martin


-----Original Message-----
From: Claes Redestad [mailto:claes.redestad at oracle.com] 
Sent: Dienstag, 11. Oktober 2016 12:05
To: Coleen Phillimore <coleen.phillimore at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Hi,

On 2016-10-11 02:03, Coleen Phillimore wrote:
>
> Hi,
>
> Was the linear allocation in mutex.cpp the cause of the false sharing 
> that you observed?  I think I like this change better than the 
> original, because I've wondered myself why the name string was so 
> long.  So with this, we could make Monitor's smaller if they're 
> embedded in metadata or other structures.

Music to my ears!

I even think most embedded uses would see improvements if _name was 
removed entirely (or "simply" turned into a const char * so that it's 
not copied and embedded into the Monitor/Mutex)

>
> Thanks,
> Coleen
>
> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>> Hi Claes,
>>
>> thank you very much for your explanations.
>>
>> I agree with you that it would be better to pad where the Monitors 
>> are used. It would still fulfill the purpose of this RFE without 
>> disturbing other usages.
>>
>> So I could introduce:
>> class PaddedMonitor : public Monitor {
>>    enum {
>>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE - 
>> (int)sizeof(Monitor),
>>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>    };
>>    char _padding[PADDING_LEN];
>> };
>> and similarly PaddedMutex and replace all of the ones which get 
>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).

Sure!

Some compilers may take issue with cases where PADDING_LEN == 0 (since 
char _padding[0] is technically illegal C++, but works on gcc etc) so 
maybe that special case will have to be (somewhat excessively):

PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 
DEFAULT_CACHE_LINE_SIZE

We took a look at if it'd be feasible to express class PaddedMonitor : 
public PaddedEnd<Monitor>, but it appears that'd require variadic 
template arguments (C++11) to get right (since we'd need PaddedEnd to
transitively publish constructors of Monitor).

Thanks!

/Claes

>>
>> Would you agree with this change?
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>> Sent: Freitag, 7. Oktober 2016 12:35
>> To: Doerr, Martin <martin.doerr at sap.com>; 
>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>> <coleen.phillimore at oracle.com>
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>> DEFAULT_CACHE_LINE_SIZE
>>
>> Hi,
>>
>> I'm concerned that this might be an easy-but-wrong fix to a complex
>> problem, and acknowledging that there are already use cases where the
>> _name field is contra-productive. This change adds complexity that
>> makes it even less likely such uses will be optimized for in the
>> future.
>>
>> There are Padded* types put in place to deal with these concerns
>> explicitly rather than implicitly *where it matters*, which allows us
>> the choice of applying padding or not on a per use-case basis (which
>> means we can also remove the _name field for those use cases that don't
>> care about either, which might be most outside of the global lists).
>>
>> I am very concerned about false sharing, but I have no data to support
>> that this change has any measurable benefit in practice: I even did an
>> experiment years ago now where I turned _name into a pointer to not pad
>> at all and saw nothing exceeding noise levels on any benchmark.
>>
>> Thanks!
>>
>> /Claes
>>
>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> what the change basically does is that the _name[] field gets 
>>> enlarged by 8 bytes on platforms with 128 byte 
>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed 
>>> by the C++ compiler.
>>> What exactly is your concern about the footprint overhead?
>>> Are you not concerned about the risk of false sharing?
>>>
>>> Best regards,
>>> Martin
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:00
>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; 
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>; 
>>> Coleen Phillimore (coleen.phillimore at oracle.com) 
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> after due consideration I strongly consider this change unacceptable
>>> since it adds footprint overhead to performance critcial compiler and
>>> GC code with little to no data to support this won't cause regressions.
>>>
>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>> than this.
>>>
>>> If I do have a veto on the matter, here it is.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>> Hi Dan,
>>>>
>>>> thank you very much for reviewing and for investigating the history.
>>>>
>>>> It was not intended to make the functions you mentioned public. 
>>>> I've fixed that.
>>>> I also updated the copyright information.
>>>>
>>>> New webrev is here:
>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>
>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>> To: Doerr, Martin <martin.doerr at sap.com>; 
>>>> hotspot-runtime-dev at openjdk.java.net
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>> Hi,
>>>>>
>>>>> the current implementation of Monitor padding (mutex.cpp) assumes 
>>>>> that cache lines are 64 Bytes. There's a platform dependent define 
>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of 
>>>>> padding is to avoid false sharing.
>>>>>
>>>>> My proposed change is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>> src/share/vm/runtime/mutex.hpp
>>>>         Please update the copyright year before pushing.
>>>>
>>>>         L172:   // The default length of monitor name is chosen to 
>>>> avoid
>>>> false sharing.
>>>>         L173:   enum {
>>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>> sizeof(MonitorBase),
>>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>> CACHE_LINE_PADDING : 64
>>>>         L176:   };
>>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of 
>>>> mutex
>>>>
>>>>             I have to say that I'm not fond of the fact that 
>>>> MONITOR_NAME_LEN
>>>>             can vary between platforms; I like that it is a minimum 
>>>> of 64 bytes
>>>>             and is still a constant.
>>>>
>>>>             I'm also not happy that the resulting sizeof(Monitor) 
>>>> may not
>>>> be a multiple
>>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to 
>>>> mitigate
>>>> that unhappiness
>>>>             with the fact that sizeof(Monitor) hasn't been a 
>>>> multiple of
>>>> the cache line
>>>>             size since at least 2008 and no one complained (that I 
>>>> know of).
>>>>
>>>>             So if I was making this change, I would make 
>>>> MONITOR_NAME_LEN
>>>> 64 bytes
>>>>             (like it was) and add a pad field that would bring up
>>>> sizeof(Monitor)
>>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course, 
>>>> Claes
>>>> would be
>>>>             unhappy with me and anyone embedding a Monitor into 
>>>> another data
>>>>             structure would be unhappy with me, but I'm used to 
>>>> that :-)
>>>>
>>>>             So what you have is fine, especially for JDK9.
>>>>
>>>>         L180:  public:
>>>>         L181: #ifndef PRODUCT
>>>>         L182:   debug_only(static bool contains(Monitor * locks, 
>>>> Monitor *
>>>> lock);)
>>>>         L183:   debug_only(static Monitor * 
>>>> get_least_ranked_lock(Monitor *
>>>> locks);)
>>>>         L184:   debug_only(Monitor *
>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>         L185: #endif
>>>>         L186:
>>>>         L187:   void set_owner_implementation(Thread*
>>>> owner)                        PRODUCT_RETURN;
>>>>         L188:   void check_prelock_state     (Thread*
>>>> thread)                       PRODUCT_RETURN;
>>>>         L189:   void check_block_state       (Thread* thread)
>>>>
>>>>             These were all "protected" before. Now they are "public".
>>>>             Any particular reason?
>>>>
>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>> answer to the "protected" versus "public" question, but don't
>>>> considered that query to be a blocker.
>>>>
>>>>
>>>> The rest of this isn't code review, but some of this caught
>>>> my attention.
>>>>
>>>> src/share/vm/runtime/mutex.hpp
>>>>
>>>>         old L84: // The default length of monitor name is chosen to 
>>>> be 64
>>>> to avoid false sharing.
>>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>>
>>>> I had to look up the history of this comment:
>>>>
>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>> changeset:   55:2a8eb116ebbe
>>>> user:        xlu
>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>> summary:     6610420: Debug VM crashes during monitor lock rank 
>>>> checking
>>>>
>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>      // *in that order*.  If their implementations change such that 
>>>> these
>>>>      // assumptions are violated, a whole lot of code will break.
>>>>
>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>> false sharing.
>>>> +static const int MONITOR_NAME_LEN = 64;
>>>> +
>>>>      class Monitor : public CHeapObj {
>>>>
>>>>       public:
>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>        volatile bool     _snuck;              // Used for sneaky 
>>>> locking
>>>> (evil).
>>>> -  const char * _name;                    // Name of mutex
>>>>        int NotifyCount ;                      // diagnostic assist
>>>> -  double pad [8] ;                       // avoid false sharing
>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>
>>>>        // Debugging fields for naming, deadlock detection, etc. 
>>>> (some only
>>>> used in debug mode)
>>>>      #ifndef PRODUCT
>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>         int  ILocked () ;
>>>>
>>>>       protected:
>>>> -   static void ClearMonitor (Monitor * m) ;
>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>         Monitor() ;
>>>>
>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>> time that Dice determined that 8-double value, the result was to pad
>>>> the size of Monitor to an even multiple of a particular cache line
>>>> size.
>>>>
>>>> Xiobin changed the 'name' field to be an array so that the name
>>>> chars could serve double duty as the cache line pad... pun intended.
>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>> size is a multiple of the cache line size.
>>>>
>>>> Dan
>>>>
>>>>
>>>>> Please review. If will also need a sponsor.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>


From calvin.cheung at oracle.com  Tue Oct 11 17:19:35 2016
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Tue, 11 Oct 2016 10:19:35 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FCCF1E.1080703@oracle.com>
References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com>
Message-ID: <57FD1F27.5090404@oracle.com>

Hi Lois,

Thanks for your review.

On 10/11/16, 4:38 AM, Lois Foltan wrote:
>
> On 10/10/2016 11:59 PM, Calvin Cheung wrote:
>>
>> Please review this small fix for not including classes in the 
>> classlist file which are unusable during run time.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>>
>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>
> Hi Calvin,
>
> src/share/vm/classfile/classFileParser.cpp
> - line #5781, I find the if statement logic to be somewhat confusing.  
> This check seems to be only for classes defined to the boot and 
> platform class loader.  I am assuming it does not apply to the 
> application class loader because there is no way to differentiate a 
> class defined to the application class loader from being on the 
> --patch-module list and the -classpath?  Is that why the if statement 
> logic does not include the application class loader?
Yes. We do want to include the classes defined to the app class loader.
>   Maybe it is enough to improve the comment to something like:
>
>   // For the boot and platform class loaders, check if the class is 
> not found in the java runtime image
>   // or the boot loader's appended entries.  This indicates that the 
> class must be located on the --patch-module list and
>   // is not useable during run time, so should be skipped.
I've modified it a little. How about the following?
// For the boot and platform class loaders, check if the class is not 
found in the java runtime image.
// Additional check for the boot class loader is if the class is not 
found in the boot loader?s appended
// entries. This indicates that the class is not useable during run 
time, such as the ones found in the
// ?patch-module entries, so it should not be included in the classlist 
file.

>
> Then please indent the start of line #5782 by one space to show that 
> the check for the platform class loader is part of that first || 
> expression.
I'll fix it.
>
> test/runtime/modules/PatchModule/PatchModuleClassList.java
> - good test!
Let me know if you want to see another webrev.

thanks,
Calvin
>
> Thanks,
> Lois
>
>>
>> Testing:
>>     JPRT with -testset hotspot
>>     jtreg tests under hotspot/runtime on all supported platforms (in 
>> progress)
>>
>> thanks,
>> Calvin
>

From calvin.cheung at oracle.com  Tue Oct 11 17:24:14 2016
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Tue, 11 Oct 2016 10:24:14 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <D321CC36-D26D-4E36-BB86-A6FF216869DB@oracle.com>
References: <57FC63AC.3020809@oracle.com>
	<D321CC36-D26D-4E36-BB86-A6FF216869DB@oracle.com>
Message-ID: <57FD203E.1090101@oracle.com>

Thanks, Jiangli.

Calvin

On 10/11/16, 8:53 AM, Jiangli Zhou wrote:
> Looks good.
>
> Thanks,
> Jiangli
>
>> On Oct 10, 2016, at 8:59 PM, Calvin Cheung<calvin.cheung at oracle.com>  wrote:
>>
>>
>> Please review this small fix for not including classes in the classlist file which are unusable during run time.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>>
>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>>
>> Testing:
>>     JPRT with -testset hotspot
>>     jtreg tests under hotspot/runtime on all supported platforms (in progress)
>>
>> thanks,
>> Calvin

From daniel.daugherty at oracle.com  Tue Oct 11 17:30:05 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 11 Oct 2016 11:30:05 -0600
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
Message-ID: <cc1202f6-24d6-b996-f5ef-5ef35cef3aaf@oracle.com>

On 10/10/16 7:55 PM, David Holmes wrote:
> Turns out the only place changes were needed were in JDI.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>
> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/

src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java
     No comments. (Thanks for also fixing the typo.)

src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java
     L352:         if (isNonVirtual(options)) {
     L353:             if (method.isAbstract()) {
     L354:                 throw new IllegalArgumentException("Abstract 
method");
     L355:             }
         Any particular reason for breaking the logic into two
         distinct if-statements?

         Perhaps:

                   if (isNonVirtual(options) && method.isAbstract()) {
                       throw new IllegalArgumentException("Abstract 
method");
                   }

         Also, perhaps "unexpected Abstract method" is more clear?

test/com/sun/jdi/InterfaceMethodsTest.java
     L526:             if (t.getClass() != expectedException) {
     L527:                 System.err.println("--- FAILED");
     L528:                 failure("FAILED: " + t);
     L529:                 return null;
     L530:             }
         You should also report the expectedException value here to
         aid in failure analysis.

Thumbs up! I don't need to see another webrev if you decide to
make the above small tweaks.

Dan


>
> The spec change in ObjectReference is very simple and there is a CCC 
> request in progress to ratify that change.
>
> The implementation change in ObjectReferenceImpl mirrors the updated 
> spec and use the same format as already present in the class version 
> of the check method.
>
> The test is a little more complex. This is obviously an extension to 
> what is already tested in InterfaceMethodsTest. However IMT has a 
> number of problem with the way it is currently written [1] - 
> specifically it doesn't properly separate method lookup from method 
> invocation. So I've added the capability to separate lookup and 
> invocation for use with the private interface methods - I have not 
> tried to address shortcomings of the existing tests. Though I did fix 
> the return value checking logic! And did some clarifying comments and 
> renaming in a couple of place.
>
> Still on the test I can't add the negative tests I would like to add 
> because they actually pass due to a different long standing bug in JDI 
> - [2]. So the actual private interface method testing is very simple: 
> can I get the Method from the InterfaceType for the interface 
> declaring the method? Can I then invoke that method on an instance of 
> a class that implements the interface.
>
> Thanks,
> David
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From lois.foltan at oracle.com  Tue Oct 11 17:33:49 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 11 Oct 2016 13:33:49 -0400
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FD1F27.5090404@oracle.com>
References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com>
	<57FD1F27.5090404@oracle.com>
Message-ID: <57FD227D.4070307@oracle.com>


On 10/11/2016 1:19 PM, Calvin Cheung wrote:
> Hi Lois,
>
> Thanks for your review.
>
> On 10/11/16, 4:38 AM, Lois Foltan wrote:
>>
>> On 10/10/2016 11:59 PM, Calvin Cheung wrote:
>>>
>>> Please review this small fix for not including classes in the 
>>> classlist file which are unusable during run time.
>>>
>>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>>>
>>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>>
>> Hi Calvin,
>>
>> src/share/vm/classfile/classFileParser.cpp
>> - line #5781, I find the if statement logic to be somewhat 
>> confusing.  This check seems to be only for classes defined to the 
>> boot and platform class loader.  I am assuming it does not apply to 
>> the application class loader because there is no way to differentiate 
>> a class defined to the application class loader from being on the 
>> --patch-module list and the -classpath?  Is that why the if statement 
>> logic does not include the application class loader?
> Yes. We do want to include the classes defined to the app class loader.
Even if those classes defined to the app class loader are located in 
--patch-module entries?

>>   Maybe it is enough to improve the comment to something like:
>>
>>   // For the boot and platform class loaders, check if the class is 
>> not found in the java runtime image
>>   // or the boot loader's appended entries.  This indicates that the 
>> class must be located on the --patch-module list and
>>   // is not useable during run time, so should be skipped.
> I've modified it a little. How about the following?
> // For the boot and platform class loaders, check if the class is not 
> found in the java runtime image.
> // Additional check for the boot class loader is if the class is not 
> found in the boot loader?s appended
> // entries. This indicates that the class is not useable during run 
> time, such as the ones found in the
> // ?patch-module entries, so it should not be included in the 
> classlist file.
Looks good, thanks for rewording!

>
>>
>> Then please indent the start of line #5782 by one space to show that 
>> the check for the platform class loader is part of that first || 
>> expression.
> I'll fix it.
>>
>> test/runtime/modules/PatchModule/PatchModuleClassList.java
>> - good test!
> Let me know if you want to see another webrev.

No, I'm all set.
Thanks,
Lois

>
> thanks,
> Calvin
>>
>> Thanks,
>> Lois
>>
>>>
>>> Testing:
>>>     JPRT with -testset hotspot
>>>     jtreg tests under hotspot/runtime on all supported platforms (in 
>>> progress)
>>>
>>> thanks,
>>> Calvin
>>


From coleen.phillimore at oracle.com  Tue Oct 11 17:35:11 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Tue, 11 Oct 2016 13:35:11 -0400
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
	<a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
Message-ID: <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com>


I am fine with this change.  Maybe a one line comment here, something like:

// Using Padded subclasses to prevent false sharing of these global monitors and mutexes.
172 void mutex_init() {
  173   def(tty_lock                     , PaddedMutex  , event,       true,  Monitor::_safepoint_check_never);      // allow to lock in VM


On 10/11/16 12:26 PM, Doerr, Martin wrote:
> Hi all,
>
> I came to the same conclusion regarding inheritance from PaddingEnd.
> Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.)
>
> I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid.
>
> Here's a new webrev:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/
>
> It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?).
>
> Please take a look.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Dienstag, 11. Oktober 2016 12:05
> To: Coleen Phillimore <coleen.phillimore at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> On 2016-10-11 02:03, Coleen Phillimore wrote:
>> Hi,
>>
>> Was the linear allocation in mutex.cpp the cause of the false sharing
>> that you observed?  I think I like this change better than the
>> original, because I've wondered myself why the name string was so
>> long.  So with this, we could make Monitor's smaller if they're
>> embedded in metadata or other structures.
> Music to my ears!
>
> I even think most embedded uses would see improvements if _name was
> removed entirely (or "simply" turned into a const char * so that it's
> not copied and embedded into the Monitor/Mutex)
>
>> Thanks,
>> Coleen
>>
>> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> thank you very much for your explanations.
>>>
>>> I agree with you that it would be better to pad where the Monitors
>>> are used. It would still fulfill the purpose of this RFE without
>>> disturbing other usages.
>>>
>>> So I could introduce:
>>> class PaddedMonitor : public Monitor {
>>>     enum {
>>>       CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE -
>>> (int)sizeof(Monitor),
>>>       PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>>     };
>>>     char _padding[PADDING_LEN];
>>> };
>>> and similarly PaddedMutex and replace all of the ones which get
>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).
> Sure!
>
> Some compilers may take issue with cases where PADDING_LEN == 0 (since
> char _padding[0] is technically illegal C++, but works on gcc etc) so
> maybe that special case will have to be (somewhat excessively):
>
> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING :
> DEFAULT_CACHE_LINE_SIZE
>
> We took a look at if it'd be feasible to express class PaddedMonitor :
> public PaddedEnd<Monitor>, but it appears that'd require variadic
> template arguments (C++11) to get right (since we'd need PaddedEnd to
> transitively publish constructors of Monitor).
>
> Thanks!
>
> /Claes
>
>>> Would you agree with this change?
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:35
>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> I'm concerned that this might be an easy-but-wrong fix to a complex
>>> problem, and acknowledging that there are already use cases where the
>>> _name field is contra-productive. This change adds complexity that
>>> makes it even less likely such uses will be optimized for in the
>>> future.
>>>
>>> There are Padded* types put in place to deal with these concerns
>>> explicitly rather than implicitly *where it matters*, which allows us
>>> the choice of applying padding or not on a per use-case basis (which
>>> means we can also remove the _name field for those use cases that don't
>>> care about either, which might be most outside of the global lists).
>>>
>>> I am very concerned about false sharing, but I have no data to support
>>> that this change has any measurable benefit in practice: I even did an
>>> experiment years ago now where I turned _name into a pointer to not pad
>>> at all and saw nothing exceeding noise levels on any benchmark.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>>> Hi Claes,
>>>>
>>>> what the change basically does is that the _name[] field gets
>>>> enlarged by 8 bytes on platforms with 128 byte
>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed
>>>> by the C++ compiler.
>>>> What exactly is your concern about the footprint overhead?
>>>> Are you not concerned about the risk of false sharing?
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>> Sent: Freitag, 7. Oktober 2016 12:00
>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>> <coleen.phillimore at oracle.com>
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> Hi,
>>>>
>>>> after due consideration I strongly consider this change unacceptable
>>>> since it adds footprint overhead to performance critcial compiler and
>>>> GC code with little to no data to support this won't cause regressions.
>>>>
>>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>>> than this.
>>>>
>>>> If I do have a veto on the matter, here it is.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>>> Hi Dan,
>>>>>
>>>>> thank you very much for reviewing and for investigating the history.
>>>>>
>>>>> It was not intended to make the functions you mentioned public.
>>>>> I've fixed that.
>>>>> I also updated the copyright information.
>>>>>
>>>>> New webrev is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>>
>>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>
>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>>>> padding is to avoid false sharing.
>>>>>>
>>>>>> My proposed change is here:
>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>          Please update the copyright year before pushing.
>>>>>
>>>>>          L172:   // The default length of monitor name is chosen to
>>>>> avoid
>>>>> false sharing.
>>>>>          L173:   enum {
>>>>>          L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>>> sizeof(MonitorBase),
>>>>>          L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>>> CACHE_LINE_PADDING : 64
>>>>>          L176:   };
>>>>>          L177:   char _name[MONITOR_NAME_LEN];          // Name of
>>>>> mutex
>>>>>
>>>>>              I have to say that I'm not fond of the fact that
>>>>> MONITOR_NAME_LEN
>>>>>              can vary between platforms; I like that it is a minimum
>>>>> of 64 bytes
>>>>>              and is still a constant.
>>>>>
>>>>>              I'm also not happy that the resulting sizeof(Monitor)
>>>>> may not
>>>>> be a multiple
>>>>>              of the DEFAULT_CACHE_LINE_SIZE. However, I have to
>>>>> mitigate
>>>>> that unhappiness
>>>>>              with the fact that sizeof(Monitor) hasn't been a
>>>>> multiple of
>>>>> the cache line
>>>>>              size since at least 2008 and no one complained (that I
>>>>> know of).
>>>>>
>>>>>              So if I was making this change, I would make
>>>>> MONITOR_NAME_LEN
>>>>> 64 bytes
>>>>>              (like it was) and add a pad field that would bring up
>>>>> sizeof(Monitor)
>>>>>              to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course,
>>>>> Claes
>>>>> would be
>>>>>              unhappy with me and anyone embedding a Monitor into
>>>>> another data
>>>>>              structure would be unhappy with me, but I'm used to
>>>>> that :-)
>>>>>
>>>>>              So what you have is fine, especially for JDK9.
>>>>>
>>>>>          L180:  public:
>>>>>          L181: #ifndef PRODUCT
>>>>>          L182:   debug_only(static bool contains(Monitor * locks,
>>>>> Monitor *
>>>>> lock);)
>>>>>          L183:   debug_only(static Monitor *
>>>>> get_least_ranked_lock(Monitor *
>>>>> locks);)
>>>>>          L184:   debug_only(Monitor *
>>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>>          L185: #endif
>>>>>          L186:
>>>>>          L187:   void set_owner_implementation(Thread*
>>>>> owner)                        PRODUCT_RETURN;
>>>>>          L188:   void check_prelock_state     (Thread*
>>>>> thread)                       PRODUCT_RETURN;
>>>>>          L189:   void check_block_state       (Thread* thread)
>>>>>
>>>>>              These were all "protected" before. Now they are "public".
>>>>>              Any particular reason?
>>>>>
>>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>>> answer to the "protected" versus "public" question, but don't
>>>>> considered that query to be a blocker.
>>>>>
>>>>>
>>>>> The rest of this isn't code review, but some of this caught
>>>>> my attention.
>>>>>
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>
>>>>>          old L84: // The default length of monitor name is chosen to
>>>>> be 64
>>>>> to avoid false sharing.
>>>>>          old L85: static const int MONITOR_NAME_LEN = 64;
>>>>>
>>>>> I had to look up the history of this comment:
>>>>>
>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>>> changeset:   55:2a8eb116ebbe
>>>>> user:        xlu
>>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>>> summary:     6610420: Debug VM crashes during monitor lock rank
>>>>> checking
>>>>>
>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>>       // *in that order*.  If their implementations change such that
>>>>> these
>>>>>       // assumptions are violated, a whole lot of code will break.
>>>>>
>>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>>> false sharing.
>>>>> +static const int MONITOR_NAME_LEN = 64;
>>>>> +
>>>>>       class Monitor : public CHeapObj {
>>>>>
>>>>>        public:
>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>>         volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>>         ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>>         volatile bool     _snuck;              // Used for sneaky
>>>>> locking
>>>>> (evil).
>>>>> -  const char * _name;                    // Name of mutex
>>>>>         int NotifyCount ;                      // diagnostic assist
>>>>> -  double pad [8] ;                       // avoid false sharing
>>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>>
>>>>>         // Debugging fields for naming, deadlock detection, etc.
>>>>> (some only
>>>>> used in debug mode)
>>>>>       #ifndef PRODUCT
>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>>          int  ILocked () ;
>>>>>
>>>>>        protected:
>>>>> -   static void ClearMonitor (Monitor * m) ;
>>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>>          Monitor() ;
>>>>>
>>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>>> time that Dice determined that 8-double value, the result was to pad
>>>>> the size of Monitor to an even multiple of a particular cache line
>>>>> size.
>>>>>
>>>>> Xiobin changed the 'name' field to be an array so that the name
>>>>> chars could serve double duty as the cache line pad... pun intended.
>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>>> size is a multiple of the cache line size.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>> Please review. If will also need a sponsor.
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Martin
>>>>>>


From calvin.cheung at oracle.com  Tue Oct 11 17:43:51 2016
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Tue, 11 Oct 2016 10:43:51 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FD227D.4070307@oracle.com>
References: <57FC63AC.3020809@oracle.com> <57FCCF1E.1080703@oracle.com>
	<57FD1F27.5090404@oracle.com> <57FD227D.4070307@oracle.com>
Message-ID: <57FD24D7.1090903@oracle.com>


On 10/11/16, 10:33 AM, Lois Foltan wrote:
>
> On 10/11/2016 1:19 PM, Calvin Cheung wrote:
>> Hi Lois,
>>
>> Thanks for your review.
>>
>> On 10/11/16, 4:38 AM, Lois Foltan wrote:
>>>
>>> On 10/10/2016 11:59 PM, Calvin Cheung wrote:
>>>>
>>>> Please review this small fix for not including classes in the 
>>>> classlist file which are unusable during run time.
>>>>
>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>>>>
>>>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>>>
>>> Hi Calvin,
>>>
>>> src/share/vm/classfile/classFileParser.cpp
>>> - line #5781, I find the if statement logic to be somewhat 
>>> confusing.  This check seems to be only for classes defined to the 
>>> boot and platform class loader.  I am assuming it does not apply to 
>>> the application class loader because there is no way to 
>>> differentiate a class defined to the application class loader from 
>>> being on the --patch-module list and the -classpath?  Is that why 
>>> the if statement logic does not include the application class loader?
>> Yes. We do want to include the classes defined to the app class loader.
> Even if those classes defined to the app class loader are located in 
> --patch-module entries?
Yes. With the fix for JDK-8164011, we currently don't archive any 
classes found in the --patch-module entries.

thanks,
Calvin
>
>>>   Maybe it is enough to improve the comment to something like:
>>>
>>>   // For the boot and platform class loaders, check if the class is 
>>> not found in the java runtime image
>>>   // or the boot loader's appended entries.  This indicates that the 
>>> class must be located on the --patch-module list and
>>>   // is not useable during run time, so should be skipped.
>> I've modified it a little. How about the following?
>> // For the boot and platform class loaders, check if the class is not 
>> found in the java runtime image.
>> // Additional check for the boot class loader is if the class is not 
>> found in the boot loader?s appended
>> // entries. This indicates that the class is not useable during run 
>> time, such as the ones found in the
>> // ?patch-module entries, so it should not be included in the 
>> classlist file.
> Looks good, thanks for rewording!
>
>>
>>>
>>> Then please indent the start of line #5782 by one space to show that 
>>> the check for the platform class loader is part of that first || 
>>> expression.
>> I'll fix it.
>>>
>>> test/runtime/modules/PatchModule/PatchModuleClassList.java
>>> - good test!
>> Let me know if you want to see another webrev.
>
> No, I'm all set.
> Thanks,
> Lois
>
>>
>> thanks,
>> Calvin
>>>
>>> Thanks,
>>> Lois
>>>
>>>>
>>>> Testing:
>>>>     JPRT with -testset hotspot
>>>>     jtreg tests under hotspot/runtime on all supported platforms 
>>>> (in progress)
>>>>
>>>> thanks,
>>>> Calvin
>>>
>

From claes.redestad at oracle.com  Tue Oct 11 17:44:05 2016
From: claes.redestad at oracle.com (Claes Redestad)
Date: Tue, 11 Oct 2016 19:44:05 +0200
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
	<a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
	<1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com>
Message-ID: <57FD24E5.2080506@oracle.com>

I am also happy with this, thanks!

/Claes

On 2016-10-11 19:35, Coleen Phillimore wrote:
>
> I am fine with this change.  Maybe a one line comment here, something like:
>
> // Using Padded subclasses to prevent false sharing of these global monitors and mutexes.
> 172 void mutex_init() {
>   173   def(tty_lock                     , PaddedMutex  , event,       true,  Monitor::_safepoint_check_never);      // allow to lock in VM
>
>
>
> On 10/11/16 12:26 PM, Doerr, Martin wrote:
>> Hi all,
>>
>> I came to the same conclusion regarding inheritance from PaddingEnd.
>> Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.)
>>
>> I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid.
>>
>> Here's a new webrev:
>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/
>>
>> It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?).
>>
>> Please take a look.
>>
>> Thanks and best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>> Sent: Dienstag, 11. Oktober 2016 12:05
>> To: Coleen Phillimore<coleen.phillimore at oracle.com>; Doerr, Martin<martin.doerr at sap.com>;daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>
>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>>
>> Hi,
>>
>> On 2016-10-11 02:03, Coleen Phillimore wrote:
>>> Hi,
>>>
>>> Was the linear allocation in mutex.cpp the cause of the false sharing
>>> that you observed?  I think I like this change better than the
>>> original, because I've wondered myself why the name string was so
>>> long.  So with this, we could make Monitor's smaller if they're
>>> embedded in metadata or other structures.
>> Music to my ears!
>>
>> I even think most embedded uses would see improvements if _name was
>> removed entirely (or "simply" turned into a const char * so that it's
>> not copied and embedded into the Monitor/Mutex)
>>
>>> Thanks,
>>> Coleen
>>>
>>> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>>>> Hi Claes,
>>>>
>>>> thank you very much for your explanations.
>>>>
>>>> I agree with you that it would be better to pad where the Monitors
>>>> are used. It would still fulfill the purpose of this RFE without
>>>> disturbing other usages.
>>>>
>>>> So I could introduce:
>>>> class PaddedMonitor : public Monitor {
>>>>     enum {
>>>>       CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE -
>>>> (int)sizeof(Monitor),
>>>>       PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>>>     };
>>>>     char _padding[PADDING_LEN];
>>>> };
>>>> and similarly PaddedMutex and replace all of the ones which get
>>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).
>> Sure!
>>
>> Some compilers may take issue with cases where PADDING_LEN == 0 (since
>> char _padding[0] is technically illegal C++, but works on gcc etc) so
>> maybe that special case will have to be (somewhat excessively):
>>
>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING :
>> DEFAULT_CACHE_LINE_SIZE
>>
>> We took a look at if it'd be feasible to express class PaddedMonitor :
>> public PaddedEnd<Monitor>, but it appears that'd require variadic
>> template arguments (C++11) to get right (since we'd need PaddedEnd to
>> transitively publish constructors of Monitor).
>>
>> Thanks!
>>
>> /Claes
>>
>>>> Would you agree with this change?
>>>>
>>>> Thanks and best regards,
>>>> Martin
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>> Sent: Freitag, 7. Oktober 2016 12:35
>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net;
>>>> David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>;
>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>> <coleen.phillimore at oracle.com>
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> Hi,
>>>>
>>>> I'm concerned that this might be an easy-but-wrong fix to a complex
>>>> problem, and acknowledging that there are already use cases where the
>>>> _name field is contra-productive. This change adds complexity that
>>>> makes it even less likely such uses will be optimized for in the
>>>> future.
>>>>
>>>> There are Padded* types put in place to deal with these concerns
>>>> explicitly rather than implicitly *where it matters*, which allows us
>>>> the choice of applying padding or not on a per use-case basis (which
>>>> means we can also remove the _name field for those use cases that don't
>>>> care about either, which might be most outside of the global lists).
>>>>
>>>> I am very concerned about false sharing, but I have no data to support
>>>> that this change has any measurable benefit in practice: I even did an
>>>> experiment years ago now where I turned _name into a pointer to not pad
>>>> at all and saw nothing exceeding noise levels on any benchmark.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>>>> Hi Claes,
>>>>>
>>>>> what the change basically does is that the _name[] field gets
>>>>> enlarged by 8 bytes on platforms with 128 byte
>>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed
>>>>> by the C++ compiler.
>>>>> What exactly is your concern about the footprint overhead?
>>>>> Are you not concerned about the risk of false sharing?
>>>>>
>>>>> Best regards,
>>>>> Martin
>>>>>
>>>>> -----Original Message-----
>>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>>> Sent: Freitag, 7. Oktober 2016 12:00
>>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net;
>>>>> David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>;
>>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>>> <coleen.phillimore at oracle.com>
>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>
>>>>> Hi,
>>>>>
>>>>> after due consideration I strongly consider this change unacceptable
>>>>> since it adds footprint overhead to performance critcial compiler and
>>>>> GC code with little to no data to support this won't cause regressions.
>>>>>
>>>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>>>> than this.
>>>>>
>>>>> If I do have a veto on the matter, here it is.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> /Claes
>>>>>
>>>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>>>> Hi Dan,
>>>>>>
>>>>>> thank you very much for reviewing and for investigating the history.
>>>>>>
>>>>>> It was not intended to make the functions you mentioned public.
>>>>>> I've fixed that.
>>>>>> I also updated the copyright information.
>>>>>>
>>>>>> New webrev is here:
>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>>>
>>>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Martin
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>>
>>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>>>>> padding is to avoid false sharing.
>>>>>>>
>>>>>>> My proposed change is here:
>>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>>>> src/share/vm/runtime/mutex.hpp
>>>>>>          Please update the copyright year before pushing.
>>>>>>
>>>>>>          L172:   // The default length of monitor name is chosen to
>>>>>> avoid
>>>>>> false sharing.
>>>>>>          L173:   enum {
>>>>>>          L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>>>> sizeof(MonitorBase),
>>>>>>          L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>>>> CACHE_LINE_PADDING : 64
>>>>>>          L176:   };
>>>>>>          L177:   char _name[MONITOR_NAME_LEN];          // Name of
>>>>>> mutex
>>>>>>
>>>>>>              I have to say that I'm not fond of the fact that
>>>>>> MONITOR_NAME_LEN
>>>>>>              can vary between platforms; I like that it is a minimum
>>>>>> of 64 bytes
>>>>>>              and is still a constant.
>>>>>>
>>>>>>              I'm also not happy that the resulting sizeof(Monitor)
>>>>>> may not
>>>>>> be a multiple
>>>>>>              of the DEFAULT_CACHE_LINE_SIZE. However, I have to
>>>>>> mitigate
>>>>>> that unhappiness
>>>>>>              with the fact that sizeof(Monitor) hasn't been a
>>>>>> multiple of
>>>>>> the cache line
>>>>>>              size since at least 2008 and no one complained (that I
>>>>>> know of).
>>>>>>
>>>>>>              So if I was making this change, I would make
>>>>>> MONITOR_NAME_LEN
>>>>>> 64 bytes
>>>>>>              (like it was) and add a pad field that would bring up
>>>>>> sizeof(Monitor)
>>>>>>              to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course,
>>>>>> Claes
>>>>>> would be
>>>>>>              unhappy with me and anyone embedding a Monitor into
>>>>>> another data
>>>>>>              structure would be unhappy with me, but I'm used to
>>>>>> that :-)
>>>>>>
>>>>>>              So what you have is fine, especially for JDK9.
>>>>>>
>>>>>>          L180:  public:
>>>>>>          L181: #ifndef PRODUCT
>>>>>>          L182:   debug_only(static bool contains(Monitor * locks,
>>>>>> Monitor *
>>>>>> lock);)
>>>>>>          L183:   debug_only(static Monitor *
>>>>>> get_least_ranked_lock(Monitor *
>>>>>> locks);)
>>>>>>          L184:   debug_only(Monitor *
>>>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>>>          L185: #endif
>>>>>>          L186:
>>>>>>          L187:   void set_owner_implementation(Thread*
>>>>>> owner)                        PRODUCT_RETURN;
>>>>>>          L188:   void check_prelock_state     (Thread*
>>>>>> thread)                       PRODUCT_RETURN;
>>>>>>          L189:   void check_block_state       (Thread* thread)
>>>>>>
>>>>>>              These were all "protected" before. Now they are "public".
>>>>>>              Any particular reason?
>>>>>>
>>>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>>>> answer to the "protected" versus "public" question, but don't
>>>>>> considered that query to be a blocker.
>>>>>>
>>>>>>
>>>>>> The rest of this isn't code review, but some of this caught
>>>>>> my attention.
>>>>>>
>>>>>> src/share/vm/runtime/mutex.hpp
>>>>>>
>>>>>>          old L84: // The default length of monitor name is chosen to
>>>>>> be 64
>>>>>> to avoid false sharing.
>>>>>>          old L85: static const int MONITOR_NAME_LEN = 64;
>>>>>>
>>>>>> I had to look up the history of this comment:
>>>>>>
>>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>>>> changeset:   55:2a8eb116ebbe
>>>>>> user:        xlu
>>>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>>>> summary:     6610420: Debug VM crashes during monitor lock rank
>>>>>> checking
>>>>>>
>>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>>>       // *in that order*.  If their implementations change such that
>>>>>> these
>>>>>>       // assumptions are violated, a whole lot of code will break.
>>>>>>
>>>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>>>> false sharing.
>>>>>> +static const int MONITOR_NAME_LEN = 64;
>>>>>> +
>>>>>>       class Monitor : public CHeapObj {
>>>>>>
>>>>>>        public:
>>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>>>         volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>>>         ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>>>         volatile bool     _snuck;              // Used for sneaky
>>>>>> locking
>>>>>> (evil).
>>>>>> -  const char * _name;                    // Name of mutex
>>>>>>         int NotifyCount ;                      // diagnostic assist
>>>>>> -  double pad [8] ;                       // avoid false sharing
>>>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>>>
>>>>>>         // Debugging fields for naming, deadlock detection, etc.
>>>>>> (some only
>>>>>> used in debug mode)
>>>>>>       #ifndef PRODUCT
>>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>>>          int  ILocked () ;
>>>>>>
>>>>>>        protected:
>>>>>> -   static void ClearMonitor (Monitor * m) ;
>>>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>>>          Monitor() ;
>>>>>>
>>>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>>>> time that Dice determined that 8-double value, the result was to pad
>>>>>> the size of Monitor to an even multiple of a particular cache line
>>>>>> size.
>>>>>>
>>>>>> Xiobin changed the 'name' field to be an array so that the name
>>>>>> chars could serve double duty as the cache line pad... pun intended.
>>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>>>> size is a multiple of the cache line size.
>>>>>>
>>>>>> Dan
>>>>>>
>>>>>>
>>>>>>> Please review. If will also need a sponsor.
>>>>>>>
>>>>>>> Thanks and best regards,
>>>>>>> Martin
>>>>>>>
>

From daniel.daugherty at oracle.com  Tue Oct 11 17:51:10 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 11 Oct 2016 11:51:10 -0600
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <57FD24E5.2080506@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
	<a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
	<1e79c5d8-5bce-2d0f-c01f-67f4b1e79bed@oracle.com>
	<57FD24E5.2080506@oracle.com>
Message-ID: <be2f580f-3ef6-00dd-9167-7fa135866934@oracle.com>

Thumbs up on this version!

Dan


On 10/11/16 11:44 AM, Claes Redestad wrote:
> I am also happy with this, thanks!
>
> /Claes
>
> On 2016-10-11 19:35, Coleen Phillimore wrote:
>>
>> I am fine with this change.  Maybe a one line comment here, something 
>> like:
>>
>> // Using Padded subclasses to prevent false sharing of these global 
>> monitors and mutexes.
>> 172 void mutex_init() {
>>   173   def(tty_lock                     , PaddedMutex  , 
>> event,       true,  Monitor::_safepoint_check_never);      // allow 
>> to lock in VM
>>
>>
>>
>> On 10/11/16 12:26 PM, Doerr, Martin wrote:
>>> Hi all,
>>>
>>> I came to the same conclusion regarding inheritance from PaddingEnd.
>>> Unfortunately, you're also right, Claes, that we should better not 
>>> use 0 as minimal padding length because some compilers may have 
>>> trouble with 0 length arrays. I hope 1 is ok as minimal padding 
>>> length because the new operator does not allocate cache line aligned 
>>> at the moment. So I don't see any benefit in more padding. (Padding 
>>> length of 1 byte has the advantage that it may not enlarge the 
>>> object size if the previous field leaves some space due to its type.)
>>>
>>> I believe 2 _LockWord fields on one cache line was basically the 
>>> problem we wanted to avoid.
>>>
>>> Here's a new webrev:
>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/
>>>
>>> It also enables changing the _name[] field to a pointer or a smaller 
>>> array. I guess this should better be done in a separate change 
>>> (jdk10?).
>>>
>>> Please take a look.
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Dienstag, 11. Oktober 2016 12:05
>>> To: Coleen Phillimore<coleen.phillimore at oracle.com>; Doerr, 
>>> Martin<martin.doerr at sap.com>;daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net; 
>>> David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to 
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> On 2016-10-11 02:03, Coleen Phillimore wrote:
>>>> Hi,
>>>>
>>>> Was the linear allocation in mutex.cpp the cause of the false sharing
>>>> that you observed?  I think I like this change better than the
>>>> original, because I've wondered myself why the name string was so
>>>> long.  So with this, we could make Monitor's smaller if they're
>>>> embedded in metadata or other structures.
>>> Music to my ears!
>>>
>>> I even think most embedded uses would see improvements if _name was
>>> removed entirely (or "simply" turned into a const char * so that it's
>>> not copied and embedded into the Monitor/Mutex)
>>>
>>>> Thanks,
>>>> Coleen
>>>>
>>>> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>>>>> Hi Claes,
>>>>>
>>>>> thank you very much for your explanations.
>>>>>
>>>>> I agree with you that it would be better to pad where the Monitors
>>>>> are used. It would still fulfill the purpose of this RFE without
>>>>> disturbing other usages.
>>>>>
>>>>> So I could introduce:
>>>>> class PaddedMonitor : public Monitor {
>>>>>     enum {
>>>>>       CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE -
>>>>> (int)sizeof(Monitor),
>>>>>       PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>>>>     };
>>>>>     char _padding[PADDING_LEN];
>>>>> };
>>>>> and similarly PaddedMutex and replace all of the ones which get
>>>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).
>>> Sure!
>>>
>>> Some compilers may take issue with cases where PADDING_LEN == 0 (since
>>> char _padding[0] is technically illegal C++, but works on gcc etc) so
>>> maybe that special case will have to be (somewhat excessively):
>>>
>>> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING :
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> We took a look at if it'd be feasible to express class PaddedMonitor :
>>> public PaddedEnd<Monitor>, but it appears that'd require variadic
>>> template arguments (C++11) to get right (since we'd need PaddedEnd to
>>> transitively publish constructors of Monitor).
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>>>> Would you agree with this change?
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>>> Sent: Freitag, 7. Oktober 2016 12:35
>>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net;
>>>>> David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>;
>>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>>> <coleen.phillimore at oracle.com>
>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'm concerned that this might be an easy-but-wrong fix to a complex
>>>>> problem, and acknowledging that there are already use cases where the
>>>>> _name field is contra-productive. This change adds complexity that
>>>>> makes it even less likely such uses will be optimized for in the
>>>>> future.
>>>>>
>>>>> There are Padded* types put in place to deal with these concerns
>>>>> explicitly rather than implicitly *where it matters*, which allows us
>>>>> the choice of applying padding or not on a per use-case basis (which
>>>>> means we can also remove the _name field for those use cases that 
>>>>> don't
>>>>> care about either, which might be most outside of the global lists).
>>>>>
>>>>> I am very concerned about false sharing, but I have no data to 
>>>>> support
>>>>> that this change has any measurable benefit in practice: I even 
>>>>> did an
>>>>> experiment years ago now where I turned _name into a pointer to 
>>>>> not pad
>>>>> at all and saw nothing exceeding noise levels on any benchmark.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> /Claes
>>>>>
>>>>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>>>>> Hi Claes,
>>>>>>
>>>>>> what the change basically does is that the _name[] field gets
>>>>>> enlarged by 8 bytes on platforms with 128 byte
>>>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed
>>>>>> by the C++ compiler.
>>>>>> What exactly is your concern about the footprint overhead?
>>>>>> Are you not concerned about the risk of false sharing?
>>>>>>
>>>>>> Best regards,
>>>>>> Martin
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>>>> Sent: Freitag, 7. Oktober 2016 12:00
>>>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>>>> daniel.daugherty at oracle.com;hotspot-runtime-dev at openjdk.java.net;
>>>>>> David Holmes (david.holmes at oracle.com)<david.holmes at oracle.com>;
>>>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>>>> <coleen.phillimore at oracle.com>
>>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> after due consideration I strongly consider this change unacceptable
>>>>>> since it adds footprint overhead to performance critcial compiler 
>>>>>> and
>>>>>> GC code with little to no data to support this won't cause 
>>>>>> regressions.
>>>>>>
>>>>>> Changes to Monitor/Mutex needs to be done with more surgical 
>>>>>> precision
>>>>>> than this.
>>>>>>
>>>>>> If I do have a veto on the matter, here it is.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>> /Claes
>>>>>>
>>>>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>>>>> Hi Dan,
>>>>>>>
>>>>>>> thank you very much for reviewing and for investigating the 
>>>>>>> history.
>>>>>>>
>>>>>>> It was not intended to make the functions you mentioned public.
>>>>>>> I've fixed that.
>>>>>>> I also updated the copyright information.
>>>>>>>
>>>>>>> New webrev is here:
>>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>>>>
>>>>>>> @Coleen: Please use this one. I have also added reviewer 
>>>>>>> attribution.
>>>>>>>
>>>>>>> Thanks and best regards,
>>>>>>> Martin
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>>>>> To: Doerr, Martin<martin.doerr at sap.com>;
>>>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>>>
>>>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>>>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>>>>>> padding is to avoid false sharing.
>>>>>>>>
>>>>>>>> My proposed change is here:
>>>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/ 
>>>>>>>>
>>>>>>> src/share/vm/runtime/mutex.hpp
>>>>>>>          Please update the copyright year before pushing.
>>>>>>>
>>>>>>>          L172:   // The default length of monitor name is chosen to
>>>>>>> avoid
>>>>>>> false sharing.
>>>>>>>          L173:   enum {
>>>>>>>          L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>>>>> sizeof(MonitorBase),
>>>>>>>          L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>>>>> CACHE_LINE_PADDING : 64
>>>>>>>          L176:   };
>>>>>>>          L177:   char _name[MONITOR_NAME_LEN];          // Name of
>>>>>>> mutex
>>>>>>>
>>>>>>>              I have to say that I'm not fond of the fact that
>>>>>>> MONITOR_NAME_LEN
>>>>>>>              can vary between platforms; I like that it is a 
>>>>>>> minimum
>>>>>>> of 64 bytes
>>>>>>>              and is still a constant.
>>>>>>>
>>>>>>>              I'm also not happy that the resulting sizeof(Monitor)
>>>>>>> may not
>>>>>>> be a multiple
>>>>>>>              of the DEFAULT_CACHE_LINE_SIZE. However, I have to
>>>>>>> mitigate
>>>>>>> that unhappiness
>>>>>>>              with the fact that sizeof(Monitor) hasn't been a
>>>>>>> multiple of
>>>>>>> the cache line
>>>>>>>              size since at least 2008 and no one complained (that I
>>>>>>> know of).
>>>>>>>
>>>>>>>              So if I was making this change, I would make
>>>>>>> MONITOR_NAME_LEN
>>>>>>> 64 bytes
>>>>>>>              (like it was) and add a pad field that would bring up
>>>>>>> sizeof(Monitor)
>>>>>>>              to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of 
>>>>>>> course,
>>>>>>> Claes
>>>>>>> would be
>>>>>>>              unhappy with me and anyone embedding a Monitor into
>>>>>>> another data
>>>>>>>              structure would be unhappy with me, but I'm used to
>>>>>>> that :-)
>>>>>>>
>>>>>>>              So what you have is fine, especially for JDK9.
>>>>>>>
>>>>>>>          L180:  public:
>>>>>>>          L181: #ifndef PRODUCT
>>>>>>>          L182:   debug_only(static bool contains(Monitor * locks,
>>>>>>> Monitor *
>>>>>>> lock);)
>>>>>>>          L183:   debug_only(static Monitor *
>>>>>>> get_least_ranked_lock(Monitor *
>>>>>>> locks);)
>>>>>>>          L184:   debug_only(Monitor *
>>>>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>>>>          L185: #endif
>>>>>>>          L186:
>>>>>>>          L187:   void set_owner_implementation(Thread*
>>>>>>> owner)                        PRODUCT_RETURN;
>>>>>>>          L188:   void check_prelock_state     (Thread*
>>>>>>> thread)                       PRODUCT_RETURN;
>>>>>>>          L189:   void check_block_state       (Thread* thread)
>>>>>>>
>>>>>>>              These were all "protected" before. Now they are 
>>>>>>> "public".
>>>>>>>              Any particular reason?
>>>>>>>
>>>>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>>>>> answer to the "protected" versus "public" question, but don't
>>>>>>> considered that query to be a blocker.
>>>>>>>
>>>>>>>
>>>>>>> The rest of this isn't code review, but some of this caught
>>>>>>> my attention.
>>>>>>>
>>>>>>> src/share/vm/runtime/mutex.hpp
>>>>>>>
>>>>>>>          old L84: // The default length of monitor name is 
>>>>>>> chosen to
>>>>>>> be 64
>>>>>>> to avoid false sharing.
>>>>>>>          old L85: static const int MONITOR_NAME_LEN = 64;
>>>>>>>
>>>>>>> I had to look up the history of this comment:
>>>>>>>
>>>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>>>>> changeset:   55:2a8eb116ebbe
>>>>>>> user:        xlu
>>>>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>>>>> summary:     6610420: Debug VM crashes during monitor lock rank
>>>>>>> checking
>>>>>>>
>>>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 
>>>>>>> -0500
>>>>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 
>>>>>>> -0800
>>>>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>>>>       // *in that order*.  If their implementations change such 
>>>>>>> that
>>>>>>> these
>>>>>>>       // assumptions are violated, a whole lot of code will break.
>>>>>>>
>>>>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>>>>> false sharing.
>>>>>>> +static const int MONITOR_NAME_LEN = 64;
>>>>>>> +
>>>>>>>       class Monitor : public CHeapObj {
>>>>>>>
>>>>>>>        public:
>>>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>>>>         volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>>>>         ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>>>>         volatile bool     _snuck;              // Used for sneaky
>>>>>>> locking
>>>>>>> (evil).
>>>>>>> -  const char * _name;                    // Name of mutex
>>>>>>>         int NotifyCount ;                      // diagnostic assist
>>>>>>> -  double pad [8] ;                       // avoid false sharing
>>>>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>>>>
>>>>>>>         // Debugging fields for naming, deadlock detection, etc.
>>>>>>> (some only
>>>>>>> used in debug mode)
>>>>>>>       #ifndef PRODUCT
>>>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>>>>          int  ILocked () ;
>>>>>>>
>>>>>>>        protected:
>>>>>>> -   static void ClearMonitor (Monitor * m) ;
>>>>>>> +   static void ClearMonitor (Monitor * m, const char* name = 
>>>>>>> NULL) ;
>>>>>>>          Monitor() ;
>>>>>>>
>>>>>>> So the original code had an 8-double pad for avoiding false 
>>>>>>> sharing.
>>>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at 
>>>>>>> the
>>>>>>> time that Dice determined that 8-double value, the result was to 
>>>>>>> pad
>>>>>>> the size of Monitor to an even multiple of a particular cache line
>>>>>>> size.
>>>>>>>
>>>>>>> Xiobin changed the 'name' field to be an array so that the name
>>>>>>> chars could serve double duty as the cache line pad... pun 
>>>>>>> intended.
>>>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>>>>> size is a multiple of the cache line size.
>>>>>>>
>>>>>>> Dan
>>>>>>>
>>>>>>>
>>>>>>>> Please review. If will also need a sponsor.
>>>>>>>>
>>>>>>>> Thanks and best regards,
>>>>>>>> Martin
>>>>>>>>
>>


From ioi.lam at oracle.com  Tue Oct 11 20:14:42 2016
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 11 Oct 2016 13:14:42 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FC63AC.3020809@oracle.com>
References: <57FC63AC.3020809@oracle.com>
Message-ID: <57FD4832.9030705@oracle.com>

Looks good. Thanks

- Ioi

On 10/10/16 8:59 PM, Calvin Cheung wrote:
>
> Please review this small fix for not including classes in the 
> classlist file which are unusable during run time.
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>
> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>
> Testing:
>     JPRT with -testset hotspot
>     jtreg tests under hotspot/runtime on all supported platforms (in 
> progress)
>
> thanks,
> Calvin


From calvin.cheung at oracle.com  Tue Oct 11 20:46:07 2016
From: calvin.cheung at oracle.com (Calvin Cheung)
Date: Tue, 11 Oct 2016 13:46:07 -0700
Subject: RFR(S): 8166931: Do not include classes which are unusable during
	run time in the classlist file
In-Reply-To: <57FD4832.9030705@oracle.com>
References: <57FC63AC.3020809@oracle.com> <57FD4832.9030705@oracle.com>
Message-ID: <57FD4F8F.1090606@oracle.com>

Ioi,
Thanks for your review!
Calvin

On 10/11/16, 1:14 PM, Ioi Lam wrote:
> Looks good. Thanks
>
> - Ioi
>
> On 10/10/16 8:59 PM, Calvin Cheung wrote:
>>
>> Please review this small fix for not including classes in the 
>> classlist file which are unusable during run time.
>>
>> bug: https://bugs.openjdk.java.net/browse/JDK-8166931
>>
>> webrev: http://cr.openjdk.java.net/~ccheung/8166931/webrev.00/
>>
>> Testing:
>>     JPRT with -testset hotspot
>>     jtreg tests under hotspot/runtime on all supported platforms (in 
>> progress)
>>
>> thanks,
>> Calvin
>

From david.holmes at oracle.com  Tue Oct 11 21:11:24 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 07:11:24 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <cc1202f6-24d6-b996-f5ef-5ef35cef3aaf@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<cc1202f6-24d6-b996-f5ef-5ef35cef3aaf@oracle.com>
Message-ID: <c42e58fa-c952-c3c5-817e-f049200d3dff@oracle.com>

Hi Dan,

Thanks for looking at this.

On 12/10/2016 3:30 AM, Daniel D. Daugherty wrote:
> On 10/10/16 7:55 PM, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>
> src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java
>     No comments. (Thanks for also fixing the typo.)
>
> src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java
>     L352:         if (isNonVirtual(options)) {
>     L353:             if (method.isAbstract()) {
>     L354:                 throw new IllegalArgumentException("Abstract
> method");
>     L355:             }
>         Any particular reason for breaking the logic into two
>         distinct if-statements?
>
>         Perhaps:
>
>                   if (isNonVirtual(options) && method.isAbstract()) {
>                       throw new IllegalArgumentException("Abstract
> method");
>                   }
>
>         Also, perhaps "unexpected Abstract method" is more clear?

:) I tried to forestall this comment by saying "use the same format as 
already present in the class version of the check method. " The code I 
have now is a copy of what is prior:

  312         /*
  313          * For nonvirtual invokes, method must have a body
  314          */
  315         if (isNonVirtual(options)) {
  316             if (method.isAbstract()) {
  317                 throw new IllegalArgumentException("Abstract method");
  318             }
  319         }

While I personally prefer the conjunctive form I went for consistency 
whilst minimizing changes.

> test/com/sun/jdi/InterfaceMethodsTest.java
>     L526:             if (t.getClass() != expectedException) {
>     L527:                 System.err.println("--- FAILED");
>     L528:                 failure("FAILED: " + t);
>     L529:                 return null;
>     L530:             }
>         You should also report the expectedException value here to
>         aid in failure analysis.

Good point - will update.

> Thumbs up! I don't need to see another webrev if you decide to
> make the above small tweaks.

Great - thanks again.

David

> Dan
>
>
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version
>> of the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a
>> number of problem with the way it is currently written [1] -
>> specifically it doesn't properly separate method lookup from method
>> invocation. So I've added the capability to separate lookup and
>> invocation for use with the private interface methods - I have not
>> tried to address shortcomings of the existing tests. Though I did fix
>> the return value checking logic! And did some clarifying comments and
>> renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI
>> - [2]. So the actual private interface method testing is very simple:
>> can I get the Method from the InterfaceType for the interface
>> declaring the method? Can I then invoke that method on an instance of
>> a class that implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>

From mandy.chung at oracle.com  Tue Oct 11 21:14:52 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 11 Oct 2016 14:14:52 -0700
Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java needs
	update for JDK-8162401
Message-ID: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com>

Harold,

Can you review this test update:

diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java b/test/runtime/modules/IgnoreModulePropertiesTest.java
--- a/test/runtime/modules/IgnoreModulePropertiesTest.java
+++ b/test/runtime/modules/IgnoreModulePropertiesTest.java
@@ -69,8 +69,9 @@
     public static void main(String[] args) throws Exception {
         testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException");
         testOption("--limit-modules", "java.sqlx", "jdk.module.limitmods", "java.lang.module.ResolutionException");
-        testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "java.lang.RuntimeException");
-        testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", "java.lang.RuntimeException");
+        testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "WARNING: Unknown module: xyzz");
+        testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0",
+                   "WARNING: package xyzz not in java.base");
         testOption("--patch-module", "=d", "jdk.module.patch.0", "IllegalArgumentException");
     }
 }


-?add-modules is now a repeating option.  Should this line:
  testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException");

be changed to ?jdk.module.addmods.0?, as in addreads, addexports property?

Mandy

From daniel.daugherty at oracle.com  Tue Oct 11 21:27:52 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 11 Oct 2016 15:27:52 -0600
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <c42e58fa-c952-c3c5-817e-f049200d3dff@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<cc1202f6-24d6-b996-f5ef-5ef35cef3aaf@oracle.com>
	<c42e58fa-c952-c3c5-817e-f049200d3dff@oracle.com>
Message-ID: <765b6111-54c6-01b6-7288-6fd4a1578afb@oracle.com>

On 10/11/16 3:11 PM, David Holmes wrote:
> Hi Dan,
>
> Thanks for looking at this.
>
> On 12/10/2016 3:30 AM, Daniel D. Daugherty wrote:
>> On 10/10/16 7:55 PM, David Holmes wrote:
>>> Turns out the only place changes were needed were in JDI.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> src/jdk.jdi/share/classes/com/sun/jdi/ObjectReference.java
>>     No comments. (Thanks for also fixing the typo.)
>>
>> src/jdk.jdi/share/classes/com/sun/tools/jdi/ObjectReferenceImpl.java
>>     L352:         if (isNonVirtual(options)) {
>>     L353:             if (method.isAbstract()) {
>>     L354:                 throw new IllegalArgumentException("Abstract
>> method");
>>     L355:             }
>>         Any particular reason for breaking the logic into two
>>         distinct if-statements?
>>
>>         Perhaps:
>>
>>                   if (isNonVirtual(options) && method.isAbstract()) {
>>                       throw new IllegalArgumentException("Abstract
>> method");
>>                   }
>>
>>         Also, perhaps "unexpected Abstract method" is more clear?
>
> :) I tried to forestall this comment by saying "use the same format as 
> already present in the class version of the check method. " The code I 
> have now is a copy of what is prior:
>
>  312         /*
>  313          * For nonvirtual invokes, method must have a body
>  314          */
>  315         if (isNonVirtual(options)) {
>  316             if (method.isAbstract()) {
>  317                 throw new IllegalArgumentException("Abstract 
> method");
>  318             }
>  319         }
>
> While I personally prefer the conjunctive form I went for consistency 
> whilst minimizing changes.

I'm okay with your choice.

Dan

>
>> test/com/sun/jdi/InterfaceMethodsTest.java
>>     L526:             if (t.getClass() != expectedException) {
>>     L527:                 System.err.println("--- FAILED");
>>     L528:                 failure("FAILED: " + t);
>>     L529:                 return null;
>>     L530:             }
>>         You should also report the expectedException value here to
>>         aid in failure analysis.
>
> Good point - will update.
>
>> Thumbs up! I don't need to see another webrev if you decide to
>> make the above small tweaks.
>
> Great - thanks again.
>
> David
>
>> Dan
>>
>>
>>>
>>> The spec change in ObjectReference is very simple and there is a CCC
>>> request in progress to ratify that change.
>>>
>>> The implementation change in ObjectReferenceImpl mirrors the updated
>>> spec and use the same format as already present in the class version
>>> of the check method.
>>>
>>> The test is a little more complex. This is obviously an extension to
>>> what is already tested in InterfaceMethodsTest. However IMT has a
>>> number of problem with the way it is currently written [1] -
>>> specifically it doesn't properly separate method lookup from method
>>> invocation. So I've added the capability to separate lookup and
>>> invocation for use with the private interface methods - I have not
>>> tried to address shortcomings of the existing tests. Though I did fix
>>> the return value checking logic! And did some clarifying comments and
>>> renaming in a couple of place.
>>>
>>> Still on the test I can't add the negative tests I would like to add
>>> because they actually pass due to a different long standing bug in JDI
>>> - [2]. So the actual private interface method testing is very simple:
>>> can I get the Method from the InterfaceType for the interface
>>> declaring the method? Can I then invoke that method on an instance of
>>> a class that implements the interface.
>>>
>>> Thanks,
>>> David
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>>


From david.holmes at oracle.com  Wed Oct 12 02:21:56 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 12:21:56 +1000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
	<a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
Message-ID: <10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com>

Looks good to me too! Only comment is do we want to change this comment:

  84 // The default length of monitor name is chosen to be 64 to avoid 
false sharing.
   85 static const int MONITOR_NAME_LEN = 64;

and do we even want to change the value here?

Thanks,
David

On 12/10/2016 2:26 AM, Doerr, Martin wrote:
> Hi all,
>
> I came to the same conclusion regarding inheritance from PaddingEnd.
> Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.)
>
> I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid.
>
> Here's a new webrev:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/
>
> It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?).
>
> Please take a look.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Dienstag, 11. Oktober 2016 12:05
> To: Coleen Phillimore <coleen.phillimore at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> On 2016-10-11 02:03, Coleen Phillimore wrote:
>>
>> Hi,
>>
>> Was the linear allocation in mutex.cpp the cause of the false sharing
>> that you observed?  I think I like this change better than the
>> original, because I've wondered myself why the name string was so
>> long.  So with this, we could make Monitor's smaller if they're
>> embedded in metadata or other structures.
>
> Music to my ears!
>
> I even think most embedded uses would see improvements if _name was
> removed entirely (or "simply" turned into a const char * so that it's
> not copied and embedded into the Monitor/Mutex)
>
>>
>> Thanks,
>> Coleen
>>
>> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> thank you very much for your explanations.
>>>
>>> I agree with you that it would be better to pad where the Monitors
>>> are used. It would still fulfill the purpose of this RFE without
>>> disturbing other usages.
>>>
>>> So I could introduce:
>>> class PaddedMonitor : public Monitor {
>>>    enum {
>>>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE -
>>> (int)sizeof(Monitor),
>>>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>>    };
>>>    char _padding[PADDING_LEN];
>>> };
>>> and similarly PaddedMutex and replace all of the ones which get
>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).
>
> Sure!
>
> Some compilers may take issue with cases where PADDING_LEN == 0 (since
> char _padding[0] is technically illegal C++, but works on gcc etc) so
> maybe that special case will have to be (somewhat excessively):
>
> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING :
> DEFAULT_CACHE_LINE_SIZE
>
> We took a look at if it'd be feasible to express class PaddedMonitor :
> public PaddedEnd<Monitor>, but it appears that'd require variadic
> template arguments (C++11) to get right (since we'd need PaddedEnd to
> transitively publish constructors of Monitor).
>
> Thanks!
>
> /Claes
>
>>>
>>> Would you agree with this change?
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:35
>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> I'm concerned that this might be an easy-but-wrong fix to a complex
>>> problem, and acknowledging that there are already use cases where the
>>> _name field is contra-productive. This change adds complexity that
>>> makes it even less likely such uses will be optimized for in the
>>> future.
>>>
>>> There are Padded* types put in place to deal with these concerns
>>> explicitly rather than implicitly *where it matters*, which allows us
>>> the choice of applying padding or not on a per use-case basis (which
>>> means we can also remove the _name field for those use cases that don't
>>> care about either, which might be most outside of the global lists).
>>>
>>> I am very concerned about false sharing, but I have no data to support
>>> that this change has any measurable benefit in practice: I even did an
>>> experiment years ago now where I turned _name into a pointer to not pad
>>> at all and saw nothing exceeding noise levels on any benchmark.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>>> Hi Claes,
>>>>
>>>> what the change basically does is that the _name[] field gets
>>>> enlarged by 8 bytes on platforms with 128 byte
>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed
>>>> by the C++ compiler.
>>>> What exactly is your concern about the footprint overhead?
>>>> Are you not concerned about the risk of false sharing?
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>> Sent: Freitag, 7. Oktober 2016 12:00
>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>> <coleen.phillimore at oracle.com>
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> Hi,
>>>>
>>>> after due consideration I strongly consider this change unacceptable
>>>> since it adds footprint overhead to performance critcial compiler and
>>>> GC code with little to no data to support this won't cause regressions.
>>>>
>>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>>> than this.
>>>>
>>>> If I do have a veto on the matter, here it is.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>>> Hi Dan,
>>>>>
>>>>> thank you very much for reviewing and for investigating the history.
>>>>>
>>>>> It was not intended to make the functions you mentioned public.
>>>>> I've fixed that.
>>>>> I also updated the copyright information.
>>>>>
>>>>> New webrev is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>>
>>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>
>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>>>> padding is to avoid false sharing.
>>>>>>
>>>>>> My proposed change is here:
>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>         Please update the copyright year before pushing.
>>>>>
>>>>>         L172:   // The default length of monitor name is chosen to
>>>>> avoid
>>>>> false sharing.
>>>>>         L173:   enum {
>>>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>>> sizeof(MonitorBase),
>>>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>>> CACHE_LINE_PADDING : 64
>>>>>         L176:   };
>>>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of
>>>>> mutex
>>>>>
>>>>>             I have to say that I'm not fond of the fact that
>>>>> MONITOR_NAME_LEN
>>>>>             can vary between platforms; I like that it is a minimum
>>>>> of 64 bytes
>>>>>             and is still a constant.
>>>>>
>>>>>             I'm also not happy that the resulting sizeof(Monitor)
>>>>> may not
>>>>> be a multiple
>>>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to
>>>>> mitigate
>>>>> that unhappiness
>>>>>             with the fact that sizeof(Monitor) hasn't been a
>>>>> multiple of
>>>>> the cache line
>>>>>             size since at least 2008 and no one complained (that I
>>>>> know of).
>>>>>
>>>>>             So if I was making this change, I would make
>>>>> MONITOR_NAME_LEN
>>>>> 64 bytes
>>>>>             (like it was) and add a pad field that would bring up
>>>>> sizeof(Monitor)
>>>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course,
>>>>> Claes
>>>>> would be
>>>>>             unhappy with me and anyone embedding a Monitor into
>>>>> another data
>>>>>             structure would be unhappy with me, but I'm used to
>>>>> that :-)
>>>>>
>>>>>             So what you have is fine, especially for JDK9.
>>>>>
>>>>>         L180:  public:
>>>>>         L181: #ifndef PRODUCT
>>>>>         L182:   debug_only(static bool contains(Monitor * locks,
>>>>> Monitor *
>>>>> lock);)
>>>>>         L183:   debug_only(static Monitor *
>>>>> get_least_ranked_lock(Monitor *
>>>>> locks);)
>>>>>         L184:   debug_only(Monitor *
>>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>>         L185: #endif
>>>>>         L186:
>>>>>         L187:   void set_owner_implementation(Thread*
>>>>> owner)                        PRODUCT_RETURN;
>>>>>         L188:   void check_prelock_state     (Thread*
>>>>> thread)                       PRODUCT_RETURN;
>>>>>         L189:   void check_block_state       (Thread* thread)
>>>>>
>>>>>             These were all "protected" before. Now they are "public".
>>>>>             Any particular reason?
>>>>>
>>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>>> answer to the "protected" versus "public" question, but don't
>>>>> considered that query to be a blocker.
>>>>>
>>>>>
>>>>> The rest of this isn't code review, but some of this caught
>>>>> my attention.
>>>>>
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>
>>>>>         old L84: // The default length of monitor name is chosen to
>>>>> be 64
>>>>> to avoid false sharing.
>>>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>>>
>>>>> I had to look up the history of this comment:
>>>>>
>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>>> changeset:   55:2a8eb116ebbe
>>>>> user:        xlu
>>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>>> summary:     6610420: Debug VM crashes during monitor lock rank
>>>>> checking
>>>>>
>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>>      // *in that order*.  If their implementations change such that
>>>>> these
>>>>>      // assumptions are violated, a whole lot of code will break.
>>>>>
>>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>>> false sharing.
>>>>> +static const int MONITOR_NAME_LEN = 64;
>>>>> +
>>>>>      class Monitor : public CHeapObj {
>>>>>
>>>>>       public:
>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>>        volatile bool     _snuck;              // Used for sneaky
>>>>> locking
>>>>> (evil).
>>>>> -  const char * _name;                    // Name of mutex
>>>>>        int NotifyCount ;                      // diagnostic assist
>>>>> -  double pad [8] ;                       // avoid false sharing
>>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>>
>>>>>        // Debugging fields for naming, deadlock detection, etc.
>>>>> (some only
>>>>> used in debug mode)
>>>>>      #ifndef PRODUCT
>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>>         int  ILocked () ;
>>>>>
>>>>>       protected:
>>>>> -   static void ClearMonitor (Monitor * m) ;
>>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>>         Monitor() ;
>>>>>
>>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>>> time that Dice determined that 8-double value, the result was to pad
>>>>> the size of Monitor to an even multiple of a particular cache line
>>>>> size.
>>>>>
>>>>> Xiobin changed the 'name' field to be an array so that the name
>>>>> chars could serve double duty as the cache line pad... pun intended.
>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>>> size is a multiple of the cache line size.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>> Please review. If will also need a sponsor.
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Martin
>>>>>>
>>
>

From serguei.spitsyn at oracle.com  Wed Oct 12 02:37:52 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 11 Oct 2016 19:37:52 -0700
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
Message-ID: <c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>

Hi David,

It looks good, thank you for test improvements.

One minor comment.

http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html


511 private Method testLookup(ReferenceType targetClass, String 
methodName, String methodSig,
512 boolean declaredOnly, Class<?> expectedException) {
513
514 System.err.println("Looking up " + targetClass.name() + "." + 
methodName + methodSig);
515 try {
516 Method m = declaredOnly ?
517 lookupDeclaredMethod(targetClass, methodName, methodSig) :
518 lookupMethod(targetClass, methodName, methodSig);
519
520 if (expectedException == null) {
521 System.err.println("--- PASSED");
522 return m;
523 }
524 }
525 catch (Throwable t) {
526 if (t.getClass() != expectedException) {
527 System.err.println("--- FAILED");
528 failure("FAILED: got exception " + t + " but expected exception "
529 + expectedException.getSimpleName());
530 return null;
531 }
532 else {
533 System.err.println("--- PASSED");
534 return null;
535 }
536 }
537 System.err.println("--- FAILED");
538 failure("FAILED: lookup succeeded but expected exception "
539 + expectedException.getSimpleName());
540 return null;
541 }

   I'd be better to keep the fragments 520-523 and 537-540 together as 
they are logically bound.
   Perhaps, it is better to move the 520-523 to move before the L537.
   There are more cases to use the testLookup() in this test but it is 
probably for future improvements.


Thanks,
Serguei


On 10/10/16 18:55, David Holmes wrote:
> Turns out the only place changes were needed were in JDI.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>
> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>
> The spec change in ObjectReference is very simple and there is a CCC 
> request in progress to ratify that change.
>
> The implementation change in ObjectReferenceImpl mirrors the updated 
> spec and use the same format as already present in the class version 
> of the check method.
>
> The test is a little more complex. This is obviously an extension to 
> what is already tested in InterfaceMethodsTest. However IMT has a 
> number of problem with the way it is currently written [1] - 
> specifically it doesn't properly separate method lookup from method 
> invocation. So I've added the capability to separate lookup and 
> invocation for use with the private interface methods - I have not 
> tried to address shortcomings of the existing tests. Though I did fix 
> the return value checking logic! And did some clarifying comments and 
> renaming in a couple of place.
>
> Still on the test I can't add the negative tests I would like to add 
> because they actually pass due to a different long standing bug in JDI 
> - [2]. So the actual private interface method testing is very simple: 
> can I get the Method from the InterfaceType for the interface 
> declaring the method? Can I then invoke that method on an instance of 
> a class that implements the interface.
>
> Thanks,
> David
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From david.holmes at oracle.com  Wed Oct 12 02:50:23 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 12:50:23 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>
Message-ID: <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com>

Hi Serguei,

Thanks for looking at this.

On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote:
> Hi David,
>
> It looks good, thank you for test improvements.
>
> One minor comment.
>
> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html
>
> 511 private Method testLookup(ReferenceType targetClass, String
> methodName, String methodSig,
> 512 boolean declaredOnly, Class<?> expectedException) {
> 513
> 514 System.err.println("Looking up " + targetClass.name() + "." +
> methodName + methodSig);
> 515 try {
> 516 Method m = declaredOnly ?
> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) :
> 518 lookupMethod(targetClass, methodName, methodSig);
> 519
> 520 if (expectedException == null) {
> 521 System.err.println("--- PASSED");
> 522 return m;
> 523 }
> 524 }
> 525 catch (Throwable t) {
> 526 if (t.getClass() != expectedException) {
> 527 System.err.println("--- FAILED");
> 528 failure("FAILED: got exception " + t + " but expected exception "
> 529 + expectedException.getSimpleName());
> 530 return null;
> 531 }
> 532 else {
> 533 System.err.println("--- PASSED");
> 534 return null;
> 535 }
> 536 }
> 537 System.err.println("--- FAILED");
> 538 failure("FAILED: lookup succeeded but expected exception "
> 539 + expectedException.getSimpleName());
> 540 return null;
> 541 }
>
>   I'd be better to keep the fragments 520-523 and 537-540 together as
> they are logically bound.
>   Perhaps, it is better to move the 520-523 to move before the L537.

You're right - but I prefer to move the code from L537 into an else for 
the if at L520. Webrev updated in place.

>   There are more cases to use the testLookup() in this test but it is
> probably for future improvements.

Yes - see the bugs I linked as [1] and [2]. There are even more bugs 
related to static interface method handling that impact this test. Bit 
of a can-of-worms.

Thanks,
David
-----

>
> Thanks,
> Serguei
>
>
>
> On 10/10/16 18:55, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version
>> of the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a
>> number of problem with the way it is currently written [1] -
>> specifically it doesn't properly separate method lookup from method
>> invocation. So I've added the capability to separate lookup and
>> invocation for use with the private interface methods - I have not
>> tried to address shortcomings of the existing tests. Though I did fix
>> the return value checking logic! And did some clarifying comments and
>> renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI
>> - [2]. So the actual private interface method testing is very simple:
>> can I get the Method from the InterfaceType for the interface
>> declaring the method? Can I then invoke that method on an instance of
>> a class that implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>
>

From serguei.spitsyn at oracle.com  Wed Oct 12 03:02:22 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 11 Oct 2016 20:02:22 -0700
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>
	<25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com>
Message-ID: <cdb51aed-4111-af52-bbeb-d124ac6b7b63@oracle.com>

On 10/11/16 19:50, David Holmes wrote:
> Hi Serguei,
>
> Thanks for looking at this.
>
> On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote:
>> Hi David,
>>
>> It looks good, thank you for test improvements.
>>
>> One minor comment.
>>
>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html 
>>
>>
>> 511 private Method testLookup(ReferenceType targetClass, String
>> methodName, String methodSig,
>> 512 boolean declaredOnly, Class<?> expectedException) {
>> 513
>> 514 System.err.println("Looking up " + targetClass.name() + "." +
>> methodName + methodSig);
>> 515 try {
>> 516 Method m = declaredOnly ?
>> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) :
>> 518 lookupMethod(targetClass, methodName, methodSig);
>> 519
>> 520 if (expectedException == null) {
>> 521 System.err.println("--- PASSED");
>> 522 return m;
>> 523 }
>> 524 }
>> 525 catch (Throwable t) {
>> 526 if (t.getClass() != expectedException) {
>> 527 System.err.println("--- FAILED");
>> 528 failure("FAILED: got exception " + t + " but expected exception "
>> 529 + expectedException.getSimpleName());
>> 530 return null;
>> 531 }
>> 532 else {
>> 533 System.err.println("--- PASSED");
>> 534 return null;
>> 535 }
>> 536 }
>> 537 System.err.println("--- FAILED");
>> 538 failure("FAILED: lookup succeeded but expected exception "
>> 539 + expectedException.getSimpleName());
>> 540 return null;
>> 541 }
>>
>>   I'd be better to keep the fragments 520-523 and 537-540 together as
>> they are logically bound.
>>   Perhaps, it is better to move the 520-523 to move before the L537.
>
> You're right - but I prefer to move the code from L537 into an else 
> for the if at L520. Webrev updated in place.

It's up to you.


>
>>   There are more cases to use the testLookup() in this test but it is
>> probably for future improvements.
>
> Yes - see the bugs I linked as [1] and [2].

Right. Perhaps, the it is a part of the JDK-8166453.

>
> There are even more bugs related to static interface method handling 
> that impact this test. Bit of a can-of-worms.


   BTW, would it make sense to consider one more test case ?

   private void testImplementationClass(ReferenceType targetClass, ObjectReference thisObject) {
       . . .

       testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I", vm().mirrorOf(RESULT_B),
               "private interface methods are not inheritable");

Thanks, Serguei
> Thanks, David -----
>> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote:
>>> Turns out the only place changes were needed were in JDI. Bug: 
>>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev: 
>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change 
>>> in ObjectReference is very simple and there is a CCC request in 
>>> progress to ratify that change. The implementation change in 
>>> ObjectReferenceImpl mirrors the updated spec and use the same format 
>>> as already present in the class version of the check method. The 
>>> test is a little more complex. This is obviously an extension to 
>>> what is already tested in InterfaceMethodsTest. However IMT has a 
>>> number of problem with the way it is currently written [1] - 
>>> specifically it doesn't properly separate method lookup from method 
>>> invocation. So I've added the capability to separate lookup and 
>>> invocation for use with the private interface methods - I have not 
>>> tried to address shortcomings of the existing tests. Though I did 
>>> fix the return value checking logic! And did some clarifying 
>>> comments and renaming in a couple of place. Still on the test I 
>>> can't add the negative tests I would like to add because they 
>>> actually pass due to a different long standing bug in JDI - [2]. So 
>>> the actual private interface method testing is very simple: can I 
>>> get the Method from the InterfaceType for the interface declaring 
>>> the method? Can I then invoke that method on an instance of a class 
>>> that implements the interface. Thanks, David [1] 
>>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2] 
>>> https://bugs.openjdk.java.net/browse/JDK-8167416 


From david.holmes at oracle.com  Wed Oct 12 03:28:37 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 13:28:37 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <cdb51aed-4111-af52-bbeb-d124ac6b7b63@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>
	<25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com>
	<cdb51aed-4111-af52-bbeb-d124ac6b7b63@oracle.com>
Message-ID: <b76d28d2-d78b-0309-2078-cf833a233bc8@oracle.com>

On 12/10/2016 1:02 PM, serguei.spitsyn at oracle.com wrote:
> On 10/11/16 19:50, David Holmes wrote:
>> Hi Serguei,
>>
>> Thanks for looking at this.
>>
>> On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote:
>>> Hi David,
>>>
>>> It looks good, thank you for test improvements.
>>>
>>> One minor comment.
>>>
>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html
>>>
>>>
>>> 511 private Method testLookup(ReferenceType targetClass, String
>>> methodName, String methodSig,
>>> 512 boolean declaredOnly, Class<?> expectedException) {
>>> 513
>>> 514 System.err.println("Looking up " + targetClass.name() + "." +
>>> methodName + methodSig);
>>> 515 try {
>>> 516 Method m = declaredOnly ?
>>> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) :
>>> 518 lookupMethod(targetClass, methodName, methodSig);
>>> 519
>>> 520 if (expectedException == null) {
>>> 521 System.err.println("--- PASSED");
>>> 522 return m;
>>> 523 }
>>> 524 }
>>> 525 catch (Throwable t) {
>>> 526 if (t.getClass() != expectedException) {
>>> 527 System.err.println("--- FAILED");
>>> 528 failure("FAILED: got exception " + t + " but expected exception "
>>> 529 + expectedException.getSimpleName());
>>> 530 return null;
>>> 531 }
>>> 532 else {
>>> 533 System.err.println("--- PASSED");
>>> 534 return null;
>>> 535 }
>>> 536 }
>>> 537 System.err.println("--- FAILED");
>>> 538 failure("FAILED: lookup succeeded but expected exception "
>>> 539 + expectedException.getSimpleName());
>>> 540 return null;
>>> 541 }
>>>
>>>   I'd be better to keep the fragments 520-523 and 537-540 together as
>>> they are logically bound.
>>>   Perhaps, it is better to move the 520-523 to move before the L537.
>>
>> You're right - but I prefer to move the code from L537 into an else
>> for the if at L520. Webrev updated in place.
>
> It's up to you.
>
>
>>
>>>   There are more cases to use the testLookup() in this test but it is
>>> probably for future improvements.
>>
>> Yes - see the bugs I linked as [1] and [2].
>
> Right. Perhaps, the it is a part of the JDK-8166453.
>
>>
>> There are even more bugs related to static interface method handling
>> that impact this test. Bit of a can-of-worms.
>
>
>
>
>   BTW, would it make sense to consider one more test case ?
>
>   private void testImplementationClass(ReferenceType targetClass,
> ObjectReference thisObject) {
>       . . .
>
>       testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I",
> vm().mirrorOf(RESULT_B),
>               "private interface methods are not inheritable");

Such a test will presently fail. It will do a lookup of 
privateInterfaceMethodB from targetClass, which will succeed because of 
the way the local getMethods is implemented. The invocation will then be 
successful. The real test for the above would be a lookup of the private 
interface method in the implementation class, but that will succeed when 
it should not because of bug [2].

Within the current constraints of the test and the JDI implementation 
only the simple positive tests for private interface methods are possible.

Thanks,
David

> Thanks, Serguei
>> Thanks, David -----
>>> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote:
>>>> Turns out the only place changes were needed were in JDI. Bug:
>>>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev:
>>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change
>>>> in ObjectReference is very simple and there is a CCC request in
>>>> progress to ratify that change. The implementation change in
>>>> ObjectReferenceImpl mirrors the updated spec and use the same format
>>>> as already present in the class version of the check method. The
>>>> test is a little more complex. This is obviously an extension to
>>>> what is already tested in InterfaceMethodsTest. However IMT has a
>>>> number of problem with the way it is currently written [1] -
>>>> specifically it doesn't properly separate method lookup from method
>>>> invocation. So I've added the capability to separate lookup and
>>>> invocation for use with the private interface methods - I have not
>>>> tried to address shortcomings of the existing tests. Though I did
>>>> fix the return value checking logic! And did some clarifying
>>>> comments and renaming in a couple of place. Still on the test I
>>>> can't add the negative tests I would like to add because they
>>>> actually pass due to a different long standing bug in JDI - [2]. So
>>>> the actual private interface method testing is very simple: can I
>>>> get the Method from the InterfaceType for the interface declaring
>>>> the method? Can I then invoke that method on an instance of a class
>>>> that implements the interface. Thanks, David [1]
>>>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2]
>>>> https://bugs.openjdk.java.net/browse/JDK-8167416
>

From ioi.lam at oracle.com  Wed Oct 12 04:47:48 2016
From: ioi.lam at oracle.com (Ioi Lam)
Date: Tue, 11 Oct 2016 21:47:48 -0700
Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if class
	is in_error_state
Message-ID: <57FDC074.1070900@oracle.com>

https://bugs.openjdk.java.net/browse/JDK-8166203
http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/

Summary:

Kudos to Coleen for noticing the bug.

When dumping the CDS archive, we would throw NoClassDefFoundError inside 
InstanceKlass::link_class_impl() if the current class is in_error_state. 
This was only intended to be a convenient way to deal with verification 
errors during CDS dumping time. However, if the code is executed in 
normal VM execution time, it would violate the JLS.

The fix is to throw the NoClassDefFoundError only when 
DumpSharedSpaces==true, to avoid affecting normal VM execution.

Thanks
- Ioi


From david.holmes at oracle.com  Wed Oct 12 05:08:38 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 15:08:38 +1000
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
Message-ID: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>

Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/

In IUnlock we have the following succession code to wakeup the "onDeck" 
thread:

  ParkEvent * List = _EntryList;
   if (List != NULL) {
     // Transfer the head of the EntryList to the OnDeck position.
     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
     // For a given lock there is at most OnDeck thread at any one instant.
    WakeOne:
     assert(List == _EntryList, "invariant");
     ParkEvent * const w = List;
     assert(RelaxAssert || w != Thread::current()->_MutexEvent, 
"invariant");
     _EntryList = w->ListNext;
     // as a diagnostic measure consider setting w->_ListNext = BAD
     assert(UNS(_OnDeck) == _LBIT, "invariant");
     _OnDeck = w;  // pass OnDeck to w.

It is critical that the update to _EntryList happens before we set 
_OnDeck, as as soon as _OnDeck is set the selected thread (which need 
not yet have parked) can acquire the mutex, complete its critical 
section and proceed to unlock the mutex, and so execute IUnlock in 
parallel with the original thread. If the write to _EntryList has not 
yet happened that second thread finds itself still at the head of 
_EntryList and so the assert fires. If the write to _EntryList happens 
after the load "List = _EntryList", then the first assert can also fire.

Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) 
with a matching load_acquire(&_OnDeck) in the ILock code:

   while (_OnDeck != ESelf) {
     ParkCommon(ESelf, 0);
   }

and corresponding "raw" lock code. Also fixed a couple of typos.

Thanks,
David

From serguei.spitsyn at oracle.com  Wed Oct 12 05:57:32 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 11 Oct 2016 22:57:32 -0700
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <b76d28d2-d78b-0309-2078-cf833a233bc8@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<c3bda9ce-13a4-f696-ce6d-1eaf7ee831f3@oracle.com>
	<25686c9b-cd2a-9db2-8165-3054ac4d71f5@oracle.com>
	<cdb51aed-4111-af52-bbeb-d124ac6b7b63@oracle.com>
	<b76d28d2-d78b-0309-2078-cf833a233bc8@oracle.com>
Message-ID: <a2e71750-6348-6e2e-7574-44e887033e43@oracle.com>

On 10/11/16 20:28, David Holmes wrote:
> On 12/10/2016 1:02 PM, serguei.spitsyn at oracle.com wrote:
>> On 10/11/16 19:50, David Holmes wrote:
>>> Hi Serguei,
>>>
>>> Thanks for looking at this.
>>>
>>> On 12/10/2016 12:37 PM, serguei.spitsyn at oracle.com wrote:
>>>> Hi David,
>>>>
>>>> It looks good, thank you for test improvements.
>>>>
>>>> One minor comment.
>>>>
>>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/test/com/sun/jdi/InterfaceMethodsTest.java.frames.html 
>>>>
>>>>
>>>>
>>>> 511 private Method testLookup(ReferenceType targetClass, String
>>>> methodName, String methodSig,
>>>> 512 boolean declaredOnly, Class<?> expectedException) {
>>>> 513
>>>> 514 System.err.println("Looking up " + targetClass.name() + "." +
>>>> methodName + methodSig);
>>>> 515 try {
>>>> 516 Method m = declaredOnly ?
>>>> 517 lookupDeclaredMethod(targetClass, methodName, methodSig) :
>>>> 518 lookupMethod(targetClass, methodName, methodSig);
>>>> 519
>>>> 520 if (expectedException == null) {
>>>> 521 System.err.println("--- PASSED");
>>>> 522 return m;
>>>> 523 }
>>>> 524 }
>>>> 525 catch (Throwable t) {
>>>> 526 if (t.getClass() != expectedException) {
>>>> 527 System.err.println("--- FAILED");
>>>> 528 failure("FAILED: got exception " + t + " but expected exception "
>>>> 529 + expectedException.getSimpleName());
>>>> 530 return null;
>>>> 531 }
>>>> 532 else {
>>>> 533 System.err.println("--- PASSED");
>>>> 534 return null;
>>>> 535 }
>>>> 536 }
>>>> 537 System.err.println("--- FAILED");
>>>> 538 failure("FAILED: lookup succeeded but expected exception "
>>>> 539 + expectedException.getSimpleName());
>>>> 540 return null;
>>>> 541 }
>>>>
>>>>   I'd be better to keep the fragments 520-523 and 537-540 together as
>>>> they are logically bound.
>>>>   Perhaps, it is better to move the 520-523 to move before the L537.
>>>
>>> You're right - but I prefer to move the code from L537 into an else
>>> for the if at L520. Webrev updated in place.
>>
>> It's up to you.
>>
>>
>>>
>>>>   There are more cases to use the testLookup() in this test but it is
>>>> probably for future improvements.
>>>
>>> Yes - see the bugs I linked as [1] and [2].
>>
>> Right. Perhaps, the it is a part of the JDK-8166453.
>>
>>>
>>> There are even more bugs related to static interface method handling
>>> that impact this test. Bit of a can-of-worms.
>>
>>
>>
>>
>>   BTW, would it make sense to consider one more test case ?
>>
>>   private void testImplementationClass(ReferenceType targetClass,
>> ObjectReference thisObject) {
>>       . . .
>>
>>       testInvokeNeg(targetClass,thisObject, "privateMethodB", "()I",
>> vm().mirrorOf(RESULT_B),
>>               "private interface methods are not inheritable");
>
> Such a test will presently fail. It will do a lookup of 
> privateInterfaceMethodB from targetClass, which will succeed because 
> of the way the local getMethods is implemented. The invocation will 
> then be successful. The real test for the above would be a lookup of 
> the private interface method in the implementation class, but that 
> will succeed when it should not because of bug [2].
>
> Within the current constraints of the test and the JDI implementation 
> only the simple positive tests for private interface methods are possible.

Got it, thanks.

Thanks,
Serguei


>
> Thanks,
> David
>
>> Thanks, Serguei
>>> Thanks, David -----
>>>> Thanks, Serguei On 10/10/16 18:55, David Holmes wrote:
>>>>> Turns out the only place changes were needed were in JDI. Bug:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8165827 webrev:
>>>>> http://cr.openjdk.java.net/~dholmes/8165827/webrev/ The spec change
>>>>> in ObjectReference is very simple and there is a CCC request in
>>>>> progress to ratify that change. The implementation change in
>>>>> ObjectReferenceImpl mirrors the updated spec and use the same format
>>>>> as already present in the class version of the check method. The
>>>>> test is a little more complex. This is obviously an extension to
>>>>> what is already tested in InterfaceMethodsTest. However IMT has a
>>>>> number of problem with the way it is currently written [1] -
>>>>> specifically it doesn't properly separate method lookup from method
>>>>> invocation. So I've added the capability to separate lookup and
>>>>> invocation for use with the private interface methods - I have not
>>>>> tried to address shortcomings of the existing tests. Though I did
>>>>> fix the return value checking logic! And did some clarifying
>>>>> comments and renaming in a couple of place. Still on the test I
>>>>> can't add the negative tests I would like to add because they
>>>>> actually pass due to a different long standing bug in JDI - [2]. So
>>>>> the actual private interface method testing is very simple: can I
>>>>> get the Method from the InterfaceType for the interface declaring
>>>>> the method? Can I then invoke that method on an instance of a class
>>>>> that implements the interface. Thanks, David [1]
>>>>> https://bugs.openjdk.java.net/browse/JDK-8166453 [2]
>>>>> https://bugs.openjdk.java.net/browse/JDK-8167416
>>


From david.holmes at oracle.com  Wed Oct 12 05:58:59 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 12 Oct 2016 15:58:59 +1000
Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if
	class is in_error_state
In-Reply-To: <57FDC074.1070900@oracle.com>
References: <57FDC074.1070900@oracle.com>
Message-ID: <ce8a47c5-4b39-1da2-27a0-d7a77010822b@oracle.com>

Hi Ioi,

On 12/10/2016 2:47 PM, Ioi Lam wrote:
> https://bugs.openjdk.java.net/browse/JDK-8166203
> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/
>
> Summary:
>
> Kudos to Coleen for noticing the bug.
>
> When dumping the CDS archive, we would throw NoClassDefFoundError inside
> InstanceKlass::link_class_impl() if the current class is in_error_state.
> This was only intended to be a convenient way to deal with verification
> errors during CDS dumping time. However, if the code is executed in
> normal VM execution time, it would violate the JLS.
>
> The fix is to throw the NoClassDefFoundError only when
> DumpSharedSpaces==true, to avoid affecting normal VM execution.

Fix looks fine.

Test change is somewhat confusing. What bug does this still refer to?

  160         try {
  161             boolean bb = Iunlinked.v;
  162         } catch(NoClassDefFoundError e) {
  163             System.out.println("NoClassDefFoundError thrown 
because of bug");
  164         }

Either the try block should complete exceptionally or the catch block, 
to indicate a failure.

Thanks,
David


> Thanks
> - Ioi
>

From shafi.s.ahmad at oracle.com  Wed Oct 12 07:12:17 2016
From: shafi.s.ahmad at oracle.com (Shafi Ahmad)
Date: Wed, 12 Oct 2016 00:12:17 -0700 (PDT)
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
Message-ID: <5eb7474b-a72e-41c0-b389-bfad82270f18@default>

Hi Mikael,

Thanks for reviewing it. 

Once VM is initialized, following are two OOME scenarios:
 1) OOME due to unavailability of java memory [Mainly due to java application].
 2) OOME due to unavailability of native memory.

Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is correct.

Consider the following scenarios:
1) Let there is java application which uses JNI code and inside JNI code there is native memory allocation/free and we hit OOME. 
2) Let there is java application which uses JNI code and inside JNI code there is memory leak error and due to this OOME situation occurs.
3) We use jvm option Xms and  -Xmx in such a way that the available native memory is very less and VM hit OOME.

I am not sure above scenario is feasible in JVM or not but if any of the above scenario is possible in VM then should we consider it as OOME due java application or not?
I consider case 1) and 2) as OOME due to java application and added code for java.lang.OutOfMemoryError inside report_vm_out_of_memory.

My  assumption of OOME once VM is initialized completely is due to java application[directly or indirectly] may not hold true always. 
-XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added the related code change inside report_vm_out_of_memory. 
Yes, I must not use ' java.lang.OutOfMemoryError'  for such case. 

Please let me know whether I should remove the code change inside  report_vm_out_of_memory or keep it by adding appropriate reason of OutOfMemoryError.

Regards,
Shafi

> -----Original Message-----
> From: Mikael Gerdin
> Sent: Monday, October 10, 2016 7:30 PM
> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> work for OOM caused by inability to create threads'
> 
> Hi,
> 
> On 2016-10-10 09:24, Shafi Ahmad wrote:
> > Hi All,
> >
> > Please review the simple change for the fix of bug '' JDK-8155004:
> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
> create threads'.
> >
> > Summary:
> > In the current implementation there are few scenarios where we are not
> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> > While I was analysis this issue I found there are two jvm state where OOM
> can happen:
> >  1.  OOM during VM initialization - as per our internal discussion for this case
> it is not worth for dumping core file, so this is left as it is.
> >  2.  OOM once VM is initialized - For this scenario most of the place code is
> already added but few place corresponding code changes are missing so this
> change covers it.
> >
> > Webrev link: http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
> 
> 
> There is a lot of confusion in the VM code with the term "out of memory
> error".
> In some places it refers to code throwing a java.lang.OutOfMemoryError and
> expecting running java code to be able to potentially catch that Error and
> continue running.
> 
> In other places, such as callers of report_vm_out_of_memory, the situation
> is much more dire and the calling thread may not even be a JavaThread and
> as such cannot "throw" an exception.
> report_vm_out_of_memory is only invoked through the macro
> vm_exit_out_of_memory, which of course implies that the condition is fatal
> and we are about to terminate the JVM process altogether.
> 
> I think that it's incorrect to call code related to java.lang.OutOfMemoryError
> in report_vm_out_of_memory since the condition may not even be
> correlated with Java level application behavior.
> 
> /Mikael
> 
> > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
> >
> > Testing: jprt and jtreg (on Linux x86_64)
> >
> > Regards,
> > Shafi
> >

From serguei.spitsyn at oracle.com  Wed Oct 12 08:08:04 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 12 Oct 2016 01:08:04 -0700
Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if
	class is in_error_state
In-Reply-To: <ce8a47c5-4b39-1da2-27a0-d7a77010822b@oracle.com>
References: <57FDC074.1070900@oracle.com>
	<ce8a47c5-4b39-1da2-27a0-d7a77010822b@oracle.com>
Message-ID: <f686a328-f5c5-afc4-2b86-aa5b3b5a95f8@oracle.com>

Hi Ioi,

The fix looks good to me.
But I agree with David below that the catch statement is somewhat confusing.
The test needs to fail in such a case with a message like "Unexpected 
NoClassDefFoundError <...>".

Thanks,
Serguei


On 10/11/16 22:58, David Holmes wrote:
> Hi Ioi,
>
> On 12/10/2016 2:47 PM, Ioi Lam wrote:
>> https://bugs.openjdk.java.net/browse/JDK-8166203
>> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/
>>
>> Summary:
>>
>> Kudos to Coleen for noticing the bug.
>>
>> When dumping the CDS archive, we would throw NoClassDefFoundError inside
>> InstanceKlass::link_class_impl() if the current class is in_error_state.
>> This was only intended to be a convenient way to deal with verification
>> errors during CDS dumping time. However, if the code is executed in
>> normal VM execution time, it would violate the JLS.
>>
>> The fix is to throw the NoClassDefFoundError only when
>> DumpSharedSpaces==true, to avoid affecting normal VM execution.
>
> Fix looks fine.
>
> Test change is somewhat confusing. What bug does this still refer to?
>
>  160         try {
>  161             boolean bb = Iunlinked.v;
>  162         } catch(NoClassDefFoundError e) {
>  163             System.out.println("NoClassDefFoundError thrown 
> because of bug");
>  164         }
>
> Either the try block should complete exceptionally or the catch block, 
> to indicate a failure.
>
> Thanks,
> David
>
>
>> Thanks
>> - Ioi
>>


From martin.doerr at sap.com  Wed Oct 12 08:53:01 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Wed, 12 Oct 2016 08:53:01 +0000
Subject: RFR(M): 8166970: Adapt mutex padding according to
	DEFAULT_CACHE_LINE_SIZE
In-Reply-To: <10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com>
References: <dc0808274b4843f3afa56ec70caa3df4@DEWDFE13DE14.global.corp.sap>
	<c0660659-1374-0040-7142-8af890f49d9a@oracle.com>
	<f4d9466c9636413e973a88c3378c21f5@DEWDFE13DE14.global.corp.sap>
	<57F77202.8070201@oracle.com>
	<6d19eb4ce02f41c48b88da41cdc3fc90@DEWDFE13DE14.global.corp.sap>
	<57F77A4B.6060604@oracle.com>
	<0a5aab53d3dc47689913f528d1c749e5@DEWDFE13DE14.global.corp.sap>
	<fa4592d3-5ea7-6d0b-8d94-4627762491ca@oracle.com>
	<4d4c458b-9654-d1b4-46b2-829dab182d8e@oracle.com>
	<a5dce6d7d7e34a29804cbdfc593b0770@DEWDFE13DE14.global.corp.sap>
	<10c46800-15d1-fedc-f64f-b8a85e9ef635@oracle.com>
Message-ID: <af655a72773546288cb65f84f4a7c3dd@DEWDFE13DE14.global.corp.sap>

Thanks everbody for reviewing.

The webrev with additional comments is here:
http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.04/

I have added a TODO to check if the _name[] array should better get replaced by a const char*.
Would you like me to open a new bug for jdk 10 so we have a reminder?

Thank you very much for sponsoring, Coleen.
 
Best regards,
Martin

-----Original Message-----
From: David Holmes [mailto:david.holmes at oracle.com] 
Sent: Mittwoch, 12. Oktober 2016 04:22
To: Doerr, Martin <martin.doerr at sap.com>; Claes Redestad <claes.redestad at oracle.com>; Coleen Phillimore <coleen.phillimore at oracle.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE

Looks good to me too! Only comment is do we want to change this comment:

  84 // The default length of monitor name is chosen to be 64 to avoid 
false sharing.
   85 static const int MONITOR_NAME_LEN = 64;

and do we even want to change the value here?

Thanks,
David

On 12/10/2016 2:26 AM, Doerr, Martin wrote:
> Hi all,
>
> I came to the same conclusion regarding inheritance from PaddingEnd.
> Unfortunately, you're also right, Claes, that we should better not use 0 as minimal padding length because some compilers may have trouble with 0 length arrays. I hope 1 is ok as minimal padding length because the new operator does not allocate cache line aligned at the moment. So I don't see any benefit in more padding. (Padding length of 1 byte has the advantage that it may not enlarge the object size if the previous field leaves some space due to its type.)
>
> I believe 2 _LockWord fields on one cache line was basically the problem we wanted to avoid.
>
> Here's a new webrev:
> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.03/
>
> It also enables changing the _name[] field to a pointer or a smaller array. I guess this should better be done in a separate change (jdk10?).
>
> Please take a look.
>
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Claes Redestad [mailto:claes.redestad at oracle.com]
> Sent: Dienstag, 11. Oktober 2016 12:05
> To: Coleen Phillimore <coleen.phillimore at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net; David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>
> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to DEFAULT_CACHE_LINE_SIZE
>
> Hi,
>
> On 2016-10-11 02:03, Coleen Phillimore wrote:
>>
>> Hi,
>>
>> Was the linear allocation in mutex.cpp the cause of the false sharing
>> that you observed?  I think I like this change better than the
>> original, because I've wondered myself why the name string was so
>> long.  So with this, we could make Monitor's smaller if they're
>> embedded in metadata or other structures.
>
> Music to my ears!
>
> I even think most embedded uses would see improvements if _name was
> removed entirely (or "simply" turned into a const char * so that it's
> not copied and embedded into the Monitor/Mutex)
>
>>
>> Thanks,
>> Coleen
>>
>> On 10/10/16 2:00 PM, Doerr, Martin wrote:
>>> Hi Claes,
>>>
>>> thank you very much for your explanations.
>>>
>>> I agree with you that it would be better to pad where the Monitors
>>> are used. It would still fulfill the purpose of this RFE without
>>> disturbing other usages.
>>>
>>> So I could introduce:
>>> class PaddedMonitor : public Monitor {
>>>    enum {
>>>      CACHE_LINE_PADDING = (int)DEFAULT_CACHE_LINE_SIZE -
>>> (int)sizeof(Monitor),
>>>      PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING : 0
>>>    };
>>>    char _padding[PADDING_LEN];
>>> };
>>> and similarly PaddedMutex and replace all of the ones which get
>>> allocated in a linear fashion (mutexLocker.cpp mutex_init()).
>
> Sure!
>
> Some compilers may take issue with cases where PADDING_LEN == 0 (since
> char _padding[0] is technically illegal C++, but works on gcc etc) so
> maybe that special case will have to be (somewhat excessively):
>
> PADDING_LEN = CACHE_LINE_PADDING > 0 ? CACHE_LINE_PADDING :
> DEFAULT_CACHE_LINE_SIZE
>
> We took a look at if it'd be feasible to express class PaddedMonitor :
> public PaddedEnd<Monitor>, but it appears that'd require variadic
> template arguments (C++11) to get right (since we'd need PaddedEnd to
> transitively publish constructors of Monitor).
>
> Thanks!
>
> /Claes
>
>>>
>>> Would you agree with this change?
>>>
>>> Thanks and best regards,
>>> Martin
>>>
>>>
>>> -----Original Message-----
>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>> Sent: Freitag, 7. Oktober 2016 12:35
>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>> <coleen.phillimore at oracle.com>
>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>> DEFAULT_CACHE_LINE_SIZE
>>>
>>> Hi,
>>>
>>> I'm concerned that this might be an easy-but-wrong fix to a complex
>>> problem, and acknowledging that there are already use cases where the
>>> _name field is contra-productive. This change adds complexity that
>>> makes it even less likely such uses will be optimized for in the
>>> future.
>>>
>>> There are Padded* types put in place to deal with these concerns
>>> explicitly rather than implicitly *where it matters*, which allows us
>>> the choice of applying padding or not on a per use-case basis (which
>>> means we can also remove the _name field for those use cases that don't
>>> care about either, which might be most outside of the global lists).
>>>
>>> I am very concerned about false sharing, but I have no data to support
>>> that this change has any measurable benefit in practice: I even did an
>>> experiment years ago now where I turned _name into a pointer to not pad
>>> at all and saw nothing exceeding noise levels on any benchmark.
>>>
>>> Thanks!
>>>
>>> /Claes
>>>
>>> On 2016-10-07 12:18, Doerr, Martin wrote:
>>>> Hi Claes,
>>>>
>>>> what the change basically does is that the _name[] field gets
>>>> enlarged by 8 bytes on platforms with 128 byte
>>>> DEFAULT_CACHE_LINE_SIZE. The logic behind it is completely computed
>>>> by the C++ compiler.
>>>> What exactly is your concern about the footprint overhead?
>>>> Are you not concerned about the risk of false sharing?
>>>>
>>>> Best regards,
>>>> Martin
>>>>
>>>> -----Original Message-----
>>>> From: Claes Redestad [mailto:claes.redestad at oracle.com]
>>>> Sent: Freitag, 7. Oktober 2016 12:00
>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>> daniel.daugherty at oracle.com; hotspot-runtime-dev at openjdk.java.net;
>>>> David Holmes (david.holmes at oracle.com) <david.holmes at oracle.com>;
>>>> Coleen Phillimore (coleen.phillimore at oracle.com)
>>>> <coleen.phillimore at oracle.com>
>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>> DEFAULT_CACHE_LINE_SIZE
>>>>
>>>> Hi,
>>>>
>>>> after due consideration I strongly consider this change unacceptable
>>>> since it adds footprint overhead to performance critcial compiler and
>>>> GC code with little to no data to support this won't cause regressions.
>>>>
>>>> Changes to Monitor/Mutex needs to be done with more surgical precision
>>>> than this.
>>>>
>>>> If I do have a veto on the matter, here it is.
>>>>
>>>> Thanks!
>>>>
>>>> /Claes
>>>>
>>>> On 2016-10-07 11:34, Doerr, Martin wrote:
>>>>> Hi Dan,
>>>>>
>>>>> thank you very much for reviewing and for investigating the history.
>>>>>
>>>>> It was not intended to make the functions you mentioned public.
>>>>> I've fixed that.
>>>>> I also updated the copyright information.
>>>>>
>>>>> New webrev is here:
>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.02/
>>>>>
>>>>> @Coleen: Please use this one. I have also added reviewer attribution.
>>>>>
>>>>> Thanks and best regards,
>>>>> Martin
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel D. Daugherty [mailto:daniel.daugherty at oracle.com]
>>>>> Sent: Donnerstag, 6. Oktober 2016 23:13
>>>>> To: Doerr, Martin <martin.doerr at sap.com>;
>>>>> hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: RFR(M): 8166970: Adapt mutex padding according to
>>>>> DEFAULT_CACHE_LINE_SIZE
>>>>>
>>>>> On 9/30/16 9:48 AM, Doerr, Martin wrote:
>>>>>> Hi,
>>>>>>
>>>>>> the current implementation of Monitor padding (mutex.cpp) assumes
>>>>>> that cache lines are 64 Bytes. There's a platform dependent define
>>>>>> "DEFAULT_CACHE_LINE_SIZE" available which can be used. Purpose of
>>>>>> padding is to avoid false sharing.
>>>>>>
>>>>>> My proposed change is here:
>>>>>> http://cr.openjdk.java.net/~mdoerr/8166970_mutex_padding/webrev.00/
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>         Please update the copyright year before pushing.
>>>>>
>>>>>         L172:   // The default length of monitor name is chosen to
>>>>> avoid
>>>>> false sharing.
>>>>>         L173:   enum {
>>>>>         L174:     CACHE_LINE_PADDING = DEFAULT_CACHE_LINE_SIZE -
>>>>> sizeof(MonitorBase),
>>>>>         L175:     MONITOR_NAME_LEN = CACHE_LINE_PADDING > 64 ?
>>>>> CACHE_LINE_PADDING : 64
>>>>>         L176:   };
>>>>>         L177:   char _name[MONITOR_NAME_LEN];          // Name of
>>>>> mutex
>>>>>
>>>>>             I have to say that I'm not fond of the fact that
>>>>> MONITOR_NAME_LEN
>>>>>             can vary between platforms; I like that it is a minimum
>>>>> of 64 bytes
>>>>>             and is still a constant.
>>>>>
>>>>>             I'm also not happy that the resulting sizeof(Monitor)
>>>>> may not
>>>>> be a multiple
>>>>>             of the DEFAULT_CACHE_LINE_SIZE. However, I have to
>>>>> mitigate
>>>>> that unhappiness
>>>>>             with the fact that sizeof(Monitor) hasn't been a
>>>>> multiple of
>>>>> the cache line
>>>>>             size since at least 2008 and no one complained (that I
>>>>> know of).
>>>>>
>>>>>             So if I was making this change, I would make
>>>>> MONITOR_NAME_LEN
>>>>> 64 bytes
>>>>>             (like it was) and add a pad field that would bring up
>>>>> sizeof(Monitor)
>>>>>             to be a multiple of DEFAULT_CACHE_LINE_SIZE. Of course,
>>>>> Claes
>>>>> would be
>>>>>             unhappy with me and anyone embedding a Monitor into
>>>>> another data
>>>>>             structure would be unhappy with me, but I'm used to
>>>>> that :-)
>>>>>
>>>>>             So what you have is fine, especially for JDK9.
>>>>>
>>>>>         L180:  public:
>>>>>         L181: #ifndef PRODUCT
>>>>>         L182:   debug_only(static bool contains(Monitor * locks,
>>>>> Monitor *
>>>>> lock);)
>>>>>         L183:   debug_only(static Monitor *
>>>>> get_least_ranked_lock(Monitor *
>>>>> locks);)
>>>>>         L184:   debug_only(Monitor *
>>>>> get_least_ranked_lock_besides_this(Monitor * locks);)
>>>>>         L185: #endif
>>>>>         L186:
>>>>>         L187:   void set_owner_implementation(Thread*
>>>>> owner)                        PRODUCT_RETURN;
>>>>>         L188:   void check_prelock_state     (Thread*
>>>>> thread)                       PRODUCT_RETURN;
>>>>>         L189:   void check_block_state       (Thread* thread)
>>>>>
>>>>>             These were all "protected" before. Now they are "public".
>>>>>             Any particular reason?
>>>>>
>>>>> Thumbs up on the mechanics of this change. I'm interested in the
>>>>> answer to the "protected" versus "public" question, but don't
>>>>> considered that query to be a blocker.
>>>>>
>>>>>
>>>>> The rest of this isn't code review, but some of this caught
>>>>> my attention.
>>>>>
>>>>> src/share/vm/runtime/mutex.hpp
>>>>>
>>>>>         old L84: // The default length of monitor name is chosen to
>>>>> be 64
>>>>> to avoid false sharing.
>>>>>         old L85: static const int MONITOR_NAME_LEN = 64;
>>>>>
>>>>> I had to look up the history of this comment:
>>>>>
>>>>> $ hg log -r 55 src/share/vm/runtime/mutex.hpp
>>>>> changeset:   55:2a8eb116ebbe
>>>>> user:        xlu
>>>>> date:        Tue Feb 05 23:21:57 2008 -0800
>>>>> summary:     6610420: Debug VM crashes during monitor lock rank
>>>>> checking
>>>>>
>>>>> $ hg diff -r5{4,5} src/share/vm/runtime/mutex.hpp
>>>>> diff -r d4a0f561287a -r 2a8eb116ebbe src/share/vm/runtime/mutex.hpp
>>>>> --- a/src/share/vm/runtime/mutex.hpp    Thu Jan 31 14:56:50 2008 -0500
>>>>> +++ b/src/share/vm/runtime/mutex.hpp    Tue Feb 05 23:21:57 2008 -0800
>>>>> @@ -82,6 +82,9 @@ class ParkEvent ;
>>>>>      // *in that order*.  If their implementations change such that
>>>>> these
>>>>>      // assumptions are violated, a whole lot of code will break.
>>>>>
>>>>> +// The default length of monitor name is choosen to be 64 to avoid
>>>>> false sharing.
>>>>> +static const int MONITOR_NAME_LEN = 64;
>>>>> +
>>>>>      class Monitor : public CHeapObj {
>>>>>
>>>>>       public:
>>>>> @@ -126,9 +129,8 @@ class Monitor : public CHeapObj {
>>>>>        volatile intptr_t _WaitLock [1] ;      // Protects _WaitSet
>>>>>        ParkEvent * volatile  _WaitSet ;       // LL of ParkEvents
>>>>>        volatile bool     _snuck;              // Used for sneaky
>>>>> locking
>>>>> (evil).
>>>>> -  const char * _name;                    // Name of mutex
>>>>>        int NotifyCount ;                      // diagnostic assist
>>>>> -  double pad [8] ;                       // avoid false sharing
>>>>> +  char _name[MONITOR_NAME_LEN];          // Name of mutex
>>>>>
>>>>>        // Debugging fields for naming, deadlock detection, etc.
>>>>> (some only
>>>>> used in debug mode)
>>>>>      #ifndef PRODUCT
>>>>> @@ -170,7 +172,7 @@ class Monitor : public CHeapObj {
>>>>>         int  ILocked () ;
>>>>>
>>>>>       protected:
>>>>> -   static void ClearMonitor (Monitor * m) ;
>>>>> +   static void ClearMonitor (Monitor * m, const char* name = NULL) ;
>>>>>         Monitor() ;
>>>>>
>>>>> So the original code had an 8-double pad for avoiding false sharing.
>>>>> Sounds very much like the old ObjectMonitor padding. I'm sure at the
>>>>> time that Dice determined that 8-double value, the result was to pad
>>>>> the size of Monitor to an even multiple of a particular cache line
>>>>> size.
>>>>>
>>>>> Xiobin changed the 'name' field to be an array so that the name
>>>>> chars could serve double duty as the cache line pad... pun intended.
>>>>> Unfortunately that pad doesn't make sure that the resulting Monitor
>>>>> size is a multiple of the cache line size.
>>>>>
>>>>> Dan
>>>>>
>>>>>
>>>>>> Please review. If will also need a sponsor.
>>>>>>
>>>>>> Thanks and best regards,
>>>>>> Martin
>>>>>>
>>
>

From coleen.phillimore at oracle.com  Wed Oct 12 12:17:52 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 12 Oct 2016 08:17:52 -0400
Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if
	class is in_error_state
In-Reply-To: <f686a328-f5c5-afc4-2b86-aa5b3b5a95f8@oracle.com>
References: <57FDC074.1070900@oracle.com>
	<ce8a47c5-4b39-1da2-27a0-d7a77010822b@oracle.com>
	<f686a328-f5c5-afc4-2b86-aa5b3b5a95f8@oracle.com>
Message-ID: <3979349a-edb5-16be-af19-33c2aa808209@oracle.com>

http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/test/runtime/lambda-features/InterfaceInitializationStates.java.udiff.html

Same comment as the others.

Just take out the try block for Iunlinked.

Thank you for fixing this!
Coleen


On 10/12/16 4:08 AM, serguei.spitsyn at oracle.com wrote:
> Hi Ioi,
>
> The fix looks good to me.
> But I agree with David below that the catch statement is somewhat 
> confusing.
> The test needs to fail in such a case with a message like "Unexpected 
> NoClassDefFoundError <...>".
>
> Thanks,
> Serguei
>
>
>
> On 10/11/16 22:58, David Holmes wrote:
>> Hi Ioi,
>>
>> On 12/10/2016 2:47 PM, Ioi Lam wrote:
>>> https://bugs.openjdk.java.net/browse/JDK-8166203
>>> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/
>>>
>>> Summary:
>>>
>>> Kudos to Coleen for noticing the bug.
>>>
>>> When dumping the CDS archive, we would throw NoClassDefFoundError 
>>> inside
>>> InstanceKlass::link_class_impl() if the current class is 
>>> in_error_state.
>>> This was only intended to be a convenient way to deal with verification
>>> errors during CDS dumping time. However, if the code is executed in
>>> normal VM execution time, it would violate the JLS.
>>>
>>> The fix is to throw the NoClassDefFoundError only when
>>> DumpSharedSpaces==true, to avoid affecting normal VM execution.
>>
>> Fix looks fine.
>>
>> Test change is somewhat confusing. What bug does this still refer to?
>>
>>  160         try {
>>  161             boolean bb = Iunlinked.v;
>>  162         } catch(NoClassDefFoundError e) {
>>  163             System.out.println("NoClassDefFoundError thrown 
>> because of bug");
>>  164         }
>>
>> Either the try block should complete exceptionally or the catch 
>> block, to indicate a failure.
>>
>> Thanks,
>> David
>>
>>
>>> Thanks
>>> - Ioi
>>>
>


From varming at gmail.com  Wed Oct 12 13:21:44 2016
From: varming at gmail.com (Carsten Varming)
Date: Wed, 12 Oct 2016 09:21:44 -0400
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
Message-ID: <CAP_pwnUUiJxaFtcc8DEnYmfsiBeiX3yd16tmBYBgopogFme71Q@mail.gmail.com>

Dear David,

In line 590 "Pass onDeck to w". I don't understand this part of the
comment. Should it say something like "Pass ownership of _OnDeck and
_EntryList to w".

In line 532 there is a read of _OnDeck and in line 553 there is a read of
_EntryList. Why is it safe not to do a load_acquire of _OnDeck in line 532?

Carsten


On Wed, Oct 12, 2016 at 1:08 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>
> In IUnlock we have the following succession code to wakeup the "onDeck"
> thread:
>
>  ParkEvent * List = _EntryList;
>   if (List != NULL) {
>     // Transfer the head of the EntryList to the OnDeck position.
>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>     // For a given lock there is at most OnDeck thread at any one instant.
>    WakeOne:
>     assert(List == _EntryList, "invariant");
>     ParkEvent * const w = List;
>     assert(RelaxAssert || w != Thread::current()->_MutexEvent,
> "invariant");
>     _EntryList = w->ListNext;
>     // as a diagnostic measure consider setting w->_ListNext = BAD
>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>     _OnDeck = w;  // pass OnDeck to w.
>
> It is critical that the update to _EntryList happens before we set
> _OnDeck, as as soon as _OnDeck is set the selected thread (which need not
> yet have parked) can acquire the mutex, complete its critical section and
> proceed to unlock the mutex, and so execute IUnlock in parallel with the
> original thread. If the write to _EntryList has not yet happened that
> second thread finds itself still at the head of _EntryList and so the
> assert fires. If the write to _EntryList happens after the load "List =
> _EntryList", then the first assert can also fire.
>
> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w)
> with a matching load_acquire(&_OnDeck) in the ILock code:
>
>   while (_OnDeck != ESelf) {
>     ParkCommon(ESelf, 0);
>   }
>
> and corresponding "raw" lock code. Also fixed a couple of typos.
>
> Thanks,
> David
>

From daniel.daugherty at oracle.com  Wed Oct 12 15:03:50 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Wed, 12 Oct 2016 09:03:50 -0600
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
Message-ID: <cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>

On 10/11/16 11:08 PM, David Holmes wrote:
> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/

Very nice catch! We should check the ObjectMonitor succession code for
similar issues (my task).


src/share/vm/runtime/mutex.cpp
     L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, 
UNS(ESelf)) == 0) {
         Thanks for fixing this bug also!

     L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) {
         So you've changed this load of _OnDeck to use load-acquire
         which matches the new store-release on L595:

         OrderAccess::release_store_ptr(&_OnDeck, w);

         What about the other loads of _OnDeck or stores to _OnDeck?
         There should at least be a new comment explaining why we don't
         need an OrderAccess operation for those. Update: I see you
         changed one other load of _OnDeck on L1061. Now I'm really
         wanting comments for the other _OnDeck loads and stores. :-)

         Update: I see Carsten V. asked about this in a slightly different
         way.

     L590:     // Pass onDeck to w, ensuring that _EntryList has been 
set first.
         Typo: 'onDeck' -> 'OnDeck'

         I suspect you don't want to fix all this CamelCase usage to meet
         HotSpot style. I did that for most of the ObjectMonitor code and
         it was painful. We could clean it up early in JDK10.

         Update: I see Carsten has a comment about this comment also. I
         don't think I quite agree that we're "passing" _EntryList to w,
         but I can be convinced otherwise...

Again, very nice catch! I'd like to see another webrev with the other
_OnDeck loads and stores either updated for OrderAccess ops or some
comment explaining why it's not needed.

Dan


>
> In IUnlock we have the following succession code to wakeup the 
> "onDeck" thread:
>
>  ParkEvent * List = _EntryList;
>   if (List != NULL) {
>     // Transfer the head of the EntryList to the OnDeck position.
>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>     // For a given lock there is at most OnDeck thread at any one 
> instant.
>    WakeOne:
>     assert(List == _EntryList, "invariant");
>     ParkEvent * const w = List;
>     assert(RelaxAssert || w != Thread::current()->_MutexEvent, 
> "invariant");
>     _EntryList = w->ListNext;
>     // as a diagnostic measure consider setting w->_ListNext = BAD
>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>     _OnDeck = w;  // pass OnDeck to w.
>
> It is critical that the update to _EntryList happens before we set 
> _OnDeck, as as soon as _OnDeck is set the selected thread (which need 
> not yet have parked) can acquire the mutex, complete its critical 
> section and proceed to unlock the mutex, and so execute IUnlock in 
> parallel with the original thread. If the write to _EntryList has not 
> yet happened that second thread finds itself still at the head of 
> _EntryList and so the assert fires. If the write to _EntryList happens 
> after the load "List = _EntryList", then the first assert can also fire.
>
> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w) 
> with a matching load_acquire(&_OnDeck) in the ILock code:
>
>   while (_OnDeck != ESelf) {
>     ParkCommon(ESelf, 0);
>   }
>
> and corresponding "raw" lock code. Also fixed a couple of typos.
>
> Thanks,
> David


From ioi.lam at oracle.com  Wed Oct 12 16:09:23 2016
From: ioi.lam at oracle.com (Ioi Lam)
Date: Wed, 12 Oct 2016 09:09:23 -0700
Subject: RFR (xs) 8166203 NoClassDefFoundError should not be thrown if
	class is in_error_state
In-Reply-To: <3979349a-edb5-16be-af19-33c2aa808209@oracle.com>
References: <57FDC074.1070900@oracle.com>	<ce8a47c5-4b39-1da2-27a0-d7a77010822b@oracle.com>	<f686a328-f5c5-afc4-2b86-aa5b3b5a95f8@oracle.com>
	<3979349a-edb5-16be-af19-33c2aa808209@oracle.com>
Message-ID: <57FE6033.40805@oracle.com>

David, Serguei & Coleen,

Thanks for the comments. I will fix the test by removing the try .. 
catch block.

- Ioi

On 10/12/16 5:17 AM, Coleen Phillimore wrote:
> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/test/runtime/lambda-features/InterfaceInitializationStates.java.udiff.html 
>
>
> Same comment as the others.
>
> Just take out the try block for Iunlinked.
>
> Thank you for fixing this!
> Coleen
>
>
> On 10/12/16 4:08 AM, serguei.spitsyn at oracle.com wrote:
>> Hi Ioi,
>>
>> The fix looks good to me.
>> But I agree with David below that the catch statement is somewhat 
>> confusing.
>> The test needs to fail in such a case with a message like "Unexpected 
>> NoClassDefFoundError <...>".
>>
>> Thanks,
>> Serguei
>>
>>
>>
>> On 10/11/16 22:58, David Holmes wrote:
>>> Hi Ioi,
>>>
>>> On 12/10/2016 2:47 PM, Ioi Lam wrote:
>>>> https://bugs.openjdk.java.net/browse/JDK-8166203
>>>> http://cr.openjdk.java.net/~iklam/jdk9/8166203_init_state_error_bug/
>>>>
>>>> Summary:
>>>>
>>>> Kudos to Coleen for noticing the bug.
>>>>
>>>> When dumping the CDS archive, we would throw NoClassDefFoundError 
>>>> inside
>>>> InstanceKlass::link_class_impl() if the current class is 
>>>> in_error_state.
>>>> This was only intended to be a convenient way to deal with 
>>>> verification
>>>> errors during CDS dumping time. However, if the code is executed in
>>>> normal VM execution time, it would violate the JLS.
>>>>
>>>> The fix is to throw the NoClassDefFoundError only when
>>>> DumpSharedSpaces==true, to avoid affecting normal VM execution.
>>>
>>> Fix looks fine.
>>>
>>> Test change is somewhat confusing. What bug does this still refer to?
>>>
>>>  160         try {
>>>  161             boolean bb = Iunlinked.v;
>>>  162         } catch(NoClassDefFoundError e) {
>>>  163             System.out.println("NoClassDefFoundError thrown 
>>> because of bug");
>>>  164         }
>>>
>>> Either the try block should complete exceptionally or the catch 
>>> block, to indicate a failure.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>>> Thanks
>>>> - Ioi
>>>>
>>
>


From coleen.phillimore at oracle.com  Wed Oct 12 21:10:45 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 12 Oct 2016 17:10:45 -0400
Subject: [8u-dev] Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
Message-ID: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>

Summary: Backport change to correct interface initialization.

There were too many changes to instanceKlass.cpp for a clean backport.  
Also in JDK8, this corrects interface initialization to not initialize 
the whole interface hierarchy if an interface, not class, initializes 
initialization.  This is to correctly follow JLS 12.4.2 step 7.   I 
filed a compatibility request (in review) to document the difference in 
behavior, which I believe will not be noticed.

Tested with JPRT, including runtime jtreg lambda-features tests, and JCK 
tests.

open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8

Thanks,
Coleen

From coleen.phillimore at oracle.com  Wed Oct 12 22:51:04 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 12 Oct 2016 18:51:04 -0400
Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
Message-ID: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>


Note, this is also an RFR since the backport wasn't clean.
thanks,
Coleen


On 10/12/16 5:10 PM, Coleen Phillimore wrote:
> Summary: Backport change to correct interface initialization.
>
> There were too many changes to instanceKlass.cpp for a clean 
> backport.  Also in JDK8, this corrects interface initialization to not 
> initialize the whole interface hierarchy if an interface, not class, 
> initializes initialization.  This is to correctly follow JLS 12.4.2 
> step 7.   I filed a compatibility request (in review) to document the 
> difference in behavior, which I believe will not be noticed.
>
> Tested with JPRT, including runtime jtreg lambda-features tests, and 
> JCK tests.
>
> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>
> Thanks,
> Coleen


From george.triantafillou at oracle.com  Wed Oct 12 23:25:01 2016
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Wed, 12 Oct 2016 19:25:01 -0400
Subject: [8u-dev] Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
Message-ID: <53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com>

Hi Coleen,

Small typo in src/share/vm/oops/instanceKlass.cpp:

889   // Next, if C is a class rather than an interface, initialize it's 
super class and super

change to

889   // Next, if C is a class rather than an interface, initialize its 
super class and super

Otherwise, looks good.

-George

On 10/12/2016 5:10 PM, Coleen Phillimore wrote:
> Summary: Backport change to correct interface initialization.
>
> There were too many changes to instanceKlass.cpp for a clean 
> backport.  Also in JDK8, this corrects interface initialization to not 
> initialize the whole interface hierarchy if an interface, not class, 
> initializes initialization.  This is to correctly follow JLS 12.4.2 
> step 7.   I filed a compatibility request (in review) to document the 
> difference in behavior, which I believe will not be noticed.
>
> Tested with JPRT, including runtime jtreg lambda-features tests, and 
> JCK tests.
>
> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>
> Thanks,
> Coleen


From serguei.spitsyn at oracle.com  Wed Oct 12 23:35:45 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Wed, 12 Oct 2016 16:35:45 -0700
Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
Message-ID: <7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com>

Coleen,

The backport looks good to me.

Minor questions to the test.


http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev/test/runtime/lambda-features/TestInterfaceInit.java.frames.html

2 * Copyright (c) 2014, 2015, Oracle and/or its affiliates. All rights 
reserved. 2015 => 2016


28 * @bug 8098557

   Why the new bug number is not 8163969?


Thanks,
Serguei


On 10/12/16 15:51, Coleen Phillimore wrote:
>
> Note, this is also an RFR since the backport wasn't clean.
> thanks,
> Coleen
>
>
> On 10/12/16 5:10 PM, Coleen Phillimore wrote:
>> Summary: Backport change to correct interface initialization.
>>
>> There were too many changes to instanceKlass.cpp for a clean 
>> backport.  Also in JDK8, this corrects interface initialization to 
>> not initialize the whole interface hierarchy if an interface, not 
>> class, initializes initialization.  This is to correctly follow JLS 
>> 12.4.2 step 7.   I filed a compatibility request (in review) to 
>> document the difference in behavior, which I believe will not be 
>> noticed.
>>
>> Tested with JPRT, including runtime jtreg lambda-features tests, and 
>> JCK tests.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>
>> Thanks,
>> Coleen
>


From david.holmes at oracle.com  Thu Oct 13 00:54:31 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 13 Oct 2016 10:54:31 +1000
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <CAP_pwnUUiJxaFtcc8DEnYmfsiBeiX3yd16tmBYBgopogFme71Q@mail.gmail.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<CAP_pwnUUiJxaFtcc8DEnYmfsiBeiX3yd16tmBYBgopogFme71Q@mail.gmail.com>
Message-ID: <504e9148-54b2-6203-465c-fbda63a19375@oracle.com>

Hi Carsten,

Thanks for looking at this.

On 12/10/2016 11:21 PM, Carsten Varming wrote:
> Dear David,
>
> In line 590 "Pass onDeck to w". I don't understand this part of the
> comment. Should it say something like "Pass ownership of _OnDeck and
> _EntryList to w".

First note that that part of the comment already existed in the old code:

  589     _OnDeck = w;  // pass OnDeck to w.

This is making 'w' the OnDeck thread - nothing to do with _EntryList. 
Prior to executing this line the current thread holds the "OnDeck lock" 
- which is just a logical lock obtained by CASing _OnDeck from 0 to 1. 
The current thread selects 'w' (as the head of _EntryList) to be the new 
OnDeck thread, and passes that role to it through the assignment - and 
of course by doing so drops the "OnDeck lock". I changed to read:

     // Pass OnDeck role to w, ensuring that _EntryList has been set first.

> In line 532 there is a read of _OnDeck and in line 553 there is a read
> of _EntryList. Why is it safe not to do a load_acquire of _OnDeck in
> line 532?

<sigh> I knew I should have just stuck in a storestore() then we could 
all be blissfully ignorant :) Okay here's that code fragment with 
comments elided

  514 void Monitor::IUnlock(bool RelaxAssert) {
  529   OrderAccess::release_store(&_LockWord.Bytes[_LSBINDEX], 0); // 
drop outer lock
  530
  531   OrderAccess::storeload();
  532   ParkEvent * const w = _OnDeck;
  533   assert(RelaxAssert || w != Thread::current()->_MutexEvent, 
"invariant");
  534   if (w != NULL) {
  548     if ((UNS(w) & _LBIT) == 0) w->unpark();
  549     return;
  550   }

A release-store to X is used to ensure that shared data written prior to 
the store to X actually occurs prior to that store. A load-acquire of X 
ensures that if the load sees the value written by the store-release 
then it also sees the updates to the shared data. In the current case 
the release-store to _OnDeck writes the non-NULL value 'w', and if the 
code above sees a non-NULL value (ie it sees 'w') then at most it 
unparks 'w' and returns. It never accesses the shared data ie 
_EntryList. So no load-acquire is needed as we are not trying to 
"synchronize" with the changes to the shared state made by the thread 
that did the release-store. I've added a comment as per my response to 
Dan's email

Now lets look at all the other loads of _OnDeck:

- there are a bunch of asserts such as:

   444   assert(_OnDeck != Self->_MutexEvent, "invariant");
   588     assert(UNS(_OnDeck) == _LBIT, "invariant");
   836     assert(_OnDeck != ESelf, "invariant");
   1197 
assert((UNS(_owner)|UNS(_LockWord.FullWord)|UNS(_EntryList)|UNS(_WaitSet)|UNS(_OnDeck)) 
== 0, "");

   and a load that is only used for an assert:

   1161   uintptr_t ondeck = UNS(_OnDeck);

These are all logical checks on the current state and are not attempting 
to synchronize with any other changes to shared state, so no need for 
load-acquire. I hope there is no controversy on that point.

- there are some CAS operations which already embody full bi-directional 
fences which subsume the load-acquire (whether directly needed to 
synchronize with the store-release or not)

   466   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL, 
UNS(ESelf)) == 0) {
   573   if (CASPTR (&_OnDeck, NULL, _LBIT) != UNS(NULL)) {


  - then we have:

    843       if (_OnDeck == ESelf && TrySpin(Self)) break;

I missed this one - it needs the load-acquire for the same reason as the 
code in ILock - which should have been obvious given:

841     // The following fragment is extracted from Monitor::ILock()

:)

Webrev updated in place - also see my response to Dan.

Thanks,
David

> Carsten
>
>
> On Wed, Oct 12, 2016 at 1:08 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>     <https://bugs.openjdk.java.net/browse/JDK-8166197>
>     webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>     <http://cr.openjdk.java.net/~dholmes/8166197/webrev/>
>
>     In IUnlock we have the following succession code to wakeup the
>     "onDeck" thread:
>
>      ParkEvent * List = _EntryList;
>       if (List != NULL) {
>         // Transfer the head of the EntryList to the OnDeck position.
>         // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>         // For a given lock there is at most OnDeck thread at any one
>     instant.
>        WakeOne:
>         assert(List == _EntryList, "invariant");
>         ParkEvent * const w = List;
>         assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>     "invariant");
>         _EntryList = w->ListNext;
>         // as a diagnostic measure consider setting w->_ListNext = BAD
>         assert(UNS(_OnDeck) == _LBIT, "invariant");
>         _OnDeck = w;  // pass OnDeck to w.
>
>     It is critical that the update to _EntryList happens before we set
>     _OnDeck, as as soon as _OnDeck is set the selected thread (which
>     need not yet have parked) can acquire the mutex, complete its
>     critical section and proceed to unlock the mutex, and so execute
>     IUnlock in parallel with the original thread. If the write to
>     _EntryList has not yet happened that second thread finds itself
>     still at the head of _EntryList and so the assert fires. If the
>     write to _EntryList happens after the load "List = _EntryList", then
>     the first assert can also fire.
>
>     Preferred fix today is to use OrderAccess::release_store(&_OnDeck,
>     w) with a matching load_acquire(&_OnDeck) in the ILock code:
>
>       while (_OnDeck != ESelf) {
>         ParkCommon(ESelf, 0);
>       }
>
>     and corresponding "raw" lock code. Also fixed a couple of typos.
>
>     Thanks,
>     David
>
>

From david.holmes at oracle.com  Thu Oct 13 01:18:05 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 13 Oct 2016 11:18:05 +1000
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
Message-ID: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>

Hi Dan,

Thanks for looking at this.

On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote:
> On 10/11/16 11:08 PM, David Holmes wrote:
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>
> Very nice catch! We should check the ObjectMonitor succession code for
> similar issues (my task).

Yes. As I said in email I did a quick check through but the succession 
logic is sufficiently different that nothing was obviously wrong in a 
similar way.

>
> src/share/vm/runtime/mutex.cpp
>     L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL,
> UNS(ESelf)) == 0) {
>         Thanks for fixing this bug also!
>
>     L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) {
>         So you've changed this load of _OnDeck to use load-acquire
>         which matches the new store-release on L595:
>
>         OrderAccess::release_store_ptr(&_OnDeck, w);

Right.

>         What about the other loads of _OnDeck or stores to _OnDeck?
>         There should at least be a new comment explaining why we don't
>         need an OrderAccess operation for those. Update: I see you
>         changed one other load of _OnDeck on L1061. Now I'm really
>         wanting comments for the other _OnDeck loads and stores. :-)
>
>         Update: I see Carsten V. asked about this in a slightly different
>         way.

See my reply to Carsten re the load's. I did miss one as we have three 
"locking" paths that need to synchronize with the IUnlock code.

As for documenting ... for line 532 I can add something simple like:

  532   ParkEvent * const w = _OnDeck; // raw load as we will just 
return if non-NULL

For the other stores to _OnDeck ... CAS should be obvious. The setting 
to NULL should also be quite clear as only the _OnDeck thread sets to 
NULL to relinquish being _OnDeck once it has acquired the mutex, which 
happens via CAS which has full barriers. None of the plain stores are in 
the context of:

  some_var = y; // write some shared-state
  _OnDeck = NULL; // signal some_var has been updated

>     L590:     // Pass onDeck to w, ensuring that _EntryList has been set
> first.
>         Typo: 'onDeck' -> 'OnDeck'
>
>         I suspect you don't want to fix all this CamelCase usage to meet
>         HotSpot style. I did that for most of the ObjectMonitor code and
>         it was painful. We could clean it up early in JDK10.

I fixed the typo and also changed ONDECK to OnDeck so that we generally 
refer to OnDeck in commentary unless specifically referring to the 
_OnDeck field.

>         Update: I see Carsten has a comment about this comment also. I
>         don't think I quite agree that we're "passing" _EntryList to w,
>         but I can be convinced otherwise...

Right, nothing to do with _EntryList just making w the OnDeck thread.

> Again, very nice catch! I'd like to see another webrev with the other
> _OnDeck loads and stores either updated for OrderAccess ops or some
> comment explaining why it's not needed.

webrev updated in place with one comment and one new use of 
load-acquire. Plus some cosmetic changes.

Thanks again,
David

> Dan
>
>
>>
>> In IUnlock we have the following succession code to wakeup the
>> "onDeck" thread:
>>
>>  ParkEvent * List = _EntryList;
>>   if (List != NULL) {
>>     // Transfer the head of the EntryList to the OnDeck position.
>>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>>     // For a given lock there is at most OnDeck thread at any one
>> instant.
>>    WakeOne:
>>     assert(List == _EntryList, "invariant");
>>     ParkEvent * const w = List;
>>     assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>> "invariant");
>>     _EntryList = w->ListNext;
>>     // as a diagnostic measure consider setting w->_ListNext = BAD
>>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>>     _OnDeck = w;  // pass OnDeck to w.
>>
>> It is critical that the update to _EntryList happens before we set
>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need
>> not yet have parked) can acquire the mutex, complete its critical
>> section and proceed to unlock the mutex, and so execute IUnlock in
>> parallel with the original thread. If the write to _EntryList has not
>> yet happened that second thread finds itself still at the head of
>> _EntryList and so the assert fires. If the write to _EntryList happens
>> after the load "List = _EntryList", then the first assert can also fire.
>>
>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w)
>> with a matching load_acquire(&_OnDeck) in the ILock code:
>>
>>   while (_OnDeck != ESelf) {
>>     ParkCommon(ESelf, 0);
>>   }
>>
>> and corresponding "raw" lock code. Also fixed a couple of typos.
>>
>> Thanks,
>> David
>

From david.holmes at oracle.com  Thu Oct 13 01:47:39 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 13 Oct 2016 11:47:39 +1000
Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
Message-ID: <67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com>

Hi Coleen,

On 13/10/2016 8:51 AM, Coleen Phillimore wrote:
>
> Note, this is also an RFR since the backport wasn't clean.
> thanks,
> Coleen

Backport of fix itself is good.

I'm assuming you simply copied across the existing test from the JDK9 
repo. The reference in the test to 8098557 is confusing because 8098557 
was never backported and 8163969 effectively replaces it. So as Serguei 
alluded to I'd replace the @bug 8098557 with 8163969.

Thanks,
David

>
> On 10/12/16 5:10 PM, Coleen Phillimore wrote:
>> Summary: Backport change to correct interface initialization.
>>
>> There were too many changes to instanceKlass.cpp for a clean
>> backport.  Also in JDK8, this corrects interface initialization to not
>> initialize the whole interface hierarchy if an interface, not class,
>> initializes initialization.  This is to correctly follow JLS 12.4.2
>> step 7.   I filed a compatibility request (in review) to document the
>> difference in behavior, which I believe will not be noticed.
>>
>> Tested with JPRT, including runtime jtreg lambda-features tests, and
>> JCK tests.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>
>> Thanks,
>> Coleen
>

From thomas.stuefe at gmail.com  Thu Oct 13 04:55:55 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 13 Oct 2016 06:55:55 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn error
	logs.
Message-ID: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>

Dear all,

please take a look at the following fix:

Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html

---

In short, this fix provides the ability to cancel hanging error reporting
steps. This uses the same code paths secondary error handling uses during
error reporting. With this patch, steps which take too long will be
canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this:

4 [timeout occurred during error reporting in step "<stepname>"] after xxxx
ms.
5

and we now also get a finish message in the hs-err file if we hit the
ErrorLogTimeout and error reporting will stop altogether:

6 ------ Timout during error reporting after xxx ms. ------

(in addition to the "time expired, abort" message the WatcherThread writes
to stderr)

---

This is something which bugged us for a long time, because we rely heavily
on the hs_err files for error analysis at customer sites, and there are a
number of reasons why one step may hang and prevent the follow-up steps
from running.

It works like this:

Before, when error reporting started, the WatcherThread was waiting for
ErrorLogTimeout seconds, then would stop the VM.

Now, the WatcherThread periodically pings error reporting, which checks if
the last step did timeout. If it does, it sends a signal to the reporting
thread, and the thread will continue with the next step. This follows the
same path as secondary crash handling.

Some implementation details:

On Posix platforms, to interrupt the thread, I use pthread_kill. This means
I must know the pthread id of the reporting thread, which I now store at
the beginning of error reporting. We already store the reporting thread id
in first_error_tid, but that I cannot use, because it gets set by
os::current_thread_id(), which is not always the pthread id. Should we ever
switch to only using pthread id for posix platforms, this coding can be
simplified.

On Windows, there is unfortunately no easy way to interrupt a
non-cooperative thread. I would need a way to cause a SEH inside the target
thread, which then would get handled by secondary error handling like on
Posix platforms, but that is not easy. It is doable - one can suspend the
thread, modify the thread context in a way that it will crash upon resume.
But that felt a bit heavyweight for this problem. So on windows, timeout
handling still works (after ErrorLogTimeout the VM gets shut down), but
error reporting steps are not interruptable. If we feel this is important,
this can be added later.

Kind Regards, Thomas

From thomas.stuefe at gmail.com  Thu Oct 13 05:49:33 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 13 Oct 2016 07:49:33 +0200
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
Message-ID: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>

Hi all,

may I have plase a review for this tiny change? It just adds some assert to
NMT.

Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
MEMFLAGS/webrev.00/webrev/

We had an ugly memory overwrite caused by this - ultimately our fault,
because we fed an invalid memory flag to NMT - but it was difficult to
find. An assert would have saved some time.

Thank you!

Thomas

From david.holmes at oracle.com  Thu Oct 13 10:08:24 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 13 Oct 2016 20:08:24 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
Message-ID: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>

Hi Thomas,

On 13/10/2016 3:49 PM, Thomas St?fe wrote:
> Hi all,
>
> may I have plase a review for this tiny change? It just adds some assert to
> NMT.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
> MEMFLAGS/webrev.00/webrev/
>
> We had an ugly memory overwrite caused by this - ultimately our fault,
> because we fed an invalid memory flag to NMT - but it was difficult to
> find. An assert would have saved some time.

I'm a little perplexed with asserting that something of MEMFLAGS type 
must be an actual MEMFLAGS value - it implies the caller is coercing 
plain int to MEMFLAGS, and I don't have much sympathy if they mess that 
up. Can't help wondering if there is some clever C++ trick to flag bad 
conversions at compile-time?

The function that takes the index should validate the index, so that is 
fine.

Which one were you actually passing the bad value to? :)

This isn't a strong objection just musing if we can do better. And as 
the hs repos are still closed, and likely to remain so till early next 
week, we have some slack time :)

Cheers,
David

> Thank you!
>
> Thomas
>

From david.holmes at oracle.com  Thu Oct 13 10:25:13 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 13 Oct 2016 20:25:13 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
Message-ID: <d3d60958-44da-16ef-7ddd-9fa1ed7cc04c@oracle.com>

In the interests of fairness I should also point out this is technically 
an enhancement not a bug fix.

David

On 13/10/2016 8:08 PM, David Holmes wrote:
> Hi Thomas,
>
> On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>> Hi all,
>>
>> may I have plase a review for this tiny change? It just adds some
>> assert to
>> NMT.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>> webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>> MEMFLAGS/webrev.00/webrev/
>>
>> We had an ugly memory overwrite caused by this - ultimately our fault,
>> because we fed an invalid memory flag to NMT - but it was difficult to
>> find. An assert would have saved some time.
>
> I'm a little perplexed with asserting that something of MEMFLAGS type
> must be an actual MEMFLAGS value - it implies the caller is coercing
> plain int to MEMFLAGS, and I don't have much sympathy if they mess that
> up. Can't help wondering if there is some clever C++ trick to flag bad
> conversions at compile-time?
>
> The function that takes the index should validate the index, so that is
> fine.
>
> Which one were you actually passing the bad value to? :)
>
> This isn't a strong objection just musing if we can do better. And as
> the hs repos are still closed, and likely to remain so till early next
> week, we have some slack time :)
>
> Cheers,
> David
>
>> Thank you!
>>
>> Thomas
>>

From george.triantafillou at oracle.com  Thu Oct 13 11:40:14 2016
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Thu, 13 Oct 2016 07:40:14 -0400
Subject: RFR(XS) 8166155: Create tests for VM module option handling
In-Reply-To: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com>
References: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com>
Message-ID: <5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com>

After offline feedback from Dmitry Dmitriev, here's an updated webrev:

http://cr.openjdk.java.net/~gtriantafill/8166155/webrev.01/ 
<http://cr.openjdk.java.net/%7Egtriantafill/8166155/webrev.01/>

The test was moved to a separate test for VM module option handling.  
Thanks.

-George

On 9/15/2016 2:25 PM, George Triantafillou wrote:
> Please review this change that adds test coverage for the new VM 
> module option handling implemented in 
> https://bugs.openjdk.java.net/browse/JDK-8157038.
>
> JBS: https://bugs.openjdk.java.net/browse/JDK-8166155
> webrev: http://cr.openjdk.java.net/~gtriantafill/8166155/webrev/ 
> <http://cr.openjdk.java.net/%7Egtriantafill/8166155/webrev/>
>
> Tested locally on Linux.
>
> Thanks.
>
> -George
>


From lois.foltan at oracle.com  Thu Oct 13 11:40:24 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Thu, 13 Oct 2016 07:40:24 -0400
Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java
	needs update for JDK-8162401
In-Reply-To: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com>
References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com>
Message-ID: <57FF72A8.1030002@oracle.com>


On 10/11/2016 5:14 PM, Mandy Chung wrote:
> Harold,
>
> Can you review this test update:
>
> diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java b/test/runtime/modules/IgnoreModulePropertiesTest.java
> --- a/test/runtime/modules/IgnoreModulePropertiesTest.java
> +++ b/test/runtime/modules/IgnoreModulePropertiesTest.java
> @@ -69,8 +69,9 @@
>       public static void main(String[] args) throws Exception {
>           testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException");
>           testOption("--limit-modules", "java.sqlx", "jdk.module.limitmods", "java.lang.module.ResolutionException");
> -        testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "java.lang.RuntimeException");
> -        testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0", "java.lang.RuntimeException");
> +        testOption("--add-reads", "xyzz=yyzd", "jdk.module.addreads.0", "WARNING: Unknown module: xyzz");
> +        testOption("--add-exports", "java.base/xyzz=yyzd", "jdk.module.addexports.0",
> +                   "WARNING: package xyzz not in java.base");
>           testOption("--patch-module", "=d", "jdk.module.patch.0", "IllegalArgumentException");
>       }
>   }

Hi Mandy,
Looks good.

>
> -?add-modules is now a repeating option.  Should this line:
>    testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException");
>
> be changed to ?jdk.module.addmods.0?, as in addreads, addexports property?

Yes, I think it should.
Lois

>
> Mandy


From coleen.phillimore at oracle.com  Thu Oct 13 12:26:34 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Thu, 13 Oct 2016 08:26:34 -0400
Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
	<7f3f7665-2771-e471-e7d8-20c9672c996b@oracle.com>
Message-ID: <f63fe47e-2e7d-87ed-fcf6-ce7e525cc9ee@oracle.com>


Thank you Serguei.

On 10/12/16 7:35 PM, serguei.spitsyn at oracle.com wrote:
> Coleen,
>
> The backport looks good to me.
>
> Minor questions to the test.
>
>
> http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev/test/runtime/lambda-features/TestInterfaceInit.java.frames.html 
>
>
> 2 * Copyright (c) 2014, 2015, Oracle and/or its affiliates. All rights 
> reserved. 2015 => 2016
>

My commit script fixes copyrights, so I'll change that in the commit.
>
> 28 * @bug 8098557
>
>   Why the new bug number is not 8163969?
>

I fixed the bug number as you and David suggested.

Thanks again!
Coleen

>
> Thanks,
> Serguei
>
>
>
> On 10/12/16 15:51, Coleen Phillimore wrote:
>>
>> Note, this is also an RFR since the backport wasn't clean.
>> thanks,
>> Coleen
>>
>>
>> On 10/12/16 5:10 PM, Coleen Phillimore wrote:
>>> Summary: Backport change to correct interface initialization.
>>>
>>> There were too many changes to instanceKlass.cpp for a clean 
>>> backport.  Also in JDK8, this corrects interface initialization to 
>>> not initialize the whole interface hierarchy if an interface, not 
>>> class, initializes initialization.  This is to correctly follow JLS 
>>> 12.4.2 step 7.   I filed a compatibility request (in review) to 
>>> document the difference in behavior, which I believe will not be 
>>> noticed.
>>>
>>> Tested with JPRT, including runtime jtreg lambda-features tests, and 
>>> JCK tests.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>>
>>> Thanks,
>>> Coleen
>>
>


From coleen.phillimore at oracle.com  Thu Oct 13 12:27:11 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Thu, 13 Oct 2016 08:27:11 -0400
Subject: [8u-dev] RFR + Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<233650b2-2888-8f18-257f-2849f3eaae62@oracle.com>
	<67f302ee-98b8-e1f9-aa1f-14e905a13b12@oracle.com>
Message-ID: <6beff22e-62a5-5277-dad4-ffb5c8bcbc91@oracle.com>


On 10/12/16 9:47 PM, David Holmes wrote:
> Hi Coleen,
>
> On 13/10/2016 8:51 AM, Coleen Phillimore wrote:
>>
>> Note, this is also an RFR since the backport wasn't clean.
>> thanks,
>> Coleen
>
> Backport of fix itself is good.
>
> I'm assuming you simply copied across the existing test from the JDK9 
> repo. The reference in the test to 8098557 is confusing because 
> 8098557 was never backported and 8163969 effectively replaces it. So 
> as Serguei alluded to I'd replace the @bug 8098557 with 8163969.

Thanks, yes, I fixed it.  Thanks for you and Serguei noticing it.

Coleen

>
> Thanks,
> David
>
>>
>> On 10/12/16 5:10 PM, Coleen Phillimore wrote:
>>> Summary: Backport change to correct interface initialization.
>>>
>>> There were too many changes to instanceKlass.cpp for a clean
>>> backport.  Also in JDK8, this corrects interface initialization to not
>>> initialize the whole interface hierarchy if an interface, not class,
>>> initializes initialization.  This is to correctly follow JLS 12.4.2
>>> step 7.   I filed a compatibility request (in review) to document the
>>> difference in behavior, which I believe will not be noticed.
>>>
>>> Tested with JPRT, including runtime jtreg lambda-features tests, and
>>> JCK tests.
>>>
>>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>>
>>> Thanks,
>>> Coleen
>>


From coleen.phillimore at oracle.com  Thu Oct 13 12:45:36 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Thu, 13 Oct 2016 08:45:36 -0400
Subject: [8u-dev] Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<53c9f169-df27-a193-ba34-d7319b61dd31@oracle.com>
Message-ID: <d2c3dde4-66f1-8702-335a-2c6bc7e2a621@oracle.com>


Thanks George.

On 10/12/16 7:25 PM, George Triantafillou wrote:
> Hi Coleen,
>
> Small typo in src/share/vm/oops/instanceKlass.cpp:
>
> 889   // Next, if C is a class rather than an interface, initialize 
> it's super class and super
>
> change to
>
> 889   // Next, if C is a class rather than an interface, initialize 
> its super class and super
>
> Otherwise, looks good.

I always want to type the ' in it's for some reason.  I fixed it.

Coleen

>
> -George
>
> On 10/12/2016 5:10 PM, Coleen Phillimore wrote:
>> Summary: Backport change to correct interface initialization.
>>
>> There were too many changes to instanceKlass.cpp for a clean 
>> backport.  Also in JDK8, this corrects interface initialization to 
>> not initialize the whole interface hierarchy if an interface, not 
>> class, initializes initialization.  This is to correctly follow JLS 
>> 12.4.2 step 7.   I filed a compatibility request (in review) to 
>> document the difference in behavior, which I believe will not be 
>> noticed.
>>
>> Tested with JPRT, including runtime jtreg lambda-features tests, and 
>> JCK tests.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>
>> Thanks,
>> Coleen
>


From thomas.stuefe at gmail.com  Thu Oct 13 12:53:01 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 13 Oct 2016 14:53:01 +0200
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
Message-ID: <CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>

Hi David,

On Thu, Oct 13, 2016 at 12:08 PM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>
>> Hi all,
>>
>> may I have plase a review for this tiny change? It just adds some assert
>> to
>> NMT.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul
>> d-check_
>> MEMFLAGS/webrev.00/webrev/
>>
>> We had an ugly memory overwrite caused by this - ultimately our fault,
>> because we fed an invalid memory flag to NMT - but it was difficult to
>> find. An assert would have saved some time.
>>
>
> I'm a little perplexed with asserting that something of MEMFLAGS type must
> be an actual MEMFLAGS value - it implies the caller is coercing plain int
> to MEMFLAGS, and I don't have much sympathy if they mess that up. Can't
> help wondering if there is some clever C++ trick to flag bad conversions at
> compile-time?
>
>
The error was caused by an uninitialized variable of type MEMFLAGS. This
was our fault, we have heavily modified allocation.hpp and introduced an
error then merging changes from upstream. Due to a merging error this lead
to a case where Arena::_flags was not initialized and contained a very
large value.

I admit it looks funny. If it bothers you, I could instead check the
returned index to be in the range for the size of the _malloc array in
MallocMemorySnapshot::by_type(). Technically, it would mean the same.


> The function that takes the index should validate the index, so that is
> fine.
>
> Which one were you actually passing the bad value to? :)
>
> This isn't a strong objection just musing if we can do better. And as the
> hs repos are still closed, and likely to remain so till early next week, we
> have some slack time :)
>
>
:) Sure.

Kind Regards, Thomas


> Cheers,
> David
>
> Thank you!
>>
>> Thomas
>>
>>

From thomas.stuefe at gmail.com  Thu Oct 13 12:57:51 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 13 Oct 2016 14:57:51 +0200
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <d3d60958-44da-16ef-7ddd-9fa1ed7cc04c@oracle.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<d3d60958-44da-16ef-7ddd-9fa1ed7cc04c@oracle.com>
Message-ID: <CAA-vtUy9kxHrFZfGXLJKVJmrNhpU-g5KQmpwsOq5OHej=75nNw@mail.gmail.com>

On Thu, Oct 13, 2016 at 12:25 PM, David Holmes <david.holmes at oracle.com>
wrote:

> In the interests of fairness I should also point out this is technically
> an enhancement not a bug fix.
>
> David


You are right, I changed this to an enhancement in Jira.

From harold.seigel at oracle.com  Thu Oct 13 13:00:15 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Thu, 13 Oct 2016 09:00:15 -0400
Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java
	needs update for JDK-8162401
In-Reply-To: <57FF72A8.1030002@oracle.com>
References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com>
	<57FF72A8.1030002@oracle.com>
Message-ID: <2ae090cf-39ff-9e0b-cce3-33f93ee46f65@oracle.com>

Hi Mandy,

Sorry, I was off yesterday.  Your changes look good.

Harold


On 10/13/2016 7:40 AM, Lois Foltan wrote:
>
> On 10/11/2016 5:14 PM, Mandy Chung wrote:
>> Harold,
>>
>> Can you review this test update:
>>
>> diff --git a/test/runtime/modules/IgnoreModulePropertiesTest.java 
>> b/test/runtime/modules/IgnoreModulePropertiesTest.java
>> --- a/test/runtime/modules/IgnoreModulePropertiesTest.java
>> +++ b/test/runtime/modules/IgnoreModulePropertiesTest.java
>> @@ -69,8 +69,9 @@
>>       public static void main(String[] args) throws Exception {
>>           testOption("--add-modules", "java.sqlx", 
>> "jdk.module.addmods", "java.lang.module.ResolutionException");
>>           testOption("--limit-modules", "java.sqlx", 
>> "jdk.module.limitmods", "java.lang.module.ResolutionException");
>> -        testOption("--add-reads", "xyzz=yyzd", 
>> "jdk.module.addreads.0", "java.lang.RuntimeException");
>> -        testOption("--add-exports", "java.base/xyzz=yyzd", 
>> "jdk.module.addexports.0", "java.lang.RuntimeException");
>> +        testOption("--add-reads", "xyzz=yyzd", 
>> "jdk.module.addreads.0", "WARNING: Unknown module: xyzz");
>> +        testOption("--add-exports", "java.base/xyzz=yyzd", 
>> "jdk.module.addexports.0",
>> +                   "WARNING: package xyzz not in java.base");
>>           testOption("--patch-module", "=d", "jdk.module.patch.0", 
>> "IllegalArgumentException");
>>       }
>>   }
>
> Hi Mandy,
> Looks good.
>
>>
>> -?add-modules is now a repeating option.  Should this line:
>>    testOption("--add-modules", "java.sqlx", "jdk.module.addmods", 
>> "java.lang.module.ResolutionException");
>>
>> be changed to ?jdk.module.addmods.0?, as in addreads, addexports 
>> property?
>
> Yes, I think it should.
> Lois
>
>>
>> Mandy
>


From varming at gmail.com  Thu Oct 13 14:20:34 2016
From: varming at gmail.com (Carsten Varming)
Date: Thu, 13 Oct 2016 10:20:34 -0400
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
	<104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
Message-ID: <CAP_pwnUwdbZDt-3nYzib0Bq+3Kv0R9tPdWMiur7JTsOwzzt-Dw@mail.gmail.com>

Dear David,

The updated webrev looks good to me.

Carsten

On Wed, Oct 12, 2016 at 9:18 PM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Dan,
>
> Thanks for looking at this.
>
> On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote:
>
>> On 10/11/16 11:08 PM, David Holmes wrote:
>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>>>
>>
>> Very nice catch! We should check the ObjectMonitor succession code for
>> similar issues (my task).
>>
>
> Yes. As I said in email I did a quick check through but the succession
> logic is sufficiently different that nothing was obviously wrong in a
> similar way.
>
>
>> src/share/vm/runtime/mutex.cpp
>>     L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL,
>> UNS(ESelf)) == 0) {
>>         Thanks for fixing this bug also!
>>
>>     L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) {
>>         So you've changed this load of _OnDeck to use load-acquire
>>         which matches the new store-release on L595:
>>
>>         OrderAccess::release_store_ptr(&_OnDeck, w);
>>
>
> Right.
>
>         What about the other loads of _OnDeck or stores to _OnDeck?
>>         There should at least be a new comment explaining why we don't
>>         need an OrderAccess operation for those. Update: I see you
>>         changed one other load of _OnDeck on L1061. Now I'm really
>>         wanting comments for the other _OnDeck loads and stores. :-)
>>
>>         Update: I see Carsten V. asked about this in a slightly different
>>         way.
>>
>
> See my reply to Carsten re the load's. I did miss one as we have three
> "locking" paths that need to synchronize with the IUnlock code.
>
> As for documenting ... for line 532 I can add something simple like:
>
>  532   ParkEvent * const w = _OnDeck; // raw load as we will just return
> if non-NULL
>
> For the other stores to _OnDeck ... CAS should be obvious. The setting to
> NULL should also be quite clear as only the _OnDeck thread sets to NULL to
> relinquish being _OnDeck once it has acquired the mutex, which happens via
> CAS which has full barriers. None of the plain stores are in the context of:
>
>  some_var = y; // write some shared-state
>  _OnDeck = NULL; // signal some_var has been updated
>
>     L590:     // Pass onDeck to w, ensuring that _EntryList has been set
>> first.
>>         Typo: 'onDeck' -> 'OnDeck'
>>
>>         I suspect you don't want to fix all this CamelCase usage to meet
>>         HotSpot style. I did that for most of the ObjectMonitor code and
>>         it was painful. We could clean it up early in JDK10.
>>
>
> I fixed the typo and also changed ONDECK to OnDeck so that we generally
> refer to OnDeck in commentary unless specifically referring to the _OnDeck
> field.
>
>         Update: I see Carsten has a comment about this comment also. I
>>         don't think I quite agree that we're "passing" _EntryList to w,
>>         but I can be convinced otherwise...
>>
>
> Right, nothing to do with _EntryList just making w the OnDeck thread.
>
> Again, very nice catch! I'd like to see another webrev with the other
>> _OnDeck loads and stores either updated for OrderAccess ops or some
>> comment explaining why it's not needed.
>>
>
> webrev updated in place with one comment and one new use of load-acquire.
> Plus some cosmetic changes.
>
> Thanks again,
> David
>
>
> Dan
>>
>>
>>
>>> In IUnlock we have the following succession code to wakeup the
>>> "onDeck" thread:
>>>
>>>  ParkEvent * List = _EntryList;
>>>   if (List != NULL) {
>>>     // Transfer the head of the EntryList to the OnDeck position.
>>>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>>>     // For a given lock there is at most OnDeck thread at any one
>>> instant.
>>>    WakeOne:
>>>     assert(List == _EntryList, "invariant");
>>>     ParkEvent * const w = List;
>>>     assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>>> "invariant");
>>>     _EntryList = w->ListNext;
>>>     // as a diagnostic measure consider setting w->_ListNext = BAD
>>>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>>>     _OnDeck = w;  // pass OnDeck to w.
>>>
>>> It is critical that the update to _EntryList happens before we set
>>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need
>>> not yet have parked) can acquire the mutex, complete its critical
>>> section and proceed to unlock the mutex, and so execute IUnlock in
>>> parallel with the original thread. If the write to _EntryList has not
>>> yet happened that second thread finds itself still at the head of
>>> _EntryList and so the assert fires. If the write to _EntryList happens
>>> after the load "List = _EntryList", then the first assert can also fire.
>>>
>>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w)
>>> with a matching load_acquire(&_OnDeck) in the ILock code:
>>>
>>>   while (_OnDeck != ESelf) {
>>>     ParkCommon(ESelf, 0);
>>>   }
>>>
>>> and corresponding "raw" lock code. Also fixed a couple of typos.
>>>
>>> Thanks,
>>> David
>>>
>>
>>

From mandy.chung at oracle.com  Thu Oct 13 14:58:06 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Thu, 13 Oct 2016 07:58:06 -0700
Subject: Review Request: JDK-8167511: IgnoreModulePropertiesTest.java
	needs update for JDK-8162401
In-Reply-To: <57FF72A8.1030002@oracle.com>
References: <99743F21-54FB-4F3C-BBBE-8FFE99E1B3C7@oracle.com>
	<57FF72A8.1030002@oracle.com>
Message-ID: <5AFFF1F1-4B2D-407E-A95C-3F977AAF9FA0@oracle.com>


> On Oct 13, 2016, at 4:40 AM, Lois Foltan <lois.foltan at oracle.com> wrote:
> 
> 
>> 
>> -?add-modules is now a repeating option.  Should this line:
>>   testOption("--add-modules", "java.sqlx", "jdk.module.addmods", "java.lang.module.ResolutionException");
>> 
>> be changed to ?jdk.module.addmods.0?, as in addreads, addexports property?
> 
> Yes, I think it should.

I can change this since I?m on this file.

Thanks
Mandy

From max.ockner at oracle.com  Thu Oct 13 15:35:05 2016
From: max.ockner at oracle.com (Max Ockner)
Date: Thu, 13 Oct 2016 11:35:05 -0400
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
Message-ID: <57FFA9A9.50401@oracle.com>

Hi Thomas,

(Comments below. )

Max

On 10/13/2016 8:53 AM, Thomas St?fe wrote:
> Hi David,
>
> On Thu, Oct 13, 2016 at 12:08 PM, David Holmes <david.holmes at oracle.com>
> wrote:
>
>> Hi Thomas,
>>
>> On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>>
>>> Hi all,
>>>
>>> may I have plase a review for this tiny change? It just adds some assert
>>> to
>>> NMT.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul
>>> d-check_
>>> MEMFLAGS/webrev.00/webrev/
>>>
>>> We had an ugly memory overwrite caused by this - ultimately our fault,
>>> because we fed an invalid memory flag to NMT - but it was difficult to
>>> find. An assert would have saved some time.
It is alarming that a bug in NMT could cause a problem in memory 
management, since it was my understanding that memory allocation 
decisions are not informed by the NMT state.
>> I'm a little perplexed with asserting that something of MEMFLAGS type must
>> be an actual MEMFLAGS value - it implies the caller is coercing plain int
>> to MEMFLAGS, and I don't have much sympathy if they mess that up. Can't
>> help wondering if there is some clever C++ trick to flag bad conversions at
>> compile-time?
>>
>>
> The error was caused by an uninitialized variable of type MEMFLAGS. This
> was our fault, we have heavily modified allocation.hpp and introduced an
> error then merging changes from upstream. Due to a merging error this lead
> to a case where Arena::_flags was not initialized and contained a very
> large value.
>
> I admit it looks funny. If it bothers you, I could instead check the
> returned index to be in the range for the size of the _malloc array in
> MallocMemorySnapshot::by_type(). Technically, it would mean the same.
>
>
>
>> The function that takes the index should validate the index, so that is
>> fine.
I agree with this. I think the decision on whether to access a slot 
should occur as close to memory accessing code as possible.

Another note - If you are validating the index immediately before 
consumption, then it looks like there is a second place where you need 
to add an assert. In virtualMemoryTracker.hpp we have:

   inline VirtualMemory* by_type(MEMFLAGS flag) {
     int index = NMTUtil::flag_to_index(flag);
     return &_virtual_memory[index];
   }

>> Which one were you actually passing the bad value to? :)
>>
>> This isn't a strong objection just musing if we can do better. And as the
>> hs repos are still closed, and likely to remain so till early next week, we
>> have some slack time :)
> :) Sure.
>
> Kind Regards, Thomas
>
>
>> Cheers,
>> David
>>
>> Thank you!
>>> Thomas
>>>
>>>


From coleen.phillimore at oracle.com  Thu Oct 13 15:56:50 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Thu, 13 Oct 2016 11:56:50 -0400
Subject: [8u-dev] Request for approval: 8163969: Cyclic interface
	initialization causes JVM crash
In-Reply-To: <20161013141505.GC3354@vimes>
References: <997fe3cb-d9ef-0895-1b1b-93a4782831cf@oracle.com>
	<20161013141505.GC3354@vimes>
Message-ID: <a88f9258-5510-a53b-d560-20fcd69710ff@oracle.com>

Thank you!
Coleen

On 10/13/16 10:15 AM, Rob McKenna wrote:
> Approved.
>
> Review thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-October/021575.html
>
> 	-Rob
>
> On 12/10/16 05:10, Coleen Phillimore wrote:
>> Summary: Backport change to correct interface initialization.
>>
>> There were too many changes to instanceKlass.cpp for a clean backport.  Also
>> in JDK8, this corrects interface initialization to not initialize the whole
>> interface hierarchy if an interface, not class, initializes initialization.
>> This is to correctly follow JLS 12.4.2 step 7.   I filed a compatibility
>> request (in review) to document the difference in behavior, which I believe
>> will not be noticed.
>>
>> Tested with JPRT, including runtime jtreg lambda-features tests, and JCK
>> tests.
>>
>> open webrev at http://cr.openjdk.java.net/~coleenp/8163969.8.01/webrev
>> bug link https://bugs.openjdk.java.net/browse/JDK-8163969.8
>>
>> Thanks,
>> Coleen


From daniel.daugherty at oracle.com  Thu Oct 13 16:24:51 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 13 Oct 2016 10:24:51 -0600
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
	<104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
Message-ID: <60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com>

 > webrev updated in place with one comment and one new use of load-acquire.
 > Plus some cosmetic changes.
 >
 > webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/

src/share/vm/runtime/mutex.cpp
     No comments.

Thumbs up!

Dan


On 10/12/16 7:18 PM, David Holmes wrote:
> Hi Dan,
>
> Thanks for looking at this.
>
> On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote:
>> On 10/11/16 11:08 PM, David Holmes wrote:
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>>
>> Very nice catch! We should check the ObjectMonitor succession code for
>> similar issues (my task).
>
> Yes. As I said in email I did a quick check through but the succession 
> logic is sufficiently different that nothing was obviously wrong in a 
> similar way.
>
>>
>> src/share/vm/runtime/mutex.cpp
>>     L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL,
>> UNS(ESelf)) == 0) {
>>         Thanks for fixing this bug also!
>>
>>     L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) {
>>         So you've changed this load of _OnDeck to use load-acquire
>>         which matches the new store-release on L595:
>>
>>         OrderAccess::release_store_ptr(&_OnDeck, w);
>
> Right.
>
>>         What about the other loads of _OnDeck or stores to _OnDeck?
>>         There should at least be a new comment explaining why we don't
>>         need an OrderAccess operation for those. Update: I see you
>>         changed one other load of _OnDeck on L1061. Now I'm really
>>         wanting comments for the other _OnDeck loads and stores. :-)
>>
>>         Update: I see Carsten V. asked about this in a slightly 
>> different
>>         way.
>
> See my reply to Carsten re the load's. I did miss one as we have three 
> "locking" paths that need to synchronize with the IUnlock code.
>
> As for documenting ... for line 532 I can add something simple like:
>
>  532   ParkEvent * const w = _OnDeck; // raw load as we will just 
> return if non-NULL
>
> For the other stores to _OnDeck ... CAS should be obvious. The setting 
> to NULL should also be quite clear as only the _OnDeck thread sets to 
> NULL to relinquish being _OnDeck once it has acquired the mutex, which 
> happens via CAS which has full barriers. None of the plain stores are 
> in the context of:
>
>  some_var = y; // write some shared-state
>  _OnDeck = NULL; // signal some_var has been updated
>
>>     L590:     // Pass onDeck to w, ensuring that _EntryList has been set
>> first.
>>         Typo: 'onDeck' -> 'OnDeck'
>>
>>         I suspect you don't want to fix all this CamelCase usage to meet
>>         HotSpot style. I did that for most of the ObjectMonitor code and
>>         it was painful. We could clean it up early in JDK10.
>
> I fixed the typo and also changed ONDECK to OnDeck so that we 
> generally refer to OnDeck in commentary unless specifically referring 
> to the _OnDeck field.
>
>>         Update: I see Carsten has a comment about this comment also. I
>>         don't think I quite agree that we're "passing" _EntryList to w,
>>         but I can be convinced otherwise...
>
> Right, nothing to do with _EntryList just making w the OnDeck thread.
>
>> Again, very nice catch! I'd like to see another webrev with the other
>> _OnDeck loads and stores either updated for OrderAccess ops or some
>> comment explaining why it's not needed.
>
> webrev updated in place with one comment and one new use of 
> load-acquire. Plus some cosmetic changes.
>
> Thanks again,
> David
>
>> Dan
>>
>>
>>>
>>> In IUnlock we have the following succession code to wakeup the
>>> "onDeck" thread:
>>>
>>>  ParkEvent * List = _EntryList;
>>>   if (List != NULL) {
>>>     // Transfer the head of the EntryList to the OnDeck position.
>>>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>>>     // For a given lock there is at most OnDeck thread at any one
>>> instant.
>>>    WakeOne:
>>>     assert(List == _EntryList, "invariant");
>>>     ParkEvent * const w = List;
>>>     assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>>> "invariant");
>>>     _EntryList = w->ListNext;
>>>     // as a diagnostic measure consider setting w->_ListNext = BAD
>>>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>>>     _OnDeck = w;  // pass OnDeck to w.
>>>
>>> It is critical that the update to _EntryList happens before we set
>>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need
>>> not yet have parked) can acquire the mutex, complete its critical
>>> section and proceed to unlock the mutex, and so execute IUnlock in
>>> parallel with the original thread. If the write to _EntryList has not
>>> yet happened that second thread finds itself still at the head of
>>> _EntryList and so the assert fires. If the write to _EntryList happens
>>> after the load "List = _EntryList", then the first assert can also 
>>> fire.
>>>
>>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w)
>>> with a matching load_acquire(&_OnDeck) in the ILock code:
>>>
>>>   while (_OnDeck != ESelf) {
>>>     ParkCommon(ESelf, 0);
>>>   }
>>>
>>> and corresponding "raw" lock code. Also fixed a couple of typos.
>>>
>>> Thanks,
>>> David
>>


From christian.tornqvist at oracle.com  Thu Oct 13 18:09:08 2016
From: christian.tornqvist at oracle.com (Christian Tornqvist)
Date: Thu, 13 Oct 2016 14:09:08 -0400
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with Could
	not open PerfMemory on Windows
Message-ID: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>

Hi everyone,

 
Please review this small fix for an intermittent issue we've seen when
running tests concurrently that use jcmd/jstack. 

When running jcmd, we enumerate the perfdata files and then open them one by
one to read things like main class names etc. If the perfdata file
disappears (because the Java process exited) before we get to it, we end up
with different exceptions depending on where in the code we are.

 
The code at:

http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/
classes/sun/tools/common/ProcessArgumentMatcher.java#l88

 
handles this, the problem is that if we get all the way to
open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which
isn't caught by this. The fix is to throw a NPE instead of Exception and let
the existing code handle this. 

 
Fix has been tested locally and with 30 JPRT runs (with concurrency patch
applied), also managed to reproduce and verify this fix locally using a
debugger to trigger the race.

 
Webrev:

http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/

 
Bug (unfortunately not visible):

https://bugs.openjdk.java.net/browse/JDK-8159799

 
Thanks,

Christian

 
From david.holmes at oracle.com  Thu Oct 13 22:45:19 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 08:45:19 +1000
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
Message-ID: <b24c170a-1d98-9fc8-a8f3-b9778c3a4898@oracle.com>

Hi Christian,

Great find on getting to the bottom of this!

However ...

On 14/10/2016 4:09 AM, Christian Tornqvist wrote:
> Hi everyone,
>
> Please review this small fix for an intermittent issue we've seen when
> running tests concurrently that use jcmd/jstack.
>
> When running jcmd, we enumerate the perfdata files and then open them one by
> one to read things like main class names etc. If the perfdata file
> disappears (because the Java process exited) before we get to it, we end up
> with different exceptions depending on where in the code we are.
>
> The code at:
>
> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/
> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
>
> handles this, the problem is that if we get all the way to
> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which
> isn't caught by this. The fix is to throw a NPE instead of Exception and let
> the existing code handle this.

... that seems the wrong fix. NPE is a very specific exception with a 
very clear meaning. I'm not at all sure where the existing NPE may come 
from, but it seems to me that there should be a more specific exception 
defined for this condition that is thrown by the VM and anticipated by 
the Java code. Why is there not a FileNotFoundException for example ?? 
The current NPE seems incidental.

As a quick fix to improve test stability I can agree to this but I'd 
like to see a RFE to properly coordinate the VM and Java sides of this 
with a well defined (set of) exception(s).

Thanks,
David

> Fix has been tested locally and with 30 JPRT runs (with concurrency patch
> applied), also managed to reproduce and verify this fix locally using a
> debugger to trigger the race.
>
>
>
> Webrev:
>
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
>
>
>
> Bug (unfortunately not visible):
>
> https://bugs.openjdk.java.net/browse/JDK-8159799
>
>
>
> Thanks,
>
> Christian
>
>
>
>
>

From david.holmes at oracle.com  Thu Oct 13 22:50:34 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 08:50:34 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUy9kxHrFZfGXLJKVJmrNhpU-g5KQmpwsOq5OHej=75nNw@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<d3d60958-44da-16ef-7ddd-9fa1ed7cc04c@oracle.com>
	<CAA-vtUy9kxHrFZfGXLJKVJmrNhpU-g5KQmpwsOq5OHej=75nNw@mail.gmail.com>
Message-ID: <18ccfdd5-503f-934b-6ff4-b1ae7237b4dd@oracle.com>

On 13/10/2016 10:57 PM, Thomas St?fe wrote:
> On Thu, Oct 13, 2016 at 12:25 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     In the interests of fairness I should also point out this is
>     technically an enhancement not a bug fix.
>
>     David
>
>
> You are right, I changed this to an enhancement in Jira.

Which means this has to wait for 10, or else go through FC Extension 
process.

David

From david.holmes at oracle.com  Thu Oct 13 22:57:25 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 08:57:25 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
Message-ID: <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>

On 13/10/2016 10:53 PM, Thomas St?fe wrote:
> Hi David,
>
> On Thu, Oct 13, 2016 at 12:08 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>
>         Hi all,
>
>         may I have plase a review for this tiny change? It just adds
>         some assert to NMT.
>
>         Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>         webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>
>         MEMFLAGS/webrev.00/webrev/
>
>         We had an ugly memory overwrite caused by this - ultimately our
>         fault, because we fed an invalid memory flag to NMT - but it was
>         difficult to find. An assert would have saved some time.
>
>
>     I'm a little perplexed with asserting that something of MEMFLAGS
>     type must be an actual MEMFLAGS value - it implies the caller is
>     coercing plain int to MEMFLAGS, and I don't have much sympathy if
>     they mess that up. Can't help wondering if there is some clever C++
>     trick to flag bad conversions at compile-time?
>
>
> The error was caused by an uninitialized variable of type MEMFLAGS. This
> was our fault, we have heavily modified allocation.hpp and introduced an
> error then merging changes from upstream. Due to a merging error this
> lead to a case where Arena::_flags was not initialized and contained a
> very large value.

Ah I see. Lack of default initialization can be annoying :)

> I admit it looks funny. If it bothers you, I could instead check the
> returned index to be in the range for the size of the _malloc array in
> MallocMemorySnapshot::by_type(). Technically, it would mean the same.

So I just realized that here:

   62   // Map memory type to human readable name
   63   static const char* flag_to_name(MEMFLAGS flag) {
   64     assert(flag >= 0 && flag < mt_number_of_types, "Invalid flag 
value %d.", (int)flag);
   65     return _memory_type_names[flag_to_index(flag)];
   66   }

we call flag_to_index, so the assert is redundant as it is already in 
flag_to_index. Then presumably we change flag_to_index to something like 
this:

      static inline int flag_to_index(MEMFLAGS flag) {
        int index = (flag & 0xff);
        assert(index >= 0 && index < mt_number_of_types, "Invalid flag 
value %d.", (int)flag);
        return index;
      }

so we're validating the index rather than the flag.

Cheers,
David

>
>
>     The function that takes the index should validate the index, so that
>     is fine.
>
>     Which one were you actually passing the bad value to? :)
>
>     This isn't a strong objection just musing if we can do better. And
>     as the hs repos are still closed, and likely to remain so till early
>     next week, we have some slack time :)
>
>
> :) Sure.
>
> Kind Regards, Thomas
>
>
>     Cheers,
>     David
>
>         Thank you!
>
>         Thomas
>
>

From david.holmes at oracle.com  Thu Oct 13 23:13:04 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 09:13:04 +1000
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <CAP_pwnUwdbZDt-3nYzib0Bq+3Kv0R9tPdWMiur7JTsOwzzt-Dw@mail.gmail.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
	<104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
	<CAP_pwnUwdbZDt-3nYzib0Bq+3Kv0R9tPdWMiur7JTsOwzzt-Dw@mail.gmail.com>
Message-ID: <e31940bb-f02d-8265-e91e-467161b30541@oracle.com>

Thanks Carsten.

David

On 14/10/2016 12:20 AM, Carsten Varming wrote:
> Dear David,
>
> The updated webrev looks good to me.
>
> Carsten
>
> On Wed, Oct 12, 2016 at 9:18 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Dan,
>
>     Thanks for looking at this.
>
>     On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote:
>
>         On 10/11/16 11:08 PM, David Holmes wrote:
>
>             Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>             <https://bugs.openjdk.java.net/browse/JDK-8166197>
>             webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>             <http://cr.openjdk.java.net/~dholmes/8166197/webrev/>
>
>
>         Very nice catch! We should check the ObjectMonitor succession
>         code for
>         similar issues (my task).
>
>
>     Yes. As I said in email I did a quick check through but the
>     succession logic is sufficiently different that nothing was
>     obviously wrong in a similar way.
>
>
>         src/share/vm/runtime/mutex.cpp
>             L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL,
>         UNS(ESelf)) == 0) {
>                 Thanks for fixing this bug also!
>
>             L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) !=
>         ESelf) {
>                 So you've changed this load of _OnDeck to use load-acquire
>                 which matches the new store-release on L595:
>
>                 OrderAccess::release_store_ptr(&_OnDeck, w);
>
>
>     Right.
>
>                 What about the other loads of _OnDeck or stores to _OnDeck?
>                 There should at least be a new comment explaining why we
>         don't
>                 need an OrderAccess operation for those. Update: I see you
>                 changed one other load of _OnDeck on L1061. Now I'm really
>                 wanting comments for the other _OnDeck loads and stores. :-)
>
>                 Update: I see Carsten V. asked about this in a slightly
>         different
>                 way.
>
>
>     See my reply to Carsten re the load's. I did miss one as we have
>     three "locking" paths that need to synchronize with the IUnlock code.
>
>     As for documenting ... for line 532 I can add something simple like:
>
>      532   ParkEvent * const w = _OnDeck; // raw load as we will just
>     return if non-NULL
>
>     For the other stores to _OnDeck ... CAS should be obvious. The
>     setting to NULL should also be quite clear as only the _OnDeck
>     thread sets to NULL to relinquish being _OnDeck once it has acquired
>     the mutex, which happens via CAS which has full barriers. None of
>     the plain stores are in the context of:
>
>      some_var = y; // write some shared-state
>      _OnDeck = NULL; // signal some_var has been updated
>
>             L590:     // Pass onDeck to w, ensuring that _EntryList has
>         been set
>         first.
>                 Typo: 'onDeck' -> 'OnDeck'
>
>                 I suspect you don't want to fix all this CamelCase usage
>         to meet
>                 HotSpot style. I did that for most of the ObjectMonitor
>         code and
>                 it was painful. We could clean it up early in JDK10.
>
>
>     I fixed the typo and also changed ONDECK to OnDeck so that we
>     generally refer to OnDeck in commentary unless specifically
>     referring to the _OnDeck field.
>
>                 Update: I see Carsten has a comment about this comment
>         also. I
>                 don't think I quite agree that we're "passing"
>         _EntryList to w,
>                 but I can be convinced otherwise...
>
>
>     Right, nothing to do with _EntryList just making w the OnDeck thread.
>
>         Again, very nice catch! I'd like to see another webrev with the
>         other
>         _OnDeck loads and stores either updated for OrderAccess ops or some
>         comment explaining why it's not needed.
>
>
>     webrev updated in place with one comment and one new use of
>     load-acquire. Plus some cosmetic changes.
>
>     Thanks again,
>     David
>
>
>         Dan
>
>
>
>             In IUnlock we have the following succession code to wakeup the
>             "onDeck" thread:
>
>              ParkEvent * List = _EntryList;
>               if (List != NULL) {
>                 // Transfer the head of the EntryList to the OnDeck
>             position.
>                 // Once OnDeck, a thread stays OnDeck until it acquires
>             the lock.
>                 // For a given lock there is at most OnDeck thread at
>             any one
>             instant.
>                WakeOne:
>                 assert(List == _EntryList, "invariant");
>                 ParkEvent * const w = List;
>                 assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>             "invariant");
>                 _EntryList = w->ListNext;
>                 // as a diagnostic measure consider setting w->_ListNext
>             = BAD
>                 assert(UNS(_OnDeck) == _LBIT, "invariant");
>                 _OnDeck = w;  // pass OnDeck to w.
>
>             It is critical that the update to _EntryList happens before
>             we set
>             _OnDeck, as as soon as _OnDeck is set the selected thread
>             (which need
>             not yet have parked) can acquire the mutex, complete its
>             critical
>             section and proceed to unlock the mutex, and so execute
>             IUnlock in
>             parallel with the original thread. If the write to
>             _EntryList has not
>             yet happened that second thread finds itself still at the
>             head of
>             _EntryList and so the assert fires. If the write to
>             _EntryList happens
>             after the load "List = _EntryList", then the first assert
>             can also fire.
>
>             Preferred fix today is to use
>             OrderAccess::release_store(&_OnDeck, w)
>             with a matching load_acquire(&_OnDeck) in the ILock code:
>
>               while (_OnDeck != ESelf) {
>                 ParkCommon(ESelf, 0);
>               }
>
>             and corresponding "raw" lock code. Also fixed a couple of typos.
>
>             Thanks,
>             David
>
>
>

From david.holmes at oracle.com  Thu Oct 13 23:13:32 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 09:13:32 +1000
Subject: RFR: 8166197: assert(RelaxAssert || w !=
	Thread::current()->_MutexEvent) failed: invariant
In-Reply-To: <60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com>
References: <e6bcda68-2837-2403-1334-2d709b998a16@oracle.com>
	<cb45f93b-fcef-63ee-9206-e944efe2c9f6@oracle.com>
	<104933ba-221f-4007-1f17-f7ce799722a4@oracle.com>
	<60e91874-853b-ecdd-01f2-e40fc84b6275@oracle.com>
Message-ID: <f5c777cf-416a-b53c-7bf0-47e193d20861@oracle.com>

Thanks Dan!

David

On 14/10/2016 2:24 AM, Daniel D. Daugherty wrote:
>> webrev updated in place with one comment and one new use of load-acquire.
>> Plus some cosmetic changes.
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>
> src/share/vm/runtime/mutex.cpp
>     No comments.
>
> Thumbs up!
>
> Dan
>
>
>
> On 10/12/16 7:18 PM, David Holmes wrote:
>> Hi Dan,
>>
>> Thanks for looking at this.
>>
>> On 13/10/2016 1:03 AM, Daniel D. Daugherty wrote:
>>> On 10/11/16 11:08 PM, David Holmes wrote:
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166197
>>>> webrev: http://cr.openjdk.java.net/~dholmes/8166197/webrev/
>>>
>>> Very nice catch! We should check the ObjectMonitor succession code for
>>> similar issues (my task).
>>
>> Yes. As I said in email I did a quick check through but the succession
>> logic is sufficiently different that nothing was obviously wrong in a
>> similar way.
>>
>>>
>>> src/share/vm/runtime/mutex.cpp
>>>     L466:   if ((NativeMonitorFlags & 32) && CASPTR (&_OnDeck, NULL,
>>> UNS(ESelf)) == 0) {
>>>         Thanks for fixing this bug also!
>>>
>>>     L477:   while (OrderAccess::load_ptr_acquire(&_OnDeck) != ESelf) {
>>>         So you've changed this load of _OnDeck to use load-acquire
>>>         which matches the new store-release on L595:
>>>
>>>         OrderAccess::release_store_ptr(&_OnDeck, w);
>>
>> Right.
>>
>>>         What about the other loads of _OnDeck or stores to _OnDeck?
>>>         There should at least be a new comment explaining why we don't
>>>         need an OrderAccess operation for those. Update: I see you
>>>         changed one other load of _OnDeck on L1061. Now I'm really
>>>         wanting comments for the other _OnDeck loads and stores. :-)
>>>
>>>         Update: I see Carsten V. asked about this in a slightly
>>> different
>>>         way.
>>
>> See my reply to Carsten re the load's. I did miss one as we have three
>> "locking" paths that need to synchronize with the IUnlock code.
>>
>> As for documenting ... for line 532 I can add something simple like:
>>
>>  532   ParkEvent * const w = _OnDeck; // raw load as we will just
>> return if non-NULL
>>
>> For the other stores to _OnDeck ... CAS should be obvious. The setting
>> to NULL should also be quite clear as only the _OnDeck thread sets to
>> NULL to relinquish being _OnDeck once it has acquired the mutex, which
>> happens via CAS which has full barriers. None of the plain stores are
>> in the context of:
>>
>>  some_var = y; // write some shared-state
>>  _OnDeck = NULL; // signal some_var has been updated
>>
>>>     L590:     // Pass onDeck to w, ensuring that _EntryList has been set
>>> first.
>>>         Typo: 'onDeck' -> 'OnDeck'
>>>
>>>         I suspect you don't want to fix all this CamelCase usage to meet
>>>         HotSpot style. I did that for most of the ObjectMonitor code and
>>>         it was painful. We could clean it up early in JDK10.
>>
>> I fixed the typo and also changed ONDECK to OnDeck so that we
>> generally refer to OnDeck in commentary unless specifically referring
>> to the _OnDeck field.
>>
>>>         Update: I see Carsten has a comment about this comment also. I
>>>         don't think I quite agree that we're "passing" _EntryList to w,
>>>         but I can be convinced otherwise...
>>
>> Right, nothing to do with _EntryList just making w the OnDeck thread.
>>
>>> Again, very nice catch! I'd like to see another webrev with the other
>>> _OnDeck loads and stores either updated for OrderAccess ops or some
>>> comment explaining why it's not needed.
>>
>> webrev updated in place with one comment and one new use of
>> load-acquire. Plus some cosmetic changes.
>>
>> Thanks again,
>> David
>>
>>> Dan
>>>
>>>
>>>>
>>>> In IUnlock we have the following succession code to wakeup the
>>>> "onDeck" thread:
>>>>
>>>>  ParkEvent * List = _EntryList;
>>>>   if (List != NULL) {
>>>>     // Transfer the head of the EntryList to the OnDeck position.
>>>>     // Once OnDeck, a thread stays OnDeck until it acquires the lock.
>>>>     // For a given lock there is at most OnDeck thread at any one
>>>> instant.
>>>>    WakeOne:
>>>>     assert(List == _EntryList, "invariant");
>>>>     ParkEvent * const w = List;
>>>>     assert(RelaxAssert || w != Thread::current()->_MutexEvent,
>>>> "invariant");
>>>>     _EntryList = w->ListNext;
>>>>     // as a diagnostic measure consider setting w->_ListNext = BAD
>>>>     assert(UNS(_OnDeck) == _LBIT, "invariant");
>>>>     _OnDeck = w;  // pass OnDeck to w.
>>>>
>>>> It is critical that the update to _EntryList happens before we set
>>>> _OnDeck, as as soon as _OnDeck is set the selected thread (which need
>>>> not yet have parked) can acquire the mutex, complete its critical
>>>> section and proceed to unlock the mutex, and so execute IUnlock in
>>>> parallel with the original thread. If the write to _EntryList has not
>>>> yet happened that second thread finds itself still at the head of
>>>> _EntryList and so the assert fires. If the write to _EntryList happens
>>>> after the load "List = _EntryList", then the first assert can also
>>>> fire.
>>>>
>>>> Preferred fix today is to use OrderAccess::release_store(&_OnDeck, w)
>>>> with a matching load_acquire(&_OnDeck) in the ILock code:
>>>>
>>>>   while (_OnDeck != ESelf) {
>>>>     ParkCommon(ESelf, 0);
>>>>   }
>>>>
>>>> and corresponding "raw" lock code. Also fixed a couple of typos.
>>>>
>>>> Thanks,
>>>> David
>>>
>

From shafi.s.ahmad at oracle.com  Fri Oct 14 05:28:45 2016
From: shafi.s.ahmad at oracle.com (Shafi Ahmad)
Date: Thu, 13 Oct 2016 22:28:45 -0700 (PDT)
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <5eb7474b-a72e-41c0-b389-bfad82270f18@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
Message-ID: <0fe3deb1-594c-46e6-829b-fe70315d3496@default>

Hi,

May I get some comment on this.

Regards,
Shafi

> -----Original Message-----
> From: Shafi Ahmad
> Sent: Wednesday, October 12, 2016 12:42 PM
> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
> Cc: David Holmes
> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> work for OOM caused by inability to create threads'
> 
> Hi Mikael,
> 
> Thanks for reviewing it.
> 
> Once VM is initialized, following are two OOME scenarios:
>  1) OOME due to unavailability of java memory [Mainly due to java
> application].
>  2) OOME due to unavailability of native memory.
> 
> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
> correct.
> 
> Consider the following scenarios:
> 1) Let there is java application which uses JNI code and inside JNI code there
> is native memory allocation/free and we hit OOME.
> 2) Let there is java application which uses JNI code and inside JNI code there
> is memory leak error and due to this OOME situation occurs.
> 3) We use jvm option Xms and  -Xmx in such a way that the available native
> memory is very less and VM hit OOME.
> 
> I am not sure above scenario is feasible in JVM or not but if any of the above
> scenario is possible in VM then should we consider it as OOME due java
> application or not?
> I consider case 1) and 2) as OOME due to java application and added code for
> java.lang.OutOfMemoryError inside report_vm_out_of_memory.
> 
> My  assumption of OOME once VM is initialized completely is due to java
> application[directly or indirectly] may not hold true always.
> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added
> the related code change inside report_vm_out_of_memory.
> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
> 
> Please let me know whether I should remove the code change inside
> report_vm_out_of_memory or keep it by adding appropriate reason of
> OutOfMemoryError.
> 
> Regards,
> Shafi
> 
> > -----Original Message-----
> > From: Mikael Gerdin
> > Sent: Monday, October 10, 2016 7:30 PM
> > To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
> > Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> > work for OOM caused by inability to create threads'
> >
> > Hi,
> >
> > On 2016-10-10 09:24, Shafi Ahmad wrote:
> > > Hi All,
> > >
> > > Please review the simple change for the fix of bug '' JDK-8155004:
> > CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
> > create threads'.
> > >
> > > Summary:
> > > In the current implementation there are few scenarios where we are
> > > not
> > obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> > > While I was analysis this issue I found there are two jvm state
> > > where OOM
> > can happen:
> > >  1.  OOM during VM initialization - as per our internal discussion
> > > for this case
> > it is not worth for dumping core file, so this is left as it is.
> > >  2.  OOM once VM is initialized - For this scenario most of the
> > > place code is
> > already added but few place corresponding code changes are missing so
> > this change covers it.
> > >
> > > Webrev link:
> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
> >
> >
> > There is a lot of confusion in the VM code with the term "out of
> > memory error".
> > In some places it refers to code throwing a java.lang.OutOfMemoryError
> > and expecting running java code to be able to potentially catch that
> > Error and continue running.
> >
> > In other places, such as callers of report_vm_out_of_memory, the
> > situation is much more dire and the calling thread may not even be a
> > JavaThread and as such cannot "throw" an exception.
> > report_vm_out_of_memory is only invoked through the macro
> > vm_exit_out_of_memory, which of course implies that the condition is
> > fatal and we are about to terminate the JVM process altogether.
> >
> > I think that it's incorrect to call code related to
> > java.lang.OutOfMemoryError in report_vm_out_of_memory since the
> > condition may not even be correlated with Java level application behavior.
> >
> > /Mikael
> >
> > > Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
> > >
> > > Testing: jprt and jtreg (on Linux x86_64)
> > >
> > > Regards,
> > > Shafi
> > >

From staffan.larsen at oracle.com  Fri Oct 14 06:18:35 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Fri, 14 Oct 2016 08:18:35 +0200
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
Message-ID: <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>

Thanks for getting to the bottom of this!

The fix looks good. Perhaps, as David points out, FileNotFoundException is a better choice but that requires more changes as FileNotFoundException is not used anywhere else in the JVM. 

Thanks,
/Staffan
 
> On 13 Oct 2016, at 20:09, Christian Tornqvist <christian.tornqvist at oracle.com> wrote:
> 
> Hi everyone,
> 
> 
> 
> Please review this small fix for an intermittent issue we've seen when
> running tests concurrently that use jcmd/jstack. 
> 
> When running jcmd, we enumerate the perfdata files and then open them one by
> one to read things like main class names etc. If the perfdata file
> disappears (because the Java process exited) before we get to it, we end up
> with different exceptions depending on where in the code we are.
> 
> 
> 
> The code at:
> 
> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/share/
> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
> 
> 
> 
> handles this, the problem is that if we get all the way to
> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception which
> isn't caught by this. The fix is to throw a NPE instead of Exception and let
> the existing code handle this. 
> 
> 
> 
> Fix has been tested locally and with 30 JPRT runs (with concurrency patch
> applied), also managed to reproduce and verify this fix locally using a
> debugger to trigger the race.
> 
> 
> 
> Webrev:
> 
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
> 
> 
> 
> Bug (unfortunately not visible):
> 
> https://bugs.openjdk.java.net/browse/JDK-8159799
> 
> 
> 
> Thanks,
> 
> Christian
> 
> 
> 
> 
> 


From david.holmes at oracle.com  Fri Oct 14 06:31:36 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 16:31:36 +1000
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <0fe3deb1-594c-46e6-829b-fe70315d3496@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
	<0fe3deb1-594c-46e6-829b-fe70315d3496@default>
Message-ID: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>

Hi Shafi,

I stand by my previous comment - in the context of this bug, in relation 
to failure to create a native thread "A call to 
report_java_out_of_memory should only be made on a code path that will 
throw an OOME."

Does this mean we have all OOM (not OOME)** situations covered? Nope. 
But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError seem specific 
to OutOfMemoryError to me - and not that useful for dealing with the JNI 
leak you describe.

Feel free to file a RFE to look into more elaborate/extensive OOM 
handling. I'm not sure if NMT hooks into JNI.

Thanks,
David


** OOM - out-of-memory
    OOME - OutOfMemoryError - a Java exception thrown in response to a 
detected OOM condition

On 14/10/2016 3:28 PM, Shafi Ahmad wrote:
> Hi,
>
> May I get some comment on this.
>
> Regards,
> Shafi
>
>> -----Original Message-----
>> From: Shafi Ahmad
>> Sent: Wednesday, October 12, 2016 12:42 PM
>> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
>> Cc: David Holmes
>> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>> work for OOM caused by inability to create threads'
>>
>> Hi Mikael,
>>
>> Thanks for reviewing it.
>>
>> Once VM is initialized, following are two OOME scenarios:
>>  1) OOME due to unavailability of java memory [Mainly due to java
>> application].
>>  2) OOME due to unavailability of native memory.
>>
>> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
>> correct.
>>
>> Consider the following scenarios:
>> 1) Let there is java application which uses JNI code and inside JNI code there
>> is native memory allocation/free and we hit OOME.
>> 2) Let there is java application which uses JNI code and inside JNI code there
>> is memory leak error and due to this OOME situation occurs.
>> 3) We use jvm option Xms and  -Xmx in such a way that the available native
>> memory is very less and VM hit OOME.
>>
>> I am not sure above scenario is feasible in JVM or not but if any of the above
>> scenario is possible in VM then should we consider it as OOME due java
>> application or not?
>> I consider case 1) and 2) as OOME due to java application and added code for
>> java.lang.OutOfMemoryError inside report_vm_out_of_memory.
>>
>> My  assumption of OOME once VM is initialized completely is due to java
>> application[directly or indirectly] may not hold true always.
>> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I added
>> the related code change inside report_vm_out_of_memory.
>> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
>>
>> Please let me know whether I should remove the code change inside
>> report_vm_out_of_memory or keep it by adding appropriate reason of
>> OutOfMemoryError.
>>
>> Regards,
>> Shafi
>>
>>> -----Original Message-----
>>> From: Mikael Gerdin
>>> Sent: Monday, October 10, 2016 7:30 PM
>>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>>> work for OOM caused by inability to create threads'
>>>
>>> Hi,
>>>
>>> On 2016-10-10 09:24, Shafi Ahmad wrote:
>>>> Hi All,
>>>>
>>>> Please review the simple change for the fix of bug '' JDK-8155004:
>>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
>>> create threads'.
>>>>
>>>> Summary:
>>>> In the current implementation there are few scenarios where we are
>>>> not
>>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
>>>> While I was analysis this issue I found there are two jvm state
>>>> where OOM
>>> can happen:
>>>>  1.  OOM during VM initialization - as per our internal discussion
>>>> for this case
>>> it is not worth for dumping core file, so this is left as it is.
>>>>  2.  OOM once VM is initialized - For this scenario most of the
>>>> place code is
>>> already added but few place corresponding code changes are missing so
>>> this change covers it.
>>>>
>>>> Webrev link:
>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
>>>
>>>
>>> There is a lot of confusion in the VM code with the term "out of
>>> memory error".
>>> In some places it refers to code throwing a java.lang.OutOfMemoryError
>>> and expecting running java code to be able to potentially catch that
>>> Error and continue running.
>>>
>>> In other places, such as callers of report_vm_out_of_memory, the
>>> situation is much more dire and the calling thread may not even be a
>>> JavaThread and as such cannot "throw" an exception.
>>> report_vm_out_of_memory is only invoked through the macro
>>> vm_exit_out_of_memory, which of course implies that the condition is
>>> fatal and we are about to terminate the JVM process altogether.
>>>
>>> I think that it's incorrect to call code related to
>>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
>>> condition may not even be correlated with Java level application behavior.
>>>
>>> /Mikael
>>>
>>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>>>>
>>>> Testing: jprt and jtreg (on Linux x86_64)
>>>>
>>>> Regards,
>>>> Shafi
>>>>

From shafi.s.ahmad at oracle.com  Fri Oct 14 07:21:14 2016
From: shafi.s.ahmad at oracle.com (Shafi Ahmad)
Date: Fri, 14 Oct 2016 00:21:14 -0700 (PDT)
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
	<0fe3deb1-594c-46e6-829b-fe70315d3496@default>
	<4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
Message-ID: <c822753c-f821-4ee4-a7eb-21483ef27c43@default>

Hi David,

Thanks for the clarification. I will send the updated webrev.

Regards,
Shafi

> -----Original Message-----
> From: David Holmes
> Sent: Friday, October 14, 2016 12:02 PM
> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> work for OOM caused by inability to create threads'
> 
> Hi Shafi,
> 
> I stand by my previous comment - in the context of this bug, in relation to
> failure to create a native thread "A call to report_java_out_of_memory
> should only be made on a code path that will throw an OOME."
> 
> Does this mean we have all OOM (not OOME)** situations covered? Nope.
> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError
> seem specific to OutOfMemoryError to me - and not that useful for dealing
> with the JNI leak you describe.
> 
> Feel free to file a RFE to look into more elaborate/extensive OOM handling.
> I'm not sure if NMT hooks into JNI.
> 
> Thanks,
> David
> 
> 
> ** OOM - out-of-memory
>     OOME - OutOfMemoryError - a Java exception thrown in response to a
> detected OOM condition
> 
> On 14/10/2016 3:28 PM, Shafi Ahmad wrote:
> > Hi,
> >
> > May I get some comment on this.
> >
> > Regards,
> > Shafi
> >
> >> -----Original Message-----
> >> From: Shafi Ahmad
> >> Sent: Wednesday, October 12, 2016 12:42 PM
> >> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
> >> Cc: David Holmes
> >> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> >> work for OOM caused by inability to create threads'
> >>
> >> Hi Mikael,
> >>
> >> Thanks for reviewing it.
> >>
> >> Once VM is initialized, following are two OOME scenarios:
> >>  1) OOME due to unavailability of java memory [Mainly due to java
> >> application].
> >>  2) OOME due to unavailability of native memory.
> >>
> >> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
> >> correct.
> >>
> >> Consider the following scenarios:
> >> 1) Let there is java application which uses JNI code and inside JNI
> >> code there is native memory allocation/free and we hit OOME.
> >> 2) Let there is java application which uses JNI code and inside JNI
> >> code there is memory leak error and due to this OOME situation occurs.
> >> 3) We use jvm option Xms and  -Xmx in such a way that the available
> >> native memory is very less and VM hit OOME.
> >>
> >> I am not sure above scenario is feasible in JVM or not but if any of
> >> the above scenario is possible in VM then should we consider it as
> >> OOME due java application or not?
> >> I consider case 1) and 2) as OOME due to java application and added
> >> code for java.lang.OutOfMemoryError inside
> report_vm_out_of_memory.
> >>
> >> My  assumption of OOME once VM is initialized completely is due to
> >> java application[directly or indirectly] may not hold true always.
> >> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I
> >> added the related code change inside report_vm_out_of_memory.
> >> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
> >>
> >> Please let me know whether I should remove the code change inside
> >> report_vm_out_of_memory or keep it by adding appropriate reason of
> >> OutOfMemoryError.
> >>
> >> Regards,
> >> Shafi
> >>
> >>> -----Original Message-----
> >>> From: Mikael Gerdin
> >>> Sent: Monday, October 10, 2016 7:30 PM
> >>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
> >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError
> >>> doesn't work for OOM caused by inability to create threads'
> >>>
> >>> Hi,
> >>>
> >>> On 2016-10-10 09:24, Shafi Ahmad wrote:
> >>>> Hi All,
> >>>>
> >>>> Please review the simple change for the fix of bug '' JDK-8155004:
> >>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
> >>> create threads'.
> >>>>
> >>>> Summary:
> >>>> In the current implementation there are few scenarios where we are
> >>>> not
> >>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> >>>> While I was analysis this issue I found there are two jvm state
> >>>> where OOM
> >>> can happen:
> >>>>  1.  OOM during VM initialization - as per our internal discussion
> >>>> for this case
> >>> it is not worth for dumping core file, so this is left as it is.
> >>>>  2.  OOM once VM is initialized - For this scenario most of the
> >>>> place code is
> >>> already added but few place corresponding code changes are missing
> >>> so this change covers it.
> >>>>
> >>>> Webrev link:
> >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
> >>>
> >>>
> >>> There is a lot of confusion in the VM code with the term "out of
> >>> memory error".
> >>> In some places it refers to code throwing a
> >>> java.lang.OutOfMemoryError and expecting running java code to be
> >>> able to potentially catch that Error and continue running.
> >>>
> >>> In other places, such as callers of report_vm_out_of_memory, the
> >>> situation is much more dire and the calling thread may not even be a
> >>> JavaThread and as such cannot "throw" an exception.
> >>> report_vm_out_of_memory is only invoked through the macro
> >>> vm_exit_out_of_memory, which of course implies that the condition is
> >>> fatal and we are about to terminate the JVM process altogether.
> >>>
> >>> I think that it's incorrect to call code related to
> >>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
> >>> condition may not even be correlated with Java level application
> behavior.
> >>>
> >>> /Mikael
> >>>
> >>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
> >>>>
> >>>> Testing: jprt and jtreg (on Linux x86_64)
> >>>>
> >>>> Regards,
> >>>> Shafi
> >>>>

From shafi.s.ahmad at oracle.com  Fri Oct 14 08:55:50 2016
From: shafi.s.ahmad at oracle.com (Shafi Ahmad)
Date: Fri, 14 Oct 2016 01:55:50 -0700 (PDT)
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
	<0fe3deb1-594c-46e6-829b-fe70315d3496@default>
	<4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
Message-ID: <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default>

Please find updated webrev.

http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/

Regards,
Shafi

> -----Original Message-----
> From: David Holmes
> Sent: Friday, October 14, 2016 12:02 PM
> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> work for OOM caused by inability to create threads'
> 
> Hi Shafi,
> 
> I stand by my previous comment - in the context of this bug, in relation to
> failure to create a native thread "A call to report_java_out_of_memory
> should only be made on a code path that will throw an OOME."
> 
> Does this mean we have all OOM (not OOME)** situations covered? Nope.
> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError
> seem specific to OutOfMemoryError to me - and not that useful for dealing
> with the JNI leak you describe.
> 
> Feel free to file a RFE to look into more elaborate/extensive OOM handling.
> I'm not sure if NMT hooks into JNI.
> 
> Thanks,
> David
> 
> 
> ** OOM - out-of-memory
>     OOME - OutOfMemoryError - a Java exception thrown in response to a
> detected OOM condition
> 
> On 14/10/2016 3:28 PM, Shafi Ahmad wrote:
> > Hi,
> >
> > May I get some comment on this.
> >
> > Regards,
> > Shafi
> >
> >> -----Original Message-----
> >> From: Shafi Ahmad
> >> Sent: Wednesday, October 12, 2016 12:42 PM
> >> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
> >> Cc: David Holmes
> >> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
> >> work for OOM caused by inability to create threads'
> >>
> >> Hi Mikael,
> >>
> >> Thanks for reviewing it.
> >>
> >> Once VM is initialized, following are two OOME scenarios:
> >>  1) OOME due to unavailability of java memory [Mainly due to java
> >> application].
> >>  2) OOME due to unavailability of native memory.
> >>
> >> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
> >> correct.
> >>
> >> Consider the following scenarios:
> >> 1) Let there is java application which uses JNI code and inside JNI
> >> code there is native memory allocation/free and we hit OOME.
> >> 2) Let there is java application which uses JNI code and inside JNI
> >> code there is memory leak error and due to this OOME situation occurs.
> >> 3) We use jvm option Xms and  -Xmx in such a way that the available
> >> native memory is very less and VM hit OOME.
> >>
> >> I am not sure above scenario is feasible in JVM or not but if any of
> >> the above scenario is possible in VM then should we consider it as
> >> OOME due java application or not?
> >> I consider case 1) and 2) as OOME due to java application and added
> >> code for java.lang.OutOfMemoryError inside
> report_vm_out_of_memory.
> >>
> >> My  assumption of OOME once VM is initialized completely is due to
> >> java application[directly or indirectly] may not hold true always.
> >> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I
> >> added the related code change inside report_vm_out_of_memory.
> >> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
> >>
> >> Please let me know whether I should remove the code change inside
> >> report_vm_out_of_memory or keep it by adding appropriate reason of
> >> OutOfMemoryError.
> >>
> >> Regards,
> >> Shafi
> >>
> >>> -----Original Message-----
> >>> From: Mikael Gerdin
> >>> Sent: Monday, October 10, 2016 7:30 PM
> >>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
> >>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError
> >>> doesn't work for OOM caused by inability to create threads'
> >>>
> >>> Hi,
> >>>
> >>> On 2016-10-10 09:24, Shafi Ahmad wrote:
> >>>> Hi All,
> >>>>
> >>>> Please review the simple change for the fix of bug '' JDK-8155004:
> >>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
> >>> create threads'.
> >>>>
> >>>> Summary:
> >>>> In the current implementation there are few scenarios where we are
> >>>> not
> >>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
> >>>> While I was analysis this issue I found there are two jvm state
> >>>> where OOM
> >>> can happen:
> >>>>  1.  OOM during VM initialization - as per our internal discussion
> >>>> for this case
> >>> it is not worth for dumping core file, so this is left as it is.
> >>>>  2.  OOM once VM is initialized - For this scenario most of the
> >>>> place code is
> >>> already added but few place corresponding code changes are missing
> >>> so this change covers it.
> >>>>
> >>>> Webrev link:
> >> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
> >>>
> >>>
> >>> There is a lot of confusion in the VM code with the term "out of
> >>> memory error".
> >>> In some places it refers to code throwing a
> >>> java.lang.OutOfMemoryError and expecting running java code to be
> >>> able to potentially catch that Error and continue running.
> >>>
> >>> In other places, such as callers of report_vm_out_of_memory, the
> >>> situation is much more dire and the calling thread may not even be a
> >>> JavaThread and as such cannot "throw" an exception.
> >>> report_vm_out_of_memory is only invoked through the macro
> >>> vm_exit_out_of_memory, which of course implies that the condition is
> >>> fatal and we are about to terminate the JVM process altogether.
> >>>
> >>> I think that it's incorrect to call code related to
> >>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
> >>> condition may not even be correlated with Java level application
> behavior.
> >>>
> >>> /Mikael
> >>>
> >>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
> >>>>
> >>>> Testing: jprt and jtreg (on Linux x86_64)
> >>>>
> >>>> Regards,
> >>>> Shafi
> >>>>

From david.holmes at oracle.com  Fri Oct 14 11:26:21 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 21:26:21 +1000
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
	<0fe3deb1-594c-46e6-829b-fe70315d3496@default>
	<4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
	<442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default>
Message-ID: <e391a84a-ecdd-4935-5e18-8b49d4bb8ddb@oracle.com>

On 14/10/2016 6:55 PM, Shafi Ahmad wrote:
> Please find updated webrev.
>
> http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/

Ok.

Thanks,
David

> Regards,
> Shafi
>
>> -----Original Message-----
>> From: David Holmes
>> Sent: Friday, October 14, 2016 12:02 PM
>> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>> work for OOM caused by inability to create threads'
>>
>> Hi Shafi,
>>
>> I stand by my previous comment - in the context of this bug, in relation to
>> failure to create a native thread "A call to report_java_out_of_memory
>> should only be made on a code path that will throw an OOME."
>>
>> Does this mean we have all OOM (not OOME)** situations covered? Nope.
>> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError
>> seem specific to OutOfMemoryError to me - and not that useful for dealing
>> with the JNI leak you describe.
>>
>> Feel free to file a RFE to look into more elaborate/extensive OOM handling.
>> I'm not sure if NMT hooks into JNI.
>>
>> Thanks,
>> David
>>
>>
>> ** OOM - out-of-memory
>>     OOME - OutOfMemoryError - a Java exception thrown in response to a
>> detected OOM condition
>>
>> On 14/10/2016 3:28 PM, Shafi Ahmad wrote:
>>> Hi,
>>>
>>> May I get some comment on this.
>>>
>>> Regards,
>>> Shafi
>>>
>>>> -----Original Message-----
>>>> From: Shafi Ahmad
>>>> Sent: Wednesday, October 12, 2016 12:42 PM
>>>> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
>>>> Cc: David Holmes
>>>> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>>>> work for OOM caused by inability to create threads'
>>>>
>>>> Hi Mikael,
>>>>
>>>> Thanks for reviewing it.
>>>>
>>>> Once VM is initialized, following are two OOME scenarios:
>>>>  1) OOME due to unavailability of java memory [Mainly due to java
>>>> application].
>>>>  2) OOME due to unavailability of native memory.
>>>>
>>>> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
>>>> correct.
>>>>
>>>> Consider the following scenarios:
>>>> 1) Let there is java application which uses JNI code and inside JNI
>>>> code there is native memory allocation/free and we hit OOME.
>>>> 2) Let there is java application which uses JNI code and inside JNI
>>>> code there is memory leak error and due to this OOME situation occurs.
>>>> 3) We use jvm option Xms and  -Xmx in such a way that the available
>>>> native memory is very less and VM hit OOME.
>>>>
>>>> I am not sure above scenario is feasible in JVM or not but if any of
>>>> the above scenario is possible in VM then should we consider it as
>>>> OOME due java application or not?
>>>> I consider case 1) and 2) as OOME due to java application and added
>>>> code for java.lang.OutOfMemoryError inside
>> report_vm_out_of_memory.
>>>>
>>>> My  assumption of OOME once VM is initialized completely is due to
>>>> java application[directly or indirectly] may not hold true always.
>>>> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I
>>>> added the related code change inside report_vm_out_of_memory.
>>>> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
>>>>
>>>> Please let me know whether I should remove the code change inside
>>>> report_vm_out_of_memory or keep it by adding appropriate reason of
>>>> OutOfMemoryError.
>>>>
>>>> Regards,
>>>> Shafi
>>>>
>>>>> -----Original Message-----
>>>>> From: Mikael Gerdin
>>>>> Sent: Monday, October 10, 2016 7:30 PM
>>>>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
>>>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError
>>>>> doesn't work for OOM caused by inability to create threads'
>>>>>
>>>>> Hi,
>>>>>
>>>>> On 2016-10-10 09:24, Shafi Ahmad wrote:
>>>>>> Hi All,
>>>>>>
>>>>>> Please review the simple change for the fix of bug '' JDK-8155004:
>>>>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
>>>>> create threads'.
>>>>>>
>>>>>> Summary:
>>>>>> In the current implementation there are few scenarios where we are
>>>>>> not
>>>>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
>>>>>> While I was analysis this issue I found there are two jvm state
>>>>>> where OOM
>>>>> can happen:
>>>>>>  1.  OOM during VM initialization - as per our internal discussion
>>>>>> for this case
>>>>> it is not worth for dumping core file, so this is left as it is.
>>>>>>  2.  OOM once VM is initialized - For this scenario most of the
>>>>>> place code is
>>>>> already added but few place corresponding code changes are missing
>>>>> so this change covers it.
>>>>>>
>>>>>> Webrev link:
>>>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
>>>>>
>>>>>
>>>>> There is a lot of confusion in the VM code with the term "out of
>>>>> memory error".
>>>>> In some places it refers to code throwing a
>>>>> java.lang.OutOfMemoryError and expecting running java code to be
>>>>> able to potentially catch that Error and continue running.
>>>>>
>>>>> In other places, such as callers of report_vm_out_of_memory, the
>>>>> situation is much more dire and the calling thread may not even be a
>>>>> JavaThread and as such cannot "throw" an exception.
>>>>> report_vm_out_of_memory is only invoked through the macro
>>>>> vm_exit_out_of_memory, which of course implies that the condition is
>>>>> fatal and we are about to terminate the JVM process altogether.
>>>>>
>>>>> I think that it's incorrect to call code related to
>>>>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
>>>>> condition may not even be correlated with Java level application
>> behavior.
>>>>>
>>>>> /Mikael
>>>>>
>>>>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>>>>>>
>>>>>> Testing: jprt and jtreg (on Linux x86_64)
>>>>>>
>>>>>> Regards,
>>>>>> Shafi
>>>>>>

From mikael.gerdin at oracle.com  Fri Oct 14 11:54:48 2016
From: mikael.gerdin at oracle.com (Mikael Gerdin)
Date: Fri, 14 Oct 2016 13:54:48 +0200
Subject: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't work for
	OOM caused by inability to create threads'
In-Reply-To: <e391a84a-ecdd-4935-5e18-8b49d4bb8ddb@oracle.com>
References: <ed5fc5e2-14c4-4496-ba04-4f0a68ffe3f7@default>
	<eccc6f7d-184d-4a65-4a99-659c9114918b@oracle.com>
	<5eb7474b-a72e-41c0-b389-bfad82270f18@default>
	<0fe3deb1-594c-46e6-829b-fe70315d3496@default>
	<4cb4bae8-e827-5558-fed1-8a03ea0dec1a@oracle.com>
	<442a6cf8-005f-4c8e-9ff8-e6f7308dffe5@default>
	<e391a84a-ecdd-4935-5e18-8b49d4bb8ddb@oracle.com>
Message-ID: <68331d88-4c52-c737-1b09-bf3cfe81720e@oracle.com>


On 2016-10-14 13:26, David Holmes wrote:
> On 14/10/2016 6:55 PM, Shafi Ahmad wrote:
>> Please find updated webrev.
>>
>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.01/
>
> Ok.

+1

/Mikael
>
> Thanks,
> David
>
>> Regards,
>> Shafi
>>
>>> -----Original Message-----
>>> From: David Holmes
>>> Sent: Friday, October 14, 2016 12:02 PM
>>> To: Shafi Ahmad; Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>>> work for OOM caused by inability to create threads'
>>>
>>> Hi Shafi,
>>>
>>> I stand by my previous comment - in the context of this bug, in
>>> relation to
>>> failure to create a native thread "A call to report_java_out_of_memory
>>> should only be made on a code path that will throw an OOME."
>>>
>>> Does this mean we have all OOM (not OOME)** situations covered? Nope.
>>> But HeapDumpOnOutOfMemoryError and CrashOnOutOfMemoryError
>>> seem specific to OutOfMemoryError to me - and not that useful for
>>> dealing
>>> with the JNI leak you describe.
>>>
>>> Feel free to file a RFE to look into more elaborate/extensive OOM
>>> handling.
>>> I'm not sure if NMT hooks into JNI.
>>>
>>> Thanks,
>>> David
>>>
>>>
>>> ** OOM - out-of-memory
>>>     OOME - OutOfMemoryError - a Java exception thrown in response to a
>>> detected OOM condition
>>>
>>> On 14/10/2016 3:28 PM, Shafi Ahmad wrote:
>>>> Hi,
>>>>
>>>> May I get some comment on this.
>>>>
>>>> Regards,
>>>> Shafi
>>>>
>>>>> -----Original Message-----
>>>>> From: Shafi Ahmad
>>>>> Sent: Wednesday, October 12, 2016 12:42 PM
>>>>> To: Mikael Gerdin; hotspot-runtime-dev at openjdk.java.net
>>>>> Cc: David Holmes
>>>>> Subject: RE: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError doesn't
>>>>> work for OOM caused by inability to create threads'
>>>>>
>>>>> Hi Mikael,
>>>>>
>>>>> Thanks for reviewing it.
>>>>>
>>>>> Once VM is initialized, following are two OOME scenarios:
>>>>>  1) OOME due to unavailability of java memory [Mainly due to java
>>>>> application].
>>>>>  2) OOME due to unavailability of native memory.
>>>>>
>>>>> Please correct me if I am wrong, for 1) java.lang.OutOfMemoryError is
>>>>> correct.
>>>>>
>>>>> Consider the following scenarios:
>>>>> 1) Let there is java application which uses JNI code and inside JNI
>>>>> code there is native memory allocation/free and we hit OOME.
>>>>> 2) Let there is java application which uses JNI code and inside JNI
>>>>> code there is memory leak error and due to this OOME situation occurs.
>>>>> 3) We use jvm option Xms and  -Xmx in such a way that the available
>>>>> native memory is very less and VM hit OOME.
>>>>>
>>>>> I am not sure above scenario is feasible in JVM or not but if any of
>>>>> the above scenario is possible in VM then should we consider it as
>>>>> OOME due java application or not?
>>>>> I consider case 1) and 2) as OOME due to java application and added
>>>>> code for java.lang.OutOfMemoryError inside
>>> report_vm_out_of_memory.
>>>>>
>>>>> My  assumption of OOME once VM is initialized completely is due to
>>>>> java application[directly or indirectly] may not hold true always.
>>>>> -XX:+CrashOnOutOfMemoryError is mostly for debugging purpose so I
>>>>> added the related code change inside report_vm_out_of_memory.
>>>>> Yes, I must not use ' java.lang.OutOfMemoryError'  for such case.
>>>>>
>>>>> Please let me know whether I should remove the code change inside
>>>>> report_vm_out_of_memory or keep it by adding appropriate reason of
>>>>> OutOfMemoryError.
>>>>>
>>>>> Regards,
>>>>> Shafi
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Mikael Gerdin
>>>>>> Sent: Monday, October 10, 2016 7:30 PM
>>>>>> To: Shafi Ahmad; hotspot-runtime-dev at openjdk.java.net
>>>>>> Subject: Re: [9] RFR for JDK-8155004: CrashOnOutOfMemoryError
>>>>>> doesn't work for OOM caused by inability to create threads'
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On 2016-10-10 09:24, Shafi Ahmad wrote:
>>>>>>> Hi All,
>>>>>>>
>>>>>>> Please review the simple change for the fix of bug '' JDK-8155004:
>>>>>> CrashOnOutOfMemoryError doesn't work for OOM caused by inability to
>>>>>> create threads'.
>>>>>>>
>>>>>>> Summary:
>>>>>>> In the current implementation there are few scenarios where we are
>>>>>>> not
>>>>>> obeying the jvm option -XX:+CrashOnOutOfMemoryError.
>>>>>>> While I was analysis this issue I found there are two jvm state
>>>>>>> where OOM
>>>>>> can happen:
>>>>>>>  1.  OOM during VM initialization - as per our internal discussion
>>>>>>> for this case
>>>>>> it is not worth for dumping core file, so this is left as it is.
>>>>>>>  2.  OOM once VM is initialized - For this scenario most of the
>>>>>>> place code is
>>>>>> already added but few place corresponding code changes are missing
>>>>>> so this change covers it.
>>>>>>>
>>>>>>> Webrev link:
>>>>> http://cr.openjdk.java.net/~shshahma/8155004/webrev.00/
>>>>>>
>>>>>>
>>>>>> There is a lot of confusion in the VM code with the term "out of
>>>>>> memory error".
>>>>>> In some places it refers to code throwing a
>>>>>> java.lang.OutOfMemoryError and expecting running java code to be
>>>>>> able to potentially catch that Error and continue running.
>>>>>>
>>>>>> In other places, such as callers of report_vm_out_of_memory, the
>>>>>> situation is much more dire and the calling thread may not even be a
>>>>>> JavaThread and as such cannot "throw" an exception.
>>>>>> report_vm_out_of_memory is only invoked through the macro
>>>>>> vm_exit_out_of_memory, which of course implies that the condition is
>>>>>> fatal and we are about to terminate the JVM process altogether.
>>>>>>
>>>>>> I think that it's incorrect to call code related to
>>>>>> java.lang.OutOfMemoryError in report_vm_out_of_memory since the
>>>>>> condition may not even be correlated with Java level application
>>> behavior.
>>>>>>
>>>>>> /Mikael
>>>>>>
>>>>>>> Jdk9 bug: https://bugs.openjdk.java.net/browse/JDK-8155004
>>>>>>>
>>>>>>> Testing: jprt and jtreg (on Linux x86_64)
>>>>>>>
>>>>>>> Regards,
>>>>>>> Shafi
>>>>>>>

From christian.tornqvist at oracle.com  Fri Oct 14 12:11:27 2016
From: christian.tornqvist at oracle.com (Christian Tornqvist)
Date: Fri, 14 Oct 2016 08:11:27 -0400
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
	<0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>
Message-ID: <186a01d22614$17a3d520$46eb7f60$@oracle.com>

Hi Staffan/David,

I looked at how the other platforms deal with it, they throw an
IllegalArgumentException when they fail to open the file. I think it makes
sense for Windows to do the same, please see the updated webrev with IAE
instead of a NPE:

http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/

Thanks,
Christian 

-----Original Message-----
From: Staffan Larsen [mailto:staffan.larsen at oracle.com] 
Sent: Friday, October 14, 2016 2:19 AM
To: Christian Tornqvist <christian.tornqvist at oracle.com>
Cc: hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
Could not open PerfMemory on Windows

Thanks for getting to the bottom of this!

The fix looks good. Perhaps, as David points out, FileNotFoundException is a
better choice but that requires more changes as FileNotFoundException is not
used anywhere else in the JVM. 

Thanks,
/Staffan
 
> On 13 Oct 2016, at 20:09, Christian Tornqvist
<christian.tornqvist at oracle.com> wrote:
> 
> Hi everyone,
> 
> 
> 
> Please review this small fix for an intermittent issue we've seen when 
> running tests concurrently that use jcmd/jstack.
> 
> When running jcmd, we enumerate the perfdata files and then open them 
> one by one to read things like main class names etc. If the perfdata 
> file disappears (because the Java process exited) before we get to it, 
> we end up with different exceptions depending on where in the code we are.
> 
> 
> 
> The code at:
> 
> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/
> share/
> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
> 
> 
> 
> handles this, the problem is that if we get all the way to
> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception 
> which isn't caught by this. The fix is to throw a NPE instead of 
> Exception and let the existing code handle this.
> 
> 
> 
> Fix has been tested locally and with 30 JPRT runs (with concurrency 
> patch applied), also managed to reproduce and verify this fix locally 
> using a debugger to trigger the race.
> 
> 
> 
> Webrev:
> 
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
> 
> 
> 
> Bug (unfortunately not visible):
> 
> https://bugs.openjdk.java.net/browse/JDK-8159799
> 
> 
> 
> Thanks,
> 
> Christian
> 
> 
> 
> 
> 


From george.triantafillou at oracle.com  Fri Oct 14 12:12:51 2016
From: george.triantafillou at oracle.com (George Triantafillou)
Date: Fri, 14 Oct 2016 08:12:51 -0400
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
	<0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>
	<186a01d22614$17a3d520$46eb7f60$@oracle.com>
Message-ID: <b79547d3-041e-0178-8a2b-e849f0b9386b@oracle.com>

Hi Christian,

This looks good!

-George

On 10/14/2016 8:11 AM, Christian Tornqvist wrote:
> Hi Staffan/David,
>
> I looked at how the other platforms deal with it, they throw an
> IllegalArgumentException when they fail to open the file. I think it makes
> sense for Windows to do the same, please see the updated webrev with IAE
> instead of a NPE:
>
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/
>
> Thanks,
> Christian
>
> -----Original Message-----
> From: Staffan Larsen [mailto:staffan.larsen at oracle.com]
> Sent: Friday, October 14, 2016 2:19 AM
> To: Christian Tornqvist <christian.tornqvist at oracle.com>
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
> Could not open PerfMemory on Windows
>
> Thanks for getting to the bottom of this!
>
> The fix looks good. Perhaps, as David points out, FileNotFoundException is a
> better choice but that requires more changes as FileNotFoundException is not
> used anywhere else in the JVM.
>
> Thanks,
> /Staffan
>   
>> On 13 Oct 2016, at 20:09, Christian Tornqvist
> <christian.tornqvist at oracle.com> wrote:
>> Hi everyone,
>>
>>
>>
>> Please review this small fix for an intermittent issue we've seen when
>> running tests concurrently that use jcmd/jstack.
>>
>> When running jcmd, we enumerate the perfdata files and then open them
>> one by one to read things like main class names etc. If the perfdata
>> file disappears (because the Java process exited) before we get to it,
>> we end up with different exceptions depending on where in the code we are.
>>
>>
>>
>> The code at:
>>
>> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/
>> share/
>> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
>>
>>
>>
>> handles this, the problem is that if we get all the way to
>> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception
>> which isn't caught by this. The fix is to throw a NPE instead of
>> Exception and let the existing code handle this.
>>
>>
>>
>> Fix has been tested locally and with 30 JPRT runs (with concurrency
>> patch applied), also managed to reproduce and verify this fix locally
>> using a debugger to trigger the race.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
>>
>>
>>
>> Bug (unfortunately not visible):
>>
>> https://bugs.openjdk.java.net/browse/JDK-8159799
>>
>>
>>
>> Thanks,
>>
>> Christian
>>
>>
>>
>>
>>
>


From frederic.parain at oracle.com  Fri Oct 14 12:26:49 2016
From: frederic.parain at oracle.com (Frederic Parain)
Date: Fri, 14 Oct 2016 08:26:49 -0400
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
	<0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>
	<186a01d22614$17a3d520$46eb7f60$@oracle.com>
Message-ID: <456f7cc0-2f4c-90f8-ea26-2800d934bcdb@oracle.com>

Christian,

Great work to find the root cause of the issue!
The fix with the IAE looks good to me.

Thanks,

Fred

On 10/14/2016 08:11 AM, Christian Tornqvist wrote:
> Hi Staffan/David,
>
> I looked at how the other platforms deal with it, they throw an
> IllegalArgumentException when they fail to open the file. I think it makes
> sense for Windows to do the same, please see the updated webrev with IAE
> instead of a NPE:
>
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/
>
> Thanks,
> Christian
>
> -----Original Message-----
> From: Staffan Larsen [mailto:staffan.larsen at oracle.com]
> Sent: Friday, October 14, 2016 2:19 AM
> To: Christian Tornqvist <christian.tornqvist at oracle.com>
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
> Could not open PerfMemory on Windows
>
> Thanks for getting to the bottom of this!
>
> The fix looks good. Perhaps, as David points out, FileNotFoundException is a
> better choice but that requires more changes as FileNotFoundException is not
> used anywhere else in the JVM.
>
> Thanks,
> /Staffan
>
>> On 13 Oct 2016, at 20:09, Christian Tornqvist
> <christian.tornqvist at oracle.com> wrote:
>>
>> Hi everyone,
>>
>>
>>
>> Please review this small fix for an intermittent issue we've seen when
>> running tests concurrently that use jcmd/jstack.
>>
>> When running jcmd, we enumerate the perfdata files and then open them
>> one by one to read things like main class names etc. If the perfdata
>> file disappears (because the Java process exited) before we get to it,
>> we end up with different exceptions depending on where in the code we are.
>>
>>
>>
>> The code at:
>>
>> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/
>> share/
>> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
>>
>>
>>
>> handles this, the problem is that if we get all the way to
>> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception
>> which isn't caught by this. The fix is to throw a NPE instead of
>> Exception and let the existing code handle this.
>>
>>
>>
>> Fix has been tested locally and with 30 JPRT runs (with concurrency
>> patch applied), also managed to reproduce and verify this fix locally
>> using a debugger to trigger the race.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
>>
>>
>>
>> Bug (unfortunately not visible):
>>
>> https://bugs.openjdk.java.net/browse/JDK-8159799
>>
>>
>>
>> Thanks,
>>
>> Christian
>>
>>
>>
>>
>>
>
>

From dmitry.dmitriev at oracle.com  Fri Oct 14 12:46:09 2016
From: dmitry.dmitriev at oracle.com (Dmitry Dmitriev)
Date: Fri, 14 Oct 2016 15:46:09 +0300
Subject: RFR(XS) 8166155: Create tests for VM module option handling
In-Reply-To: <5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com>
References: <3d7981cb-7d27-a086-e46c-ec8f82f23849@oracle.com>
	<5a0075a3-0a36-d3f5-6ed3-2c04d3f7cda3@oracle.com>
Message-ID: <9e03ff3e-684f-6b12-8589-2ffc4de24ec2@oracle.com>

Hi George,

Thank you for moving the test to the separate file.
Few comments:
1) Why you not use createJavaProcessBuilder from 
jdk.test.lib.process.ProcessTools testlibrary class?
2) You can reduce size of the test by introducing common test function:
private static void checkInvalidModuleOption(String VMOptionFile, String 
expectedOutput) {
          ProcessBuilder pb = createJavaProcessBuilder(
              "-XX:VMOptionsFile=" + 
getAbsolutePathFromSource(VMOptionFile));
          output = new OutputAnalyzer(pb.start());
          output.shouldContain(expectedOutput);
          output.shouldHaveExitValue(1);
}

In this case main test function will looks like this:
checkInvalidModuleOption(ADD_MODULES_BAD1, "Usage");
checkInvalidModuleOption(ADD_MODULES_BAD2, "Unrecognized option");
...

Thanks,
Dmitry

On 13.10.2016 14:40, George Triantafillou wrote:
> After offline feedback from Dmitry Dmitriev, here's an updated webrev:
>
> http://cr.openjdk.java.net/~gtriantafill/8166155/webrev.01/ 
> <http://cr.openjdk.java.net/%7Egtriantafill/8166155/webrev.01/>
>
> The test was moved to a separate test for VM module option handling.  
> Thanks.
>
> -George
>
> On 9/15/2016 2:25 PM, George Triantafillou wrote:
>> Please review this change that adds test coverage for the new VM 
>> module option handling implemented in 
>> https://bugs.openjdk.java.net/browse/JDK-8157038.
>>
>> JBS: https://bugs.openjdk.java.net/browse/JDK-8166155
>> webrev: http://cr.openjdk.java.net/~gtriantafill/8166155/webrev/ 
>> <http://cr.openjdk.java.net/%7Egtriantafill/8166155/webrev/>
>>
>> Tested locally on Linux.
>>
>> Thanks.
>>
>> -George
>>
>


From david.holmes at oracle.com  Fri Oct 14 13:10:37 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 23:10:37 +1000
Subject: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
	Could not open PerfMemory on Windows
In-Reply-To: <186a01d22614$17a3d520$46eb7f60$@oracle.com>
References: <13f501d2257c$e4fcf080$aef6d180$@oracle.com>
	<0880DBFD-7E48-47A9-86D0-A581B30C57BF@oracle.com>
	<186a01d22614$17a3d520$46eb7f60$@oracle.com>
Message-ID: <be4af83f-f297-6ba7-044a-24f3f0d4c125@oracle.com>

On 14/10/2016 10:11 PM, Christian Tornqvist wrote:
> Hi Staffan/David,
>
> I looked at how the other platforms deal with it, they throw an
> IllegalArgumentException when they fail to open the file. I think it makes
> sense for Windows to do the same, please see the updated webrev with IAE
> instead of a NPE:
>
> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.01/

I still find IAE a little odd in this context, but okay - consistency 
counts for a lot. :)

Thanks,
David

> Thanks,
> Christian
>
> -----Original Message-----
> From: Staffan Larsen [mailto:staffan.larsen at oracle.com]
> Sent: Friday, October 14, 2016 2:19 AM
> To: Christian Tornqvist <christian.tornqvist at oracle.com>
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(XS): 8159799 - Tests using jcmd fails intermittently with
> Could not open PerfMemory on Windows
>
> Thanks for getting to the bottom of this!
>
> The fix looks good. Perhaps, as David points out, FileNotFoundException is a
> better choice but that requires more changes as FileNotFoundException is not
> used anywhere else in the JVM.
>
> Thanks,
> /Staffan
>
>> On 13 Oct 2016, at 20:09, Christian Tornqvist
> <christian.tornqvist at oracle.com> wrote:
>>
>> Hi everyone,
>>
>>
>>
>> Please review this small fix for an intermittent issue we've seen when
>> running tests concurrently that use jcmd/jstack.
>>
>> When running jcmd, we enumerate the perfdata files and then open them
>> one by one to read things like main class names etc. If the perfdata
>> file disappears (because the Java process exited) before we get to it,
>> we end up with different exceptions depending on where in the code we are.
>>
>>
>>
>> The code at:
>>
>> http://hg.openjdk.java.net/jdk9/hs/jdk/file/3d3f338b5aea/src/jdk.jcmd/
>> share/
>> classes/sun/tools/common/ProcessArgumentMatcher.java#l88
>>
>>
>>
>> handles this, the problem is that if we get all the way to
>> open_sharedmem_object() in the JVM, we'll throw a java.lang.Exception
>> which isn't caught by this. The fix is to throw a NPE instead of
>> Exception and let the existing code handle this.
>>
>>
>>
>> Fix has been tested locally and with 30 JPRT runs (with concurrency
>> patch applied), also managed to reproduce and verify this fix locally
>> using a debugger to trigger the race.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~ctornqvi/webrev/8159799/webrev.00/
>>
>>
>>
>> Bug (unfortunately not visible):
>>
>> https://bugs.openjdk.java.net/browse/JDK-8159799
>>
>>
>>
>> Thanks,
>>
>> Christian
>>
>>
>>
>>
>>
>
>

From david.holmes at oracle.com  Fri Oct 14 06:15:43 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 14 Oct 2016 16:15:43 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OFC81622C2.ABFF22F0-ON49258048.00271D0E-49258048.004FB8B8@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
	<OFC81622C2.ABFF22F0-ON49258048.00271D0E-49258048.004FB8B8@notes.na.collabserv.com>
Message-ID: <9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com>

On 11/10/2016 12:30 AM, Hiroshi H Horii wrote:
> Hi Thomas, David, and all,
>
>> I think you intended to modify cmpxchg_pre_membar not
>> cmpxchg_post_membar!
>
> The previous patch will change only behavior of cmpxchg_pre_membar. But

No it changed the post-membar:

http://cr.openjdk.java.net/~horii/8154736/webrev.04/src/os_cpu/linux_ppc/vm/atomic_linux_ppc.hpp.cdiff.html

   inline void cmpxchg_post_membar(cmpxchg_memory_order order) {
!   if (order == memory_order_conservative) {
       __asm__ __volatile__ (
         /* fence */
         strasm_sync
         );
     }

but I did get confused in what I wrote previously. Given the release 
must come before the store the pre barrier must be the one that does 
that - as per latest code.

> the patch is not good to be reviewed (it was not obvious) and Martin
> suggested me to use lwsync rather than sync.
> I created a new webrev. This webrev includes all points that David and
> Thomas pointed also.
>
> http://cr.openjdk.java.net/~horii/8154736/webrev.05/
>
> With this change, callers of copy_to_survivor_space can safely touch
> fields of returned obj because OrderAccess::acquire() is called in
> copy_to_survivor_space when CAS fails.

So the intent is that the acquire() pairs with the release() semantics 
of the cmpxchg store that succeeded in the other thread. That makes 
sense, though I really have to question the trade-off in code complexity 
and understandability against any performance gain due to a slight 
reduction in the barrier strengths. Do you have any metrics on this 
latest version?

>> Changes in shared code must be algorithmically correct on all platforms.
>> Not just "it will work fine today".
>>
>> Given all then work being done to add missing barriers, removing them
>> must come with a detailed analysis establishing the safety of doing so.
>> And I am not seeing that here.
>
> The latest codes in the repository are missing some calls of
> OrderAccess::acquire() before touching fileds of new_obj or
> o->forwardee() in PSPromotionManager::copy_and_push_safe_barrier and
> copy_to_survivor_space respectivey. I believe, this webrev correct them,
> also.
>
> Some methods call forwardee(). However, they don't toruch fields of
> forwardee while copying survived objects to a survivor space.
>   PSMarkSweepDecorator::compact()
>   PSPromotionManager::process_array_chunk()
>   PSPromotionManager::claim_or_forward_internal_depth()

Focusing on the code for now ...

src/os_cpu/linux_ppc/vm/atomic_linux_ppc.hpp
src/os_cpu/aix_ppc/vm/atomic_aix_ppc.hpp

In cmpxchg_post_membar I think it is preferable to maintain the existing 
default of a full fence if not specifically a "release" or "relaxed" 
operation ie:

  inline void cmpxchg_post_membar(cmpxchg_memory_order order) {
    if (order == memory_order_release) {
      // no post membar
    } else if (order != memory_order_relaxed) {
      __asm__ __volatile__ (
        /* fence */
        strasm_sync
        );
    }

as is done for the pre-membar. If nothing else pre-membar and 
post-membar should be consistent in their approach.

---

src/share/vm/gc/parallel/psPromotionManager.cpp

507     // call acquire for reading fields of obj in callers

May I suggest:

// acquire() by cas loser is needed to pair with 'release' of cas winner
// so we can safely access data (eg. fields of obj)

---

src/share/vm/gc/parallel/psPromotionManager.inline.hpp

258       // call acquire for reading fields of new_obj in callers
264     // call acquire for reading fields of new_obj in callers

Same as above.

  264     // call acquire for reading fields of new_obj in callers
  265     OrderAccess::acquire();

If I'm reading this right this is the else part of:

  119   // The same test as "o->is_forwarded()"
  120   if (!test_mark->is_marked()) {

and it less clear what the acquire() is pairing with, but presumably it 
is still the release of a successful cas_forward_to. But given the 
isolation of this code from the modified cas operation I have to wonder 
about performance again - how often will we take this path with the new 
barrier, compared to the paths with the now modified cas operations?

---

Overall the way this proposal has been presented does not instill me 
with great confidence about its correctness, or performance benefit:

1. Replace strong cas with a relaxed cas
Issue: accessing obj fields in logging statement may not be safe.
2. Remove access to obj fields in logging statements
Issue: callee access to obj fields is also unsafe
3. Return NULL on failed cas paths
Issue: changes overall semantics and correctness of returning NULL is 
very unclear.
4. Go back to previous code.
Issue: callee access to obj fields is also unsafe
Suggestion from Kim: fully relaxed seems unsafe but using "release" 
semantics might be okay [this hypothesis warranted detailed discussion 
but there was none]
5. replace relaxed cas with release cas
6. Add in acquire() on cas-losing paths, and general access to forwardee

Aside: we can presumably add back the logging statement in the losing 
path now we have the acquire() in place.

So far the only justification for making these changes to the GC code 
come from the April discussion [1] where it was stated simply that:

"We've looked at the proposed changes and we are pretty sure that the
cmpxchg done during copy_to_survivor_space in the parallel GC doesn't
require the full fence/acquire semantics." [Martin & Volker]

Reading back through all the emails, including the ones in April, I 
_think_ part of the reasoning here is that we're not doing a CAS that 
publishes a new object that was just created, but that we have 
previously created that object using a full CAS and are now only 
updating the markword of another object with a forwarding pointer. The 
second cas would not need full fence semantics as the other object is 
already visible. However I am not a GC expert and other comments by GC 
folk suggest that is not in fact the case, or at least is not 
necessarily always the case. So I can not establish that what is being 
proposed is correct.

I think the GC experts need to have a discussion to resolve things to 
their mutual satisfaction.

Thanks,
David

[1] 
http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2016-April/019079.html

-------


> Regards,
> Hiroshi
> -----------------------
> Hiroshi Horii, Ph.D.
> IBM Research - Tokyo
>

From hui.shi at linaro.org  Sat Oct 15 10:34:19 2016
From: hui.shi at linaro.org (Hui Shi)
Date: Sat, 15 Oct 2016 18:34:19 +0800
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error: Illegal
	threadstate encountered
Message-ID: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>

Hi all,

Could someone help review this fix?

JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421
webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/

JVM crashes with illegal threadstate when running on single core machine
(for example with single core VM running on aarch64 box).
Current JNI wrapper generator missing store _thread_in_native_trans into
thread state when machine has only one CPU core.

#
# A fatal error has been detected by the Java Runtime Environment:
#
# Internal Error (safepoint.cpp:716), pid=4329, tid=0x0000ffff89204200
# fatal error: Illegal threadstate encountered: 4
#
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#

--------------- T H R E A D ---------------

Current thread (0x0000ffff684fe000): JavaThread "localhost-startStop-1"
daemon [_thread_in_native, id=4341, stack(0x0000ffff89005000,
0x0000ffff89205000)]

Stack: [0x0000ffff89005000,0x0000ffff89205000], sp=0x0000ffff89201f60, free
space=2035k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
code)
V [libjvm.so+0x95ed3c] VMError::report_and_die()+0x130
V [libjvm.so+0x42c04c] report_fatal(char const*, int, char const*)+0x60
V [libjvm.so+0x85bae4] SafepointSynchronize::block(JavaThread*) [clone
.part.24]+0x50
V [libjvm.so+0x90381c] JavaThread::check_safepoint_
and_suspend_for_native_trans(JavaThread*)+0x1c8
V [libjvm.so+0x903f58] JavaThread::check_special_condition_for_native_trans(
JavaThread*)+0x14
J 236 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x0000ffff7c1e64f0
[0x0000ffff7c1e63c0+0x130]
J 1167 C1 java.util.jar.JarFile$JarEntryIterator.hasMoreElements()Z (5
bytes) @ 0x0000ffff7c4f1320 [0x0000ffff7c4f1180+0x1a0]
J 840 C1 java.util.jar.JarFile.getInputStream(Ljava/util/zip/
ZipEntry;)Ljava/io/InputStream; (89 bytes) @ 0x0000ffff7c402b54
[0x0000ffff7c402180+0x9d4]
J 1187 C1 org.apache.tomcat.util.scan.FileUrlJar.
getEntryInputStream()Ljava/io/InputStream; (21 bytes) @ 0x0000ffff7c52a640
[0x0000ffff7c52a4c0+0x180]

Regards
Hui

From aph at redhat.com  Sun Oct 16 09:19:27 2016
From: aph at redhat.com (Andrew Haley)
Date: Sun, 16 Oct 2016 10:19:27 +0100
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
Message-ID: <c25d924f-cd0b-e1ab-cf7f-3eab0e50d665@redhat.com>

Hi,

On 15/10/16 11:34, Hui Shi wrote:

> Could someone help review this fix?
> 
> JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421
> webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/
> 
> JVM crashes with illegal threadstate when running on single core machine
> (for example with single core VM running on aarch64 box).
> Current JNI wrapper generator missing store _thread_in_native_trans into
> thread state when machine has only one CPU core.

Oh, yuck.  Thanks.

I'd accept (and prefer) a patch which got rid of the is_MP() check
altogether.  These days systems are often virtualized, and running
processes can be migrated from one system to another.  In such
circumstances, is_MP() is just a bug.  But that needs wider discussion
because it affects more systems than just AArch64, so your patch is OK
for now.

Andrew.

From david.holmes at oracle.com  Sun Oct 16 20:50:12 2016
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 17 Oct 2016 06:50:12 +1000
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
Message-ID: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>

On 15/10/2016 8:34 PM, Hui Shi wrote:
> Hi all,
>
> Could someone help review this fix?
>
> JIRA: https://bugs.openjdk.java.net/browse/JDK-8167421
> webrev: http://cr.openjdk.java.net/~hshi/8167421/webrev/
>
> JVM crashes with illegal threadstate when running on single core machine
> (for example with single core VM running on aarch64 box).
> Current JNI wrapper generator missing store _thread_in_native_trans into
> thread state when machine has only one CPU core.

Fix seems okay - though I'm not expert on aarch64 assembler. But I have 
to wonder why this chunk of code is different to the functionally 
equivalent code in templateInterpreterGenerator_aarch64.cpp - including 
the difference between using DSB and DMB for the barrier?

  // change thread state
   __ mov(rscratch1, _thread_in_native_trans);
   __ lea(rscratch2, Address(rthread, JavaThread::thread_state_offset()));
   __ stlrw(rscratch1, rscratch2);

   if (os::is_MP()) {
     if (UseMembar) {
       // Force this write out before the read below
       __ dsb(Assembler::SY);
     } else {
       // Write serialization page so VM thread can do a pseudo remote 
membar.
       // We use the current thread pointer to calculate a thread specific
       // offset to write to within the page. This minimizes bus traffic
       // due to cache line collision.
       __ serialize_memory(rthread, rscratch2);
     }
   }

David

> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> # Internal Error (safepoint.cpp:716), pid=4329, tid=0x0000ffff89204200
> # fatal error: Illegal threadstate encountered: 4
> #
> #
> # If you would like to submit a bug report, please visit:
> # http://bugreport.java.com/bugreport/crash.jsp
> # The crash happened outside the Java Virtual Machine in native code.
> # See problematic frame for where to report the bug.
> #
>
> --------------- T H R E A D ---------------
>
> Current thread (0x0000ffff684fe000): JavaThread "localhost-startStop-1"
> daemon [_thread_in_native, id=4341, stack(0x0000ffff89005000,
> 0x0000ffff89205000)]
>
> Stack: [0x0000ffff89005000,0x0000ffff89205000], sp=0x0000ffff89201f60, free
> space=2035k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V [libjvm.so+0x95ed3c] VMError::report_and_die()+0x130
> V [libjvm.so+0x42c04c] report_fatal(char const*, int, char const*)+0x60
> V [libjvm.so+0x85bae4] SafepointSynchronize::block(JavaThread*) [clone
> .part.24]+0x50
> V [libjvm.so+0x90381c] JavaThread::check_safepoint_
> and_suspend_for_native_trans(JavaThread*)+0x1c8
> V [libjvm.so+0x903f58] JavaThread::check_special_condition_for_native_trans(
> JavaThread*)+0x14
> J 236 java.util.zip.ZipFile.getEntry(J[BZ)J (0 bytes) @ 0x0000ffff7c1e64f0
> [0x0000ffff7c1e63c0+0x130]
> J 1167 C1 java.util.jar.JarFile$JarEntryIterator.hasMoreElements()Z (5
> bytes) @ 0x0000ffff7c4f1320 [0x0000ffff7c4f1180+0x1a0]
> J 840 C1 java.util.jar.JarFile.getInputStream(Ljava/util/zip/
> ZipEntry;)Ljava/io/InputStream; (89 bytes) @ 0x0000ffff7c402b54
> [0x0000ffff7c402180+0x9d4]
> J 1187 C1 org.apache.tomcat.util.scan.FileUrlJar.
> getEntryInputStream()Ljava/io/InputStream; (21 bytes) @ 0x0000ffff7c52a640
> [0x0000ffff7c52a4c0+0x180]
>
> Regards
> Hui
>

From HORII at jp.ibm.com  Mon Oct 17 01:44:53 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Mon, 17 Oct 2016 10:44:53 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com>
Message-ID: <OFBB39F73B.41D1EAF7-ON4925804F.00059A25-4925804F.00099D08@notes.na.collabserv.com>

Hi David,

Thank you for your comments.

> Do you have any metrics on this latest version?

Pause time of Young GC (3rd-10th in evaluation period) in SPECjbb2013 was 
shorten 5.4% and Critical jOPS (which highly depends on GC pause time) was 
improved 9.2%. CPU was POWER8 (8247-22L) and two cores were enabled. 24GB 
for mx and 20GB for mn.

> So far the only justification for making these changes to the GC code 
> come from the April discussion [1] where it was stated simply that:
> "We've looked at the proposed changes and we are pretty sure that the
> cmpxchg done during copy_to_survivor_space in the parallel GC doesn't
> require the full fence/acquire semantics." [Martin & Volker]
> 
> Reading back through all the emails, including the ones in April, I 
> _think_ part of the reasoning here is that we're not doing a CAS that 
> publishes a new object that was just created, but that we have 
> previously created that object using a full CAS and are now only 
> updating the markword of another object with a forwarding pointer. The 
> second cas would not need full fence semantics as the other object is 
> already visible. However I am not a GC expert and other comments by GC 
> folk suggest that is not in fact the case, or at least is not 
> necessarily always the case. So I can not establish that what is being 
> proposed is correct.
> 
> I think the GC experts need to have a discussion to resolve things to 
> their mutual satisfaction.

Thank you for lots of your comments and suggestions. And lots of my 
mistakes made the discussion long. very sorry. I would like to know 
comments of GC experts.

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


From aph at redhat.com  Mon Oct 17 08:41:34 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 17 Oct 2016 09:41:34 +0100
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
	<77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
Message-ID: <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>

On 16/10/16 21:50, David Holmes wrote:

> including the difference between using DSB and DMB for the barrier?

DSB was a mistake.  I wrote this code before I understood the
difference between DSB and DSB; only DMB is needed here.  The
documentation we had was rather thin on detail Also, the line above
which changes thread_state uses STLRW, a fully sequentially-consistent
store, so I don't think that any of the code within os::is_MP() is
needed at all.

I have noticed these anomalies before, but didn't do anything because
it's delicate code and very difficult to test.  This might be a good
time to correct both versions.

Andrew.

From martin.doerr at sap.com  Mon Oct 17 16:38:09 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 17 Oct 2016 16:38:09 +0000
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580
	and 8154867
Message-ID: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>

Hi,

I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9:

8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed).

8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev.

Webrev is here:
http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/

Please review.

Thanks and best regards,
Martin


From david.holmes at oracle.com  Tue Oct 18 03:59:17 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 18 Oct 2016 13:59:17 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
Message-ID: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>

Hi Lois, Dan, Serguei,

Went to push this today and realized I had left off the updated JNI 
method lookup tests. As I said in the bug report JNI behaves as 
expected, but there weren't any testcases so I added them:

http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/

Thanks,
David

On 11/10/2016 11:55 AM, David Holmes wrote:
> Turns out the only place changes were needed were in JDI.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>
> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>
> The spec change in ObjectReference is very simple and there is a CCC
> request in progress to ratify that change.
>
> The implementation change in ObjectReferenceImpl mirrors the updated
> spec and use the same format as already present in the class version of
> the check method.
>
> The test is a little more complex. This is obviously an extension to
> what is already tested in InterfaceMethodsTest. However IMT has a number
> of problem with the way it is currently written [1] - specifically it
> doesn't properly separate method lookup from method invocation. So I've
> added the capability to separate lookup and invocation for use with the
> private interface methods - I have not tried to address shortcomings of
> the existing tests. Though I did fix the return value checking logic!
> And did some clarifying comments and renaming in a couple of place.
>
> Still on the test I can't add the negative tests I would like to add
> because they actually pass due to a different long standing bug in JDI -
> [2]. So the actual private interface method testing is very simple: can
> I get the Method from the InterfaceType for the interface declaring the
> method? Can I then invoke that method on an instance of a class that
> implements the interface.
>
> Thanks,
> David
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
> [2] https://bugs.openjdk.java.net/browse/JDK-8167416

From thomas.stuefe at gmail.com  Tue Oct 18 05:39:20 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 18 Oct 2016 07:39:20 +0200
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
	<60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>
Message-ID: <CAA-vtUw6NvFL1_p8u2hXxWq_b=3DxXyWcQPfaeGJfJyhwTbW2A@mail.gmail.com>

Hi David, Max,

I changed the asserts according to Max' suggestion. Instead of checking
inside flag_to_index, now I check before callers of this function use this
value to access memory.

http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html

As David correctly writes, this is technically not a bug, so I guess this
will have to wait until java 10.

Kind Regards, Thomas


On Fri, Oct 14, 2016 at 12:57 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 13/10/2016 10:53 PM, Thomas St?fe wrote:
>
>> Hi David,
>>
>> On Thu, Oct 13, 2016 at 12:08 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>>
>>         Hi all,
>>
>>         may I have plase a review for this tiny change? It just adds
>>         some assert to NMT.
>>
>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>>         webrev:
>>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul
>> d-check_
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shou
>> ld-check_>
>>         MEMFLAGS/webrev.00/webrev/
>>
>>         We had an ugly memory overwrite caused by this - ultimately our
>>         fault, because we fed an invalid memory flag to NMT - but it was
>>         difficult to find. An assert would have saved some time.
>>
>>
>>     I'm a little perplexed with asserting that something of MEMFLAGS
>>     type must be an actual MEMFLAGS value - it implies the caller is
>>     coercing plain int to MEMFLAGS, and I don't have much sympathy if
>>     they mess that up. Can't help wondering if there is some clever C++
>>     trick to flag bad conversions at compile-time?
>>
>>
>> The error was caused by an uninitialized variable of type MEMFLAGS. This
>> was our fault, we have heavily modified allocation.hpp and introduced an
>> error then merging changes from upstream. Due to a merging error this
>> lead to a case where Arena::_flags was not initialized and contained a
>> very large value.
>>
>
> Ah I see. Lack of default initialization can be annoying :)
>
> I admit it looks funny. If it bothers you, I could instead check the
>> returned index to be in the range for the size of the _malloc array in
>> MallocMemorySnapshot::by_type(). Technically, it would mean the same.
>>
>
> So I just realized that here:
>
>   62   // Map memory type to human readable name
>   63   static const char* flag_to_name(MEMFLAGS flag) {
>   64     assert(flag >= 0 && flag < mt_number_of_types, "Invalid flag
> value %d.", (int)flag);
>   65     return _memory_type_names[flag_to_index(flag)];
>   66   }
>
> we call flag_to_index, so the assert is redundant as it is already in
> flag_to_index. Then presumably we change flag_to_index to something like
> this:
>
>      static inline int flag_to_index(MEMFLAGS flag) {
>        int index = (flag & 0xff);
>        assert(index >= 0 && index < mt_number_of_types, "Invalid flag
> value %d.", (int)flag);
>        return index;
>      }
>
> so we're validating the index rather than the flag.
>
> Cheers,
> David
>
>
>
>>
>>     The function that takes the index should validate the index, so that
>>     is fine.
>>
>>     Which one were you actually passing the bad value to? :)
>>
>>     This isn't a strong objection just musing if we can do better. And
>>     as the hs repos are still closed, and likely to remain so till early
>>     next week, we have some slack time :)
>>
>>
>> :) Sure.
>>
>> Kind Regards, Thomas
>>
>>
>>     Cheers,
>>     David
>>
>>         Thank you!
>>
>>         Thomas
>>
>>
>>

From thomas.stuefe at gmail.com  Tue Oct 18 06:22:08 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 18 Oct 2016 08:22:08 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
Message-ID: <CAA-vtUxg-VnMmbQ4UCjy0jd-_mHE1eT4vNTjXoCOaiezKt+wMg@mail.gmail.com>

Ping.

On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe <thomas.stuefe at gmail.com>
wrote:

> Dear all,
>
> please take a look at the following fix:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-
> Hanging-Error-Reporting/webrev.00/webrev/index.html
>
> ---
>
> In short, this fix provides the ability to cancel hanging error reporting
> steps. This uses the same code paths secondary error handling uses during
> error reporting. With this patch, steps which take too long will be
> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this:
>
> 4 [timeout occurred during error reporting in step "<stepname>"] after
> xxxx ms.
> 5
>
> and we now also get a finish message in the hs-err file if we hit the
> ErrorLogTimeout and error reporting will stop altogether:
>
> 6 ------ Timout during error reporting after xxx ms. ------
>
> (in addition to the "time expired, abort" message the WatcherThread writes
> to stderr)
>
> ---
>
> This is something which bugged us for a long time, because we rely heavily
> on the hs_err files for error analysis at customer sites, and there are a
> number of reasons why one step may hang and prevent the follow-up steps
> from running.
>
> It works like this:
>
> Before, when error reporting started, the WatcherThread was waiting for
> ErrorLogTimeout seconds, then would stop the VM.
>
> Now, the WatcherThread periodically pings error reporting, which checks if
> the last step did timeout. If it does, it sends a signal to the reporting
> thread, and the thread will continue with the next step. This follows the
> same path as secondary crash handling.
>
> Some implementation details:
>
> On Posix platforms, to interrupt the thread, I use pthread_kill. This
> means I must know the pthread id of the reporting thread, which I now store
> at the beginning of error reporting. We already store the reporting thread
> id in first_error_tid, but that I cannot use, because it gets set by
> os::current_thread_id(), which is not always the pthread id. Should we ever
> switch to only using pthread id for posix platforms, this coding can be
> simplified.
>
> On Windows, there is unfortunately no easy way to interrupt a
> non-cooperative thread. I would need a way to cause a SEH inside the target
> thread, which then would get handled by secondary error handling like on
> Posix platforms, but that is not easy. It is doable - one can suspend the
> thread, modify the thread context in a way that it will crash upon resume.
> But that felt a bit heavyweight for this problem. So on windows, timeout
> handling still works (after ErrorLogTimeout the VM gets shut down), but
> error reporting steps are not interruptable. If we feel this is important,
> this can be added later.
>
> Kind Regards, Thomas
>
>
>
>
>
>
>
>
>
>
>

From serguei.spitsyn at oracle.com  Tue Oct 18 07:10:37 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 18 Oct 2016 00:10:37 -0700
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
Message-ID: <7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com>

David,

It looks good.

Thanks,
Serguei


On 10/17/16 20:59, David Holmes wrote:
> Hi Lois, Dan, Serguei,
>
> Went to push this today and realized I had left off the updated JNI 
> method lookup tests. As I said in the bug report JNI behaves as 
> expected, but there weren't any testcases so I added them:
>
> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/
>
> Thanks,
> David
>
> On 11/10/2016 11:55 AM, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version of
>> the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a number
>> of problem with the way it is currently written [1] - specifically it
>> doesn't properly separate method lookup from method invocation. So I've
>> added the capability to separate lookup and invocation for use with the
>> private interface methods - I have not tried to address shortcomings of
>> the existing tests. Though I did fix the return value checking logic!
>> And did some clarifying comments and renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI -
>> [2]. So the actual private interface method testing is very simple: can
>> I get the Method from the InterfaceType for the interface declaring the
>> method? Can I then invoke that method on an instance of a class that
>> implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From david.holmes at oracle.com  Tue Oct 18 07:16:19 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 18 Oct 2016 17:16:19 +1000
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUxg-VnMmbQ4UCjy0jd-_mHE1eT4vNTjXoCOaiezKt+wMg@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<CAA-vtUxg-VnMmbQ4UCjy0jd-_mHE1eT4vNTjXoCOaiezKt+wMg@mail.gmail.com>
Message-ID: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com>

Hi Thomas,

I took an initial look but am still mulling over things.

Note that as an enhancement this will need to wait for Java 10 repos to 
open - unless you go through the FC extension process.

Thanks,
David

On 18/10/2016 4:22 PM, Thomas St?fe wrote:
> Ping.
>
> On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe <thomas.stuefe at gmail.com>
> wrote:
>
>> Dear all,
>>
>> please take a look at the following fix:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-
>> Hanging-Error-Reporting/webrev.00/webrev/index.html
>>
>> ---
>>
>> In short, this fix provides the ability to cancel hanging error reporting
>> steps. This uses the same code paths secondary error handling uses during
>> error reporting. With this patch, steps which take too long will be
>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this:
>>
>> 4 [timeout occurred during error reporting in step "<stepname>"] after
>> xxxx ms.
>> 5
>>
>> and we now also get a finish message in the hs-err file if we hit the
>> ErrorLogTimeout and error reporting will stop altogether:
>>
>> 6 ------ Timout during error reporting after xxx ms. ------
>>
>> (in addition to the "time expired, abort" message the WatcherThread writes
>> to stderr)
>>
>> ---
>>
>> This is something which bugged us for a long time, because we rely heavily
>> on the hs_err files for error analysis at customer sites, and there are a
>> number of reasons why one step may hang and prevent the follow-up steps
>> from running.
>>
>> It works like this:
>>
>> Before, when error reporting started, the WatcherThread was waiting for
>> ErrorLogTimeout seconds, then would stop the VM.
>>
>> Now, the WatcherThread periodically pings error reporting, which checks if
>> the last step did timeout. If it does, it sends a signal to the reporting
>> thread, and the thread will continue with the next step. This follows the
>> same path as secondary crash handling.
>>
>> Some implementation details:
>>
>> On Posix platforms, to interrupt the thread, I use pthread_kill. This
>> means I must know the pthread id of the reporting thread, which I now store
>> at the beginning of error reporting. We already store the reporting thread
>> id in first_error_tid, but that I cannot use, because it gets set by
>> os::current_thread_id(), which is not always the pthread id. Should we ever
>> switch to only using pthread id for posix platforms, this coding can be
>> simplified.
>>
>> On Windows, there is unfortunately no easy way to interrupt a
>> non-cooperative thread. I would need a way to cause a SEH inside the target
>> thread, which then would get handled by secondary error handling like on
>> Posix platforms, but that is not easy. It is doable - one can suspend the
>> thread, modify the thread context in a way that it will crash upon resume.
>> But that felt a bit heavyweight for this problem. So on windows, timeout
>> handling still works (after ErrorLogTimeout the VM gets shut down), but
>> error reporting steps are not interruptable. If we feel this is important,
>> this can be added later.
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

From david.holmes at oracle.com  Tue Oct 18 07:16:45 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 18 Oct 2016 17:16:45 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
	<7c5cb57d-4554-ee14-2aec-a80eec99d9a9@oracle.com>
Message-ID: <b9f4c736-1864-6e8c-92d9-1778f831165f@oracle.com>

Thanks Serguei!

David

On 18/10/2016 5:10 PM, serguei.spitsyn at oracle.com wrote:
> David,
>
> It looks good.
>
> Thanks,
> Serguei
>
>
> On 10/17/16 20:59, David Holmes wrote:
>> Hi Lois, Dan, Serguei,
>>
>> Went to push this today and realized I had left off the updated JNI
>> method lookup tests. As I said in the bug report JNI behaves as
>> expected, but there weren't any testcases so I added them:
>>
>> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/
>>
>> Thanks,
>> David
>>
>> On 11/10/2016 11:55 AM, David Holmes wrote:
>>> Turns out the only place changes were needed were in JDI.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>>
>>> The spec change in ObjectReference is very simple and there is a CCC
>>> request in progress to ratify that change.
>>>
>>> The implementation change in ObjectReferenceImpl mirrors the updated
>>> spec and use the same format as already present in the class version of
>>> the check method.
>>>
>>> The test is a little more complex. This is obviously an extension to
>>> what is already tested in InterfaceMethodsTest. However IMT has a number
>>> of problem with the way it is currently written [1] - specifically it
>>> doesn't properly separate method lookup from method invocation. So I've
>>> added the capability to separate lookup and invocation for use with the
>>> private interface methods - I have not tried to address shortcomings of
>>> the existing tests. Though I did fix the return value checking logic!
>>> And did some clarifying comments and renaming in a couple of place.
>>>
>>> Still on the test I can't add the negative tests I would like to add
>>> because they actually pass due to a different long standing bug in JDI -
>>> [2]. So the actual private interface method testing is very simple: can
>>> I get the Method from the InterfaceType for the interface declaring the
>>> method? Can I then invoke that method on an instance of a class that
>>> implements the interface.
>>>
>>> Thanks,
>>> David
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>
>

From thomas.stuefe at gmail.com  Tue Oct 18 07:49:49 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 18 Oct 2016 09:49:49 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<CAA-vtUxg-VnMmbQ4UCjy0jd-_mHE1eT4vNTjXoCOaiezKt+wMg@mail.gmail.com>
	<422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com>
Message-ID: <CAA-vtUx7t80-VcgeCqj4JSsbweaEobMEo6nsTM40GkmRv_QbXA@mail.gmail.com>

Hi David,

thanks!

On Tue, Oct 18, 2016 at 9:16 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> I took an initial look but am still mulling over things.
>
> Note that as an enhancement this will need to wait for Java 10 repos to
> open - unless you go through the FC extension process.
>
>
I was afraid that would be the case. Oh well.

Kind Regards, Thomas


> Thanks,
> David
>
>
> On 18/10/2016 4:22 PM, Thomas St?fe wrote:
>
>> Ping.
>>
>> On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe <thomas.stuefe at gmail.com>
>> wrote:
>>
>> Dear all,
>>>
>>> please take a look at the following fix:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-
>>> Hanging-Error-Reporting/webrev.00/webrev/index.html
>>>
>>> ---
>>>
>>> In short, this fix provides the ability to cancel hanging error reporting
>>> steps. This uses the same code paths secondary error handling uses during
>>> error reporting. With this patch, steps which take too long will be
>>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like
>>> this:
>>>
>>> 4 [timeout occurred during error reporting in step "<stepname>"] after
>>> xxxx ms.
>>> 5
>>>
>>> and we now also get a finish message in the hs-err file if we hit the
>>> ErrorLogTimeout and error reporting will stop altogether:
>>>
>>> 6 ------ Timout during error reporting after xxx ms. ------
>>>
>>> (in addition to the "time expired, abort" message the WatcherThread
>>> writes
>>> to stderr)
>>>
>>> ---
>>>
>>> This is something which bugged us for a long time, because we rely
>>> heavily
>>> on the hs_err files for error analysis at customer sites, and there are a
>>> number of reasons why one step may hang and prevent the follow-up steps
>>> from running.
>>>
>>> It works like this:
>>>
>>> Before, when error reporting started, the WatcherThread was waiting for
>>> ErrorLogTimeout seconds, then would stop the VM.
>>>
>>> Now, the WatcherThread periodically pings error reporting, which checks
>>> if
>>> the last step did timeout. If it does, it sends a signal to the reporting
>>> thread, and the thread will continue with the next step. This follows the
>>> same path as secondary crash handling.
>>>
>>> Some implementation details:
>>>
>>> On Posix platforms, to interrupt the thread, I use pthread_kill. This
>>> means I must know the pthread id of the reporting thread, which I now
>>> store
>>> at the beginning of error reporting. We already store the reporting
>>> thread
>>> id in first_error_tid, but that I cannot use, because it gets set by
>>> os::current_thread_id(), which is not always the pthread id. Should we
>>> ever
>>> switch to only using pthread id for posix platforms, this coding can be
>>> simplified.
>>>
>>> On Windows, there is unfortunately no easy way to interrupt a
>>> non-cooperative thread. I would need a way to cause a SEH inside the
>>> target
>>> thread, which then would get handled by secondary error handling like on
>>> Posix platforms, but that is not easy. It is doable - one can suspend the
>>> thread, modify the thread context in a way that it will crash upon
>>> resume.
>>> But that felt a bit heavyweight for this problem. So on windows, timeout
>>> handling still works (after ErrorLogTimeout the VM gets shut down), but
>>> error reporting steps are not interruptable. If we feel this is
>>> important,
>>> this can be added later.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

From goetz.lindenmaier at sap.com  Tue Oct 18 11:49:01 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Tue, 18 Oct 2016 11:49:01 +0000
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
	8154580	and 8154867
In-Reply-To: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
Message-ID: <3a833908865b4e2c975ec51c672b68a6@DEWDFE13DE50.global.corp.sap>

Hi Martin, 

thanks for doing this change, it looks good. 

Maybe you want to add comment to load_mirror_from_const_method():
// As load_mirror() on other platforms just that const_method is passed 
// in instead of method (saving one indirection).

Best regards,
  Goetz.

> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-
> bounces at openjdk.java.net] On Behalf Of Doerr, Martin
> Sent: Montag, 17. Oktober 2016 18:38
> To: 'hotspot-runtime-dev at openjdk.java.net' <hotspot-runtime-
> dev at openjdk.java.net>
> Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
> 8154580 and 8154867
> 
> Hi,
> 
> I'd like to clean up the template interpreter on PPC64 a little bit after changes
> which were pushed into jdk9:
> 
> 8154580 introduced copying the java mirror into the interpreter frame. Some
> code can be implemented shorter. Before this change, the size of the ijava
> state was designed to be a multiple of 16. We should remove the comment
> as this is no longer true. I have checked that this is not really required
> (generate_fixed_frame inserts frame padding if needed).
> 
> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states
> are not treated appropriately (which is not critical because the template
> interpreter also uses itos for shorter types). This part of the change was
> requested by Coleen, but it didn't make it into the original webrev.
> 
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webre
> v.00/
> 
> Please review.
> 
> Thanks and best regards,
> Martin


From lois.foltan at oracle.com  Tue Oct 18 12:06:37 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Tue, 18 Oct 2016 08:06:37 -0400
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP,
	JDI and JDB
In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
Message-ID: <5806104D.2070605@oracle.com>

Looks good David!
Lois

On 10/17/2016 11:59 PM, David Holmes wrote:
> Hi Lois, Dan, Serguei,
>
> Went to push this today and realized I had left off the updated JNI 
> method lookup tests. As I said in the bug report JNI behaves as 
> expected, but there weren't any testcases so I added them:
>
> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/
>
> Thanks,
> David
>
> On 11/10/2016 11:55 AM, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version of
>> the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a number
>> of problem with the way it is currently written [1] - specifically it
>> doesn't properly separate method lookup from method invocation. So I've
>> added the capability to separate lookup and invocation for use with the
>> private interface methods - I have not tried to address shortcomings of
>> the existing tests. Though I did fix the return value checking logic!
>> And did some clarifying comments and renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI -
>> [2]. So the actual private interface method testing is very simple: can
>> I get the Method from the InterfaceType for the interface declaring the
>> method? Can I then invoke that method on an instance of a class that
>> implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From felix.yang at linaro.org  Tue Oct 18 12:51:55 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Tue, 18 Oct 2016 20:51:55 +0800
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
	<77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
	<661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>
Message-ID: <CACc5Y6RAv_cci=AygPZSeDA13sHFxJY2DqiUXy_6np6Ccsd=nA@mail.gmail.com>

Hi,

    Thanks for fixing the bug.
    Is it OK to push this patch into repo: http://hg.openjdk.java.net/
jdk9/hs/hotspot for now?

Thanks,
Felix

On 17 October 2016 at 16:41, Andrew Haley <aph at redhat.com> wrote:

> On 16/10/16 21:50, David Holmes wrote:
>
> > including the difference between using DSB and DMB for the barrier?
>
> DSB was a mistake.  I wrote this code before I understood the
> difference between DSB and DSB; only DMB is needed here.  The
> documentation we had was rather thin on detail Also, the line above
> which changes thread_state uses STLRW, a fully sequentially-consistent
> store, so I don't think that any of the code within os::is_MP() is
> needed at all.
>
> I have noticed these anomalies before, but didn't do anything because
> it's delicate code and very difficult to test.  This might be a good
> time to correct both versions.
>
> Andrew.
>

From aph at redhat.com  Tue Oct 18 13:03:28 2016
From: aph at redhat.com (Andrew Haley)
Date: Tue, 18 Oct 2016 14:03:28 +0100
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <CACc5Y6RAv_cci=AygPZSeDA13sHFxJY2DqiUXy_6np6Ccsd=nA@mail.gmail.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
	<77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
	<661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>
	<CACc5Y6RAv_cci=AygPZSeDA13sHFxJY2DqiUXy_6np6Ccsd=nA@mail.gmail.com>
Message-ID: <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com>

On 18/10/16 13:51, Felix Yang wrote:
>     Is it OK to push this patch into repo: http://hg.openjdk.java.net/
> jdk9/hs/hotspot for now?

Yes, but whoever does this should also apply it to
http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/ and
http://icedtea.classpath.org/hg/icedtea7-forest/hotspot/.

Andrew.


From daniel.daugherty at oracle.com  Tue Oct 18 14:27:09 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Tue, 18 Oct 2016 08:27:09 -0600
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
Message-ID: <b055b92f-8179-b4dc-05d4-8ed345ed412c@oracle.com>

On 10/17/16 9:59 PM, David Holmes wrote:
> Hi Lois, Dan, Serguei,
>
> Went to push this today and realized I had left off the updated JNI 
> method lookup tests. As I said in the bug report JNI behaves as 
> expected, but there weren't any testcases so I added them:
>
> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/

test/runtime/jni/PrivateInterfaceMethods/PrivateInterfaceMethods.java
     L74:         lookup(A.class.getName(), "onlyA", null); //should succeed
     :
     :
     L90:         lookup(Impl2.class.getName(), "onlyC", 
NoSuchMethodError.class); //should fail
         nit: please add a space after '//'

     L138:         String desc = " Lookup of " + definingClass + "." + 
method;
         nit: any particular reason for the space before Lookup?


test/runtime/jni/PrivateInterfaceMethods/libPrivateInterfaceMethods.c
     L78: blank line at the end of the file. jcheck will probably complain.


Thumbs up! Feel free to ignore the nits. No need to see a new
webrev if you fix them.

Dan


>
> Thanks,
> David
>
> On 11/10/2016 11:55 AM, David Holmes wrote:
>> Turns out the only place changes were needed were in JDI.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>
>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>
>> The spec change in ObjectReference is very simple and there is a CCC
>> request in progress to ratify that change.
>>
>> The implementation change in ObjectReferenceImpl mirrors the updated
>> spec and use the same format as already present in the class version of
>> the check method.
>>
>> The test is a little more complex. This is obviously an extension to
>> what is already tested in InterfaceMethodsTest. However IMT has a number
>> of problem with the way it is currently written [1] - specifically it
>> doesn't properly separate method lookup from method invocation. So I've
>> added the capability to separate lookup and invocation for use with the
>> private interface methods - I have not tried to address shortcomings of
>> the existing tests. Though I did fix the return value checking logic!
>> And did some clarifying comments and renaming in a couple of place.
>>
>> Still on the test I can't add the negative tests I would like to add
>> because they actually pass due to a different long standing bug in JDI -
>> [2]. So the actual private interface method testing is very simple: can
>> I get the Method from the InterfaceType for the interface declaring the
>> method? Can I then invoke that method on an instance of a class that
>> implements the interface.
>>
>> Thanks,
>> David
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416


From coleen.phillimore at oracle.com  Tue Oct 18 21:56:12 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Tue, 18 Oct 2016 17:56:12 -0400
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
	8154580 and 8154867
In-Reply-To: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
Message-ID: <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com>


This seems good.   I think it's a shame to change load_mirror() to 
load_mirror_from_const_method() though because there's load_mirror() 
with the same parameters on all the other platforms and it makes 
platform development a little easier.   But that's up to you to because 
you can generate shorter sequences.

Coleen


On 10/17/16 12:38 PM, Doerr, Martin wrote:
> Hi,
>
> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9:
>
> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed).
>
> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev.
>
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/
>
> Please review.
>
> Thanks and best regards,
> Martin
>


From david.holmes at oracle.com  Tue Oct 18 23:55:32 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 09:55:32 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <5806104D.2070605@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
	<5806104D.2070605@oracle.com>
Message-ID: <0958e1c9-250b-17e2-69f4-3b7ad9303ea9@oracle.com>

Thanks Lois!

David

On 18/10/2016 10:06 PM, Lois Foltan wrote:
> Looks good David!
> Lois
>
> On 10/17/2016 11:59 PM, David Holmes wrote:
>> Hi Lois, Dan, Serguei,
>>
>> Went to push this today and realized I had left off the updated JNI
>> method lookup tests. As I said in the bug report JNI behaves as
>> expected, but there weren't any testcases so I added them:
>>
>> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/
>>
>> Thanks,
>> David
>>
>> On 11/10/2016 11:55 AM, David Holmes wrote:
>>> Turns out the only place changes were needed were in JDI.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>>
>>> The spec change in ObjectReference is very simple and there is a CCC
>>> request in progress to ratify that change.
>>>
>>> The implementation change in ObjectReferenceImpl mirrors the updated
>>> spec and use the same format as already present in the class version of
>>> the check method.
>>>
>>> The test is a little more complex. This is obviously an extension to
>>> what is already tested in InterfaceMethodsTest. However IMT has a number
>>> of problem with the way it is currently written [1] - specifically it
>>> doesn't properly separate method lookup from method invocation. So I've
>>> added the capability to separate lookup and invocation for use with the
>>> private interface methods - I have not tried to address shortcomings of
>>> the existing tests. Though I did fix the return value checking logic!
>>> And did some clarifying comments and renaming in a couple of place.
>>>
>>> Still on the test I can't add the negative tests I would like to add
>>> because they actually pass due to a different long standing bug in JDI -
>>> [2]. So the actual private interface method testing is very simple: can
>>> I get the Method from the InterfaceType for the interface declaring the
>>> method? Can I then invoke that method on an instance of a class that
>>> implements the interface.
>>>
>>> Thanks,
>>> David
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>

From david.holmes at oracle.com  Tue Oct 18 23:56:52 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 09:56:52 +1000
Subject: RFR: 8165827: Support private interface methods in JNI, JDWP, JDI
	and JDB
In-Reply-To: <b055b92f-8179-b4dc-05d4-8ed345ed412c@oracle.com>
References: <566649ff-db95-c95c-16c4-291371d28dff@oracle.com>
	<175c676a-7cf6-4a3e-7978-e861088b9310@oracle.com>
	<b055b92f-8179-b4dc-05d4-8ed345ed412c@oracle.com>
Message-ID: <4a476eaa-07be-d029-6fef-c7ebbb357708@oracle.com>

On 19/10/2016 12:27 AM, Daniel D. Daugherty wrote:
> On 10/17/16 9:59 PM, David Holmes wrote:
>> Hi Lois, Dan, Serguei,
>>
>> Went to push this today and realized I had left off the updated JNI
>> method lookup tests. As I said in the bug report JNI behaves as
>> expected, but there weren't any testcases so I added them:
>>
>> http://cr.openjdk.java.net/~dholmes/8165827/webrev.hotspot/
>
> test/runtime/jni/PrivateInterfaceMethods/PrivateInterfaceMethods.java
>     L74:         lookup(A.class.getName(), "onlyA", null); //should succeed
>     :
>     :
>     L90:         lookup(Impl2.class.getName(), "onlyC",
> NoSuchMethodError.class); //should fail
>         nit: please add a space after '//'
>
>     L138:         String desc = " Lookup of " + definingClass + "." +
> method;
>         nit: any particular reason for the space before Lookup?

Just checking your powers of observation :)

>
>
> test/runtime/jni/PrivateInterfaceMethods/libPrivateInterfaceMethods.c
>     L78: blank line at the end of the file. jcheck will probably complain.

Yeah I deal with that at commit time.

>
> Thumbs up! Feel free to ignore the nits. No need to see a new
> webrev if you fix them.

Thanks. Will fix the nits.

David

> Dan
>
>
>>
>> Thanks,
>> David
>>
>> On 11/10/2016 11:55 AM, David Holmes wrote:
>>> Turns out the only place changes were needed were in JDI.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8165827
>>>
>>> webrev: http://cr.openjdk.java.net/~dholmes/8165827/webrev/
>>>
>>> The spec change in ObjectReference is very simple and there is a CCC
>>> request in progress to ratify that change.
>>>
>>> The implementation change in ObjectReferenceImpl mirrors the updated
>>> spec and use the same format as already present in the class version of
>>> the check method.
>>>
>>> The test is a little more complex. This is obviously an extension to
>>> what is already tested in InterfaceMethodsTest. However IMT has a number
>>> of problem with the way it is currently written [1] - specifically it
>>> doesn't properly separate method lookup from method invocation. So I've
>>> added the capability to separate lookup and invocation for use with the
>>> private interface methods - I have not tried to address shortcomings of
>>> the existing tests. Though I did fix the return value checking logic!
>>> And did some clarifying comments and renaming in a couple of place.
>>>
>>> Still on the test I can't add the negative tests I would like to add
>>> because they actually pass due to a different long standing bug in JDI -
>>> [2]. So the actual private interface method testing is very simple: can
>>> I get the Method from the InterfaceType for the interface declaring the
>>> method? Can I then invoke that method on an instance of a class that
>>> implements the interface.
>>>
>>> Thanks,
>>> David
>>>
>>> [1] https://bugs.openjdk.java.net/browse/JDK-8166453
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8167416
>

From david.holmes at oracle.com  Wed Oct 19 01:21:25 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 11:21:25 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUw6NvFL1_p8u2hXxWq_b=3DxXyWcQPfaeGJfJyhwTbW2A@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
	<60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>
	<CAA-vtUw6NvFL1_p8u2hXxWq_b=3DxXyWcQPfaeGJfJyhwTbW2A@mail.gmail.com>
Message-ID: <ef8d1d14-b746-bd04-d958-25f21b770a9d@oracle.com>

On 18/10/2016 3:39 PM, Thomas St?fe wrote:
> Hi David, Max,
>
> I changed the asserts according to Max' suggestion. Instead of checking
> inside flag_to_index, now I check before callers of this function use
> this value to access memory.

I don't see where Max suggested that?? It doesn't make sense to me to 
have all the callers of flag_to_index check what it returned instead of 
doing it inside flag_to_index.

> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html
>
> As David correctly writes, this is technically not a bug, so I guess
> this will have to wait until java 10.

Yes, afraid so.

Thanks,
David

> Kind Regards, Thomas
>
>
>
> On Fri, Oct 14, 2016 at 12:57 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 13/10/2016 10:53 PM, Thomas St?fe wrote:
>
>         Hi David,
>
>         On Thu, Oct 13, 2016 at 12:08 PM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>             Hi Thomas,
>
>             On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>
>                 Hi all,
>
>                 may I have plase a review for this tiny change? It just adds
>                 some assert to NMT.
>
>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>                 <https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>>
>                 webrev:
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>
>
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>>
>                 MEMFLAGS/webrev.00/webrev/
>
>                 We had an ugly memory overwrite caused by this -
>         ultimately our
>                 fault, because we fed an invalid memory flag to NMT -
>         but it was
>                 difficult to find. An assert would have saved some time.
>
>
>             I'm a little perplexed with asserting that something of MEMFLAGS
>             type must be an actual MEMFLAGS value - it implies the caller is
>             coercing plain int to MEMFLAGS, and I don't have much
>         sympathy if
>             they mess that up. Can't help wondering if there is some
>         clever C++
>             trick to flag bad conversions at compile-time?
>
>
>         The error was caused by an uninitialized variable of type
>         MEMFLAGS. This
>         was our fault, we have heavily modified allocation.hpp and
>         introduced an
>         error then merging changes from upstream. Due to a merging error
>         this
>         lead to a case where Arena::_flags was not initialized and
>         contained a
>         very large value.
>
>
>     Ah I see. Lack of default initialization can be annoying :)
>
>         I admit it looks funny. If it bothers you, I could instead check the
>         returned index to be in the range for the size of the _malloc
>         array in
>         MallocMemorySnapshot::by_type(). Technically, it would mean the
>         same.
>
>
>     So I just realized that here:
>
>       62   // Map memory type to human readable name
>       63   static const char* flag_to_name(MEMFLAGS flag) {
>       64     assert(flag >= 0 && flag < mt_number_of_types, "Invalid
>     flag value %d.", (int)flag);
>       65     return _memory_type_names[flag_to_index(flag)];
>       66   }
>
>     we call flag_to_index, so the assert is redundant as it is already
>     in flag_to_index. Then presumably we change flag_to_index to
>     something like this:
>
>          static inline int flag_to_index(MEMFLAGS flag) {
>            int index = (flag & 0xff);
>            assert(index >= 0 && index < mt_number_of_types, "Invalid
>     flag value %d.", (int)flag);
>            return index;
>          }
>
>     so we're validating the index rather than the flag.
>
>     Cheers,
>     David
>
>
>
>
>             The function that takes the index should validate the index,
>         so that
>             is fine.
>
>             Which one were you actually passing the bad value to? :)
>
>             This isn't a strong objection just musing if we can do
>         better. And
>             as the hs repos are still closed, and likely to remain so
>         till early
>             next week, we have some slack time :)
>
>
>         :) Sure.
>
>         Kind Regards, Thomas
>
>
>             Cheers,
>             David
>
>                 Thank you!
>
>                 Thomas
>
>
>

From david.holmes at oracle.com  Wed Oct 19 02:01:57 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 12:01:57 +1000
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<526af6b4-3630-c05d-db8f-b489ac7a2167@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
Message-ID: <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>

Pushed.

David

On 11/10/2016 11:12 AM, David Holmes wrote:
> Ok. I will sponsor this once hs is open again.
>
> Thanks,
> David
>
> On 6/10/2016 10:10 PM, Alan Burlison wrote:
>> On 04/10/2016 19:37, Alan Burlison wrote:
>>
>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>>>> already being included.
>>>
>>> Cool, I'll pop that in instead - thanks!
>>
>> Done, webrev updated, jprt hotspot testset is clean.
>>

From thomas.stuefe at gmail.com  Wed Oct 19 05:17:01 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 19 Oct 2016 07:17:01 +0200
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <ef8d1d14-b746-bd04-d958-25f21b770a9d@oracle.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
	<60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>
	<CAA-vtUw6NvFL1_p8u2hXxWq_b=3DxXyWcQPfaeGJfJyhwTbW2A@mail.gmail.com>
	<ef8d1d14-b746-bd04-d958-25f21b770a9d@oracle.com>
Message-ID: <CAA-vtUwDJ8RypD=Kn_xyNkYgcy-H=jPw6z6cqUTK1LZ2cFJESA@mail.gmail.com>

On Wed, Oct 19, 2016 at 3:21 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 18/10/2016 3:39 PM, Thomas St?fe wrote:
>
>> Hi David, Max,
>>
>> I changed the asserts according to Max' suggestion. Instead of checking
>> inside flag_to_index, now I check before callers of this function use
>> this value to access memory.
>>
>
> I don't see where Max suggested that??


Max wrote: " I think the decision on whether to access a slot should occur
as close to memory accessing code as possible." and proceeded to suggest
fixing VirtualMemorySnapshot::by_type() as well.


> It doesn't make sense to me to have all the callers of flag_to_index check
> what it returned instead of doing it inside flag_to_index.
>
>
I disagree. Imho it makes sense to either check the Memflags enumeration
input argument in flag_to_index() or the returned index before consumption.
In both cases one knows the valid value range. Strictly speaking checking
the index in flag_to_index() cannot be done because it is a faceless int
type whose valid values are not yet known.

It is all academical and mostly a matter of taste.


> http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul
>> d-check_MEMFLAGS/webrev.01/webrev/index.html
>>
>> As David correctly writes, this is technically not a bug, so I guess
>> this will have to wait until java 10.
>>
>
> Yes, afraid so.
>

The fix is trivial and I will try to get fc extension for this (now that
Goetz explained to me how to do this :). It seems this is done for many
other non-bug issues as well.

..Thomas


> Thanks,
> David
>
> Kind Regards, Thomas
>>
>>
>>
>> On Fri, Oct 14, 2016 at 12:57 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     On 13/10/2016 10:53 PM, Thomas St?fe wrote:
>>
>>         Hi David,
>>
>>         On Thu, Oct 13, 2016 at 12:08 PM, David Holmes
>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>         <mailto:david.holmes at oracle.com
>>
>>         <mailto:david.holmes at oracle.com>>> wrote:
>>
>>             Hi Thomas,
>>
>>             On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>>
>>                 Hi all,
>>
>>                 may I have plase a review for this tiny change? It just
>> adds
>>                 some assert to NMT.
>>
>>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8167650
>>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>>                 <https://bugs.openjdk.java.net/browse/JDK-8167650
>>         <https://bugs.openjdk.java.net/browse/JDK-8167650>>
>>                 webrev:
>>
>>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shoul
>> d-check_
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shou
>> ld-check_>
>>
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shou
>> ld-check_
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-shou
>> ld-check_>>
>>                 MEMFLAGS/webrev.00/webrev/
>>
>>                 We had an ugly memory overwrite caused by this -
>>         ultimately our
>>                 fault, because we fed an invalid memory flag to NMT -
>>         but it was
>>                 difficult to find. An assert would have saved some time.
>>
>>
>>             I'm a little perplexed with asserting that something of
>> MEMFLAGS
>>             type must be an actual MEMFLAGS value - it implies the caller
>> is
>>             coercing plain int to MEMFLAGS, and I don't have much
>>         sympathy if
>>             they mess that up. Can't help wondering if there is some
>>         clever C++
>>             trick to flag bad conversions at compile-time?
>>
>>
>>         The error was caused by an uninitialized variable of type
>>         MEMFLAGS. This
>>         was our fault, we have heavily modified allocation.hpp and
>>         introduced an
>>         error then merging changes from upstream. Due to a merging error
>>         this
>>         lead to a case where Arena::_flags was not initialized and
>>         contained a
>>         very large value.
>>
>>
>>     Ah I see. Lack of default initialization can be annoying :)
>>
>>         I admit it looks funny. If it bothers you, I could instead check
>> the
>>         returned index to be in the range for the size of the _malloc
>>         array in
>>         MallocMemorySnapshot::by_type(). Technically, it would mean the
>>         same.
>>
>>
>>     So I just realized that here:
>>
>>       62   // Map memory type to human readable name
>>       63   static const char* flag_to_name(MEMFLAGS flag) {
>>       64     assert(flag >= 0 && flag < mt_number_of_types, "Invalid
>>     flag value %d.", (int)flag);
>>       65     return _memory_type_names[flag_to_index(flag)];
>>       66   }
>>
>>     we call flag_to_index, so the assert is redundant as it is already
>>     in flag_to_index. Then presumably we change flag_to_index to
>>     something like this:
>>
>>          static inline int flag_to_index(MEMFLAGS flag) {
>>            int index = (flag & 0xff);
>>            assert(index >= 0 && index < mt_number_of_types, "Invalid
>>     flag value %d.", (int)flag);
>>            return index;
>>          }
>>
>>     so we're validating the index rather than the flag.
>>
>>     Cheers,
>>     David
>>
>>
>>
>>
>>             The function that takes the index should validate the index,
>>         so that
>>             is fine.
>>
>>             Which one were you actually passing the bad value to? :)
>>
>>             This isn't a strong objection just musing if we can do
>>         better. And
>>             as the hs repos are still closed, and likely to remain so
>>         till early
>>             next week, we have some slack time :)
>>
>>
>>         :) Sure.
>>
>>         Kind Regards, Thomas
>>
>>
>>             Cheers,
>>             David
>>
>>                 Thank you!
>>
>>                 Thomas
>>
>>
>>
>>

From felix.yang at linaro.org  Wed Oct 19 05:48:59 2016
From: felix.yang at linaro.org (Felix Yang)
Date: Wed, 19 Oct 2016 13:48:59 +0800
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <4561489f-c037-7252-bb36-b3446db5b62e@redhat.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
	<77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
	<661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>
	<CACc5Y6RAv_cci=AygPZSeDA13sHFxJY2DqiUXy_6np6Ccsd=nA@mail.gmail.com>
	<4561489f-c037-7252-bb36-b3446db5b62e@redhat.com>
Message-ID: <CACc5Y6QTYOnhpPD5FD9FvbyRZOO=73yeqz_DqRsh8uV6C3ybRg@mail.gmail.com>

Hi,

    I have pushed the patch to jdk9/hs/hotspot repo and also backported to
aarch64-port/jdk8u/hotspot repo.
    I checked the code of icedtea7-forest/hotspot and it seems to me that
it does not have the issue, please take a look.

Thanks,
Felix


On 18 October 2016 at 21:03, Andrew Haley <aph at redhat.com> wrote:

> On 18/10/16 13:51, Felix Yang wrote:
> >     Is it OK to push this patch into repo: http://hg.openjdk.java.net/
> > jdk9/hs/hotspot for now?
>
> Yes, but whoever does this should also apply it to
> http://hg.openjdk.java.net/aarch64-port/jdk8u/hotspot/ and
> http://icedtea.classpath.org/hg/icedtea7-forest/hotspot/.
>
> Andrew.
>
>

From thomas.stuefe at gmail.com  Wed Oct 19 06:10:04 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 19 Oct 2016 08:10:04 +0200
Subject: "os" - make this a real namespace?
Message-ID: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>

Hi all,

a small question.

I sometimes stumble over the fact that "os" is a class, not a namespace.
And that we include the platform dependent additions into the middle of
this class.

This has a number of repercussions, like not being able to include the
platform dependent files (os_<os>_<cpu>) directly, not being able to
forward declare functions from the "os" namespace (e.g. os::malloc) etc. I
also cannot split implementations from "os" functions to different
implementation files without problems.

It seems to me all compiler nowadays support namespaces, would it not make
sense to convert "os" to a real namespace?

While we are at it, what is the reason for the "<os>" sub classes? e.g.
os::Bsd, os::Aix etc? It makes integrating patches between platforms
difficult and, to me, does not seem to serve any clear purpose.

If the purpose is to be a very low wrapper around OS particularities, it
makes no sense to have them in the "os" namespace and to make them visible
to the shared sections of the VM. E.g. there should be no reason to access
"os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
functions outside implementations specific for Posix platforms.

Thanks, and Kind Regards, Thomas

From david.holmes at oracle.com  Wed Oct 19 06:49:35 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 16:49:35 +1000
Subject: RFR(xxs): 8167650: NMT should check for invalid MEMFLAGS.
In-Reply-To: <CAA-vtUwDJ8RypD=Kn_xyNkYgcy-H=jPw6z6cqUTK1LZ2cFJESA@mail.gmail.com>
References: <CAA-vtUz-=_-Xt86xzt6F2eknz750LiQ1xAFgkn9E0b2+G0HiZw@mail.gmail.com>
	<98572424-37a1-90b9-30ef-3be0691e7bf0@oracle.com>
	<CAA-vtUz0=w9rBbGyvEvwWZFjoFb+QXyix4VeLKdh1+7rYV8dNQ@mail.gmail.com>
	<60cafd21-1945-b7d4-b0bf-db92a93dfdcf@oracle.com>
	<CAA-vtUw6NvFL1_p8u2hXxWq_b=3DxXyWcQPfaeGJfJyhwTbW2A@mail.gmail.com>
	<ef8d1d14-b746-bd04-d958-25f21b770a9d@oracle.com>
	<CAA-vtUwDJ8RypD=Kn_xyNkYgcy-H=jPw6z6cqUTK1LZ2cFJESA@mail.gmail.com>
Message-ID: <e9027772-0b9b-89ec-48ee-e8bcfe585fbe@oracle.com>


On 19/10/2016 3:17 PM, Thomas St?fe wrote:
>
>
> On Wed, Oct 19, 2016 at 3:21 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 18/10/2016 3:39 PM, Thomas St?fe wrote:
>
>         Hi David, Max,
>
>         I changed the asserts according to Max' suggestion. Instead of
>         checking
>         inside flag_to_index, now I check before callers of this
>         function use
>         this value to access memory.
>
>
>     I don't see where Max suggested that??
>
>
> Max wrote: " I think the decision on whether to access a slot should
> occur as close to memory accessing code as possible." and proceeded to
> suggest fixing VirtualMemorySnapshot::by_type() as well.

I did not interpret that comment that way, and was puzzled by the 
reference to by_type.
>
>     It doesn't make sense to me to have all the callers of flag_to_index
>     check what it returned instead of doing it inside flag_to_index.
>
>
> I disagree. Imho it makes sense to either check the Memflags enumeration
> input argument in flag_to_index() or the returned index before
> consumption. In both cases one knows the valid value range. Strictly
> speaking checking the index in flag_to_index() cannot be done because it
> is a faceless int type whose valid values are not yet known.

The index has to fall in the range 0 <= index <= mt_number_of_types, and 
I was suggesting that it makes more sense to verify this once in 
flag_to_index() than in all the callers of flag_to_index.

David

> It is all academical and mostly a matter of taste.
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_MEMFLAGS/webrev.01/webrev/index.html>
>
>         As David correctly writes, this is technically not a bug, so I guess
>         this will have to wait until java 10.
>
>
>     Yes, afraid so.
>
>
> The fix is trivial and I will try to get fc extension for this (now that
> Goetz explained to me how to do this :). It seems this is done for many
> other non-bug issues as well.
>
> ..Thomas
>
>
>     Thanks,
>     David
>
>         Kind Regards, Thomas
>
>
>
>         On Fri, Oct 14, 2016 at 12:57 AM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>             On 13/10/2016 10:53 PM, Thomas St?fe wrote:
>
>                 Hi David,
>
>                 On Thu, Oct 13, 2016 at 12:08 PM, David Holmes
>                 <david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com> <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>
>                 <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>
>
>                 <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>>> wrote:
>
>                     Hi Thomas,
>
>                     On 13/10/2016 3:49 PM, Thomas St?fe wrote:
>
>                         Hi all,
>
>                         may I have plase a review for this tiny change?
>         It just adds
>                         some assert to NMT.
>
>                         Bug:
>         https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>                 <https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>>
>
>         <https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>
>                 <https://bugs.openjdk.java.net/browse/JDK-8167650
>         <https://bugs.openjdk.java.net/browse/JDK-8167650>>>
>                         webrev:
>
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>
>
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>>
>
>
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>
>
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8167650-NMT-should-check_>>>
>                         MEMFLAGS/webrev.00/webrev/
>
>                         We had an ugly memory overwrite caused by this -
>                 ultimately our
>                         fault, because we fed an invalid memory flag to
>         NMT -
>                 but it was
>                         difficult to find. An assert would have saved
>         some time.
>
>
>                     I'm a little perplexed with asserting that something
>         of MEMFLAGS
>                     type must be an actual MEMFLAGS value - it implies
>         the caller is
>                     coercing plain int to MEMFLAGS, and I don't have much
>                 sympathy if
>                     they mess that up. Can't help wondering if there is some
>                 clever C++
>                     trick to flag bad conversions at compile-time?
>
>
>                 The error was caused by an uninitialized variable of type
>                 MEMFLAGS. This
>                 was our fault, we have heavily modified allocation.hpp and
>                 introduced an
>                 error then merging changes from upstream. Due to a
>         merging error
>                 this
>                 lead to a case where Arena::_flags was not initialized and
>                 contained a
>                 very large value.
>
>
>             Ah I see. Lack of default initialization can be annoying :)
>
>                 I admit it looks funny. If it bothers you, I could
>         instead check the
>                 returned index to be in the range for the size of the
>         _malloc
>                 array in
>                 MallocMemorySnapshot::by_type(). Technically, it would
>         mean the
>                 same.
>
>
>             So I just realized that here:
>
>               62   // Map memory type to human readable name
>               63   static const char* flag_to_name(MEMFLAGS flag) {
>               64     assert(flag >= 0 && flag < mt_number_of_types, "Invalid
>             flag value %d.", (int)flag);
>               65     return _memory_type_names[flag_to_index(flag)];
>               66   }
>
>             we call flag_to_index, so the assert is redundant as it is
>         already
>             in flag_to_index. Then presumably we change flag_to_index to
>             something like this:
>
>                  static inline int flag_to_index(MEMFLAGS flag) {
>                    int index = (flag & 0xff);
>                    assert(index >= 0 && index < mt_number_of_types, "Invalid
>             flag value %d.", (int)flag);
>                    return index;
>                  }
>
>             so we're validating the index rather than the flag.
>
>             Cheers,
>             David
>
>
>
>
>                     The function that takes the index should validate
>         the index,
>                 so that
>                     is fine.
>
>                     Which one were you actually passing the bad value to? :)
>
>                     This isn't a strong objection just musing if we can do
>                 better. And
>                     as the hs repos are still closed, and likely to
>         remain so
>                 till early
>                     next week, we have some slack time :)
>
>
>                 :) Sure.
>
>                 Kind Regards, Thomas
>
>
>                     Cheers,
>                     David
>
>                         Thank you!
>
>                         Thomas
>
>
>
>

From david.holmes at oracle.com  Wed Oct 19 07:02:53 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 17:02:53 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
Message-ID: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>

Hi Thomas,

On 19/10/2016 4:10 PM, Thomas St?fe wrote:
> Hi all,
>
> a small question.
>
> I sometimes stumble over the fact that "os" is a class, not a namespace.

?? AFAIK everything in hotspot is a class not a namespace - we don't use 
"namespaces".

> And that we include the platform dependent additions into the middle of
> this class.

Build-time specialization. It allows for the os API to actually be 
different on different platforms, as opposed to just being implemented 
differently.

> This has a number of repercussions, like not being able to include the
> platform dependent files (os_<os>_<cpu>) directly, not being able to

I'd call that a feature - they are not intended to be standalone APIs.

> forward declare functions from the "os" namespace (e.g. os::malloc) etc. I
> also cannot split implementations from "os" functions to different
> implementation files without problems.
>
> It seems to me all compiler nowadays support namespaces, would it not make
> sense to convert "os" to a real namespace?

Not being a C++ aficionado I'm not sure exactly what that would entail - 
as far as I know we don't use C++ namespaces anywhere in hotspot.

> While we are at it, what is the reason for the "<os>" sub classes? e.g.
> os::Bsd, os::Aix etc? It makes integrating patches between platforms
> difficult and, to me, does not seem to serve any clear purpose.

Must admit this arrangement has also had me confused at times. I think 
it is way to add a per-OS helper class for the main os API implementation.

> If the purpose is to be a very low wrapper around OS particularities, it
> makes no sense to have them in the "os" namespace and to make them visible
> to the shared sections of the VM. E.g. there should be no reason to access
> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
> functions outside implementations specific for Posix platforms.

Not sure how you make, for example os::BSD accessible from all classes 
in os/bsd/vm yet not be visible anywhere else ??

Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm.

There is a lot of cleanup in this area slated for the future - hopefully 
Java 10. POSIX refactoring etc.

Cheers,
David

> Thanks, and Kind Regards, Thomas
>

From aph at redhat.com  Wed Oct 19 07:50:55 2016
From: aph at redhat.com (Andrew Haley)
Date: Wed, 19 Oct 2016 08:50:55 +0100
Subject: RFR(XS): 8167421: AArch64: in one core system, fatal error:
	Illegal threadstate encountered
In-Reply-To: <CACc5Y6QTYOnhpPD5FD9FvbyRZOO=73yeqz_DqRsh8uV6C3ybRg@mail.gmail.com>
References: <CAF1YaiBaMVQc21Q1JRxD1JBMFcZynsU6g3kVri2EY3OJtzoMoA@mail.gmail.com>
	<77bcc1eb-b43d-1568-4c56-8c091d0da5e9@oracle.com>
	<661c26c9-dd2d-0cc0-9a7c-4d09b5bc118e@redhat.com>
	<CACc5Y6RAv_cci=AygPZSeDA13sHFxJY2DqiUXy_6np6Ccsd=nA@mail.gmail.com>
	<4561489f-c037-7252-bb36-b3446db5b62e@redhat.com>
	<CACc5Y6QTYOnhpPD5FD9FvbyRZOO=73yeqz_DqRsh8uV6C3ybRg@mail.gmail.com>
Message-ID: <7cec6ce2-68b0-b899-9344-bb4738d13f12@redhat.com>

On 19/10/16 06:48, Felix Yang wrote:
>     I have pushed the patch to jdk9/hs/hotspot repo and also backported to
> aarch64-port/jdk8u/hotspot repo.

OK, thanks.

>     I checked the code of icedtea7-forest/hotspot and it seems to me that
> it does not have the issue, please take a look.

Not, it doesn't, you are right.  I wonder how that happened.

Andrew.


From thomas.stuefe at gmail.com  Wed Oct 19 09:07:03 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 19 Oct 2016 11:07:03 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
Message-ID: <CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>

Hi David!

On Wed, Oct 19, 2016 at 9:02 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 19/10/2016 4:10 PM, Thomas St?fe wrote:
>
>> Hi all,
>>
>> a small question.
>>
>> I sometimes stumble over the fact that "os" is a class, not a namespace.
>>
>
> ?? AFAIK everything in hotspot is a class not a namespace - we don't use
> "namespaces".
>
>
I meant that it is used as one would use a C++ namespace, not a class. As
far as I see any class derived from AllStatic is actually a namespace in
the sense that it serves as bracket for a number of related static
functions.


> And that we include the platform dependent additions into the middle of
>> this class.
>>
>
> Build-time specialization. It allows for the os API to actually be
> different on different platforms, as opposed to just being implemented
> differently.
>
>
The "os" API is a shared, platform independent API, so there should be no
difference between platforms. There should be no need for it to export
platform specifics - its whole intent is to hide those specifics.

Looking into the various os_<os>.hpp files, I see:
1) things where the declaration does differ between oses and therefore they
cannot be called in shared code without #ifdefs (e.g. all <os> subclasses).
2)Things where the declaration is shared but the implementation differs.
Again, two cases:
    a) Either implementation is not time critical. In that case the
declaration should live in os.hpp and the implementation should live in
some platform dependent C++ file.
    b) Or implementation is time critical and must be inline, in which case
a separate platform dependent header would be needed.

For (1) and (2b), C++ namespaces would be more convenient. Now, you are
forced to include the platform specific file into the class os{}
declaration, because there can just be one:

os.hpp
class os {
...
#include <os_xxx.hpp>

};

. With a namespace, you can add functions to the namespace in various
disjunct places and hence could write:

os.hpp
namespace os { ... functions ... }

os_xxx.hpp
namespace os { ... functions ... }

which would be more natural and


> This has a number of repercussions, like not being able to include the
>> platform dependent files (os_<os>_<cpu>) directly, not being able to
>>
>
> I'd call that a feature - they are not intended to be standalone APIs.


Right now os_<os>.hpp exports the os::<os> api, which one may want to use
separately from "os" because they expose platform dependent APIs which are
conceptionally lower than the os namespace. At least that is how I always
did interpret the intention behind os::<os>.

But actually, because the "Aix" class is part of "os", cannot be used
separately and is exposed to the whole of the VM, I always avoided putting
anything os::Aix if it could be helped. Hence, for AIX, we added
porting_aix.hpp for AIX specific functions which are not to be used outside
os/aix/vm. Or mostly just plain left functions to be file scope static
inside os_aix.cpp. So, os::Aix was pretty useless for me as a porter.


>
>
> forward declare functions from the "os" namespace (e.g. os::malloc) etc. I
>> also cannot split implementations from "os" functions to different
>> implementation files without problems.
>>
>> It seems to me all compiler nowadays support namespaces, would it not make
>> sense to convert "os" to a real namespace?
>>
>
> Not being a C++ aficionado I'm not sure exactly what that would entail -
> as far as I know we don't use C++ namespaces anywhere in hotspot.
>
>
We start using all kinds of modern C++ features. Templates pop up all over
the place. Namespaces in contrast are an old and easily understood feature.
We already use it inside our own port.


> While we are at it, what is the reason for the "<os>" sub classes? e.g.
>> os::Bsd, os::Aix etc? It makes integrating patches between platforms
>> difficult and, to me, does not seem to serve any clear purpose.
>>
>
> Must admit this arrangement has also had me confused at times. I think it
> is way to add a per-OS helper class for the main os API implementation.
>
> If the purpose is to be a very low wrapper around OS particularities, it
>> makes no sense to have them in the "os" namespace and to make them visible
>> to the shared sections of the VM. E.g. there should be no reason to access
>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
>> functions outside implementations specific for Posix platforms.
>>
>
> Not sure how you make, for example os::BSD accessible from all classes in
> os/bsd/vm yet not be visible anywhere else ??
>
>
I think there is no real reason for os::Bsd to exist at all. Either we have
shared functions with platform dependent implementation, then they should
be declared in "os". Or they are completely platform specific, then they
can be moved to a platform specific header outside of "os" like we did with
porting_aix.hpp.

 Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm.

>
> There is a lot of cleanup in this area slated for the future - hopefully
> Java 10. POSIX refactoring etc.
>
>
Sure!

Kind Regards, Thomas


> Cheers,
> David
>
>
> Thanks, and Kind Regards, Thomas
>>
>>

From david.holmes at oracle.com  Wed Oct 19 12:19:25 2016
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 19 Oct 2016 22:19:25 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
	<CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>
Message-ID: <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com>

Hi Thomas,

On 19/10/2016 7:07 PM, Thomas St?fe wrote:
> Hi David!
>
> On Wed, Oct 19, 2016 at 9:02 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     On 19/10/2016 4:10 PM, Thomas St?fe wrote:
>
>         Hi all,
>
>         a small question.
>
>         I sometimes stumble over the fact that "os" is a class, not a
>         namespace.
>
>
>     ?? AFAIK everything in hotspot is a class not a namespace - we don't
>     use "namespaces".
>
>
> I meant that it is used as one would use a C++ namespace, not a class.
> As far as I see any class derived from AllStatic is actually a namespace
> in the sense that it serves as bracket for a number of related static
> functions.

Okay ... if that is what you mean by a namespace ... I always thought 
namespaces were like packages, a level up from classes.

>
>         And that we include the platform dependent additions into the
>         middle of this class.
>
>
>     Build-time specialization. It allows for the os API to actually be
>     different on different platforms, as opposed to just being
>     implemented differently.
>
>
> The "os" API is a shared, platform independent API, so there should be
> no difference between platforms. There should be no need for it to
> export platform specifics - its whole intent is to hide those specifics.

But that is your current design perspective of what the os API should 
be. What it is is something with a very long history and which has had 
to accommodate different things over time. At one point a lot of the JDK 
native code would call into the VM for functionality that is now 
directly implemented in JDK native code. There's a lot of historical 
baggage here.

> Looking into the various os_<os>.hpp files, I see:
> 1) things where the declaration does differ between oses and therefore
> they cannot be called in shared code without #ifdefs (e.g. all <os>
> subclasses).
> 2)Things where the declaration is shared but the implementation differs.
> Again, two cases:
>     a) Either implementation is not time critical. In that case the
> declaration should live in os.hpp and the implementation should live in
> some platform dependent C++ file.
>     b) Or implementation is time critical and must be inline, in which
> case a separate platform dependent header would be needed.
>
> For (1) and (2b), C++ namespaces would be more convenient. Now, you are
> forced to include the platform specific file into the class os{}
> declaration, because there can just be one:
>
> os.hpp
> class os {
> ...
> #include <os_xxx.hpp>
>
> };
>
> . With a namespace, you can add functions to the namespace in various
> disjunct places and hence could write:
>
> os.hpp
> namespace os { ... functions ... }
>
> os_xxx.hpp
> namespace os { ... functions ... }
>
> which would be more natural and

Yes I can see that as an alternative way to expand the os API. Though I 
still prefer to group functionality in a class.

>         This has a number of repercussions, like not being able to
>         include the
>         platform dependent files (os_<os>_<cpu>) directly, not being able to
>
>
>     I'd call that a feature - they are not intended to be standalone APIs.
>
>
> Right now os_<os>.hpp exports the os::<os> api, which one may want to
> use separately from "os" because they expose platform dependent APIs
> which are conceptionally lower than the os namespace. At least that is
> how I always did interpret the intention behind os::<os>.

I wouldn't say lower - they extend the os API with platform specific 
functionality and concepts. The idea is that <os> specific code that 
wants to use OS facilities that are specific to that <os> access them 
through the os::<os> class.

> But actually, because the "Aix" class is part of "os", cannot be used
> separately and is exposed to the whole of the VM, I always avoided
> putting anything os::Aix if it could be helped. Hence, for AIX, we added
> porting_aix.hpp for AIX specific functions which are not to be used
> outside os/aix/vm. Or mostly just plain left functions to be file scope
> static inside os_aix.cpp. So, os::Aix was pretty useless for me as a porter.

Seems you made a decision that the os::AIX class didn't meet your ideas 
as to how platform specifics should be handled and so went with an 
alternative design. Wouldn't that make it "useless" because you chose 
not to use it?

>
>         forward declare functions from the "os" namespace (e.g.
>         os::malloc) etc. I
>         also cannot split implementations from "os" functions to different
>         implementation files without problems.
>
>         It seems to me all compiler nowadays support namespaces, would
>         it not make sense to convert "os" to a real namespace?
>
>
>     Not being a C++ aficionado I'm not sure exactly what that would
>     entail - as far as I know we don't use C++ namespaces anywhere in
>     hotspot.
>
>
> We start using all kinds of modern C++ features. Templates pop up all
> over the place. Namespaces in contrast are an old and easily understood
> feature. We already use it inside our own port.
>
>
>         While we are at it, what is the reason for the "<os>" sub
>         classes? e.g.
>         os::Bsd, os::Aix etc? It makes integrating patches between platforms
>         difficult and, to me, does not seem to serve any clear purpose.
>
>
>     Must admit this arrangement has also had me confused at times. I
>     think it is way to add a per-OS helper class for the main os API
>     implementation.
>
>         If the purpose is to be a very low wrapper around OS
>         particularities, it
>         makes no sense to have them in the "os" namespace and to make
>         them visible
>         to the shared sections of the VM. E.g. there should be no reason
>         to access
>         "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
>         functions outside implementations specific for Posix platforms.
>
>
>     Not sure how you make, for example os::BSD accessible from all
>     classes in os/bsd/vm yet not be visible anywhere else ??
>
>
> I think there is no real reason for os::Bsd to exist at all. Either we
> have shared functions with platform dependent implementation, then they
> should be declared in "os". Or they are completely platform specific,
> then they can be moved to a platform specific header outside of "os"
> like we did with porting_aix.hpp.

Sure you could do that. But 20 years ago that wasn't how things were 
designed and we have what we have today. As I said a lot of baggage.

Personally I find the nesting of the concrete os API quite natural: 
os::win32 to me is better than unrelated os and win32 classes or namespaces.

Cheers,
David
-----

>  Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm.
>
>
>     There is a lot of cleanup in this area slated for the future -
>     hopefully Java 10. POSIX refactoring etc.
>
>
> Sure!
>
> Kind Regards, Thomas
>
>
>     Cheers,
>     David
>
>
>         Thanks, and Kind Regards, Thomas
>
>

From thomas.stuefe at gmail.com  Wed Oct 19 13:54:11 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 19 Oct 2016 15:54:11 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
	<CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>
	<4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com>
Message-ID: <CAA-vtUxojOOyYRGgMX37q4UNTHJ9uxgOYBa7qYzhgQJdunWVpQ@mail.gmail.com>

Hi David,

my intent was not to attack the existing code, but to ask about the
original design intentions and possibly come up with ideas to improve it.
See my further answers inline.

On Wed, Oct 19, 2016 at 2:19 PM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 19/10/2016 7:07 PM, Thomas St?fe wrote:
>
>> Hi David!
>>
>> On Wed, Oct 19, 2016 at 9:02 AM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     On 19/10/2016 4:10 PM, Thomas St?fe wrote:
>>
>>         Hi all,
>>
>>         a small question.
>>
>>         I sometimes stumble over the fact that "os" is a class, not a
>>         namespace.
>>
>>
>>     ?? AFAIK everything in hotspot is a class not a namespace - we don't
>>     use "namespaces".
>>
>>
>> I meant that it is used as one would use a C++ namespace, not a class.
>> As far as I see any class derived from AllStatic is actually a namespace
>> in the sense that it serves as bracket for a number of related static
>> functions.
>>
>
> Okay ... if that is what you mean by a namespace ... I always thought
> namespaces were like packages, a level up from classes.
>
>
I always used namespace as a common scope for declarations which belong
together, be that classes, global functions, variables. I always thought
this is how class "os" is used in the hotspot.

I also use C++ namespace to isolate coding in large projects I do not own
but where my symbols are, for technical reasons, visible anywhere but I
want to avoid name clashes. A typical porter headache. E.g. we have a
namespace "sap" in our coding, just to keep our stuff separate from global
symbols.


>
>>         And that we include the platform dependent additions into the
>>         middle of this class.
>>
>>
>>     Build-time specialization. It allows for the os API to actually be
>>     different on different platforms, as opposed to just being
>>     implemented differently.
>>
>>
>> The "os" API is a shared, platform independent API, so there should be
>> no difference between platforms. There should be no need for it to
>> export platform specifics - its whole intent is to hide those specifics.
>>
>
> But that is your current design perspective of what the os API should be.
> What it is is something with a very long history and which has had to
> accommodate different things over time. At one point a lot of the JDK
> native code would call into the VM for functionality that is now directly
> implemented in JDK native code. There's a lot of historical baggage here.
>
>
Looking into the various os_<os>.hpp files, I see:
>> 1) things where the declaration does differ between oses and therefore
>> they cannot be called in shared code without #ifdefs (e.g. all <os>
>> subclasses).
>> 2)Things where the declaration is shared but the implementation differs.
>> Again, two cases:
>>     a) Either implementation is not time critical. In that case the
>> declaration should live in os.hpp and the implementation should live in
>> some platform dependent C++ file.
>>     b) Or implementation is time critical and must be inline, in which
>> case a separate platform dependent header would be needed.
>>
>> For (1) and (2b), C++ namespaces would be more convenient. Now, you are
>> forced to include the platform specific file into the class os{}
>> declaration, because there can just be one:
>>
>> os.hpp
>> class os {
>> ...
>> #include <os_xxx.hpp>
>>
>> };
>>
>> . With a namespace, you can add functions to the namespace in various
>> disjunct places and hence could write:
>>
>> os.hpp
>> namespace os { ... functions ... }
>>
>> os_xxx.hpp
>> namespace os { ... functions ... }
>>
>> which would be more natural and
>>
>
> Yes I can see that as an alternative way to expand the os API. Though I
> still prefer to group functionality in a class.
>
>
I would argue that the advantage of namespaces here is that the special
handling of platform specific headers is not needed anymore. Now, when
reading any os_<os>/<cpu> header, I need to keep in mind that the content
of this header gets inserted into the middle of a class definition. That is
just rather exotic and unexpected and tripped me over a few times already.


>         This has a number of repercussions, like not being able to
>>         include the
>>         platform dependent files (os_<os>_<cpu>) directly, not being able
>> to
>>
>>
>>     I'd call that a feature - they are not intended to be standalone APIs.
>>
>
There are a number of useful "os" APIs which I would sometimes like to use
without the bagage of including the whole os.hpp header. For instance,
os::malloc(). Normally, I would forward declare them, but this is not
possible for class functions.


>
>>
>> Right now os_<os>.hpp exports the os::<os> api, which one may want to
>> use separately from "os" because they expose platform dependent APIs
>> which are conceptionally lower than the os namespace. At least that is
>> how I always did interpret the intention behind os::<os>.
>>
>
> I wouldn't say lower - they extend the os API with platform specific
> functionality and concepts. The idea is that <os> specific code that wants
> to use OS facilities that are specific to that <os> access them through the
> os::<os> class.


>
> But actually, because the "Aix" class is part of "os", cannot be used
>> separately and is exposed to the whole of the VM, I always avoided
>> putting anything os::Aix if it could be helped. Hence, for AIX, we added
>> porting_aix.hpp for AIX specific functions which are not to be used
>> outside os/aix/vm. Or mostly just plain left functions to be file scope
>> static inside os_aix.cpp. So, os::Aix was pretty useless for me as a
>> porter.
>>
>
> Seems you made a decision that the os::AIX class didn't meet your ideas as
> to how platform specifics should be handled and so went with an alternative
> design. Wouldn't that make it "useless" because you chose not to use it?


When the AIX port started, the os interface was not well documented. Nor
was there anyone I could ask because OpenJDK did not yet exist. So I had to
deduce the intent of the original authors from the code and try to fill it
with life as best as possible.


>
>>         forward declare functions from the "os" namespace (e.g.
>>         os::malloc) etc. I
>>         also cannot split implementations from "os" functions to different
>>         implementation files without problems.
>>
>>         It seems to me all compiler nowadays support namespaces, would
>>         it not make sense to convert "os" to a real namespace?
>>
>>
>>     Not being a C++ aficionado I'm not sure exactly what that would
>>     entail - as far as I know we don't use C++ namespaces anywhere in
>>     hotspot.
>>
>>
>> We start using all kinds of modern C++ features. Templates pop up all
>> over the place. Namespaces in contrast are an old and easily understood
>> feature. We already use it inside our own port.
>>
>>
>>         While we are at it, what is the reason for the "<os>" sub
>>         classes? e.g.
>>         os::Bsd, os::Aix etc? It makes integrating patches between
>> platforms
>>         difficult and, to me, does not seem to serve any clear purpose.
>>
>>
>>     Must admit this arrangement has also had me confused at times. I
>>     think it is way to add a per-OS helper class for the main os API
>>     implementation.
>>
>>         If the purpose is to be a very low wrapper around OS
>>         particularities, it
>>         makes no sense to have them in the "os" namespace and to make
>>         them visible
>>         to the shared sections of the VM. E.g. there should be no reason
>>         to access
>>         "os::Bsd" functions from outside os/bsd/vm, or to access
>> "os::Posix"
>>         functions outside implementations specific for Posix platforms.
>>
>>
>>     Not sure how you make, for example os::BSD accessible from all
>>     classes in os/bsd/vm yet not be visible anywhere else ??
>>
>>
>> I think there is no real reason for os::Bsd to exist at all. Either we
>> have shared functions with platform dependent implementation, then they
>> should be declared in "os". Or they are completely platform specific,
>> then they can be moved to a platform specific header outside of "os"
>> like we did with porting_aix.hpp.
>>
>
> Sure you could do that. But 20 years ago that wasn't how things were
> designed and we have what we have today. As I said a lot of baggage.
>
> Personally I find the nesting of the concrete os API quite natural:
> os::win32 to me is better than unrelated os and win32 classes or namespaces.
>
>
I think there is code structure, and then there is exposure. Both are
separate things but currently interwoven in the hotspot.

Putting Win32 specifics into os::win32 makes sense. But there is no reason
to expose os::win32 to shared parts of the VM, considering how much the
hotspot programmers at Oracle dislike platform specific #ifdefs in shared
code.

I understand that this is historical? Before namespaces, you had to define
a class as common bracket, and a class definition must be complete to be
valid, so os::<OS> is automatically visible everywhere. But with namespaces
this could be disentangled. We could have a globally visible "os" namespace
with platform independent shared functions, as well as an "os::win32"
namespace which is only visible in windows specific implementation files.

Kind Regards, Thomas


> Cheers,
> David
> -----
>
>
>  Plus it also needs to potentially be visible from os_cpu/bsd_XXX/vm.
>>
>>
>>     There is a lot of cleanup in this area slated for the future -
>>     hopefully Java 10. POSIX refactoring etc.
>>
>>
>> Sure!
>>
>> Kind Regards, Thomas
>>
>>
>>     Cheers,
>>     David
>>
>>
>>         Thanks, and Kind Regards, Thomas
>>
>>
>>

From vladimir.kozlov at oracle.com  Wed Oct 19 17:43:50 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Oct 2016 10:43:50 -0700
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<c7a21cca-2bd4-faa5-beda-ffb6475f58b0@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
	<5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>
Message-ID: <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com>

I missed all this review fun :)

Thank you, Alan, for cleaning this up.

The only concern I have is removal of conditional macros.

 > I've also taken the opportunity to strip out most of the '#ifndef(FOO)'
 > probes for the HW capability bit macros in vm_version_solaris_sparc.cpp.
 > They are now redundant as the macros are are in the system header files
 > from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related
 > ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and
 > AV2_SPARC_SPARC5. For those I've left the macro probes in place.

Most likely people will try to run JDK 9 on Solaris 10. Or in some kind 
of VM environment which may not have Solaris 11.1 headers. We have a lot 
such cases before that is why those macros were added.

"JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is 
fine but original code would cover more running cases.

Sorry for rumbling.

Regards,
Vladimir

On 10/18/16 7:01 PM, David Holmes wrote:
> Pushed.
>
> David
>
> On 11/10/2016 11:12 AM, David Holmes wrote:
>> Ok. I will sponsor this once hs is open again.
>>
>> Thanks,
>> David
>>
>> On 6/10/2016 10:10 PM, Alan Burlison wrote:
>>> On 04/10/2016 19:37, Alan Burlison wrote:
>>>
>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>>>>> already being included.
>>>>
>>>> Cool, I'll pop that in instead - thanks!
>>>
>>> Done, webrev updated, jprt hotspot testset is clean.
>>>

From david.holmes at oracle.com  Wed Oct 19 23:59:04 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 20 Oct 2016 09:59:04 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <CAA-vtUxojOOyYRGgMX37q4UNTHJ9uxgOYBa7qYzhgQJdunWVpQ@mail.gmail.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
	<CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>
	<4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com>
	<CAA-vtUxojOOyYRGgMX37q4UNTHJ9uxgOYBa7qYzhgQJdunWVpQ@mail.gmail.com>
Message-ID: <f83146ac-eece-4d24-4c80-b4a716021066@oracle.com>

On 19/10/2016 11:54 PM, Thomas St?fe wrote:
> Hi David,
>
> my intent was not to attack the existing code, but to ask about the
> original design intentions and possibly come up with ideas to improve
> it. See my further answers inline.

Sure - no problem. It is hard to understand the intent of the design 
when the remaining code doesn't even reflect the original design 
intentions (nor is there anyone around involved in that!). And there is 
always scope to redo this as part of the "big OS code cleanup".

The "exposure" concern is not one that I've heard expressed previously 
in relation to the hotspot code.

Cheers,
David

> On Wed, Oct 19, 2016 at 2:19 PM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     On 19/10/2016 7:07 PM, Thomas St?fe wrote:
>
>         Hi David!
>
>         On Wed, Oct 19, 2016 at 9:02 AM, David Holmes
>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>         <mailto:david.holmes at oracle.com
>         <mailto:david.holmes at oracle.com>>> wrote:
>
>             Hi Thomas,
>
>             On 19/10/2016 4:10 PM, Thomas St?fe wrote:
>
>                 Hi all,
>
>                 a small question.
>
>                 I sometimes stumble over the fact that "os" is a class,
>         not a
>                 namespace.
>
>
>             ?? AFAIK everything in hotspot is a class not a namespace -
>         we don't
>             use "namespaces".
>
>
>         I meant that it is used as one would use a C++ namespace, not a
>         class.
>         As far as I see any class derived from AllStatic is actually a
>         namespace
>         in the sense that it serves as bracket for a number of related
>         static
>         functions.
>
>
>     Okay ... if that is what you mean by a namespace ... I always
>     thought namespaces were like packages, a level up from classes.
>
>
> I always used namespace as a common scope for declarations which belong
> together, be that classes, global functions, variables. I always thought
> this is how class "os" is used in the hotspot.
>
> I also use C++ namespace to isolate coding in large projects I do not
> own but where my symbols are, for technical reasons, visible anywhere
> but I want to avoid name clashes. A typical porter headache. E.g. we
> have a namespace "sap" in our coding, just to keep our stuff separate
> from global symbols.
>
>
>
>                 And that we include the platform dependent additions
>         into the
>                 middle of this class.
>
>
>             Build-time specialization. It allows for the os API to
>         actually be
>             different on different platforms, as opposed to just being
>             implemented differently.
>
>
>         The "os" API is a shared, platform independent API, so there
>         should be
>         no difference between platforms. There should be no need for it to
>         export platform specifics - its whole intent is to hide those
>         specifics.
>
>
>     But that is your current design perspective of what the os API
>     should be. What it is is something with a very long history and
>     which has had to accommodate different things over time. At one
>     point a lot of the JDK native code would call into the VM for
>     functionality that is now directly implemented in JDK native code.
>     There's a lot of historical baggage here.
>
>
>         Looking into the various os_<os>.hpp files, I see:
>         1) things where the declaration does differ between oses and
>         therefore
>         they cannot be called in shared code without #ifdefs (e.g. all <os>
>         subclasses).
>         2)Things where the declaration is shared but the implementation
>         differs.
>         Again, two cases:
>             a) Either implementation is not time critical. In that case the
>         declaration should live in os.hpp and the implementation should
>         live in
>         some platform dependent C++ file.
>             b) Or implementation is time critical and must be inline, in
>         which
>         case a separate platform dependent header would be needed.
>
>         For (1) and (2b), C++ namespaces would be more convenient. Now,
>         you are
>         forced to include the platform specific file into the class os{}
>         declaration, because there can just be one:
>
>         os.hpp
>         class os {
>         ...
>         #include <os_xxx.hpp>
>
>         };
>
>         . With a namespace, you can add functions to the namespace in
>         various
>         disjunct places and hence could write:
>
>         os.hpp
>         namespace os { ... functions ... }
>
>         os_xxx.hpp
>         namespace os { ... functions ... }
>
>         which would be more natural and
>
>
>     Yes I can see that as an alternative way to expand the os API.
>     Though I still prefer to group functionality in a class.
>
>
> I would argue that the advantage of namespaces here is that the special
> handling of platform specific headers is not needed anymore. Now, when
> reading any os_<os>/<cpu> header, I need to keep in mind that the
> content of this header gets inserted into the middle of a class
> definition. That is just rather exotic and unexpected and tripped me
> over a few times already.
>
>
>
>                 This has a number of repercussions, like not being able to
>                 include the
>                 platform dependent files (os_<os>_<cpu>) directly, not
>         being able to
>
>
>             I'd call that a feature - they are not intended to be
>         standalone APIs.
>
>
> There are a number of useful "os" APIs which I would sometimes like to
> use without the bagage of including the whole os.hpp header. For
> instance, os::malloc(). Normally, I would forward declare them, but this
> is not possible for class functions.
>
>
>
>
>         Right now os_<os>.hpp exports the os::<os> api, which one may
>         want to
>         use separately from "os" because they expose platform dependent APIs
>         which are conceptionally lower than the os namespace. At least
>         that is
>         how I always did interpret the intention behind os::<os>.
>
>
>     I wouldn't say lower - they extend the os API with platform specific
>     functionality and concepts. The idea is that <os> specific code that
>     wants to use OS facilities that are specific to that <os> access
>     them through the os::<os> class.
>
>
>
>         But actually, because the "Aix" class is part of "os", cannot be
>         used
>         separately and is exposed to the whole of the VM, I always avoided
>         putting anything os::Aix if it could be helped. Hence, for AIX,
>         we added
>         porting_aix.hpp for AIX specific functions which are not to be used
>         outside os/aix/vm. Or mostly just plain left functions to be
>         file scope
>         static inside os_aix.cpp. So, os::Aix was pretty useless for me
>         as a porter.
>
>
>     Seems you made a decision that the os::AIX class didn't meet your
>     ideas as to how platform specifics should be handled and so went
>     with an alternative design. Wouldn't that make it "useless" because
>     you chose not to use it?
>
>
> When the AIX port started, the os interface was not well documented. Nor
> was there anyone I could ask because OpenJDK did not yet exist. So I had
> to deduce the intent of the original authors from the code and try to
> fill it with life as best as possible.
>
>
>
>                 forward declare functions from the "os" namespace (e.g.
>                 os::malloc) etc. I
>                 also cannot split implementations from "os" functions to
>         different
>                 implementation files without problems.
>
>                 It seems to me all compiler nowadays support namespaces,
>         would
>                 it not make sense to convert "os" to a real namespace?
>
>
>             Not being a C++ aficionado I'm not sure exactly what that would
>             entail - as far as I know we don't use C++ namespaces
>         anywhere in
>             hotspot.
>
>
>         We start using all kinds of modern C++ features. Templates pop
>         up all
>         over the place. Namespaces in contrast are an old and easily
>         understood
>         feature. We already use it inside our own port.
>
>
>                 While we are at it, what is the reason for the "<os>" sub
>                 classes? e.g.
>                 os::Bsd, os::Aix etc? It makes integrating patches
>         between platforms
>                 difficult and, to me, does not seem to serve any clear
>         purpose.
>
>
>             Must admit this arrangement has also had me confused at times. I
>             think it is way to add a per-OS helper class for the main os API
>             implementation.
>
>                 If the purpose is to be a very low wrapper around OS
>                 particularities, it
>                 makes no sense to have them in the "os" namespace and to
>         make
>                 them visible
>                 to the shared sections of the VM. E.g. there should be
>         no reason
>                 to access
>                 "os::Bsd" functions from outside os/bsd/vm, or to access
>         "os::Posix"
>                 functions outside implementations specific for Posix
>         platforms.
>
>
>             Not sure how you make, for example os::BSD accessible from all
>             classes in os/bsd/vm yet not be visible anywhere else ??
>
>
>         I think there is no real reason for os::Bsd to exist at all.
>         Either we
>         have shared functions with platform dependent implementation,
>         then they
>         should be declared in "os". Or they are completely platform
>         specific,
>         then they can be moved to a platform specific header outside of "os"
>         like we did with porting_aix.hpp.
>
>
>     Sure you could do that. But 20 years ago that wasn't how things were
>     designed and we have what we have today. As I said a lot of baggage.
>
>     Personally I find the nesting of the concrete os API quite natural:
>     os::win32 to me is better than unrelated os and win32 classes or
>     namespaces.
>
>
> I think there is code structure, and then there is exposure. Both are
> separate things but currently interwoven in the hotspot.
>
> Putting Win32 specifics into os::win32 makes sense. But there is no
> reason to expose os::win32 to shared parts of the VM, considering how
> much the hotspot programmers at Oracle dislike platform specific #ifdefs
> in shared code.
>
> I understand that this is historical? Before namespaces, you had to
> define a class as common bracket, and a class definition must be
> complete to be valid, so os::<OS> is automatically visible everywhere.
> But with namespaces this could be disentangled. We could have a globally
> visible "os" namespace with platform independent shared functions, as
> well as an "os::win32" namespace which is only visible in windows
> specific implementation files.
>
> Kind Regards, Thomas
>
>
>     Cheers,
>     David
>     -----
>
>
>          Plus it also needs to potentially be visible from
>         os_cpu/bsd_XXX/vm.
>
>
>             There is a lot of cleanup in this area slated for the future -
>             hopefully Java 10. POSIX refactoring etc.
>
>
>         Sure!
>
>         Kind Regards, Thomas
>
>
>             Cheers,
>             David
>
>
>                 Thanks, and Kind Regards, Thomas
>
>
>

From david.holmes at oracle.com  Thu Oct 20 00:17:13 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 20 Oct 2016 10:17:13 +1000
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <9619324d-5360-b3f0-684a-e7f1069656db@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<f2d378e5-c55e-fb2d-f630-34fd239620ca@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
	<5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>
	<9619324d-5360-b3f0-684a-e7f1069656db@oracle.com>
Message-ID: <f469e56a-6db0-a019-3022-c9861a31f9ce@oracle.com>

On 20/10/2016 3:43 AM, Vladimir Kozlov wrote:
> I missed all this review fun :)
>
> Thank you, Alan, for cleaning this up.
>
> The only concern I have is removal of conditional macros.
>
>> I've also taken the opportunity to strip out most of the '#ifndef(FOO)'
>> probes for the HW capability bit macros in vm_version_solaris_sparc.cpp.
>> They are now redundant as the macros are are in the system header files
>> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related
>> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and
>> AV2_SPARC_SPARC5. For those I've left the macro probes in place.
>
> Most likely people will try to run JDK 9 on Solaris 10. Or in some kind
> of VM environment which may not have Solaris 11.1 headers. We have a lot
> such cases before that is why those macros were added.

run or build? running should not be a problem. Building on S10 without a 
devkit has not worked for a while AFAIK.

David

> "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is
> fine but original code would cover more running cases.
>
> Sorry for rumbling.
>
> Regards,
> Vladimir
>
> On 10/18/16 7:01 PM, David Holmes wrote:
>> Pushed.
>>
>> David
>>
>> On 11/10/2016 11:12 AM, David Holmes wrote:
>>> Ok. I will sponsor this once hs is open again.
>>>
>>> Thanks,
>>> David
>>>
>>> On 6/10/2016 10:10 PM, Alan Burlison wrote:
>>>> On 04/10/2016 19:37, Alan Burlison wrote:
>>>>
>>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>>>>>> already being included.
>>>>>
>>>>> Cool, I'll pop that in instead - thanks!
>>>>
>>>> Done, webrev updated, jprt hotspot testset is clean.
>>>>

From vladimir.kozlov at oracle.com  Thu Oct 20 02:56:58 2016
From: vladimir.kozlov at oracle.com (Vladimir Kozlov)
Date: Wed, 19 Oct 2016 19:56:58 -0700
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <f469e56a-6db0-a019-3022-c9861a31f9ce@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<b3a7b5a9-b855-277c-dc96-7bec18bbae47@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
	<5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>
	<9619324d-5360-b3f0-684a-e7f1069656db@oracle.com>
	<f469e56a-6db0-a019-3022-c9861a31f9ce@oracle.com>
Message-ID: <db7600a8-28ea-022d-a572-fe9d26062198@oracle.com>

On 10/19/16 5:17 PM, David Holmes wrote:
> On 20/10/2016 3:43 AM, Vladimir Kozlov wrote:
>> I missed all this review fun :)
>>
>> Thank you, Alan, for cleaning this up.
>>
>> The only concern I have is removal of conditional macros.
>>
>>> I've also taken the opportunity to strip out most of the '#ifndef(FOO)'
>>> probes for the HW capability bit macros in vm_version_solaris_sparc.cpp.
>>> They are now redundant as the macros are are in the system header files
>>> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related
>>> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and
>>> AV2_SPARC_SPARC5. For those I've left the macro probes in place.
>>
>> Most likely people will try to run JDK 9 on Solaris 10. Or in some kind
>> of VM environment which may not have Solaris 11.1 headers. We have a lot
>> such cases before that is why those macros were added.
>
> run or build? running should not be a problem. Building on S10 without a
> devkit has not worked for a while AFAIK.

Ooh yes, you are right - it was build problem. Those macros were for 
time when we did not use devkit yet.

Everything is good then.

Thanks,
Vladimir

>
> David
>
>> "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is
>> fine but original code would cover more running cases.
>>
>> Sorry for rumbling.
>>
>> Regards,
>> Vladimir
>>
>> On 10/18/16 7:01 PM, David Holmes wrote:
>>> Pushed.
>>>
>>> David
>>>
>>> On 11/10/2016 11:12 AM, David Holmes wrote:
>>>> Ok. I will sponsor this once hs is open again.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 6/10/2016 10:10 PM, Alan Burlison wrote:
>>>>> On 04/10/2016 19:37, Alan Burlison wrote:
>>>>>
>>>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow not
>>>>>>> already being included.
>>>>>>
>>>>>> Cool, I'll pop that in instead - thanks!
>>>>>
>>>>> Done, webrev updated, jprt hotspot testset is clean.
>>>>>

From thomas.stuefe at gmail.com  Thu Oct 20 07:22:36 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 20 Oct 2016 09:22:36 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <f83146ac-eece-4d24-4c80-b4a716021066@oracle.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<682766fe-c801-2617-fde2-77bab89fb0d3@oracle.com>
	<CAA-vtUzUSQy=um7GkDQX7DscP2T-ZoMRiiM0DQUyuH4rMiu_hA@mail.gmail.com>
	<4cd640e0-ee80-0447-1d8d-1f7c014729a9@oracle.com>
	<CAA-vtUxojOOyYRGgMX37q4UNTHJ9uxgOYBa7qYzhgQJdunWVpQ@mail.gmail.com>
	<f83146ac-eece-4d24-4c80-b4a716021066@oracle.com>
Message-ID: <CAA-vtUzV6pRpmzYoPJg5mdq9KNVLSZrs0HXr7Qo8xZ8BoqmjxQ@mail.gmail.com>

On Thu, Oct 20, 2016 at 1:59 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 19/10/2016 11:54 PM, Thomas St?fe wrote:
>
>> Hi David,
>>
>> my intent was not to attack the existing code, but to ask about the
>> original design intentions and possibly come up with ideas to improve
>> it. See my further answers inline.
>>
>
> Sure - no problem. It is hard to understand the intent of the design when
> the remaining code doesn't even reflect the original design intentions (nor
> is there anyone around involved in that!). And there is always scope to
> redo this as part of the "big OS code cleanup".
>
> The "exposure" concern is not one that I've heard expressed previously in
> relation to the hotspot code.
>
> Cheers,
> David
>
>
When there is time, I may just whip up an example patch. This may be
simpler than talking. I understand this would be something for java10, so
it has to wait until there is at least a repo.

Kind Regards, Thomas


> On Wed, Oct 19, 2016 at 2:19 PM, David Holmes <david.holmes at oracle.com
>> <mailto:david.holmes at oracle.com>> wrote:
>>
>>     Hi Thomas,
>>
>>     On 19/10/2016 7:07 PM, Thomas St?fe wrote:
>>
>>         Hi David!
>>
>>         On Wed, Oct 19, 2016 at 9:02 AM, David Holmes
>>         <david.holmes at oracle.com <mailto:david.holmes at oracle.com>
>>         <mailto:david.holmes at oracle.com
>>
>>         <mailto:david.holmes at oracle.com>>> wrote:
>>
>>             Hi Thomas,
>>
>>             On 19/10/2016 4:10 PM, Thomas St?fe wrote:
>>
>>                 Hi all,
>>
>>                 a small question.
>>
>>                 I sometimes stumble over the fact that "os" is a class,
>>         not a
>>                 namespace.
>>
>>
>>             ?? AFAIK everything in hotspot is a class not a namespace -
>>         we don't
>>             use "namespaces".
>>
>>
>>         I meant that it is used as one would use a C++ namespace, not a
>>         class.
>>         As far as I see any class derived from AllStatic is actually a
>>         namespace
>>         in the sense that it serves as bracket for a number of related
>>         static
>>         functions.
>>
>>
>>     Okay ... if that is what you mean by a namespace ... I always
>>     thought namespaces were like packages, a level up from classes.
>>
>>
>> I always used namespace as a common scope for declarations which belong
>> together, be that classes, global functions, variables. I always thought
>> this is how class "os" is used in the hotspot.
>>
>> I also use C++ namespace to isolate coding in large projects I do not
>> own but where my symbols are, for technical reasons, visible anywhere
>> but I want to avoid name clashes. A typical porter headache. E.g. we
>> have a namespace "sap" in our coding, just to keep our stuff separate
>> from global symbols.
>>
>>
>>
>>                 And that we include the platform dependent additions
>>         into the
>>                 middle of this class.
>>
>>
>>             Build-time specialization. It allows for the os API to
>>         actually be
>>             different on different platforms, as opposed to just being
>>             implemented differently.
>>
>>
>>         The "os" API is a shared, platform independent API, so there
>>         should be
>>         no difference between platforms. There should be no need for it to
>>         export platform specifics - its whole intent is to hide those
>>         specifics.
>>
>>
>>     But that is your current design perspective of what the os API
>>     should be. What it is is something with a very long history and
>>     which has had to accommodate different things over time. At one
>>     point a lot of the JDK native code would call into the VM for
>>     functionality that is now directly implemented in JDK native code.
>>     There's a lot of historical baggage here.
>>
>>
>>         Looking into the various os_<os>.hpp files, I see:
>>         1) things where the declaration does differ between oses and
>>         therefore
>>         they cannot be called in shared code without #ifdefs (e.g. all
>> <os>
>>         subclasses).
>>         2)Things where the declaration is shared but the implementation
>>         differs.
>>         Again, two cases:
>>             a) Either implementation is not time critical. In that case
>> the
>>         declaration should live in os.hpp and the implementation should
>>         live in
>>         some platform dependent C++ file.
>>             b) Or implementation is time critical and must be inline, in
>>         which
>>         case a separate platform dependent header would be needed.
>>
>>         For (1) and (2b), C++ namespaces would be more convenient. Now,
>>         you are
>>         forced to include the platform specific file into the class os{}
>>         declaration, because there can just be one:
>>
>>         os.hpp
>>         class os {
>>         ...
>>         #include <os_xxx.hpp>
>>
>>         };
>>
>>         . With a namespace, you can add functions to the namespace in
>>         various
>>         disjunct places and hence could write:
>>
>>         os.hpp
>>         namespace os { ... functions ... }
>>
>>         os_xxx.hpp
>>         namespace os { ... functions ... }
>>
>>         which would be more natural and
>>
>>
>>     Yes I can see that as an alternative way to expand the os API.
>>     Though I still prefer to group functionality in a class.
>>
>>
>> I would argue that the advantage of namespaces here is that the special
>> handling of platform specific headers is not needed anymore. Now, when
>> reading any os_<os>/<cpu> header, I need to keep in mind that the
>> content of this header gets inserted into the middle of a class
>> definition. That is just rather exotic and unexpected and tripped me
>> over a few times already.
>>
>>
>>
>>                 This has a number of repercussions, like not being able to
>>                 include the
>>                 platform dependent files (os_<os>_<cpu>) directly, not
>>         being able to
>>
>>
>>             I'd call that a feature - they are not intended to be
>>         standalone APIs.
>>
>>
>> There are a number of useful "os" APIs which I would sometimes like to
>> use without the bagage of including the whole os.hpp header. For
>> instance, os::malloc(). Normally, I would forward declare them, but this
>> is not possible for class functions.
>>
>>
>>
>>
>>         Right now os_<os>.hpp exports the os::<os> api, which one may
>>         want to
>>         use separately from "os" because they expose platform dependent
>> APIs
>>         which are conceptionally lower than the os namespace. At least
>>         that is
>>         how I always did interpret the intention behind os::<os>.
>>
>>
>>     I wouldn't say lower - they extend the os API with platform specific
>>     functionality and concepts. The idea is that <os> specific code that
>>     wants to use OS facilities that are specific to that <os> access
>>     them through the os::<os> class.
>>
>>
>>
>>         But actually, because the "Aix" class is part of "os", cannot be
>>         used
>>         separately and is exposed to the whole of the VM, I always avoided
>>         putting anything os::Aix if it could be helped. Hence, for AIX,
>>         we added
>>         porting_aix.hpp for AIX specific functions which are not to be
>> used
>>         outside os/aix/vm. Or mostly just plain left functions to be
>>         file scope
>>         static inside os_aix.cpp. So, os::Aix was pretty useless for me
>>         as a porter.
>>
>>
>>     Seems you made a decision that the os::AIX class didn't meet your
>>     ideas as to how platform specifics should be handled and so went
>>     with an alternative design. Wouldn't that make it "useless" because
>>     you chose not to use it?
>>
>>
>> When the AIX port started, the os interface was not well documented. Nor
>> was there anyone I could ask because OpenJDK did not yet exist. So I had
>> to deduce the intent of the original authors from the code and try to
>> fill it with life as best as possible.
>>
>>
>>
>>                 forward declare functions from the "os" namespace (e.g.
>>                 os::malloc) etc. I
>>                 also cannot split implementations from "os" functions to
>>         different
>>                 implementation files without problems.
>>
>>                 It seems to me all compiler nowadays support namespaces,
>>         would
>>                 it not make sense to convert "os" to a real namespace?
>>
>>
>>             Not being a C++ aficionado I'm not sure exactly what that
>> would
>>             entail - as far as I know we don't use C++ namespaces
>>         anywhere in
>>             hotspot.
>>
>>
>>         We start using all kinds of modern C++ features. Templates pop
>>         up all
>>         over the place. Namespaces in contrast are an old and easily
>>         understood
>>         feature. We already use it inside our own port.
>>
>>
>>                 While we are at it, what is the reason for the "<os>" sub
>>                 classes? e.g.
>>                 os::Bsd, os::Aix etc? It makes integrating patches
>>         between platforms
>>                 difficult and, to me, does not seem to serve any clear
>>         purpose.
>>
>>
>>             Must admit this arrangement has also had me confused at
>> times. I
>>             think it is way to add a per-OS helper class for the main os
>> API
>>             implementation.
>>
>>                 If the purpose is to be a very low wrapper around OS
>>                 particularities, it
>>                 makes no sense to have them in the "os" namespace and to
>>         make
>>                 them visible
>>                 to the shared sections of the VM. E.g. there should be
>>         no reason
>>                 to access
>>                 "os::Bsd" functions from outside os/bsd/vm, or to access
>>         "os::Posix"
>>                 functions outside implementations specific for Posix
>>         platforms.
>>
>>
>>             Not sure how you make, for example os::BSD accessible from all
>>             classes in os/bsd/vm yet not be visible anywhere else ??
>>
>>
>>         I think there is no real reason for os::Bsd to exist at all.
>>         Either we
>>         have shared functions with platform dependent implementation,
>>         then they
>>         should be declared in "os". Or they are completely platform
>>         specific,
>>         then they can be moved to a platform specific header outside of
>> "os"
>>         like we did with porting_aix.hpp.
>>
>>
>>     Sure you could do that. But 20 years ago that wasn't how things were
>>     designed and we have what we have today. As I said a lot of baggage.
>>
>>     Personally I find the nesting of the concrete os API quite natural:
>>     os::win32 to me is better than unrelated os and win32 classes or
>>     namespaces.
>>
>>
>> I think there is code structure, and then there is exposure. Both are
>> separate things but currently interwoven in the hotspot.
>>
>> Putting Win32 specifics into os::win32 makes sense. But there is no
>> reason to expose os::win32 to shared parts of the VM, considering how
>> much the hotspot programmers at Oracle dislike platform specific #ifdefs
>> in shared code.
>>
>> I understand that this is historical? Before namespaces, you had to
>> define a class as common bracket, and a class definition must be
>> complete to be valid, so os::<OS> is automatically visible everywhere.
>> But with namespaces this could be disentangled. We could have a globally
>> visible "os" namespace with platform independent shared functions, as
>> well as an "os::win32" namespace which is only visible in windows
>> specific implementation files.
>>
>> Kind Regards, Thomas
>>
>>
>>     Cheers,
>>     David
>>     -----
>>
>>
>>          Plus it also needs to potentially be visible from
>>         os_cpu/bsd_XXX/vm.
>>
>>
>>             There is a lot of cleanup in this area slated for the future -
>>             hopefully Java 10. POSIX refactoring etc.
>>
>>
>>         Sure!
>>
>>         Kind Regards, Thomas
>>
>>
>>             Cheers,
>>             David
>>
>>
>>                 Thanks, and Kind Regards, Thomas
>>
>>
>>
>>

From thomas.stuefe at gmail.com  Thu Oct 20 08:00:04 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 20 Oct 2016 10:00:04 +0200
Subject: Question about WatcherThreadCrashProtection
Message-ID: <CAA-vtUz1iu1Tb5djK5X3dQf4Emz9oJpk3bAFEGmcjvZNQ4WpEQ@mail.gmail.com>

Hi all,

a small question.`

WatcherThreadCrashProtection is a small stack object wrapping
setjmp/longjmp. But I cannot find any place where
WatcherThreadCrashProtection is actually used. Am I overlooking something
or is this dead code?

Thank you,

Thomas

From staffan.larsen at oracle.com  Thu Oct 20 08:04:23 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 20 Oct 2016 10:04:23 +0200
Subject: Question about WatcherThreadCrashProtection
In-Reply-To: <CAA-vtUz1iu1Tb5djK5X3dQf4Emz9oJpk3bAFEGmcjvZNQ4WpEQ@mail.gmail.com>
References: <CAA-vtUz1iu1Tb5djK5X3dQf4Emz9oJpk3bAFEGmcjvZNQ4WpEQ@mail.gmail.com>
Message-ID: <2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com>

It is used in some closed code (JFR) that you aren?t seeing. ;)

/Staffan

> On 20 Oct 2016, at 10:00, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> 
> Hi all,
> 
> a small question.`
> 
> WatcherThreadCrashProtection is a small stack object wrapping
> setjmp/longjmp. But I cannot find any place where
> WatcherThreadCrashProtection is actually used. Am I overlooking something
> or is this dead code?
> 
> Thank you,
> 
> Thomas


From thomas.stuefe at gmail.com  Thu Oct 20 08:07:41 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 20 Oct 2016 10:07:41 +0200
Subject: Question about WatcherThreadCrashProtection
In-Reply-To: <2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com>
References: <CAA-vtUz1iu1Tb5djK5X3dQf4Emz9oJpk3bAFEGmcjvZNQ4WpEQ@mail.gmail.com>
	<2C8F7BF2-D608-4C5D-87ED-C8B7C6780651@oracle.com>
Message-ID: <CAA-vtUwNRTeeG96LOjZwhhZYmaCK=vobDj-z0Z1bYexGxUvCvA@mail.gmail.com>

:) Ok, thank you!

On Thu, Oct 20, 2016 at 10:04 AM, Staffan Larsen <staffan.larsen at oracle.com>
wrote:

> It is used in some closed code (JFR) that you aren?t seeing. ;)
>
> /Staffan
>
> > On 20 Oct 2016, at 10:00, Thomas St?fe <thomas.stuefe at gmail.com> wrote:
> >
> > Hi all,
> >
> > a small question.`
> >
> > WatcherThreadCrashProtection is a small stack object wrapping
> > setjmp/longjmp. But I cannot find any place where
> > WatcherThreadCrashProtection is actually used. Am I overlooking something
> > or is this dead code?
> >
> > Thank you,
> >
> > Thomas
>
>

From rickard.backman at oracle.com  Thu Oct 20 08:27:03 2016
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Thu, 20 Oct 2016 10:27:03 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
Message-ID: <20161020082703.GA29006@rbackman>

Hi Thomas,

I tried something like that a couple of years ago and still think it is
a good idea.

Link to the discussion and patches:

http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html

/R

On 10/19, Thomas St?fe wrote:
> Hi all,
> 
> a small question.
> 
> I sometimes stumble over the fact that "os" is a class, not a namespace.
> And that we include the platform dependent additions into the middle of
> this class.
> 
> This has a number of repercussions, like not being able to include the
> platform dependent files (os_<os>_<cpu>) directly, not being able to
> forward declare functions from the "os" namespace (e.g. os::malloc) etc. I
> also cannot split implementations from "os" functions to different
> implementation files without problems.
> 
> It seems to me all compiler nowadays support namespaces, would it not make
> sense to convert "os" to a real namespace?
> 
> While we are at it, what is the reason for the "<os>" sub classes? e.g.
> os::Bsd, os::Aix etc? It makes integrating patches between platforms
> difficult and, to me, does not seem to serve any clear purpose.
> 
> If the purpose is to be a very low wrapper around OS particularities, it
> makes no sense to have them in the "os" namespace and to make them visible
> to the shared sections of the VM. E.g. there should be no reason to access
> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
> functions outside implementations specific for Posix platforms.
> 
> Thanks, and Kind Regards, Thomas

From martin.doerr at sap.com  Thu Oct 20 08:58:24 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 20 Oct 2016 08:58:24 +0000
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
	8154580 and 8154867
In-Reply-To: <7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com>
References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
	<7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com>
Message-ID: <f43fe5c7b0f442719f27310e9e7cf710@DEWDFE13DE14.global.corp.sap>

Hi Coleen,

thank you very much for reviewing my PPC change.

We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important.
A large amount of less optimized changes will make it slower over time.
That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well.
Maybe we should improve them in 10.

Best regards,
Martin


-----Original Message-----
From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore
Sent: Dienstag, 18. Oktober 2016 23:56
To: hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867


This seems good.   I think it's a shame to change load_mirror() to 
load_mirror_from_const_method() though because there's load_mirror() 
with the same parameters on all the other platforms and it makes 
platform development a little easier.   But that's up to you to because 
you can generate shorter sequences.

Coleen


On 10/17/16 12:38 PM, Doerr, Martin wrote:
> Hi,
>
> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9:
>
> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed).
>
> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev.
>
> Webrev is here:
> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/
>
> Please review.
>
> Thanks and best regards,
> Martin
>


From david.holmes at oracle.com  Thu Oct 20 11:37:03 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 20 Oct 2016 21:37:03 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <20161020082703.GA29006@rbackman>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
Message-ID: <2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com>

On 20/10/2016 6:27 PM, Rickard B?ckman wrote:
> Hi Thomas,
>
> I tried something like that a couple of years ago and still think it is
> a good idea.
>
> Link to the discussion and patches:
>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html

Yeah but noone else seemed to like your os::pd approach :)

Cheers,
David

> /R
>
> On 10/19, Thomas St?fe wrote:
>> Hi all,
>>
>> a small question.
>>
>> I sometimes stumble over the fact that "os" is a class, not a namespace.
>> And that we include the platform dependent additions into the middle of
>> this class.
>>
>> This has a number of repercussions, like not being able to include the
>> platform dependent files (os_<os>_<cpu>) directly, not being able to
>> forward declare functions from the "os" namespace (e.g. os::malloc) etc. I
>> also cannot split implementations from "os" functions to different
>> implementation files without problems.
>>
>> It seems to me all compiler nowadays support namespaces, would it not make
>> sense to convert "os" to a real namespace?
>>
>> While we are at it, what is the reason for the "<os>" sub classes? e.g.
>> os::Bsd, os::Aix etc? It makes integrating patches between platforms
>> difficult and, to me, does not seem to serve any clear purpose.
>>
>> If the purpose is to be a very low wrapper around OS particularities, it
>> makes no sense to have them in the "os" namespace and to make them visible
>> to the shared sections of the VM. E.g. there should be no reason to access
>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
>> functions outside implementations specific for Posix platforms.
>>
>> Thanks, and Kind Regards, Thomas

From david.holmes at oracle.com  Thu Oct 20 13:23:31 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 20 Oct 2016 23:23:31 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
	<2e672d38-14e9-f5d3-9a26-0e4839ae98a4@oracle.com>
Message-ID: <41dfc585-c135-ecfe-5ced-81de653a166c@oracle.com>

On 20/10/2016 9:37 PM, David Holmes wrote:
> On 20/10/2016 6:27 PM, Rickard B?ckman wrote:
>> Hi Thomas,
>>
>> I tried something like that a couple of years ago and still think it is
>> a good idea.
>>
>> Link to the discussion and patches:
>>
>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html
>
> Yeah but noone else seemed to like your os::pd approach :)

Sorry that was a bit too tongue in cheek.

David

> Cheers,
> David
>
>> /R
>>
>> On 10/19, Thomas St?fe wrote:
>>> Hi all,
>>>
>>> a small question.
>>>
>>> I sometimes stumble over the fact that "os" is a class, not a namespace.
>>> And that we include the platform dependent additions into the middle of
>>> this class.
>>>
>>> This has a number of repercussions, like not being able to include the
>>> platform dependent files (os_<os>_<cpu>) directly, not being able to
>>> forward declare functions from the "os" namespace (e.g. os::malloc)
>>> etc. I
>>> also cannot split implementations from "os" functions to different
>>> implementation files without problems.
>>>
>>> It seems to me all compiler nowadays support namespaces, would it not
>>> make
>>> sense to convert "os" to a real namespace?
>>>
>>> While we are at it, what is the reason for the "<os>" sub classes? e.g.
>>> os::Bsd, os::Aix etc? It makes integrating patches between platforms
>>> difficult and, to me, does not seem to serve any clear purpose.
>>>
>>> If the purpose is to be a very low wrapper around OS particularities, it
>>> makes no sense to have them in the "os" namespace and to make them
>>> visible
>>> to the shared sections of the VM. E.g. there should be no reason to
>>> access
>>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
>>> functions outside implementations specific for Posix platforms.
>>>
>>> Thanks, and Kind Regards, Thomas

From thomas.stuefe at gmail.com  Thu Oct 20 13:36:27 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 20 Oct 2016 15:36:27 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <20161020082703.GA29006@rbackman>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
Message-ID: <CAA-vtUx76=7TqLU1yUqx=9Y-d+xKVbdYApuQdu5Pf+h0f3qDXw@mail.gmail.com>

Hi Rickard,

I definitely like some of the aspects of that patch. But like others I'm
not a big fan of renaming the files - I like the current naming scheme
<os>_<cpu> just fine, I am used to it and it helps me in many places. I
work both in IDEs (CDT) and on the command line with vi and grep, and
having the platform in the file name makes it a easier to work with
multiple platforms. I am also quite sure that having different versions of
a file with the same name in some locations would bite us at some places.

Kind Regards, Thomas


On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman <
rickard.backman at oracle.com> wrote:

> Hi Thomas,
>
> I tried something like that a couple of years ago and still think it is
> a good idea.
>
> Link to the discussion and patches:
>
> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html
>
> /R
>
> On 10/19, Thomas St?fe wrote:
> > Hi all,
> >
> > a small question.
> >
> > I sometimes stumble over the fact that "os" is a class, not a namespace.
> > And that we include the platform dependent additions into the middle of
> > this class.
> >
> > This has a number of repercussions, like not being able to include the
> > platform dependent files (os_<os>_<cpu>) directly, not being able to
> > forward declare functions from the "os" namespace (e.g. os::malloc) etc.
> I
> > also cannot split implementations from "os" functions to different
> > implementation files without problems.
> >
> > It seems to me all compiler nowadays support namespaces, would it not
> make
> > sense to convert "os" to a real namespace?
> >
> > While we are at it, what is the reason for the "<os>" sub classes? e.g.
> > os::Bsd, os::Aix etc? It makes integrating patches between platforms
> > difficult and, to me, does not seem to serve any clear purpose.
> >
> > If the purpose is to be a very low wrapper around OS particularities, it
> > makes no sense to have them in the "os" namespace and to make them
> visible
> > to the shared sections of the VM. E.g. there should be no reason to
> access
> > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
> > functions outside implementations specific for Posix platforms.
> >
> > Thanks, and Kind Regards, Thomas
>

From daniel.daugherty at oracle.com  Thu Oct 20 14:22:29 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Thu, 20 Oct 2016 08:22:29 -0600
Subject: RFR: JDK-8157141 & JDK-8166454: Solaris getisax(2) and meminfo(2)
	cleanup
In-Reply-To: <db7600a8-28ea-022d-a572-fe9d26062198@oracle.com>
References: <2eb28814-45d1-2fd9-7042-d21483588ba7@oracle.com>
	<66d31946-9d43-de44-ec9c-27de6e41f711@oracle.com>
	<c9b11bc5-1018-f0b9-a56c-3d04f2d6d8f1@oracle.com>
	<5F6D7173-A858-4DCB-BAD0-33CB6A4B9238@oracle.com>
	<7fa8ac4a-95d1-c8f0-03be-6693970a0e44@oracle.com>
	<3BB57F37-5502-4437-9AB1-9B3A53CD185E@oracle.com>
	<e760533d-a901-d2f6-5c0f-f0d0c09b681f@oracle.com>
	<CA910A07-45B4-4F14-BA69-59E0775EE37D@oracle.com>
	<a9908963-8d3a-5e63-d614-b87046ed6008@oracle.com>
	<340ae279-7833-a5d3-8653-1016fad830c6@oracle.com>
	<9d250f63-9626-97c1-a401-e433b88891e5@oracle.com>
	<E73A3D9C-D0F7-4792-8195-9704588B20F8@oracle.com>
	<ecbb5968-dabb-548c-4a9e-3d1c37ebe030@oracle.com>
	<8982da51-28b3-2f19-7957-2e96d0646fea@oracle.com>
	<2220239d-eab5-ebbc-86e9-fe313fa62aea@oracle.com>
	<5fe250a9-c611-37e8-89d2-0142367a27c1@oracle.com>
	<9619324d-5360-b3f0-684a-e7f1069656db@oracle.com>
	<f469e56a-6db0-a019-3022-c9861a31f9ce@oracle.com>
	<db7600a8-28ea-022d-a572-fe9d26062198@oracle.com>
Message-ID: <9f2ac85c-8373-4f7f-35f7-1872bccf2cc0@oracle.com>

On 10/19/16 8:56 PM, Vladimir Kozlov wrote:
> On 10/19/16 5:17 PM, David Holmes wrote:
>> On 20/10/2016 3:43 AM, Vladimir Kozlov wrote:
>>> I missed all this review fun :)
>>>
>>> Thank you, Alan, for cleaning this up.
>>>
>>> The only concern I have is removal of conditional macros.
>>>
>>>> I've also taken the opportunity to strip out most of the 
>>>> '#ifndef(FOO)'
>>>> probes for the HW capability bit macros in 
>>>> vm_version_solaris_sparc.cpp.
>>>> They are now redundant as the macros are are in the system header 
>>>> files
>>>> from Solaris 11.1 onwards. The only ones that aren't are T7/M7 related
>>>> ones (from Solaris 11.3 onwards), namely AV_SPARC_FMAF and
>>>> AV2_SPARC_SPARC5. For those I've left the macro probes in place.
>>>
>>> Most likely people will try to run JDK 9 on Solaris 10. Or in some kind
>>> of VM environment which may not have Solaris 11.1 headers. We have a 
>>> lot
>>> such cases before that is why those macros were added.
>>
>> run or build? running should not be a problem. Building on S10 without a
>> devkit has not worked for a while AFAIK.
>
> Ooh yes, you are right - it was build problem. Those macros were for 
> time when we did not use devkit yet.

Just a clarification. You cannot build on S10 anymore either with or
without a devkit. This is why I had to migrate my big Solaris X64
server from Solaris 10u11 -> Solaris 11.2 SRU5.5.

Dan


>
> Everything is good then.
>
> Thanks,
> Vladimir
>
>>
>> David
>>
>>> "JDK 9 Platform Support" list only Solaris 11.x and 12.x. May be it is
>>> fine but original code would cover more running cases.
>>>
>>> Sorry for rumbling.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 10/18/16 7:01 PM, David Holmes wrote:
>>>> Pushed.
>>>>
>>>> David
>>>>
>>>> On 11/10/2016 11:12 AM, David Holmes wrote:
>>>>> Ok. I will sponsor this once hs is open again.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 6/10/2016 10:10 PM, Alan Burlison wrote:
>>>>>> On 04/10/2016 19:37, Alan Burlison wrote:
>>>>>>
>>>>>>>> It?s in globalDefinitions.hpp, on the off chance that?s somehow 
>>>>>>>> not
>>>>>>>> already being included.
>>>>>>>
>>>>>>> Cool, I'll pop that in instead - thanks!
>>>>>>
>>>>>> Done, webrev updated, jprt hotspot testset is clean.
>>>>>>


From chris.plummer at oracle.com  Thu Oct 20 20:28:09 2016
From: chris.plummer at oracle.com (Chris Plummer)
Date: Thu, 20 Oct 2016 13:28:09 -0700
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame name
	with <no Java callstack recorded> starting with Java 9 b133
Message-ID: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>

Hello,

Please review the following:

http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/
https://bugs.openjdk.java.net/browse/JDK-8166679

The fix is to partially undo the changes for JDK-8159284. There are two 
places where the fix for JDK-8159284 added an extra check of the 
validity of the entry frame, but really only the first one is 
appropriate since for the second one we are not in an entry frame. More 
details can be found near the end of the bug comments.

Note I did a straight patch of the old version of the code. It could 
probably use some formatting and comment cleanup. I decided not to clean 
it up to make it easy to compare the current code with the original. 
I'll clean it up if you feel it would be best to.

Tested by running KitchenSink more times than I can count, since that's 
where JDK-8159284 turned up. However, that's not proving much since I 
could not reproduce JDK-8159284 even without its fix in place (it also 
couldn't be reproduced at the time JDK-8159284 was was being 
investigated and fixed). For this reason I can't be 100% sure that 
JDK-8159284 is not being re-introduced with my changes.

Also tested by running a very large set of tests trough RBT, close to 
what we do for PIT testing, minus product builds and a few tests that 
take a long time to run.

Lastly, I also tested with the test case in the CR to make sure it now 
passes. Unforgettably it's not possible to add the test case as a jtreg 
test since it requires the installation of the Oracle Studio tools.

thanks,

Chris

From david.holmes at oracle.com  Fri Oct 21 03:09:17 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 21 Oct 2016 13:09:17 +1000
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
Message-ID: <8e77681f-023a-5725-6361-c357edcdd19b@oracle.com>

Hi Chris,

On 21/10/2016 6:28 AM, Chris Plummer wrote:
> Hello,
>
> Please review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/
> https://bugs.openjdk.java.net/browse/JDK-8166679
>
> The fix is to partially undo the changes for JDK-8159284. There are two
> places where the fix for JDK-8159284 added an extra check of the
> validity of the entry frame, but really only the first one is
> appropriate since for the second one we are not in an entry frame. More
> details can be found near the end of the bug comments.

This all seems reasonable. Addressing the regression is important. If 
this exposes a continuing issue with the reverted code then we can look 
at this further. The lack of reproducability makes this a difficult area 
to work in.

Thanks,
David

> Note I did a straight patch of the old version of the code. It could
> probably use some formatting and comment cleanup. I decided not to clean
> it up to make it easy to compare the current code with the original.
> I'll clean it up if you feel it would be best to.
>
> Tested by running KitchenSink more times than I can count, since that's
> where JDK-8159284 turned up. However, that's not proving much since I
> could not reproduce JDK-8159284 even without its fix in place (it also
> couldn't be reproduced at the time JDK-8159284 was was being
> investigated and fixed). For this reason I can't be 100% sure that
> JDK-8159284 is not being re-introduced with my changes.
>
> Also tested by running a very large set of tests trough RBT, close to
> what we do for PIT testing, minus product builds and a few tests that
> take a long time to run.
>
> Lastly, I also tested with the test case in the CR to make sure it now
> passes. Unforgettably it's not possible to add the test case as a jtreg
> test since it requires the installation of the Oracle Studio tools.
>
> thanks,
>
> Chris

From rickard.backman at oracle.com  Fri Oct 21 06:02:28 2016
From: rickard.backman at oracle.com (Rickard =?iso-8859-1?Q?B=E4ckman?=)
Date: Fri, 21 Oct 2016 08:02:28 +0200
Subject: "os" - make this a real namespace?
In-Reply-To: <CAA-vtUx76=7TqLU1yUqx=9Y-d+xKVbdYApuQdu5Pf+h0f3qDXw@mail.gmail.com>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
	<CAA-vtUx76=7TqLU1yUqx=9Y-d+xKVbdYApuQdu5Pf+h0f3qDXw@mail.gmail.com>
Message-ID: <20161021060228.GB29006@rbackman>

Yes the naming was just one try. There were multiple other ways of doing
it. Other possibilities were keeping it as is, have one file named
os_thread.hpp per platform that includes the os_thread_x86.hpp and just
have the #include "os_thread.hpp" in files that need it... Macros
*shudder*.

/R

On 10/20, Thomas St?fe wrote:
> Hi Rickard,
> 
> I definitely like some of the aspects of that patch. But like others I'm
> not a big fan of renaming the files - I like the current naming scheme
> <os>_<cpu> just fine, I am used to it and it helps me in many places. I
> work both in IDEs (CDT) and on the command line with vi and grep, and
> having the platform in the file name makes it a easier to work with
> multiple platforms. I am also quite sure that having different versions of
> a file with the same name in some locations would bite us at some places.
> 
> Kind Regards, Thomas
> 
> 
> On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman <
> rickard.backman at oracle.com> wrote:
> 
> > Hi Thomas,
> >
> > I tried something like that a couple of years ago and still think it is
> > a good idea.
> >
> > Link to the discussion and patches:
> >
> > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html
> >
> > /R
> >
> > On 10/19, Thomas St?fe wrote:
> > > Hi all,
> > >
> > > a small question.
> > >
> > > I sometimes stumble over the fact that "os" is a class, not a namespace.
> > > And that we include the platform dependent additions into the middle of
> > > this class.
> > >
> > > This has a number of repercussions, like not being able to include the
> > > platform dependent files (os_<os>_<cpu>) directly, not being able to
> > > forward declare functions from the "os" namespace (e.g. os::malloc) etc.
> > I
> > > also cannot split implementations from "os" functions to different
> > > implementation files without problems.
> > >
> > > It seems to me all compiler nowadays support namespaces, would it not
> > make
> > > sense to convert "os" to a real namespace?
> > >
> > > While we are at it, what is the reason for the "<os>" sub classes? e.g.
> > > os::Bsd, os::Aix etc? It makes integrating patches between platforms
> > > difficult and, to me, does not seem to serve any clear purpose.
> > >
> > > If the purpose is to be a very low wrapper around OS particularities, it
> > > makes no sense to have them in the "os" namespace and to make them
> > visible
> > > to the shared sections of the VM. E.g. there should be no reason to
> > access
> > > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
> > > functions outside implementations specific for Posix platforms.
> > >
> > > Thanks, and Kind Regards, Thomas
> >

From david.holmes at oracle.com  Fri Oct 21 06:12:29 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 21 Oct 2016 16:12:29 +1000
Subject: "os" - make this a real namespace?
In-Reply-To: <20161021060228.GB29006@rbackman>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
	<CAA-vtUx76=7TqLU1yUqx=9Y-d+xKVbdYApuQdu5Pf+h0f3qDXw@mail.gmail.com>
	<20161021060228.GB29006@rbackman>
Message-ID: <e2eaf8f1-f8ab-2696-8667-148f31364c3a@oracle.com>

On 21/10/2016 4:02 PM, Rickard B?ckman wrote:
> Yes the naming was just one try. There were multiple other ways of doing
> it. Other possibilities were keeping it as is, have one file named
> os_thread.hpp per platform that includes the os_thread_x86.hpp and just
> have the #include "os_thread.hpp" in files that need it... Macros
> *shudder*.

Note that we have already abstracted platform specific includes into 
macros. eg:

#include OS_CPU_HEADER(os)
#include OS_HEADER(os)

in os.hpp.

David

> /R
>
> On 10/20, Thomas St?fe wrote:
>> Hi Rickard,
>>
>> I definitely like some of the aspects of that patch. But like others I'm
>> not a big fan of renaming the files - I like the current naming scheme
>> <os>_<cpu> just fine, I am used to it and it helps me in many places. I
>> work both in IDEs (CDT) and on the command line with vi and grep, and
>> having the platform in the file name makes it a easier to work with
>> multiple platforms. I am also quite sure that having different versions of
>> a file with the same name in some locations would bite us at some places.
>>
>> Kind Regards, Thomas
>>
>>
>> On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman <
>> rickard.backman at oracle.com> wrote:
>>
>>> Hi Thomas,
>>>
>>> I tried something like that a couple of years ago and still think it is
>>> a good idea.
>>>
>>> Link to the discussion and patches:
>>>
>>> http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-March/008884.html
>>>
>>> /R
>>>
>>> On 10/19, Thomas St?fe wrote:
>>>> Hi all,
>>>>
>>>> a small question.
>>>>
>>>> I sometimes stumble over the fact that "os" is a class, not a namespace.
>>>> And that we include the platform dependent additions into the middle of
>>>> this class.
>>>>
>>>> This has a number of repercussions, like not being able to include the
>>>> platform dependent files (os_<os>_<cpu>) directly, not being able to
>>>> forward declare functions from the "os" namespace (e.g. os::malloc) etc.
>>> I
>>>> also cannot split implementations from "os" functions to different
>>>> implementation files without problems.
>>>>
>>>> It seems to me all compiler nowadays support namespaces, would it not
>>> make
>>>> sense to convert "os" to a real namespace?
>>>>
>>>> While we are at it, what is the reason for the "<os>" sub classes? e.g.
>>>> os::Bsd, os::Aix etc? It makes integrating patches between platforms
>>>> difficult and, to me, does not seem to serve any clear purpose.
>>>>
>>>> If the purpose is to be a very low wrapper around OS particularities, it
>>>> makes no sense to have them in the "os" namespace and to make them
>>> visible
>>>> to the shared sections of the VM. E.g. there should be no reason to
>>> access
>>>> "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
>>>> functions outside implementations specific for Posix platforms.
>>>>
>>>> Thanks, and Kind Regards, Thomas
>>>

From goetz.lindenmaier at sap.com  Fri Oct 21 06:29:13 2016
From: goetz.lindenmaier at sap.com (Lindenmaier, Goetz)
Date: Fri, 21 Oct 2016 06:29:13 +0000
Subject: "os" - make this a real namespace?
In-Reply-To: <20161021060228.GB29006@rbackman>
References: <CAA-vtUx4wB0KtAjDJg0ZMjk6CPjw69wvxuhhTTxMSbSXA2vb+Q@mail.gmail.com>
	<20161020082703.GA29006@rbackman>
	<CAA-vtUx76=7TqLU1yUqx=9Y-d+xKVbdYApuQdu5Pf+h0f3qDXw@mail.gmail.com>
	<20161021060228.GB29006@rbackman>
Message-ID: <bd5d39d3925343a1829b68bde33310da@DEWDFE13DE50.global.corp.sap>

Hi,

remember that the ugly include cascades are gone, as there is now macro
  #include OS_CPU_HEADER(thread)
including files like thread_linux_x86.hpp.

Therefore I think the need to rename files is no more that
Important.

Best regards,
  Goetz.


> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-
> bounces at openjdk.java.net] On Behalf Of Rickard B?ckman
> Sent: Freitag, 21. Oktober 2016 08:02
> To: Thomas St?fe <thomas.stuefe at gmail.com>
> Cc: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: "os" - make this a real namespace?
> 
> Yes the naming was just one try. There were multiple other ways of doing
> it. Other possibilities were keeping it as is, have one file named
> os_thread.hpp per platform that includes the os_thread_x86.hpp and just
> have the #include "os_thread.hpp" in files that need it... Macros
> *shudder*.
> 
> /R
> 
> On 10/20, Thomas St?fe wrote:
> > Hi Rickard,
> >
> > I definitely like some of the aspects of that patch. But like others I'm
> > not a big fan of renaming the files - I like the current naming scheme
> > <os>_<cpu> just fine, I am used to it and it helps me in many places. I
> > work both in IDEs (CDT) and on the command line with vi and grep, and
> > having the platform in the file name makes it a easier to work with
> > multiple platforms. I am also quite sure that having different versions of
> > a file with the same name in some locations would bite us at some places.
> >
> > Kind Regards, Thomas
> >
> >
> > On Thu, Oct 20, 2016 at 10:27 AM, Rickard B?ckman <
> > rickard.backman at oracle.com> wrote:
> >
> > > Hi Thomas,
> > >
> > > I tried something like that a couple of years ago and still think it is
> > > a good idea.
> > >
> > > Link to the discussion and patches:
> > >
> > > http://mail.openjdk.java.net/pipermail/hotspot-dev/2013-
> March/008884.html
> > >
> > > /R
> > >
> > > On 10/19, Thomas St?fe wrote:
> > > > Hi all,
> > > >
> > > > a small question.
> > > >
> > > > I sometimes stumble over the fact that "os" is a class, not a namespace.
> > > > And that we include the platform dependent additions into the middle
> of
> > > > this class.
> > > >
> > > > This has a number of repercussions, like not being able to include the
> > > > platform dependent files (os_<os>_<cpu>) directly, not being able to
> > > > forward declare functions from the "os" namespace (e.g. os::malloc)
> etc.
> > > I
> > > > also cannot split implementations from "os" functions to different
> > > > implementation files without problems.
> > > >
> > > > It seems to me all compiler nowadays support namespaces, would it
> not
> > > make
> > > > sense to convert "os" to a real namespace?
> > > >
> > > > While we are at it, what is the reason for the "<os>" sub classes? e.g.
> > > > os::Bsd, os::Aix etc? It makes integrating patches between platforms
> > > > difficult and, to me, does not seem to serve any clear purpose.
> > > >
> > > > If the purpose is to be a very low wrapper around OS particularities, it
> > > > makes no sense to have them in the "os" namespace and to make
> them
> > > visible
> > > > to the shared sections of the VM. E.g. there should be no reason to
> > > access
> > > > "os::Bsd" functions from outside os/bsd/vm, or to access "os::Posix"
> > > > functions outside implementations specific for Posix platforms.
> > > >
> > > > Thanks, and Kind Regards, Thomas
> > >

From dmitry.samersoff at oracle.com  Fri Oct 21 08:42:21 2016
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Fri, 21 Oct 2016 11:42:21 +0300
Subject: RFR(S): JDK-8165496 assert(_exception_caught == false) failed:
	_exception_caught is out of phase
Message-ID: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com>

Everybody,

Please review a small modification of the fix for JDK-8134434:

http://cr.openjdk.java.net/~dsamersoff/JDK-8165496/webrev.04/

Its' possible that we come to rethrow_C when _exception_caught is
already cleared. We need not to set exception_detected in this
case.

-Dmitry

-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From daniel.daugherty at oracle.com  Fri Oct 21 14:59:47 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 21 Oct 2016 08:59:47 -0600
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
Message-ID: <def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>

On 10/20/16 2:28 PM, Chris Plummer wrote:
> Hello,
>
> Please review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/

src/cpu/aarch64/vm/frame_aarch64.cpp
     So we're in a "if (StubRoutines::returns_to_call_stub()" block
     and the assumption was that a frame that returns to a call stub
     must be an entry frame. Hence the use of is_entry_frame_valid().
     However, your investigation revealed that you can be in an
     interpreter frame that returns to a call stub here. That sounds
     both familiar and right :-)

     L209:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > 
(address)sender.fp());
         nit: please remove extra blank here: "( jcw"

     I like the new JavaCallWrapper sanity check. I never thought of
     that when I worked on AsyncGetCallTrace().

src/cpu/sparc/vm/frame_sparc.cpp
     old L281:     if (sender.is_entry_frame()) {
     old L282:       return sender.is_entry_frame_valid(thread);
     old L283:     }
         I don't understand this one. Why isn't is_entry_frame_valid()
         correct here? You are in a "if (sender.is_entry_frame())" block.

     I can see wanting to add the JavaCallWrapper sanity check as
     an additional check. If you do that:

     L286       bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw > 
sender_fp);
         nit: please remove extra blank here: "( jcw"

src/cpu/x86/vm/frame_x86.cpp
     Again we're in a if (StubRoutines::returns_to_call_stub()" block
     so I see why is_entry_frame_valid() is not the right call.

     L208:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw > 
(address)sender.fp());
         nit: please remove extra blank here: "( jcw"


OK so I understand the AARCH64 and X86 changes. I don't quite
understand the SPARC change... but I can be convinced otherwise.

If you fix the nits, I don't need to see a new webrev.

Dan


> https://bugs.openjdk.java.net/browse/JDK-8166679
>
> The fix is to partially undo the changes for JDK-8159284. There are 
> two places where the fix for JDK-8159284 added an extra check of the 
> validity of the entry frame, but really only the first one is 
> appropriate since for the second one we are not in an entry frame. 
> More details can be found near the end of the bug comments.
>
> Note I did a straight patch of the old version of the code. It could 
> probably use some formatting and comment cleanup. I decided not to 
> clean it up to make it easy to compare the current code with the 
> original. I'll clean it up if you feel it would be best to.
>
> Tested by running KitchenSink more times than I can count, since 
> that's where JDK-8159284 turned up. However, that's not proving much 
> since I could not reproduce JDK-8159284 even without its fix in place 
> (it also couldn't be reproduced at the time JDK-8159284 was was being 
> investigated and fixed). For this reason I can't be 100% sure that 
> JDK-8159284 is not being re-introduced with my changes.
>
> Also tested by running a very large set of tests trough RBT, close to 
> what we do for PIT testing, minus product builds and a few tests that 
> take a long time to run.
>
> Lastly, I also tested with the test case in the CR to make sure it now 
> passes. Unforgettably it's not possible to add the test case as a 
> jtreg test since it requires the installation of the Oracle Studio tools.
>
> thanks,
>
> Chris


From chris.plummer at oracle.com  Fri Oct 21 19:13:09 2016
From: chris.plummer at oracle.com (Chris Plummer)
Date: Fri, 21 Oct 2016 12:13:09 -0700
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
	<def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>
Message-ID: <e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>

Hi Dan,

Thanks for the review. Comments inline below:

On 10/21/16 7:59 AM, Daniel D. Daugherty wrote:
> On 10/20/16 2:28 PM, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/
>
> src/cpu/aarch64/vm/frame_aarch64.cpp
>     So we're in a "if (StubRoutines::returns_to_call_stub()" block
>     and the assumption was that a frame that returns to a call stub
>     must be an entry frame. Hence the use of is_entry_frame_valid().
>     However, your investigation revealed that you can be in an
>     interpreter frame that returns to a call stub here. That sounds
>     both familiar and right :-)
>
>     L209:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
> > (address)sender.fp());
>         nit: please remove extra blank here: "( jcw"
Ok.
>
>     I like the new JavaCallWrapper sanity check. I never thought of
>     that when I worked on AsyncGetCallTrace().
>
> src/cpu/sparc/vm/frame_sparc.cpp
>     old L281:     if (sender.is_entry_frame()) {
>     old L282:       return sender.is_entry_frame_valid(thread);
>     old L283:     }
>         I don't understand this one. Why isn't is_entry_frame_valid()
>         correct here? You are in a "if (sender.is_entry_frame())" block.
I starred at this one a bit too, since the code is not quite the same as 
x86 and aarch64. I'm not 100% sure I got it right, so I opted to just 
change it to what used to be there, especially since 8159284 never 
turned up on sparc. I did try to go down the path of making sure that 
8166679 (this CR I'm fixing) does occur on Solaris-sparc, but getting 
Dev Studio installed on a Solaris-sparc machine was proving difficult. 
Maybe I should take another stab at that.

As for the similarities and differences between the sparc code an x86, 
for x86 before my changes we had:

       if (StubRoutines::returns_to_call_stub(sender_pc)) {
         ...
         frame sender(sender_sp, sender_unextended_sp, saved_fp, sender_pc);
         return sender.is_entry_frame_valid(thread);
       }

And for sparc:

       frame sender(_SENDER_SP, younger_sp, adjusted_stack);
       if (sender.is_entry_frame()) {
         return sender.is_entry_frame_valid(thread);
       }

So for x86 we are only adding the sender.is_entry_frame_valid() check if 
the "current frame" returns to a stub, but for sparc we are doing the 
check if the "sender frame" is an entry frame. I don't know the reason 
for this difference. Aren't stubs entry frames? If yes, it seem that 
having the check done in this way would cause this CR on sparc just like 
it does on sparc.
>
>     I can see wanting to add the JavaCallWrapper sanity check as
>     an additional check. If you do that:
>
>     L286       bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw 
> > sender_fp);
>         nit: please remove extra blank here: "( jcw"
Ok.
>
> src/cpu/x86/vm/frame_x86.cpp
>     Again we're in a if (StubRoutines::returns_to_call_stub()" block
>     so I see why is_entry_frame_valid() is not the right call.
>
>     L208:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
> > (address)sender.fp());
>         nit: please remove extra blank here: "( jcw"
Ok.
>
>
> OK so I understand the AARCH64 and X86 changes. I don't quite
> understand the SPARC change... but I can be convinced otherwise.
Ok. Let me know what you think now after a bit more explanation. I can 
put some more effort into trying out the test case on sprarc if needed.

thanks,

Chris
>
> If you fix the nits, I don't need to see a new webrev.
>
> Dan
>
>
>> https://bugs.openjdk.java.net/browse/JDK-8166679
>>
>> The fix is to partially undo the changes for JDK-8159284. There are 
>> two places where the fix for JDK-8159284 added an extra check of the 
>> validity of the entry frame, but really only the first one is 
>> appropriate since for the second one we are not in an entry frame. 
>> More details can be found near the end of the bug comments.
>>
>> Note I did a straight patch of the old version of the code. It could 
>> probably use some formatting and comment cleanup. I decided not to 
>> clean it up to make it easy to compare the current code with the 
>> original. I'll clean it up if you feel it would be best to.
>>
>> Tested by running KitchenSink more times than I can count, since 
>> that's where JDK-8159284 turned up. However, that's not proving much 
>> since I could not reproduce JDK-8159284 even without its fix in place 
>> (it also couldn't be reproduced at the time JDK-8159284 was was being 
>> investigated and fixed). For this reason I can't be 100% sure that 
>> JDK-8159284 is not being re-introduced with my changes.
>>
>> Also tested by running a very large set of tests trough RBT, close to 
>> what we do for PIT testing, minus product builds and a few tests that 
>> take a long time to run.
>>
>> Lastly, I also tested with the test case in the CR to make sure it 
>> now passes. Unforgettably it's not possible to add the test case as a 
>> jtreg test since it requires the installation of the Oracle Studio 
>> tools.
>>
>> thanks,
>>
>> Chris
>


From coleen.phillimore at oracle.com  Fri Oct 21 19:42:57 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Fri, 21 Oct 2016 15:42:57 -0400
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
	<def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>
	<e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>
Message-ID: <9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com>


Chris,

This change looks good.  Thank you for the analysis and fixing the 
regression.

On 10/21/16 3:13 PM, Chris Plummer wrote:
> Hi Dan,
>
> Thanks for the review. Comments inline below:
>
> On 10/21/16 7:59 AM, Daniel D. Daugherty wrote:
>> On 10/20/16 2:28 PM, Chris Plummer wrote:
>>> Hello,
>>>
>>> Please review the following:
>>>
>>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/
>>
>> src/cpu/aarch64/vm/frame_aarch64.cpp
>>     So we're in a "if (StubRoutines::returns_to_call_stub()" block
>>     and the assumption was that a frame that returns to a call stub
>>     must be an entry frame. Hence the use of is_entry_frame_valid().
>>     However, your investigation revealed that you can be in an
>>     interpreter frame that returns to a call stub here. That sounds
>>     both familiar and right :-)
>>
>>     L209:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
>> > (address)sender.fp());
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>>     I like the new JavaCallWrapper sanity check. I never thought of
>>     that when I worked on AsyncGetCallTrace().
>>
>> src/cpu/sparc/vm/frame_sparc.cpp
>>     old L281:     if (sender.is_entry_frame()) {
>>     old L282:       return sender.is_entry_frame_valid(thread);
>>     old L283:     }
>>         I don't understand this one. Why isn't is_entry_frame_valid()
>>         correct here? You are in a "if (sender.is_entry_frame())" block.
> I starred at this one a bit too, since the code is not quite the same 
> as x86 and aarch64. I'm not 100% sure I got it right, so I opted to 
> just change it to what used to be there, especially since 8159284 
> never turned up on sparc. I did try to go down the path of making sure 
> that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, but 
> getting Dev Studio installed on a Solaris-sparc machine was proving 
> difficult. Maybe I should take another stab at that.
>
> As for the similarities and differences between the sparc code an x86, 
> for x86 before my changes we had:
>
>       if (StubRoutines::returns_to_call_stub(sender_pc)) {
>         ...
>         frame sender(sender_sp, sender_unextended_sp, saved_fp, 
> sender_pc);
>         return sender.is_entry_frame_valid(thread);
>       }
>
> And for sparc:
>
>       frame sender(_SENDER_SP, younger_sp, adjusted_stack);
>       if (sender.is_entry_frame()) {
>         return sender.is_entry_frame_valid(thread);
>       }
>
> So for x86 we are only adding the sender.is_entry_frame_valid() check 
> if the "current frame" returns to a stub, but for sparc we are doing 
> the check if the "sender frame" is an entry frame. I don't know the 
> reason for this difference. Aren't stubs entry frames? If yes, it seem 
> that having the check done in this way would cause this CR on sparc 
> just like it does on sparc.

I looked at this too and decided the platforms were equivalent, only 
coded differently.   On sparc we create a sender frame, and x86 we look 
at sender_pc before creating a sender frame.  And is_entry_frame is:

inline bool frame::is_entry_frame() const {
   return StubRoutines::returns_to_call_stub(pc());
}

Thanks,
Coleen

>>
>>     I can see wanting to add the JavaCallWrapper sanity check as
>>     an additional check. If you do that:
>>
>>     L286       bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw 
>> > sender_fp);
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>> src/cpu/x86/vm/frame_x86.cpp
>>     Again we're in a if (StubRoutines::returns_to_call_stub()" block
>>     so I see why is_entry_frame_valid() is not the right call.
>>
>>     L208:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
>> > (address)sender.fp());
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>>
>> OK so I understand the AARCH64 and X86 changes. I don't quite
>> understand the SPARC change... but I can be convinced otherwise.
> Ok. Let me know what you think now after a bit more explanation. I can 
> put some more effort into trying out the test case on sprarc if needed.
>
> thanks,
>
> Chris
>>
>> If you fix the nits, I don't need to see a new webrev.
>>
>> Dan
>>
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8166679
>>>
>>> The fix is to partially undo the changes for JDK-8159284. There are 
>>> two places where the fix for JDK-8159284 added an extra check of the 
>>> validity of the entry frame, but really only the first one is 
>>> appropriate since for the second one we are not in an entry frame. 
>>> More details can be found near the end of the bug comments.
>>>
>>> Note I did a straight patch of the old version of the code. It could 
>>> probably use some formatting and comment cleanup. I decided not to 
>>> clean it up to make it easy to compare the current code with the 
>>> original. I'll clean it up if you feel it would be best to.
>>>
>>> Tested by running KitchenSink more times than I can count, since 
>>> that's where JDK-8159284 turned up. However, that's not proving much 
>>> since I could not reproduce JDK-8159284 even without its fix in 
>>> place (it also couldn't be reproduced at the time JDK-8159284 was 
>>> was being investigated and fixed). For this reason I can't be 100% 
>>> sure that JDK-8159284 is not being re-introduced with my changes.
>>>
>>> Also tested by running a very large set of tests trough RBT, close 
>>> to what we do for PIT testing, minus product builds and a few tests 
>>> that take a long time to run.
>>>
>>> Lastly, I also tested with the test case in the CR to make sure it 
>>> now passes. Unforgettably it's not possible to add the test case as 
>>> a jtreg test since it requires the installation of the Oracle Studio 
>>> tools.
>>>
>>> thanks,
>>>
>>> Chris
>>
>


From daniel.daugherty at oracle.com  Fri Oct 21 22:22:27 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 21 Oct 2016 16:22:27 -0600
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
	<def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>
	<e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>
Message-ID: <637e7454-ec9a-caa9-483c-cc7818a4ba89@oracle.com>

 > Ok. Let me know what you think now after a bit more explanation.

I'm good with it. Thumbs up!

Dan


On 10/21/16 1:13 PM, Chris Plummer wrote:
> Hi Dan,
>
> Thanks for the review. Comments inline below:
>
> On 10/21/16 7:59 AM, Daniel D. Daugherty wrote:
>> On 10/20/16 2:28 PM, Chris Plummer wrote:
>>> Hello,
>>>
>>> Please review the following:
>>>
>>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/
>>
>> src/cpu/aarch64/vm/frame_aarch64.cpp
>>     So we're in a "if (StubRoutines::returns_to_call_stub()" block
>>     and the assumption was that a frame that returns to a call stub
>>     must be an entry frame. Hence the use of is_entry_frame_valid().
>>     However, your investigation revealed that you can be in an
>>     interpreter frame that returns to a call stub here. That sounds
>>     both familiar and right :-)
>>
>>     L209:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
>> > (address)sender.fp());
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>>     I like the new JavaCallWrapper sanity check. I never thought of
>>     that when I worked on AsyncGetCallTrace().
>>
>> src/cpu/sparc/vm/frame_sparc.cpp
>>     old L281:     if (sender.is_entry_frame()) {
>>     old L282:       return sender.is_entry_frame_valid(thread);
>>     old L283:     }
>>         I don't understand this one. Why isn't is_entry_frame_valid()
>>         correct here? You are in a "if (sender.is_entry_frame())" block.
> I starred at this one a bit too, since the code is not quite the same 
> as x86 and aarch64. I'm not 100% sure I got it right, so I opted to 
> just change it to what used to be there, especially since 8159284 
> never turned up on sparc. I did try to go down the path of making sure 
> that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, but 
> getting Dev Studio installed on a Solaris-sparc machine was proving 
> difficult. Maybe I should take another stab at that.
>
> As for the similarities and differences between the sparc code an x86, 
> for x86 before my changes we had:
>
>       if (StubRoutines::returns_to_call_stub(sender_pc)) {
>         ...
>         frame sender(sender_sp, sender_unextended_sp, saved_fp, 
> sender_pc);
>         return sender.is_entry_frame_valid(thread);
>       }
>
> And for sparc:
>
>       frame sender(_SENDER_SP, younger_sp, adjusted_stack);
>       if (sender.is_entry_frame()) {
>         return sender.is_entry_frame_valid(thread);
>       }
>
> So for x86 we are only adding the sender.is_entry_frame_valid() check 
> if the "current frame" returns to a stub, but for sparc we are doing 
> the check if the "sender frame" is an entry frame. I don't know the 
> reason for this difference. Aren't stubs entry frames? If yes, it seem 
> that having the check done in this way would cause this CR on sparc 
> just like it does on sparc.
>>
>>     I can see wanting to add the JavaCallWrapper sanity check as
>>     an additional check. If you do that:
>>
>>     L286       bool jcw_safe = (jcw <= thread->stack_base()) && ( jcw 
>> > sender_fp);
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>> src/cpu/x86/vm/frame_x86.cpp
>>     Again we're in a if (StubRoutines::returns_to_call_stub()" block
>>     so I see why is_entry_frame_valid() is not the right call.
>>
>>     L208:       bool jcw_safe = (jcw < thread->stack_base()) && ( jcw 
>> > (address)sender.fp());
>>         nit: please remove extra blank here: "( jcw"
> Ok.
>>
>>
>> OK so I understand the AARCH64 and X86 changes. I don't quite
>> understand the SPARC change... but I can be convinced otherwise.
> Ok. Let me know what you think now after a bit more explanation. I can 
> put some more effort into trying out the test case on sprarc if needed.
>
> thanks,
>
> Chris
>>
>> If you fix the nits, I don't need to see a new webrev.
>>
>> Dan
>>
>>
>>> https://bugs.openjdk.java.net/browse/JDK-8166679
>>>
>>> The fix is to partially undo the changes for JDK-8159284. There are 
>>> two places where the fix for JDK-8159284 added an extra check of the 
>>> validity of the entry frame, but really only the first one is 
>>> appropriate since for the second one we are not in an entry frame. 
>>> More details can be found near the end of the bug comments.
>>>
>>> Note I did a straight patch of the old version of the code. It could 
>>> probably use some formatting and comment cleanup. I decided not to 
>>> clean it up to make it easy to compare the current code with the 
>>> original. I'll clean it up if you feel it would be best to.
>>>
>>> Tested by running KitchenSink more times than I can count, since 
>>> that's where JDK-8159284 turned up. However, that's not proving much 
>>> since I could not reproduce JDK-8159284 even without its fix in 
>>> place (it also couldn't be reproduced at the time JDK-8159284 was 
>>> was being investigated and fixed). For this reason I can't be 100% 
>>> sure that JDK-8159284 is not being re-introduced with my changes.
>>>
>>> Also tested by running a very large set of tests trough RBT, close 
>>> to what we do for PIT testing, minus product builds and a few tests 
>>> that take a long time to run.
>>>
>>> Lastly, I also tested with the test case in the CR to make sure it 
>>> now passes. Unforgettably it's not possible to add the test case as 
>>> a jtreg test since it requires the installation of the Oracle Studio 
>>> tools.
>>>
>>> thanks,
>>>
>>> Chris
>>
>


From daniel.daugherty at oracle.com  Fri Oct 21 22:27:35 2016
From: daniel.daugherty at oracle.com (Daniel D. Daugherty)
Date: Fri, 21 Oct 2016 16:27:35 -0600
Subject: RFR(S): 8166679: JNI AsyncGetCallTrace replaces topmost frame
	name with <no Java callstack recorded> starting with Java 9 b133
In-Reply-To: <9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com>
References: <3137e96c-d7b5-044b-9873-d570eab77d2b@oracle.com>
	<def6c2f4-11b3-acf0-8a5f-89ea3c4a7874@oracle.com>
	<e8e8f01d-d1db-f5b1-b2a1-b9a8254c721a@oracle.com>
	<9e4afc7a-f6cc-fd4f-4935-9574169276a6@oracle.com>
Message-ID: <d77e71a8-7efc-1c5f-52f2-18bef7c525f4@oracle.com>

On 10/21/16 1:42 PM, Coleen Phillimore wrote:
>
> Chris,
>
> This change looks good.  Thank you for the analysis and fixing the 
> regression.
>
> On 10/21/16 3:13 PM, Chris Plummer wrote:
>> Hi Dan,
>>
>> Thanks for the review. Comments inline below:
>>
>> On 10/21/16 7:59 AM, Daniel D. Daugherty wrote:
>>> On 10/20/16 2:28 PM, Chris Plummer wrote:
>>>> Hello,
>>>>
>>>> Please review the following:
>>>>
>>>> http://cr.openjdk.java.net/~cjplummer/8166679/webrev.00/webrev.hotspot/ 
>>>>
>>>
>>> src/cpu/aarch64/vm/frame_aarch64.cpp
>>>     So we're in a "if (StubRoutines::returns_to_call_stub()" block
>>>     and the assumption was that a frame that returns to a call stub
>>>     must be an entry frame. Hence the use of is_entry_frame_valid().
>>>     However, your investigation revealed that you can be in an
>>>     interpreter frame that returns to a call stub here. That sounds
>>>     both familiar and right :-)
>>>
>>>     L209:       bool jcw_safe = (jcw < thread->stack_base()) && ( 
>>> jcw > (address)sender.fp());
>>>         nit: please remove extra blank here: "( jcw"
>> Ok.
>>>
>>>     I like the new JavaCallWrapper sanity check. I never thought of
>>>     that when I worked on AsyncGetCallTrace().
>>>
>>> src/cpu/sparc/vm/frame_sparc.cpp
>>>     old L281:     if (sender.is_entry_frame()) {
>>>     old L282:       return sender.is_entry_frame_valid(thread);
>>>     old L283:     }
>>>         I don't understand this one. Why isn't is_entry_frame_valid()
>>>         correct here? You are in a "if (sender.is_entry_frame())" 
>>> block.
>> I starred at this one a bit too, since the code is not quite the same 
>> as x86 and aarch64. I'm not 100% sure I got it right, so I opted to 
>> just change it to what used to be there, especially since 8159284 
>> never turned up on sparc. I did try to go down the path of making 
>> sure that 8166679 (this CR I'm fixing) does occur on Solaris-sparc, 
>> but getting Dev Studio installed on a Solaris-sparc machine was 
>> proving difficult. Maybe I should take another stab at that.
>>
>> As for the similarities and differences between the sparc code an 
>> x86, for x86 before my changes we had:
>>
>>       if (StubRoutines::returns_to_call_stub(sender_pc)) {
>>         ...
>>         frame sender(sender_sp, sender_unextended_sp, saved_fp, 
>> sender_pc);
>>         return sender.is_entry_frame_valid(thread);
>>       }
>>
>> And for sparc:
>>
>>       frame sender(_SENDER_SP, younger_sp, adjusted_stack);
>>       if (sender.is_entry_frame()) {
>>         return sender.is_entry_frame_valid(thread);
>>       }
>>
>> So for x86 we are only adding the sender.is_entry_frame_valid() check 
>> if the "current frame" returns to a stub, but for sparc we are doing 
>> the check if the "sender frame" is an entry frame. I don't know the 
>> reason for this difference. Aren't stubs entry frames? If yes, it 
>> seem that having the check done in this way would cause this CR on 
>> sparc just like it does on sparc.
>
> I looked at this too and decided the platforms were equivalent, only 
> coded differently.   On sparc we create a sender frame, and x86 we 
> look at sender_pc before creating a sender frame.  And is_entry_frame is:
>
> inline bool frame::is_entry_frame() const {
>   return StubRoutines::returns_to_call_stub(pc());
> }

That just makes this even more "interesting".

So what we're saying here is that for either form of the question:

     Is this an entry frame?

we cannot call is_entry_frame_valid() because that function will
sometimes return false when the "entry frame" is also an interpreter
frame...

Are we possibly fixing this in the wrong place? Dunno. It's Friday
afternoon and maybe I'm just too fried to think this one through...

Dan


>
> Thanks,
> Coleen
>
>>>
>>>     I can see wanting to add the JavaCallWrapper sanity check as
>>>     an additional check. If you do that:
>>>
>>>     L286       bool jcw_safe = (jcw <= thread->stack_base()) && ( 
>>> jcw > sender_fp);
>>>         nit: please remove extra blank here: "( jcw"
>> Ok.
>>>
>>> src/cpu/x86/vm/frame_x86.cpp
>>>     Again we're in a if (StubRoutines::returns_to_call_stub()" block
>>>     so I see why is_entry_frame_valid() is not the right call.
>>>
>>>     L208:       bool jcw_safe = (jcw < thread->stack_base()) && ( 
>>> jcw > (address)sender.fp());
>>>         nit: please remove extra blank here: "( jcw"
>> Ok.
>>>
>>>
>>> OK so I understand the AARCH64 and X86 changes. I don't quite
>>> understand the SPARC change... but I can be convinced otherwise.
>> Ok. Let me know what you think now after a bit more explanation. I 
>> can put some more effort into trying out the test case on sprarc if 
>> needed.
>>
>> thanks,
>>
>> Chris
>>>
>>> If you fix the nits, I don't need to see a new webrev.
>>>
>>> Dan
>>>
>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8166679
>>>>
>>>> The fix is to partially undo the changes for JDK-8159284. There are 
>>>> two places where the fix for JDK-8159284 added an extra check of 
>>>> the validity of the entry frame, but really only the first one is 
>>>> appropriate since for the second one we are not in an entry frame. 
>>>> More details can be found near the end of the bug comments.
>>>>
>>>> Note I did a straight patch of the old version of the code. It 
>>>> could probably use some formatting and comment cleanup. I decided 
>>>> not to clean it up to make it easy to compare the current code with 
>>>> the original. I'll clean it up if you feel it would be best to.
>>>>
>>>> Tested by running KitchenSink more times than I can count, since 
>>>> that's where JDK-8159284 turned up. However, that's not proving 
>>>> much since I could not reproduce JDK-8159284 even without its fix 
>>>> in place (it also couldn't be reproduced at the time JDK-8159284 
>>>> was was being investigated and fixed). For this reason I can't be 
>>>> 100% sure that JDK-8159284 is not being re-introduced with my changes.
>>>>
>>>> Also tested by running a very large set of tests trough RBT, close 
>>>> to what we do for PIT testing, minus product builds and a few tests 
>>>> that take a long time to run.
>>>>
>>>> Lastly, I also tested with the test case in the CR to make sure it 
>>>> now passes. Unforgettably it's not possible to add the test case as 
>>>> a jtreg test since it requires the installation of the Oracle 
>>>> Studio tools.
>>>>
>>>> thanks,
>>>>
>>>> Chris
>>>
>>
>


From thomas.stuefe at gmail.com  Mon Oct 24 13:12:13 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 24 Oct 2016 15:12:13 +0200
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for input
	size=0
Message-ID: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>

Dear all,

please check this tiny bug fix.

Bug report:
https://bugs.openjdk.java.net/browse/JDK-8168542

Webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/

In short, this fixes a corner case for os::realloc() which currently
returns NULL if input size is zero.

But as we have coding which interprets a return value of NULL as OOM (See
ReallocateHeap()), this is not a good solution. It is also inconsistent
with how os::malloc() deals with the same situation and potentially with
the way the native C-Runtime deals with it (currently, in a debug build we
will return NULL in case of size=0 whereas in the release build we just
call the native ::realloc() and return whatever it returns.)

Thank you,

Thomas

From thomas.stuefe at gmail.com  Mon Oct 24 13:39:43 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Mon, 24 Oct 2016 15:39:43 +0200
Subject: PING: Enhancement Proposal: Reduce metaspace waste by dynamically
	merging and splitting metaspace chunks.
Message-ID: <CAA-vtUwuh4DzFz5Mfyq-5XbVovvxVS3zqE-xmoguH-gHk0VT+w@mail.gmail.com>

(crossposting to runtime-dev in the hope of getting more interest)

Hi all,

Please take a look at this proposed JEP.

https://bugs.openjdk.java.net/browse/JDK-8166690

The JEP proposes an improved allocator for metaspace. That allocator
reduces metaspace wastage for certain corner cases by a lot.

We at SAP have already an existing implementation for this proposal, but
currently only in our internal code base, not in the OpenJDK. It works
fine. I can provide a prototype based on openjdk 9 to look at and play
with, but would like to know whether there is any interest before investing
the work.

Thank you! and Kind Regards,

Thomas


On Tue, Sep 27, 2016 at 10:45 AM, Thomas St?fe <thomas.stuefe at gmail.com>
wrote:

> Dear all,
>
> please take a look at this Enhancement Proposal for the metaspace
> allocator. I hope these are the right groups for this discussion.
>
> https://bugs.openjdk.java.net/browse/JDK-8166690
>
> Background:
>
> We at SAP see at times at customer installations OOMs in Metaspace
> (usually, with compressed class pointers enabled, in Compressed Class
> Space). The VM attempts to allocate metaspace and fails, hitting the
> CompressedClassSpaceSize limit. Note that we usually set the limit lower
> than the default, typically at 256M.
>
> When analyzing, we observed that a large part of the metaspace is indeed
> free but "locked in" into metaspace chunks of the wrong size: often we
> would find a lot of free small chunks, but the allocation request was for
> medium chunks, and failed.
>
> The reason was that if at some point in time a lot of class loaders were
> alive, each with only a few small classes loaded. This would lead to the
> metaspace being swamped with lots of small chunks. This is because each
> SpaceManager first allocates small chunks, only after a certain amount of
> allocation requests switches to larger chunks.
>
> These small chunks are free and wait in the freelist, but cannot be reused
> for allocation requests which require larger chunks, even if they are
> physically adjacent in the virtual space.
>
> We (at SAP) added a patch which allows on-the-fly metaspace chunk merging
> - to merge multiple adjacent smaller chunk to form a larger chunk. This, in
> combination with the reverse direction - splitting a large chunk to get
> smaller chunks - partly negates the "chunks-are-locked-in-into-their-size"
> limitation and provides for better reuse of metaspace chunks. It also
> provides better defragmentation as well.
>
> I discussed this fix off-list with Coleen Phillimore and Jon Masamitsu,
> and instead of just offering this as a fix, both recommended to open a JEP
> for this, because its scope would be beyond that of a simple fix.
>
> So here is my first JEP :) I hope it follows the right form. Please, if
> you have time, take a look and tell us what you think.
>
> Thank you, and Kind Regards,
>
> Thomas St?fe
>
>
>
>

From rachel.protacio at oracle.com  Mon Oct 24 20:17:44 2016
From: rachel.protacio at oracle.com (Rachel Protacio)
Date: Mon, 24 Oct 2016 16:17:44 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
Message-ID: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>

Hi,

Please review this small fix, which removes two nested ResourceMark's 
that were causing problems with defaultmethods logging.

Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/

Tested with JPRT.

Thanks!
Rachel

From max.ockner at oracle.com  Tue Oct 25 01:11:52 2016
From: max.ockner at oracle.com (Max Ockner)
Date: Mon, 24 Oct 2016 21:11:52 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
Message-ID: <580EB158.7020509@oracle.com>

Rachel,
Did you mean to remove both ResourceMarks?
(I suppose if it passes JPRT then it might not matter)
Max
On 10/24/2016 4:17 PM, Rachel Protacio wrote:
> Hi,
>
> Please review this small fix, which removes two nested ResourceMark's 
> that were causing problems with defaultmethods logging.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>
> Tested with JPRT.
>
> Thanks!
> Rachel


From david.holmes at oracle.com  Tue Oct 25 04:22:22 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Oct 2016 14:22:22 +1000
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
Message-ID: <d4e088bb-7b5e-94f3-84dc-f0a3deca1f20@oracle.com>

Hi Thomas,

On 24/10/2016 11:12 PM, Thomas St?fe wrote:
> Dear all,
>
> please check this tiny bug fix.
>
> Bug report:
> https://bugs.openjdk.java.net/browse/JDK-8168542
>
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/
>
> In short, this fixes a corner case for os::realloc() which currently
> returns NULL if input size is zero.
>
> But as we have coding which interprets a return value of NULL as OOM (See
> ReallocateHeap()), this is not a good solution. It is also inconsistent
> with how os::malloc() deals with the same situation and potentially with
> the way the native C-Runtime deals with it (currently, in a debug build we
> will return NULL in case of size=0 whereas in the release build we just
> call the native ::realloc() and return whatever it returns.)

Sorry but I do not like this. A native realloc with a size of zero and a 
non-NULL ptr acts like free(ptr). Our realloc does not do that. A native 
malloc that receives a size of zero "returns either NULL, or a unique 
pointer value that can later be successfully passed to free()". Our 
os::malloc returns 1 - and I see nothing that indicates that can 
successfully be passed to os::free.

So while the current handling of size==0 is a bit inconsistent and 
unclear, it is even less clear that returning 1 is a reasonable thing to do.

To me passing a size of zero (unless expecting it to act like a free!) 
is a bug that should be handled in the caller.

I welcome opinions from others on this.

David

PS. I will be traveling soon and unable to respond to emails until 
Wednesday afternoon at the earliest.

> Thank you,
>
> Thomas
>

From david.holmes at oracle.com  Tue Oct 25 05:41:36 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Oct 2016 15:41:36 +1000
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
Message-ID: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>

Hi Rachel,

On 25/10/2016 6:17 AM, Rachel Protacio wrote:
> Hi,
>
> Please review this small fix, which removes two nested ResourceMark's
> that were causing problems with defaultmethods logging.
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>
> Tested with JPRT.

It is tricky to determine who has responsibility for positioning the 
ResourceMarks. Looking at this call chain it initially appeared to me 
that we now had a missing RM for the code at line #80:

813       slot->print_on(logstream);
=>
590   void print_on(outputStream* str) const {
591     print_slot(str, name(), signature());
592   }
=>
79 static void print_slot(outputStream* str, Symbol* name, Symbol* 
signature) {
80   str->print("%s%s", name->as_C_string(), signature->as_C_string());
81 }

but we actually have a RM higher up at:

787   ResourceMark rm(THREAD);

so that is good, but then we also have a nested ResourceMark further down:

  795   if (log_is_enabled(Debug, defaultmethods)) {
  796     ResourceMark rm;

I must admit I'm unclear if ResourceMarks should never be nested, or 
should be nested "carefully" - and if the latter exactly what that means 
and how to recognize it.

Thanks,
David
-----

> Thanks!
> Rachel

From david.holmes at oracle.com  Tue Oct 25 05:46:21 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Oct 2016 15:46:21 +1000
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<CAA-vtUxg-VnMmbQ4UCjy0jd-_mHE1eT4vNTjXoCOaiezKt+wMg@mail.gmail.com>
	<422a7612-79a9-4782-70de-e7b0c8dad9ac@oracle.com>
Message-ID: <776ac549-77cb-3ac1-69f4-b356f8631019@oracle.com>

On 18/10/2016 5:16 PM, David Holmes wrote:
> Hi Thomas,
>
> I took an initial look but am still mulling over things.

Sorry Thomas haven't had a chance to get back to this. Hard to find time 
for future features/enhancements at the moment. :)

Others should feel free to chime in on this. :)

David

> Note that as an enhancement this will need to wait for Java 10 repos to
> open - unless you go through the FC extension process.
>
> Thanks,
> David
>
> On 18/10/2016 4:22 PM, Thomas St?fe wrote:
>> Ping.
>>
>> On Thu, Oct 13, 2016 at 6:55 AM, Thomas St?fe <thomas.stuefe at gmail.com>
>> wrote:
>>
>>> Dear all,
>>>
>>> please take a look at the following fix:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-
>>> Hanging-Error-Reporting/webrev.00/webrev/index.html
>>>
>>> ---
>>>
>>> In short, this fix provides the ability to cancel hanging error
>>> reporting
>>> steps. This uses the same code paths secondary error handling uses
>>> during
>>> error reporting. With this patch, steps which take too long will be
>>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look
>>> like this:
>>>
>>> 4 [timeout occurred during error reporting in step "<stepname>"] after
>>> xxxx ms.
>>> 5
>>>
>>> and we now also get a finish message in the hs-err file if we hit the
>>> ErrorLogTimeout and error reporting will stop altogether:
>>>
>>> 6 ------ Timout during error reporting after xxx ms. ------
>>>
>>> (in addition to the "time expired, abort" message the WatcherThread
>>> writes
>>> to stderr)
>>>
>>> ---
>>>
>>> This is something which bugged us for a long time, because we rely
>>> heavily
>>> on the hs_err files for error analysis at customer sites, and there
>>> are a
>>> number of reasons why one step may hang and prevent the follow-up steps
>>> from running.
>>>
>>> It works like this:
>>>
>>> Before, when error reporting started, the WatcherThread was waiting for
>>> ErrorLogTimeout seconds, then would stop the VM.
>>>
>>> Now, the WatcherThread periodically pings error reporting, which
>>> checks if
>>> the last step did timeout. If it does, it sends a signal to the
>>> reporting
>>> thread, and the thread will continue with the next step. This follows
>>> the
>>> same path as secondary crash handling.
>>>
>>> Some implementation details:
>>>
>>> On Posix platforms, to interrupt the thread, I use pthread_kill. This
>>> means I must know the pthread id of the reporting thread, which I now
>>> store
>>> at the beginning of error reporting. We already store the reporting
>>> thread
>>> id in first_error_tid, but that I cannot use, because it gets set by
>>> os::current_thread_id(), which is not always the pthread id. Should
>>> we ever
>>> switch to only using pthread id for posix platforms, this coding can be
>>> simplified.
>>>
>>> On Windows, there is unfortunately no easy way to interrupt a
>>> non-cooperative thread. I would need a way to cause a SEH inside the
>>> target
>>> thread, which then would get handled by secondary error handling like on
>>> Posix platforms, but that is not easy. It is doable - one can suspend
>>> the
>>> thread, modify the thread context in a way that it will crash upon
>>> resume.
>>> But that felt a bit heavyweight for this problem. So on windows, timeout
>>> handling still works (after ErrorLogTimeout the VM gets shut down), but
>>> error reporting steps are not interruptable. If we feel this is
>>> important,
>>> this can be added later.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>

From thomas.stuefe at gmail.com  Tue Oct 25 05:50:32 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Tue, 25 Oct 2016 07:50:32 +0200
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <d4e088bb-7b5e-94f3-84dc-f0a3deca1f20@oracle.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
	<d4e088bb-7b5e-94f3-84dc-f0a3deca1f20@oracle.com>
Message-ID: <CAA-vtUx6bUA8bA0as0B2Pn3Ws1u_e4iYgiMvniu=Ev8MHLAuvg@mail.gmail.com>

Hi David,

On Tue, Oct 25, 2016 at 6:22 AM, David Holmes <david.holmes at oracle.com>
wrote:

> Hi Thomas,
>
> On 24/10/2016 11:12 PM, Thomas St?fe wrote:
>
>> Dear all,
>>
>> please check this tiny bug fix.
>>
>> Bug report:
>> https://bugs.openjdk.java.net/browse/JDK-8168542
>>
>> Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo
>> c_size_0/webrev.00/webrev/
>>
>> In short, this fixes a corner case for os::realloc() which currently
>> returns NULL if input size is zero.
>>
>> But as we have coding which interprets a return value of NULL as OOM (See
>> ReallocateHeap()), this is not a good solution. It is also inconsistent
>> with how os::malloc() deals with the same situation and potentially with
>> the way the native C-Runtime deals with it (currently, in a debug build we
>> will return NULL in case of size=0 whereas in the release build we just
>> call the native ::realloc() and return whatever it returns.)
>>
>
> Sorry but I do not like this. A native realloc with a size of zero and a
> non-NULL ptr acts like free(ptr). Our realloc does not do that. A native
> malloc that receives a size of zero "returns either NULL, or a unique
> pointer value that can later be successfully passed to free()". Our
> os::malloc returns 1 - and I see nothing that indicates that can
> successfully be passed to os::free.
>
> So while the current handling of size==0 is a bit inconsistent and
> unclear, it is even less clear that returning 1 is a reasonable thing to do.
>
> To me passing a size of zero (unless expecting it to act like a free!) is
> a bug that should be handled in the caller.
>
>
You completely lost me here. I do not return 1, neither does os::malloc().

os::realloc behaviour now is:
- in debug: if size==0, do not free but return NULL immediately. Which is
not a standard behaviour of a normal ::realloc()
- in release builds: if size==0, do whatever the C-Runtime realloc() does,
so it will always free() but either return NULL or a unique pointer.

So, in debug build the behaviour is unexpected and - assuming os::realloc()
mimicks ::realloc() - wrong. In release builds it will be correct but
unknown.

My patch changes this behaviour to always - in both release and debug
builds - free() and return a unique pointer. By setting the size to 1:

- for the debug build, I will go thru the normal path - allocating 1 byte
of memory (plus NMT/guard pages overhead), copying 1 byte of payload, then
freeing the original memory and returning the alloced 1 byte - which is
unique and can be passed to os::free().
- for the release build I will force the behaviour to be a realloc to size
1 and thus remove the ambiguity introduced by the native realloc.


> I welcome opinions from others on this.
>
> David
>
> PS. I will be traveling soon and unable to respond to emails until
> Wednesday afternoon at the earliest.
>
> Thank you,
>>
>> Thomas
>>
>>
Kind Regards, Thomas

From david.holmes at oracle.com  Tue Oct 25 05:55:36 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 25 Oct 2016 15:55:36 +1000
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <CAA-vtUx6bUA8bA0as0B2Pn3Ws1u_e4iYgiMvniu=Ev8MHLAuvg@mail.gmail.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
	<d4e088bb-7b5e-94f3-84dc-f0a3deca1f20@oracle.com>
	<CAA-vtUx6bUA8bA0as0B2Pn3Ws1u_e4iYgiMvniu=Ev8MHLAuvg@mail.gmail.com>
Message-ID: <7241ac80-0d65-bd4e-2908-59cbe3e65ab5@oracle.com>

On 25/10/2016 3:50 PM, Thomas St?fe wrote:
> Hi David,
>
> On Tue, Oct 25, 2016 at 6:22 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     Hi Thomas,
>
>     On 24/10/2016 11:12 PM, Thomas St?fe wrote:
>
>         Dear all,
>
>         please check this tiny bug fix.
>
>         Bug report:
>         https://bugs.openjdk.java.net/browse/JDK-8168542
>         <https://bugs.openjdk.java.net/browse/JDK-8168542>
>
>         Webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/>
>
>         In short, this fixes a corner case for os::realloc() which currently
>         returns NULL if input size is zero.
>
>         But as we have coding which interprets a return value of NULL as
>         OOM (See
>         ReallocateHeap()), this is not a good solution. It is also
>         inconsistent
>         with how os::malloc() deals with the same situation and
>         potentially with
>         the way the native C-Runtime deals with it (currently, in a
>         debug build we
>         will return NULL in case of size=0 whereas in the release build
>         we just
>         call the native ::realloc() and return whatever it returns.)
>
>
>     Sorry but I do not like this. A native realloc with a size of zero
>     and a non-NULL ptr acts like free(ptr). Our realloc does not do
>     that. A native malloc that receives a size of zero "returns either
>     NULL, or a unique pointer value that can later be successfully
>     passed to free()". Our os::malloc returns 1 - and I see nothing that
>     indicates that can successfully be passed to os::free.
>
>     So while the current handling of size==0 is a bit inconsistent and
>     unclear, it is even less clear that returning 1 is a reasonable
>     thing to do.
>
>     To me passing a size of zero (unless expecting it to act like a
>     free!) is a bug that should be handled in the caller.
>
>
> You completely lost me here. I do not return 1, neither does os::malloc().

Sorry some kind of visual-neural short-circuit. :)

Okay size 0 becomes size 1.

Let me just recant my email and let someone else step in. Thanks for the 
detailed explanation below.

David

> os::realloc behaviour now is:
> - in debug: if size==0, do not free but return NULL immediately. Which
> is not a standard behaviour of a normal ::realloc()
> - in release builds: if size==0, do whatever the C-Runtime realloc()
> does, so it will always free() but either return NULL or a unique pointer.
>
> So, in debug build the behaviour is unexpected and - assuming
> os::realloc() mimicks ::realloc() - wrong. In release builds it will be
> correct but unknown.
>
> My patch changes this behaviour to always - in both release and debug
> builds - free() and return a unique pointer. By setting the size to 1:
>
> - for the debug build, I will go thru the normal path - allocating 1
> byte of memory (plus NMT/guard pages overhead), copying 1 byte of
> payload, then freeing the original memory and returning the alloced 1
> byte - which is unique and can be passed to os::free().
> - for the release build I will force the behaviour to be a realloc to
> size 1 and thus remove the ambiguity introduced by the native realloc.
>
>
>     I welcome opinions from others on this.
>
>     David
>
>     PS. I will be traveling soon and unable to respond to emails until
>     Wednesday afternoon at the earliest.
>
>         Thank you,
>
>         Thomas
>
>
> Kind Regards, Thomas

From Alan.Burlison at oracle.com  Tue Oct 25 08:40:52 2016
From: Alan.Burlison at oracle.com (Alan Burlison)
Date: Tue, 25 Oct 2016 09:40:52 +0100
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
Message-ID: <95dcaecb-4db1-efec-387b-ccd4954e8c9f@oracle.com>

On 24/10/2016 14:12, Thomas St?fe wrote:

> In short, this fixes a corner case for os::realloc() which currently
> returns NULL if input size is zero.

For reference, here's what POSIX.1-2008 says:

malloc:

"If the space cannot be allocated, a null pointer shall be returned. If 
the size of the space requested is 0, the behavior is 
implementation-defined: either a null pointer shall be returned, or the 
behavior shall be as if the size were some non-zero value, except that 
the behavior is undefined if the returned pointer is used to access an 
object."

realloc:

"If the size of the space requested is zero, the behavior shall be 
implementation-defined: either a null pointer is returned, or the 
behavior shall be as if the size were some non-zero value, except that 
the behavior is undefined if the returned pointer is used to access an 
object. If the space cannot be allocated, the object shall remain 
unchanged."

C11 says basically the same thing.

-- 
Alan Burlison
--

From tobias.hartmann at oracle.com  Tue Oct 25 12:43:07 2016
From: tobias.hartmann at oracle.com (Tobias Hartmann)
Date: Tue, 25 Oct 2016 14:43:07 +0200
Subject: [9] RFR(S): 8164612: NoSuchMethodException when method name
	contains NULL or Latin-1 supplement character
In-Reply-To: <5808692E.9090905@oracle.com>
References: <580622AE.9080802@oracle.com> <5808692E.9090905@oracle.com>
Message-ID: <580F535B.8040205@oracle.com>

[Ping]

As Coleen requested, I executed the JCK/VM tests (see comment in bug).

Best regards,
Tobias

On 20.10.2016 08:50, Tobias Hartmann wrote:
> Hi,
> 
> since this is affecting runtime code, could someone from the runtime team please have a look as well?
> 
> Thanks,
> Tobias
> 
> On 18.10.2016 15:25, Tobias Hartmann wrote:
>> Hi,
>>
>> please review the following patch:
>> https://bugs.openjdk.java.net/browse/JDK-8164612
>> http://cr.openjdk.java.net/~thartmann/8164612/webrev.00/
>>
>> The test executes Java Script code that defines getter methods containing Latin-1 supplement characters (0x80 - 0xFF). Those methods are registered at runtime through anonymous classes via Unsafe_DefineAnonymousClass. When calling a method, the VM fails with a NoSuchMethodException in MethodHandles::resolve_MemberName(). 
>>
>> The failure happens while looking up the method name symbol in java_lang_String::as_symbol_or_null() [1]:
>> 544    jbyte* position = (length == 0) ? NULL : value->byte_at_addr(0);
>> 545    const char* base = UNICODE::as_utf8(position, length);
>> 546    return SymbolTable::probe(base, length);
>>
>> If Compact Strings is enabled, we pass the Latin-1 encoded method name to UNICODE::as_utf8() and probe for the UTF-8 String in the SymbolTable. Since the Latin-1 method name contains non-ASCII characters, the length of the resulting UTF-8 String is larger (characters >= 0x80 are encoded as two bytes in UTF-8). However, we pass the shorter Latin-1 length to SymbolTable::probe() resulting in a lookup failure.
>>
>> I fixed this by passing the String length by reference to UNICODE::as_utf8(). I also refactored the related code in utf8.cpp, added comments and updated the callers.
>>
>> Tested with regression test and hs-comp PIT RBT (running).
>>
>> Thanks,
>> Tobias,
>>
>> [1] http://hg.openjdk.java.net/jdk9/hs/hotspot/file/652537a80080/src/share/vm/classfile/javaClasses.cpp#l535
>>

From rachel.protacio at oracle.com  Tue Oct 25 15:14:01 2016
From: rachel.protacio at oracle.com (Rachel Protacio)
Date: Tue, 25 Oct 2016 11:14:01 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
	<91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>
Message-ID: <ae0d0c66-ae53-c5e6-f0e8-a7c252cb0c4f@oracle.com>

Hi,

Thanks for taking a look. I think in this particular case the issue was 
that the nested ResourceMark's were around code that affected an 
existing outputStream. So in fact the nesting per se isn't what was 
wrong, the issue was adding a ResourceMark in the middle of a resource 
that still needed the content after it went out of scope of the RM. So 
line 796 is good because its functionality is self-contained, and the 
ones I deleted were bad because they interfered with the functionality 
of the caller code. (Can someone corroborate this assessment?)

However, as those functions still need RMs in general somewhere up the 
line, I can add a comment of the form

    // The caller of print_slot() (or one of its callers)
    // must use a ResourceMark in order to correctly free the result.

for print_slot(), print_method(), and print_on() at line 590. Does that 
sound good?
Rachel

On 10/25/2016 1:41 AM, David Holmes wrote:
> Hi Rachel,
>
> On 25/10/2016 6:17 AM, Rachel Protacio wrote:
>> Hi,
>>
>> Please review this small fix, which removes two nested ResourceMark's
>> that were causing problems with defaultmethods logging.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>>
>> Tested with JPRT.
>
> It is tricky to determine who has responsibility for positioning the 
> ResourceMarks. Looking at this call chain it initially appeared to me 
> that we now had a missing RM for the code at line #80:
>
> 813       slot->print_on(logstream);
> =>
> 590   void print_on(outputStream* str) const {
> 591     print_slot(str, name(), signature());
> 592   }
> =>
> 79 static void print_slot(outputStream* str, Symbol* name, Symbol* 
> signature) {
> 80   str->print("%s%s", name->as_C_string(), signature->as_C_string());
> 81 }
>
> but we actually have a RM higher up at:
>
> 787   ResourceMark rm(THREAD);
>
> so that is good, but then we also have a nested ResourceMark further 
> down:
>
>  795   if (log_is_enabled(Debug, defaultmethods)) {
>  796     ResourceMark rm;
>
> I must admit I'm unclear if ResourceMarks should never be nested, or 
> should be nested "carefully" - and if the latter exactly what that 
> means and how to recognize it.
>
> Thanks,
> David
> -----
>
>> Thanks!
>> Rachel


From rachel.protacio at oracle.com  Tue Oct 25 15:19:42 2016
From: rachel.protacio at oracle.com (Rachel Protacio)
Date: Tue, 25 Oct 2016 11:19:42 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <580EB158.7020509@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
	<580EB158.7020509@oracle.com>
Message-ID: <d0634fdc-ae38-2b1b-5776-3860b322bd57@oracle.com>

Hi,

Thanks for looking - yes, the issue is that both functions are used as 
sub-components of a larger printing function so the RMs should only 
exist at the top level where the stream is created.
Rachel

On 10/24/2016 9:11 PM, Max Ockner wrote:
> Rachel,
> Did you mean to remove both ResourceMarks?
> (I suppose if it passes JPRT then it might not matter)
> Max
> On 10/24/2016 4:17 PM, Rachel Protacio wrote:
>> Hi,
>>
>> Please review this small fix, which removes two nested ResourceMark's 
>> that were causing problems with defaultmethods logging.
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>>
>> Tested with JPRT.
>>
>> Thanks!
>> Rachel
>


From harold.seigel at oracle.com  Tue Oct 25 19:58:23 2016
From: harold.seigel at oracle.com (harold seigel)
Date: Tue, 25 Oct 2016 15:58:23 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <ae0d0c66-ae53-c5e6-f0e8-a7c252cb0c4f@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
	<91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>
	<ae0d0c66-ae53-c5e6-f0e8-a7c252cb0c4f@oracle.com>
Message-ID: <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com>

Hi Rachel,

I think that the ResourceMarks that you removed were the correct ones.  
My understanding is that based on this assert it looks like all calls to 
stringStream::write() on a particular Stream need to be done using the 
same ResourceMark with which the Stream was created.  Otherwise, this 
assert in stringStream::write() will trigger:

       assert(rm == NULL || Thread::current()->current_resource_mark() 
== rm,
              "StringStream is re-allocated with a different 
ResourceMark..." ...)

These two ResourceMarks needed to be removed because their outputStream 
was constructed with a caller's ResourceMark.  If they specified their 
own ResourceMark then their calls to print(), which eventually calls 
stringStream::write(), would cause the assert to trigger.

    static void print_slot(outputStream* str, Symbol* name, Symbol*
    signature) {
       ResourceMark rm;
       str->print("%s%s", name->as_C_string(), signature->as_C_string());
    }

    static void print_method(outputStream* str, Method* mo, bool
    with_class=true) {
       ResourceMark rm;
       if (with_class) {
         str->print("%s.", mo->klass_name()->as_C_string());
       }
       print_slot(str, mo->name(), mo->signature());
    }


I think that having a ResourceMark in code like this is okay because 
debug_stream() probably constructs a new Stream object.

       if (log_is_enabled(Debug, defaultmethods)) {
         log_debug(defaultmethods)("Slots that need filling:");
         ResourceMark rm;
         outputStream* logstream = Log(defaultmethods)::debug_stream();
         streamIndentor si(logstream);
         for (int i = 0; i < slots->length(); ++i) {
           logstream->indent();
           slots->at(i)->print_on(logstream);
           logstream->cr();
         }
       }

Harold


On 10/25/2016 11:14 AM, Rachel Protacio wrote:
> Hi,
>
> Thanks for taking a look. I think in this particular case the issue 
> was that the nested ResourceMark's were around code that affected an 
> existing outputStream. So in fact the nesting per se isn't what was 
> wrong, the issue was adding a ResourceMark in the middle of a resource 
> that still needed the content after it went out of scope of the RM. So 
> line 796 is good because its functionality is self-contained, and the 
> ones I deleted were bad because they interfered with the functionality 
> of the caller code. (Can someone corroborate this assessment?)
>
> However, as those functions still need RMs in general somewhere up the 
> line, I can add a comment of the form
>
>    // The caller of print_slot() (or one of its callers)
>    // must use a ResourceMark in order to correctly free the result.
>
> for print_slot(), print_method(), and print_on() at line 590. Does 
> that sound good?
> Rachel
>
> On 10/25/2016 1:41 AM, David Holmes wrote:
>> Hi Rachel,
>>
>> On 25/10/2016 6:17 AM, Rachel Protacio wrote:
>>> Hi,
>>>
>>> Please review this small fix, which removes two nested ResourceMark's
>>> that were causing problems with defaultmethods logging.
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>>>
>>> Tested with JPRT.
>>
>> It is tricky to determine who has responsibility for positioning the 
>> ResourceMarks. Looking at this call chain it initially appeared to me 
>> that we now had a missing RM for the code at line #80:
>>
>> 813       slot->print_on(logstream);
>> =>
>> 590   void print_on(outputStream* str) const {
>> 591     print_slot(str, name(), signature());
>> 592   }
>> =>
>> 79 static void print_slot(outputStream* str, Symbol* name, Symbol* 
>> signature) {
>> 80   str->print("%s%s", name->as_C_string(), signature->as_C_string());
>> 81 }
>>
>> but we actually have a RM higher up at:
>>
>> 787   ResourceMark rm(THREAD);
>>
>> so that is good, but then we also have a nested ResourceMark further 
>> down:
>>
>>  795   if (log_is_enabled(Debug, defaultmethods)) {
>>  796     ResourceMark rm;
>>
>> I must admit I'm unclear if ResourceMarks should never be nested, or 
>> should be nested "carefully" - and if the latter exactly what that 
>> means and how to recognize it.
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks!
>>> Rachel
>


From chris.plummer at oracle.com  Tue Oct 25 20:19:11 2016
From: chris.plummer at oracle.com (Chris Plummer)
Date: Tue, 25 Oct 2016 13:19:11 -0700
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
Message-ID: <c7d92209-399c-296b-28bc-19ed5219b49b@oracle.com>

Hi Thomas,

I don't exactly like the behavior our current os::malloc() and 
os::realloc() is attempting, which is to hide the native malloc and 
realloc inconsistencies with size 0 by always making it size 1. Like 
David said, it should be considered a caller bug when this happens. But 
since it already seems to be baked in, and fixing all callers is way 
outside the scope of this bug, your fix seems to be the best approach.

You could actually move your fix inside the #ifndef ASSERT, since it 
will be redundant for the ASSERT case (it's already handled in 
os::malloc). However, it's probably cleaner before the #ifndef ASSERT, 
and makes it clear that no matter what the size is set to 1.

BTW, you can't push this to 9 since it's a p4. It looks like the Fix 
Version is already set to 10, so I assume that's where it is going.

cheers,

Chris

On 10/24/16 6:12 AM, Thomas St?fe wrote:
> Dear all,
>
> please check this tiny bug fix.
>
> Bug report:
> https://bugs.openjdk.java.net/browse/JDK-8168542
>
> Webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_realloc_size_0/webrev.00/webrev/
>
> In short, this fixes a corner case for os::realloc() which currently
> returns NULL if input size is zero.
>
> But as we have coding which interprets a return value of NULL as OOM (See
> ReallocateHeap()), this is not a good solution. It is also inconsistent
> with how os::malloc() deals with the same situation and potentially with
> the way the native C-Runtime deals with it (currently, in a debug build we
> will return NULL in case of size=0 whereas in the release build we just
> call the native ::realloc() and return whatever it returns.)
>
> Thank you,
>
> Thomas


From mandy.chung at oracle.com  Tue Oct 25 23:10:40 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 25 Oct 2016 16:10:40 -0700
Subject: Request Review: JDK-6479237  (cl) Add support for classloader names
Message-ID: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>

Webrev at:
   http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/

Specdiff:
   http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html

This is a long-standing RFE for adding support for class
loader names.  It's #ClassLoaderNames on JSR 376 issue 
list where the proposal [1] has been implemented in jake
for some time.  This patch brings this change to jdk9.

A short summary:
- New constructors are added in ClassLoader, SecureClassLoader
  and URLClassLoader to specify the class loader name.

- New ClassLoader::getName and StackTraceElement::getClassLoaderName
  method

- StackTraceElement::toString is updated to include the name
  of the class loader and module of that frame in this format:
     <loader>/<module>/<fully-qualified-name>(<src>:<line>)

The detail is in StackTraceElement::buildLoaderModuleClassName
that compress the output string for cases when the loader
has no name or the module is unnamed module.  Another thing
to mention is that VM sets the Class object when filling in
a stack trace of a Throwable object.  Then the library will
build a String from the Class object for serialization purpose.

Mandy
[1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html

From serguei.spitsyn at oracle.com  Wed Oct 26 03:22:28 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Tue, 25 Oct 2016 20:22:28 -0700
Subject: RFR(S): JDK-8165496 assert(_exception_caught == false) failed:
	_exception_caught is out of phase
In-Reply-To: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com>
References: <2797c9e2-fc5d-8892-d426-d9ae9626e2b3@oracle.com>
Message-ID: <35b7d0b6-abd9-d06b-18b7-6024d324c37a@oracle.com>

Hi Dmitry,

Sorry, I do not see how this fixes the problem.
What are you trying to solve by calling the set_exception_detected() 
conditionally?
The _exception_detected flag at that point has to be set anyway, right?

The root cause of this issue is that the assert is unreasonable and does 
not solve anything.
So that the assert has to be replaced with cleaning the 
_exception_caught flag.
Please, read my comment in the bug report.
I also thought that you were agree with this conclusion. :)

Thanks,
Serguei


On 10/21/16 01:42, Dmitry Samersoff wrote:
> Everybody,
>
> Please review a small modification of the fix for JDK-8134434:
>
> http://cr.openjdk.java.net/~dsamersoff/JDK-8165496/webrev.04/
>
> Its' possible that we come to rethrow_C when _exception_caught is
> already cleared. We need not to set exception_detected in this
> case.
>
> -Dmitry
>


From thomas.stuefe at gmail.com  Wed Oct 26 05:28:16 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 26 Oct 2016 07:28:16 +0200
Subject: RFR(xs): 8168542: os::realloc should return a valid pointer for
	input size=0
In-Reply-To: <c7d92209-399c-296b-28bc-19ed5219b49b@oracle.com>
References: <CAA-vtUxeYkd5qdnkC+AX1S+7zT+=gRJ-ssYV5tk-oXWzCSgXxw@mail.gmail.com>
	<c7d92209-399c-296b-28bc-19ed5219b49b@oracle.com>
Message-ID: <CAA-vtUzXkSvA0NHpYxjQiJGD-3hHPrjaOY=gX1g=JuEusCFrZQ@mail.gmail.com>

Hi Chris,

thank you for the review! I'll put this on the growing pile of jdk10 fixes
and hope we will have a repo soon to fix this in.

Kind Regards, Thomas

On Tue, Oct 25, 2016 at 10:19 PM, Chris Plummer <chris.plummer at oracle.com>
wrote:

> Hi Thomas,
>
> I don't exactly like the behavior our current os::malloc() and
> os::realloc() is attempting, which is to hide the native malloc and realloc
> inconsistencies with size 0 by always making it size 1. Like David said, it
> should be considered a caller bug when this happens. But since it already
> seems to be baked in, and fixing all callers is way outside the scope of
> this bug, your fix seems to be the best approach.
>
> You could actually move your fix inside the #ifndef ASSERT, since it will
> be redundant for the ASSERT case (it's already handled in os::malloc).
> However, it's probably cleaner before the #ifndef ASSERT, and makes it
> clear that no matter what the size is set to 1.
>
> BTW, you can't push this to 9 since it's a p4. It looks like the Fix
> Version is already set to 10, so I assume that's where it is going.
>
> cheers,
>
> Chris
>
>
> On 10/24/16 6:12 AM, Thomas St?fe wrote:
>
>> Dear all,
>>
>> please check this tiny bug fix.
>>
>> Bug report:
>> https://bugs.openjdk.java.net/browse/JDK-8168542
>>
>> Webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8168542-os_reallo
>> c_size_0/webrev.00/webrev/
>>
>> In short, this fixes a corner case for os::realloc() which currently
>> returns NULL if input size is zero.
>>
>> But as we have coding which interprets a return value of NULL as OOM (See
>> ReallocateHeap()), this is not a good solution. It is also inconsistent
>> with how os::malloc() deals with the same situation and potentially with
>> the way the native C-Runtime deals with it (currently, in a debug build we
>> will return NULL in case of size=0 whereas in the release build we just
>> call the native ::realloc() and return whatever it returns.)
>>
>> Thank you,
>>
>> Thomas
>>
>
>
>
>

From chris.plummer at oracle.com  Wed Oct 26 07:00:20 2016
From: chris.plummer at oracle.com (Chris Plummer)
Date: Wed, 26 Oct 2016 00:00:20 -0700
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
Message-ID: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>

Hi Tomas,

See JDK-8156821. I'm curious as to how your changes will impact it, 
since David says you can't interrupt a thread blocked trying to acquire 
mutex. I suspect that means this enhancement won't help in this case, 
and presumably in general you are not fixing the issue of error 
reporting getting deadlocked, or maybe I'm misinterpreting what David 
said in JDK-8156821.

Otherwise overall your changes look good, but I have a few comments. 
Also, since this is an enhancement, it needs to wait for JDK 10.

I think your test will fail for product builds. You should add 
"@requires vm.debug == true". Also, java files use 4 char indentation, 
not 2 like we use in hotspot C/C++ code. Lastly, it should only have a 
2016 copyright.

A couple of files need the copyright updated to 2016.

Why do set_to_now() and get_timestamp() need to be atomic, and what are 
the consequences of cx8 not being supported?

1282         st->print_raw_cr(buffer);
1283         st->cr();

The old code had an additional st->cr() before the above lines. I assume 
you removed it intentionally.

Is there a reason why you decided to only allow one step to timeout. 
What if the cause of a timeout in a step also impacts other steps, or is 
that not common when we see timeouts?

It's not clear to me why you changed a couple of os::sleep() calls to 
os::naked_short_sleep(), and the rationale for the sleep periods. Can 
you please explain?

thanks,

Chris

On 10/12/16 9:55 PM, Thomas St?fe wrote:
> Dear all,
>
> please take a look at the following fix:
>
> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
> webrev:
> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html
>
> ---
>
> In short, this fix provides the ability to cancel hanging error reporting
> steps. This uses the same code paths secondary error handling uses during
> error reporting. With this patch, steps which take too long will be
> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like this:
>
> 4 [timeout occurred during error reporting in step "<stepname>"] after xxxx
> ms.
> 5
>
> and we now also get a finish message in the hs-err file if we hit the
> ErrorLogTimeout and error reporting will stop altogether:
>
> 6 ------ Timout during error reporting after xxx ms. ------
>
> (in addition to the "time expired, abort" message the WatcherThread writes
> to stderr)
>
> ---
>
> This is something which bugged us for a long time, because we rely heavily
> on the hs_err files for error analysis at customer sites, and there are a
> number of reasons why one step may hang and prevent the follow-up steps
> from running.
>
> It works like this:
>
> Before, when error reporting started, the WatcherThread was waiting for
> ErrorLogTimeout seconds, then would stop the VM.
>
> Now, the WatcherThread periodically pings error reporting, which checks if
> the last step did timeout. If it does, it sends a signal to the reporting
> thread, and the thread will continue with the next step. This follows the
> same path as secondary crash handling.
>
> Some implementation details:
>
> On Posix platforms, to interrupt the thread, I use pthread_kill. This means
> I must know the pthread id of the reporting thread, which I now store at
> the beginning of error reporting. We already store the reporting thread id
> in first_error_tid, but that I cannot use, because it gets set by
> os::current_thread_id(), which is not always the pthread id. Should we ever
> switch to only using pthread id for posix platforms, this coding can be
> simplified.
>
> On Windows, there is unfortunately no easy way to interrupt a
> non-cooperative thread. I would need a way to cause a SEH inside the target
> thread, which then would get handled by secondary error handling like on
> Posix platforms, but that is not easy. It is doable - one can suspend the
> thread, modify the thread context in a way that it will crash upon resume.
> But that felt a bit heavyweight for this problem. So on windows, timeout
> handling still works (after ErrorLogTimeout the VM gets shut down), but
> error reporting steps are not interruptable. If we feel this is important,
> this can be added later.
>
> Kind Regards, Thomas


From thomas.stuefe at gmail.com  Wed Oct 26 14:45:40 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Wed, 26 Oct 2016 16:45:40 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
Message-ID: <CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>

Hi Chris,

Thanks for the review!

New webrev:
http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/

Comments inline.

On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer <chris.plummer at oracle.com>
wrote:

> Hi Tomas,
>
> See JDK-8156821. I'm curious as to how your changes will impact it, since
> David says you can't interrupt a thread blocked trying to acquire mutex. I
> suspect that means this enhancement won't help in this case, and presumably
> in general you are not fixing the issue of error reporting getting
> deadlocked, or maybe I'm misinterpreting what David said in JDK-8156821.
>

Not sure what exactly David meant with "You can't "interrupt" a thread that
is blocked trying to acquire a mutex." Maybe he can elaborate :)

My guesses:

1) If he meant "you cannot interrupt a thread blocking in
pthread_mutex_lock()" - not true, you can and my patch works just fine in
this situation. Just tested again, to be sure. This covers crashes in
sections guarded by pthread_mutex, which then try to reaquire the lock in
the error handler.

2) If he meant "you cannot interrupt malloc if it is executing a system
call in the linux kernel" - that may be. I am not a linux kernel expert but
would have thought that syscalls may block if interrupts are disabled for
certain lengths by the syscall author. But in that case i would have
expected the process to hang too and to be not killable? Again, I am no
expert.


>
> Otherwise overall your changes look good, but I have a few comments. Also,
> since this is an enhancement, it needs to wait for JDK 10.
>
> I think your test will fail for product builds. You should add "@requires
> vm.debug == true". Also, java files use 4 char indentation, not 2 like we
> use in hotspot C/C++ code. Lastly, it should only have a 2016 copyright.
>
>
Thank you for the hints. Did fix all that. Note that I had disabled the
test for product builds in the code (!Platform.isDebugBuild()) but I added
the vm.debug tag as well as you suggested.


> A couple of files need the copyright updated to 2016.
>
> Why do set_to_now() and get_timestamp() need to be atomic, and what are
> the consequences of cx8 not being supported?
>
>
The error reporting thread sets the timestamp on each STEP start, and the
timestamp is read from another thread, the WatcherThread. Timestamp is
64bit. I wanted to make sure the 64bit value is written and read
atomically, especially on 32bit platforms.

But then, I had to check whether 64bit atomic stores/loads are even
supported by this platform (I actually did not find a 32bit platform
whithout 64bit atomics, but the comment in atomic.hpp is pretty insistent
and I did not want to risk regressions for other platforms).

Well, if no cx8 support was available, I pretty much just give up and read
and write timestamps directly. As I said, I am not sure if this code path
gets ever executed.

Maybe I was overthinking all this and just reading and writing the (C++
volatile) jlongs would have been enough, but I wanted to prevent sporadic
test errors because of incompletely read 64bit values.


> 1282         st->print_raw_cr(buffer);
> 1283         st->cr();
>
> The old code had an additional st->cr() before the above lines. I assume
> you removed it intentionally.
>
>
I hope I preserved the numbers of cr(). At least that was my intention:

1260       outputStream* const st = log.is_open() ? &log : &out;
1261       st->cr();

...

and then on every path, a cr (or print_raw_cr) at the end. Where do you see
the missing cr()?


> Is there a reason why you decided to only allow one step to timeout. What
> if the cause of a timeout in a step also impacts other steps, or is that
> not common when we see timeouts?
>
>
That is mostly guesswork. In our (SAP) code we allow for four steps (so
ErrorLogTimeout/4 as step timeout) and additionally allow for "steps known
to be long" where timeouts are disabled altogether. But we also have more
complicated error reporting steps, so when porting the patch to OpenJDK, I
felt the complexity was unneeded.

I think in general you will only have one misbehaving step, but you are
right, more than one step may timeout if e.g. the file system is slow. I'm
open for suggestions: the timeout value should be large enough not to be
hit for "normal slow steps" while still leave room enough for other steps
to finish. What do you think a reasonable timeout value would be?
ErrorLogTimeout/4?


> It's not clear to me why you changed a couple of os::sleep() calls to
> os::naked_short_sleep(), and the rationale for the sleep periods. Can you
> please explain?
>
>
Because os::sleep() does a lot of work under the hood and relies on a bit
of VM infrastructure. I think that is not a good idea in error situations
where potentially everything may be broken already. You want to step
lightly and really only do a naked system sleep. About the sleep periods,
os::naked_sleep has an inbuilt maximum value of 1000ms, which I have to
stay below to not hit the assert. I did use 999ms as the longest interval I
am allowed to sleep nakedly. And after the timeout hit and before the
WatcherThread calls os::abort, I again sleep 200ms to give the error
reporter thread time to write the "error log aborted due to timeout" into
the error log and to flush the error log. Those 200ms are just guesswork.


> thanks,
>
> Chris
>
>
Thanks for the review!

Kind Regards, Thomas


>
> On 10/12/16 9:55 PM, Thomas St?fe wrote:
>
>> Dear all,
>>
>> please take a look at the following fix:
>>
>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>> webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-
>> Error-Reporting/webrev.00/webrev/index.html
>>
>> ---
>>
>> In short, this fix provides the ability to cancel hanging error reporting
>> steps. This uses the same code paths secondary error handling uses during
>> error reporting. With this patch, steps which take too long will be
>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like
>> this:
>>
>> 4 [timeout occurred during error reporting in step "<stepname>"] after
>> xxxx
>> ms.
>> 5
>>
>> and we now also get a finish message in the hs-err file if we hit the
>> ErrorLogTimeout and error reporting will stop altogether:
>>
>> 6 ------ Timout during error reporting after xxx ms. ------
>>
>> (in addition to the "time expired, abort" message the WatcherThread writes
>> to stderr)
>>
>> ---
>>
>> This is something which bugged us for a long time, because we rely heavily
>> on the hs_err files for error analysis at customer sites, and there are a
>> number of reasons why one step may hang and prevent the follow-up steps
>> from running.
>>
>> It works like this:
>>
>> Before, when error reporting started, the WatcherThread was waiting for
>> ErrorLogTimeout seconds, then would stop the VM.
>>
>> Now, the WatcherThread periodically pings error reporting, which checks if
>> the last step did timeout. If it does, it sends a signal to the reporting
>> thread, and the thread will continue with the next step. This follows the
>> same path as secondary crash handling.
>>
>> Some implementation details:
>>
>> On Posix platforms, to interrupt the thread, I use pthread_kill. This
>> means
>> I must know the pthread id of the reporting thread, which I now store at
>> the beginning of error reporting. We already store the reporting thread id
>> in first_error_tid, but that I cannot use, because it gets set by
>> os::current_thread_id(), which is not always the pthread id. Should we
>> ever
>> switch to only using pthread id for posix platforms, this coding can be
>> simplified.
>>
>> On Windows, there is unfortunately no easy way to interrupt a
>> non-cooperative thread. I would need a way to cause a SEH inside the
>> target
>> thread, which then would get handled by secondary error handling like on
>> Posix platforms, but that is not easy. It is doable - one can suspend the
>> thread, modify the thread context in a way that it will crash upon resume.
>> But that felt a bit heavyweight for this problem. So on windows, timeout
>> handling still works (after ErrorLogTimeout the VM gets shut down), but
>> error reporting steps are not interruptable. If we feel this is important,
>> this can be added later.
>>
>> Kind Regards, Thomas
>>
>
>
>
>

From coleen.phillimore at oracle.com  Wed Oct 26 19:32:13 2016
From: coleen.phillimore at oracle.com (Coleen Phillimore)
Date: Wed, 26 Oct 2016 15:32:13 -0400
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
	8154580 and 8154867
In-Reply-To: <f43fe5c7b0f442719f27310e9e7cf710@DEWDFE13DE14.global.corp.sap>
References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
	<7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com>
	<f43fe5c7b0f442719f27310e9e7cf710@DEWDFE13DE14.global.corp.sap>
Message-ID: <13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com>


On 10/20/16 4:58 AM, Doerr, Martin wrote:
> Hi Coleen,
>
> thank you very much for reviewing my PPC change.
>
> We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important.
> A large amount of less optimized changes will make it slower over time.
> That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well.
> Maybe we should improve them in 10.

I don't know. I though load_mirror() made for a nice API.  Does the 
extra indirect matter?  I filed RFE 
https://bugs.openjdk.java.net/browse/JDK-8168795 so we can investigate 
further in 10.

This is approved and I think reviewed so you can check it in anytime.  I 
put a due date of Friday on your bug.   Feel free to change it if that's 
not good.

Thanks,
Coleen

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Coleen Phillimore
> Sent: Dienstag, 18. Oktober 2016 23:56
> To: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867
>
>
> This seems good.   I think it's a shame to change load_mirror() to
> load_mirror_from_const_method() though because there's load_mirror()
> with the same parameters on all the other platforms and it makes
> platform development a little easier.   But that's up to you to because
> you can generate shorter sequences.
>
> Coleen
>
>
> On 10/17/16 12:38 PM, Doerr, Martin wrote:
>> Hi,
>>
>> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9:
>>
>> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed).
>>
>> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev.
>>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webrev.00/
>>
>> Please review.
>>
>> Thanks and best regards,
>> Martin
>>


From david.holmes at oracle.com  Wed Oct 26 19:33:11 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 27 Oct 2016 05:33:11 +1000
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
	<91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>
	<ae0d0c66-ae53-c5e6-f0e8-a7c252cb0c4f@oracle.com>
	<4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com>
Message-ID: <151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com>

Harold/Rachel,

Thanks for clarifying things - I had misconstrued the actual problem.

In summary we should not use a ResourceObj in the scope of a nested 
ResourceMark wrt. the allocation of the ResourceObj.

The fix is good and no further changes are needed.

Thanks,
David

On 26/10/2016 5:58 AM, harold seigel wrote:
> Hi Rachel,
>
> I think that the ResourceMarks that you removed were the correct ones.
> My understanding is that based on this assert it looks like all calls to
> stringStream::write() on a particular Stream need to be done using the
> same ResourceMark with which the Stream was created.  Otherwise, this
> assert in stringStream::write() will trigger:
>
>       assert(rm == NULL || Thread::current()->current_resource_mark() ==
> rm,
>              "StringStream is re-allocated with a different
> ResourceMark..." ...)
>
> These two ResourceMarks needed to be removed because their outputStream
> was constructed with a caller's ResourceMark.  If they specified their
> own ResourceMark then their calls to print(), which eventually calls
> stringStream::write(), would cause the assert to trigger.
>
>    static void print_slot(outputStream* str, Symbol* name, Symbol*
>    signature) {
>       ResourceMark rm;
>       str->print("%s%s", name->as_C_string(), signature->as_C_string());
>    }
>
>    static void print_method(outputStream* str, Method* mo, bool
>    with_class=true) {
>       ResourceMark rm;
>       if (with_class) {
>         str->print("%s.", mo->klass_name()->as_C_string());
>       }
>       print_slot(str, mo->name(), mo->signature());
>    }
>
>
>
> I think that having a ResourceMark in code like this is okay because
> debug_stream() probably constructs a new Stream object.
>
>       if (log_is_enabled(Debug, defaultmethods)) {
>         log_debug(defaultmethods)("Slots that need filling:");
>         ResourceMark rm;
>         outputStream* logstream = Log(defaultmethods)::debug_stream();
>         streamIndentor si(logstream);
>         for (int i = 0; i < slots->length(); ++i) {
>           logstream->indent();
>           slots->at(i)->print_on(logstream);
>           logstream->cr();
>         }
>       }
>
> Harold
>
>
> On 10/25/2016 11:14 AM, Rachel Protacio wrote:
>> Hi,
>>
>> Thanks for taking a look. I think in this particular case the issue
>> was that the nested ResourceMark's were around code that affected an
>> existing outputStream. So in fact the nesting per se isn't what was
>> wrong, the issue was adding a ResourceMark in the middle of a resource
>> that still needed the content after it went out of scope of the RM. So
>> line 796 is good because its functionality is self-contained, and the
>> ones I deleted were bad because they interfered with the functionality
>> of the caller code. (Can someone corroborate this assessment?)
>>
>> However, as those functions still need RMs in general somewhere up the
>> line, I can add a comment of the form
>>
>>    // The caller of print_slot() (or one of its callers)
>>    // must use a ResourceMark in order to correctly free the result.
>>
>> for print_slot(), print_method(), and print_on() at line 590. Does
>> that sound good?
>> Rachel
>>
>> On 10/25/2016 1:41 AM, David Holmes wrote:
>>> Hi Rachel,
>>>
>>> On 25/10/2016 6:17 AM, Rachel Protacio wrote:
>>>> Hi,
>>>>
>>>> Please review this small fix, which removes two nested ResourceMark's
>>>> that were causing problems with defaultmethods logging.
>>>>
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
>>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>>>>
>>>> Tested with JPRT.
>>>
>>> It is tricky to determine who has responsibility for positioning the
>>> ResourceMarks. Looking at this call chain it initially appeared to me
>>> that we now had a missing RM for the code at line #80:
>>>
>>> 813       slot->print_on(logstream);
>>> =>
>>> 590   void print_on(outputStream* str) const {
>>> 591     print_slot(str, name(), signature());
>>> 592   }
>>> =>
>>> 79 static void print_slot(outputStream* str, Symbol* name, Symbol*
>>> signature) {
>>> 80   str->print("%s%s", name->as_C_string(), signature->as_C_string());
>>> 81 }
>>>
>>> but we actually have a RM higher up at:
>>>
>>> 787   ResourceMark rm(THREAD);
>>>
>>> so that is good, but then we also have a nested ResourceMark further
>>> down:
>>>
>>>  795   if (log_is_enabled(Debug, defaultmethods)) {
>>>  796     ResourceMark rm;
>>>
>>> I must admit I'm unclear if ResourceMarks should never be nested, or
>>> should be nested "carefully" - and if the latter exactly what that
>>> means and how to recognize it.
>>>
>>> Thanks,
>>> David
>>> -----
>>>
>>>> Thanks!
>>>> Rachel
>>
>

From rachel.protacio at oracle.com  Wed Oct 26 19:59:57 2016
From: rachel.protacio at oracle.com (Rachel Protacio)
Date: Wed, 26 Oct 2016 15:59:57 -0400
Subject: RFR (XS): 8167995: -Xlog:defaultmethods=debug: lengthy method
	descriptor triggers "StringStream is re-allocated with a different
	ResourceMark"
In-Reply-To: <151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com>
References: <b7b29cec-be78-3ded-6822-5010b8645e4c@oracle.com>
	<91bb7a1d-6aad-ab4d-c249-28dfe371531f@oracle.com>
	<ae0d0c66-ae53-c5e6-f0e8-a7c252cb0c4f@oracle.com>
	<4c9558d8-1484-256d-c0ba-517a9a5a9abf@oracle.com>
	<151bb79a-8c01-c08e-e37d-094b47182cb7@oracle.com>
Message-ID: <0ca26d7b-50f3-0ba1-0edf-24b3dfb79c1d@oracle.com>

Great! Thank you, David and Harold for the reviews. I'll commit.

Rachel


On 10/26/2016 3:33 PM, David Holmes wrote:
> Harold/Rachel,
>
> Thanks for clarifying things - I had misconstrued the actual problem.
>
> In summary we should not use a ResourceObj in the scope of a nested 
> ResourceMark wrt. the allocation of the ResourceObj.
>
> The fix is good and no further changes are needed.
>
> Thanks,
> David
>
> On 26/10/2016 5:58 AM, harold seigel wrote:
>> Hi Rachel,
>>
>> I think that the ResourceMarks that you removed were the correct ones.
>> My understanding is that based on this assert it looks like all calls to
>> stringStream::write() on a particular Stream need to be done using the
>> same ResourceMark with which the Stream was created.  Otherwise, this
>> assert in stringStream::write() will trigger:
>>
>>       assert(rm == NULL || Thread::current()->current_resource_mark() ==
>> rm,
>>              "StringStream is re-allocated with a different
>> ResourceMark..." ...)
>>
>> These two ResourceMarks needed to be removed because their outputStream
>> was constructed with a caller's ResourceMark.  If they specified their
>> own ResourceMark then their calls to print(), which eventually calls
>> stringStream::write(), would cause the assert to trigger.
>>
>>    static void print_slot(outputStream* str, Symbol* name, Symbol*
>>    signature) {
>>       ResourceMark rm;
>>       str->print("%s%s", name->as_C_string(), signature->as_C_string());
>>    }
>>
>>    static void print_method(outputStream* str, Method* mo, bool
>>    with_class=true) {
>>       ResourceMark rm;
>>       if (with_class) {
>>         str->print("%s.", mo->klass_name()->as_C_string());
>>       }
>>       print_slot(str, mo->name(), mo->signature());
>>    }
>>
>>
>>
>> I think that having a ResourceMark in code like this is okay because
>> debug_stream() probably constructs a new Stream object.
>>
>>       if (log_is_enabled(Debug, defaultmethods)) {
>>         log_debug(defaultmethods)("Slots that need filling:");
>>         ResourceMark rm;
>>         outputStream* logstream = Log(defaultmethods)::debug_stream();
>>         streamIndentor si(logstream);
>>         for (int i = 0; i < slots->length(); ++i) {
>>           logstream->indent();
>>           slots->at(i)->print_on(logstream);
>>           logstream->cr();
>>         }
>>       }
>>
>> Harold
>>
>>
>> On 10/25/2016 11:14 AM, Rachel Protacio wrote:
>>> Hi,
>>>
>>> Thanks for taking a look. I think in this particular case the issue
>>> was that the nested ResourceMark's were around code that affected an
>>> existing outputStream. So in fact the nesting per se isn't what was
>>> wrong, the issue was adding a ResourceMark in the middle of a resource
>>> that still needed the content after it went out of scope of the RM. So
>>> line 796 is good because its functionality is self-contained, and the
>>> ones I deleted were bad because they interfered with the functionality
>>> of the caller code. (Can someone corroborate this assessment?)
>>>
>>> However, as those functions still need RMs in general somewhere up the
>>> line, I can add a comment of the form
>>>
>>>    // The caller of print_slot() (or one of its callers)
>>>    // must use a ResourceMark in order to correctly free the result.
>>>
>>> for print_slot(), print_method(), and print_on() at line 590. Does
>>> that sound good?
>>> Rachel
>>>
>>> On 10/25/2016 1:41 AM, David Holmes wrote:
>>>> Hi Rachel,
>>>>
>>>> On 25/10/2016 6:17 AM, Rachel Protacio wrote:
>>>>> Hi,
>>>>>
>>>>> Please review this small fix, which removes two nested ResourceMark's
>>>>> that were causing problems with defaultmethods logging.
>>>>>
>>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8167995
>>>>> Open webrev: http://cr.openjdk.java.net/~rprotacio/8167995.00/
>>>>>
>>>>> Tested with JPRT.
>>>>
>>>> It is tricky to determine who has responsibility for positioning the
>>>> ResourceMarks. Looking at this call chain it initially appeared to me
>>>> that we now had a missing RM for the code at line #80:
>>>>
>>>> 813       slot->print_on(logstream);
>>>> =>
>>>> 590   void print_on(outputStream* str) const {
>>>> 591     print_slot(str, name(), signature());
>>>> 592   }
>>>> =>
>>>> 79 static void print_slot(outputStream* str, Symbol* name, Symbol*
>>>> signature) {
>>>> 80   str->print("%s%s", name->as_C_string(), 
>>>> signature->as_C_string());
>>>> 81 }
>>>>
>>>> but we actually have a RM higher up at:
>>>>
>>>> 787   ResourceMark rm(THREAD);
>>>>
>>>> so that is good, but then we also have a nested ResourceMark further
>>>> down:
>>>>
>>>>  795   if (log_is_enabled(Debug, defaultmethods)) {
>>>>  796     ResourceMark rm;
>>>>
>>>> I must admit I'm unclear if ResourceMarks should never be nested, or
>>>> should be nested "carefully" - and if the latter exactly what that
>>>> means and how to recognize it.
>>>>
>>>> Thanks,
>>>> David
>>>> -----
>>>>
>>>>> Thanks!
>>>>> Rachel
>>>
>>


From paul.sandoz at oracle.com  Wed Oct 26 23:08:11 2016
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Wed, 26 Oct 2016 16:08:11 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
Message-ID: <A19C4F67-D5EE-4F00-BAB4-DFB338F2540F@oracle.com>

Hi,

Looks ok, just some doc suggestions below.

Paul.

ClassLoader

 366      * @param  name
 367      *         Class loader name; can be {@code null}

StackTraceElement

 100      * @param classLoaderName the class loader name if the class loader of
 101      *        the class containing the execution point represented by
 102      *        the stack trace is named; can be {@code null}

URLClassLoader

 214      * @param  name class loader name; can be {@code null}

 245      * @param  name class loader name; can be {@code null}

SecureClassLoader

 118      * @param name class loader name; can be {@code null}.

"; otherwise {@code null} if the class loader is not named.?

?


StackTraceElement

 206      * @return the name of the class loader of the class containing the execution
 207      *         point represented by this stack trace element; {@code null}
 208      *         if the class loader name is not available.

?{@code null} if the class loader is not named.?


 271      *   built-in class loader</a>, or it does not have a name, then

?? or is not named?"


> On 25 Oct 2016, at 16:10, Mandy Chung <mandy.chung at oracle.com> wrote:
> 
> Webrev at:
>   http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/
> 
> Specdiff:
>   http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html
> 
> This is a long-standing RFE for adding support for class
> loader names.  It's #ClassLoaderNames on JSR 376 issue
> list where the proposal [1] has been implemented in jake
> for some time.  This patch brings this change to jdk9.
> 
> A short summary:
> - New constructors are added in ClassLoader, SecureClassLoader
>  and URLClassLoader to specify the class loader name.
> 
> - New ClassLoader::getName and StackTraceElement::getClassLoaderName
>  method
> 
> - StackTraceElement::toString is updated to include the name
>  of the class loader and module of that frame in this format:
>     <loader>/<module>/<fully-qualified-name>(<src>:<line>)
> 
> The detail is in StackTraceElement::buildLoaderModuleClassName
> that compress the output string for cases when the loader
> has no name or the module is unnamed module.  Another thing
> to mention is that VM sets the Class object when filling in
> a stack trace of a Throwable object.  Then the library will
> build a String from the Class object for serialization purpose.
> 
> Mandy
> [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html


From mandy.chung at oracle.com  Wed Oct 26 23:22:12 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 26 Oct 2016 16:22:12 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <A19C4F67-D5EE-4F00-BAB4-DFB338F2540F@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<A19C4F67-D5EE-4F00-BAB4-DFB338F2540F@oracle.com>
Message-ID: <882F2B82-3565-4E67-BDFC-6C1F80F2FF2F@oracle.com>


> On Oct 26, 2016, at 4:08 PM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
> :
> 
> "; otherwise {@code null} if the class loader is not named.?
> 
> :
> ?{@code null} if the class loader is not named.?
> 
> :
> ?? or is not named?"

Yup that reads better. I will update them.

Thanks.
Mandy

From david.holmes at oracle.com  Thu Oct 27 00:40:03 2016
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 27 Oct 2016 10:40:03 +1000
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
Message-ID: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>

On 27/10/2016 12:45 AM, Thomas St?fe wrote:
> Hi Chris,
>
> Thanks for the review!
>
> New
> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/

Have not looked at this yet.

> Comments inline.
>
> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer <chris.plummer at oracle.com
> <mailto:chris.plummer at oracle.com>> wrote:
>
>     Hi Tomas,
>
>     See JDK-8156821. I'm curious as to how your changes will impact it,
>     since David says you can't interrupt a thread blocked trying to
>     acquire mutex. I suspect that means this enhancement won't help in
>     this case, and presumably in general you are not fixing the issue of
>     error reporting getting deadlocked, or maybe I'm misinterpreting
>     what David said in JDK-8156821.

That should be 8156823

>
>
> Not sure what exactly David meant with "You can't "interrupt" a thread
> that is blocked trying to acquire a mutex." Maybe he can elaborate :)
>
> My guesses:
>
> 1) If he meant "you cannot interrupt a thread blocking in
> pthread_mutex_lock()" - not true, you can and my patch works just fine
> in this situation. Just tested again, to be sure. This covers crashes in
> sections guarded by pthread_mutex, which then try to reaquire the lock
> in the error handler.

There is no specified, portable way to get a thread blocked acquiring a 
mutex to stop waiting for the mutex. That is what I meant. 
pthread_mutex_lock is not a cancellation point, nor will it return EINTR 
in response to a signal.

However, if a signal is received by the thread while waiting then POSIX 
semantics indicate that the signal handler will run and then return the 
thread to the waiting state. In our case the crash handler does not 
return so we are into undefined territory there - but our crash handler 
is already not a well-defined signal handler as it is not restricted to 
async-signal-safe functions, so we already run a risk when executing it.

I had not considered this aspect in relation to 8156823, so the proposed 
approach here would also attempt to address that issue.

> 2) If he meant "you cannot interrupt malloc if it is executing a system
> call in the linux kernel" - that may be. I am not a linux kernel expert
> but would have thought that syscalls may block if interrupts are
> disabled for certain lengths by the syscall author. But in that case i
> would have expected the process to hang too and to be not killable?
> Again, I am no expert.

Note "interrupt" here is a logical concept not related to hardware level 
interrupts. I don't know at what point going into malloc you will no 
longer get signal handlers run - malloc doesn't use pthread level 
mutexes, but direct futuxes, so the same signal responsiveness may not 
be present.


Thanks,
David
-----

>
>
>     Otherwise overall your changes look good, but I have a few comments.
>     Also, since this is an enhancement, it needs to wait for JDK 10.
>
>     I think your test will fail for product builds. You should add
>     "@requires vm.debug == true". Also, java files use 4 char
>     indentation, not 2 like we use in hotspot C/C++ code. Lastly, it
>     should only have a 2016 copyright.
>
>
> Thank you for the hints. Did fix all that. Note that I had disabled the
> test for product builds in the code (!Platform.isDebugBuild()) but I
> added the vm.debug tag as well as you suggested.
>
>
>     A couple of files need the copyright updated to 2016.
>
>     Why do set_to_now() and get_timestamp() need to be atomic, and what
>     are the consequences of cx8 not being supported?
>
>
> The error reporting thread sets the timestamp on each STEP start, and
> the timestamp is read from another thread, the WatcherThread. Timestamp
> is 64bit. I wanted to make sure the 64bit value is written and read
> atomically, especially on 32bit platforms.
>
> But then, I had to check whether 64bit atomic stores/loads are even
> supported by this platform (I actually did not find a 32bit platform
> whithout 64bit atomics, but the comment in atomic.hpp is pretty
> insistent and I did not want to risk regressions for other platforms).
>
> Well, if no cx8 support was available, I pretty much just give up and
> read and write timestamps directly. As I said, I am not sure if this
> code path gets ever executed.
>
> Maybe I was overthinking all this and just reading and writing the (C++
> volatile) jlongs would have been enough, but I wanted to prevent
> sporadic test errors because of incompletely read 64bit values.
>
>
>     1282         st->print_raw_cr(buffer);
>     1283         st->cr();
>
>     The old code had an additional st->cr() before the above lines. I
>     assume you removed it intentionally.
>
>
> I hope I preserved the numbers of cr(). At least that was my intention:
>
> 1260       outputStream* const st = log.is_open() ? &log : &out;
> 1261       st->cr();
>
> ...
>
> and then on every path, a cr (or print_raw_cr) at the end. Where do you
> see the missing cr()?
>
>
>
>     Is there a reason why you decided to only allow one step to timeout.
>     What if the cause of a timeout in a step also impacts other steps,
>     or is that not common when we see timeouts?
>
>
> That is mostly guesswork. In our (SAP) code we allow for four steps (so
> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps
> known to be long" where timeouts are disabled altogether. But we also
> have more complicated error reporting steps, so when porting the patch
> to OpenJDK, I felt the complexity was unneeded.
>
> I think in general you will only have one misbehaving step, but you are
> right, more than one step may timeout if e.g. the file system is slow.
> I'm open for suggestions: the timeout value should be large enough not
> to be hit for "normal slow steps" while still leave room enough for
> other steps to finish. What do you think a reasonable timeout value
> would be? ErrorLogTimeout/4?
>
>
>
>     It's not clear to me why you changed a couple of os::sleep() calls
>     to os::naked_short_sleep(), and the rationale for the sleep periods.
>     Can you please explain?
>
>
> Because os::sleep() does a lot of work under the hood and relies on a
> bit of VM infrastructure. I think that is not a good idea in error
> situations where potentially everything may be broken already. You want
> to step lightly and really only do a naked system sleep. About the sleep
> periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I
> have to stay below to not hit the assert. I did use 999ms as the longest
> interval I am allowed to sleep nakedly. And after the timeout hit and
> before the WatcherThread calls os::abort, I again sleep 200ms to give
> the error reporter thread time to write the "error log aborted due to
> timeout" into the error log and to flush the error log. Those 200ms are
> just guesswork.
>
>
>     thanks,
>
>     Chris
>
>
> Thanks for the review!
>
> Kind Regards, Thomas
>
>
>
>
>     On 10/12/16 9:55 PM, Thomas St?fe wrote:
>
>         Dear all,
>
>         please take a look at the following fix:
>
>         Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>         <https://bugs.openjdk.java.net/browse/JDK-8166944>
>         webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html>
>
>         ---
>
>         In short, this fix provides the ability to cancel hanging error
>         reporting
>         steps. This uses the same code paths secondary error handling
>         uses during
>         error reporting. With this patch, steps which take too long will be
>         canceled after 1/2 ErrorLogTimeout. In the log file, it will
>         look like this:
>
>         4 [timeout occurred during error reporting in step "<stepname>"]
>         after xxxx
>         ms.
>         5
>
>         and we now also get a finish message in the hs-err file if we
>         hit the
>         ErrorLogTimeout and error reporting will stop altogether:
>
>         6 ------ Timout during error reporting after xxx ms. ------
>
>         (in addition to the "time expired, abort" message the
>         WatcherThread writes
>         to stderr)
>
>         ---
>
>         This is something which bugged us for a long time, because we
>         rely heavily
>         on the hs_err files for error analysis at customer sites, and
>         there are a
>         number of reasons why one step may hang and prevent the
>         follow-up steps
>         from running.
>
>         It works like this:
>
>         Before, when error reporting started, the WatcherThread was
>         waiting for
>         ErrorLogTimeout seconds, then would stop the VM.
>
>         Now, the WatcherThread periodically pings error reporting, which
>         checks if
>         the last step did timeout. If it does, it sends a signal to the
>         reporting
>         thread, and the thread will continue with the next step. This
>         follows the
>         same path as secondary crash handling.
>
>         Some implementation details:
>
>         On Posix platforms, to interrupt the thread, I use pthread_kill.
>         This means
>         I must know the pthread id of the reporting thread, which I now
>         store at
>         the beginning of error reporting. We already store the reporting
>         thread id
>         in first_error_tid, but that I cannot use, because it gets set by
>         os::current_thread_id(), which is not always the pthread id.
>         Should we ever
>         switch to only using pthread id for posix platforms, this coding
>         can be
>         simplified.
>
>         On Windows, there is unfortunately no easy way to interrupt a
>         non-cooperative thread. I would need a way to cause a SEH inside
>         the target
>         thread, which then would get handled by secondary error handling
>         like on
>         Posix platforms, but that is not easy. It is doable - one can
>         suspend the
>         thread, modify the thread context in a way that it will crash
>         upon resume.
>         But that felt a bit heavyweight for this problem. So on windows,
>         timeout
>         handling still works (after ErrorLogTimeout the VM gets shut
>         down), but
>         error reporting steps are not interruptable. If we feel this is
>         important,
>         this can be added later.
>
>         Kind Regards, Thomas
>
>
>
>
>

From chris.plummer at oracle.com  Thu Oct 27 03:22:35 2016
From: chris.plummer at oracle.com (Chris Plummer)
Date: Wed, 26 Oct 2016 20:22:35 -0700
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
	<8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>
Message-ID: <4e390c00-1a96-3b8e-5f67-efce95a29021@oracle.com>

On 10/26/16 5:40 PM, David Holmes wrote:
> On 27/10/2016 12:45 AM, Thomas St?fe wrote:
>> Hi Chris,
>>
>> Thanks for the review!
>>
>> New
>> webrev: 
>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/
>
> Have not looked at this yet.
>
>> Comments inline.
>>
>> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer <chris.plummer at oracle.com
>> <mailto:chris.plummer at oracle.com>> wrote:
>>
>>     Hi Tomas,
>>
>>     See JDK-8156821. I'm curious as to how your changes will impact it,
>>     since David says you can't interrupt a thread blocked trying to
>>     acquire mutex. I suspect that means this enhancement won't help in
>>     this case, and presumably in general you are not fixing the issue of
>>     error reporting getting deadlocked, or maybe I'm misinterpreting
>>     what David said in JDK-8156821.
>
> That should be 8156823
>
>>
>>
>> Not sure what exactly David meant with "You can't "interrupt" a thread
>> that is blocked trying to acquire a mutex." Maybe he can elaborate :)
>>
>> My guesses:
>>
>> 1) If he meant "you cannot interrupt a thread blocking in
>> pthread_mutex_lock()" - not true, you can and my patch works just fine
>> in this situation. Just tested again, to be sure. This covers crashes in
>> sections guarded by pthread_mutex, which then try to reaquire the lock
>> in the error handler.
>
> There is no specified, portable way to get a thread blocked acquiring 
> a mutex to stop waiting for the mutex. That is what I meant. 
> pthread_mutex_lock is not a cancellation point, nor will it return 
> EINTR in response to a signal.
>
> However, if a signal is received by the thread while waiting then 
> POSIX semantics indicate that the signal handler will run and then 
> return the thread to the waiting state. In our case the crash handler 
> does not return so we are into undefined territory there - but our 
> crash handler is already not a well-defined signal handler as it is 
> not restricted to async-signal-safe functions, so we already run a 
> risk when executing it.
>
> I had not considered this aspect in relation to 8156823, so the 
> proposed approach here would also attempt to address that issue.
>
>> 2) If he meant "you cannot interrupt malloc if it is executing a system
>> call in the linux kernel" - that may be. I am not a linux kernel expert
>> but would have thought that syscalls may block if interrupts are
>> disabled for certain lengths by the syscall author. But in that case i
>> would have expected the process to hang too and to be not killable?
>> Again, I am no expert.
>
> Note "interrupt" here is a logical concept not related to hardware 
> level interrupts. I don't know at what point going into malloc you 
> will no longer get signal handlers run - malloc doesn't use pthread 
> level mutexes, but direct futuxes, so the same signal responsiveness 
> may not be present.
It probably would not be all that hard to create the malloc crash from 
8156823 and then see how Thomas' changes impact how VMError handles it. 
Just find an appropriate place in the VM to malloc a chunk of memory, 
step all over the bytes before and after it, and then call free.

Chris
>
>
> Thanks,
> David
> -----
>
>>
>>
>>     Otherwise overall your changes look good, but I have a few comments.
>>     Also, since this is an enhancement, it needs to wait for JDK 10.
>>
>>     I think your test will fail for product builds. You should add
>>     "@requires vm.debug == true". Also, java files use 4 char
>>     indentation, not 2 like we use in hotspot C/C++ code. Lastly, it
>>     should only have a 2016 copyright.
>>
>>
>> Thank you for the hints. Did fix all that. Note that I had disabled the
>> test for product builds in the code (!Platform.isDebugBuild()) but I
>> added the vm.debug tag as well as you suggested.
>>
>>
>>     A couple of files need the copyright updated to 2016.
>>
>>     Why do set_to_now() and get_timestamp() need to be atomic, and what
>>     are the consequences of cx8 not being supported?
>>
>>
>> The error reporting thread sets the timestamp on each STEP start, and
>> the timestamp is read from another thread, the WatcherThread. Timestamp
>> is 64bit. I wanted to make sure the 64bit value is written and read
>> atomically, especially on 32bit platforms.
>>
>> But then, I had to check whether 64bit atomic stores/loads are even
>> supported by this platform (I actually did not find a 32bit platform
>> whithout 64bit atomics, but the comment in atomic.hpp is pretty
>> insistent and I did not want to risk regressions for other platforms).
>>
>> Well, if no cx8 support was available, I pretty much just give up and
>> read and write timestamps directly. As I said, I am not sure if this
>> code path gets ever executed.
>>
>> Maybe I was overthinking all this and just reading and writing the (C++
>> volatile) jlongs would have been enough, but I wanted to prevent
>> sporadic test errors because of incompletely read 64bit values.
>>
>>
>>     1282         st->print_raw_cr(buffer);
>>     1283         st->cr();
>>
>>     The old code had an additional st->cr() before the above lines. I
>>     assume you removed it intentionally.
>>
>>
>> I hope I preserved the numbers of cr(). At least that was my intention:
>>
>> 1260       outputStream* const st = log.is_open() ? &log : &out;
>> 1261       st->cr();
>>
>> ...
>>
>> and then on every path, a cr (or print_raw_cr) at the end. Where do you
>> see the missing cr()?
>>
>>
>>
>>     Is there a reason why you decided to only allow one step to timeout.
>>     What if the cause of a timeout in a step also impacts other steps,
>>     or is that not common when we see timeouts?
>>
>>
>> That is mostly guesswork. In our (SAP) code we allow for four steps (so
>> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps
>> known to be long" where timeouts are disabled altogether. But we also
>> have more complicated error reporting steps, so when porting the patch
>> to OpenJDK, I felt the complexity was unneeded.
>>
>> I think in general you will only have one misbehaving step, but you are
>> right, more than one step may timeout if e.g. the file system is slow.
>> I'm open for suggestions: the timeout value should be large enough not
>> to be hit for "normal slow steps" while still leave room enough for
>> other steps to finish. What do you think a reasonable timeout value
>> would be? ErrorLogTimeout/4?
>>
>>
>>
>>     It's not clear to me why you changed a couple of os::sleep() calls
>>     to os::naked_short_sleep(), and the rationale for the sleep periods.
>>     Can you please explain?
>>
>>
>> Because os::sleep() does a lot of work under the hood and relies on a
>> bit of VM infrastructure. I think that is not a good idea in error
>> situations where potentially everything may be broken already. You want
>> to step lightly and really only do a naked system sleep. About the sleep
>> periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I
>> have to stay below to not hit the assert. I did use 999ms as the longest
>> interval I am allowed to sleep nakedly. And after the timeout hit and
>> before the WatcherThread calls os::abort, I again sleep 200ms to give
>> the error reporter thread time to write the "error log aborted due to
>> timeout" into the error log and to flush the error log. Those 200ms are
>> just guesswork.
>>
>>
>>     thanks,
>>
>>     Chris
>>
>>
>> Thanks for the review!
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>>     On 10/12/16 9:55 PM, Thomas St?fe wrote:
>>
>>         Dear all,
>>
>>         please take a look at the following fix:
>>
>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>>         <https://bugs.openjdk.java.net/browse/JDK-8166944>
>>         webrev:
>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html
>> <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html>
>>
>>         ---
>>
>>         In short, this fix provides the ability to cancel hanging error
>>         reporting
>>         steps. This uses the same code paths secondary error handling
>>         uses during
>>         error reporting. With this patch, steps which take too long 
>> will be
>>         canceled after 1/2 ErrorLogTimeout. In the log file, it will
>>         look like this:
>>
>>         4 [timeout occurred during error reporting in step "<stepname>"]
>>         after xxxx
>>         ms.
>>         5
>>
>>         and we now also get a finish message in the hs-err file if we
>>         hit the
>>         ErrorLogTimeout and error reporting will stop altogether:
>>
>>         6 ------ Timout during error reporting after xxx ms. ------
>>
>>         (in addition to the "time expired, abort" message the
>>         WatcherThread writes
>>         to stderr)
>>
>>         ---
>>
>>         This is something which bugged us for a long time, because we
>>         rely heavily
>>         on the hs_err files for error analysis at customer sites, and
>>         there are a
>>         number of reasons why one step may hang and prevent the
>>         follow-up steps
>>         from running.
>>
>>         It works like this:
>>
>>         Before, when error reporting started, the WatcherThread was
>>         waiting for
>>         ErrorLogTimeout seconds, then would stop the VM.
>>
>>         Now, the WatcherThread periodically pings error reporting, which
>>         checks if
>>         the last step did timeout. If it does, it sends a signal to the
>>         reporting
>>         thread, and the thread will continue with the next step. This
>>         follows the
>>         same path as secondary crash handling.
>>
>>         Some implementation details:
>>
>>         On Posix platforms, to interrupt the thread, I use pthread_kill.
>>         This means
>>         I must know the pthread id of the reporting thread, which I now
>>         store at
>>         the beginning of error reporting. We already store the reporting
>>         thread id
>>         in first_error_tid, but that I cannot use, because it gets 
>> set by
>>         os::current_thread_id(), which is not always the pthread id.
>>         Should we ever
>>         switch to only using pthread id for posix platforms, this coding
>>         can be
>>         simplified.
>>
>>         On Windows, there is unfortunately no easy way to interrupt a
>>         non-cooperative thread. I would need a way to cause a SEH inside
>>         the target
>>         thread, which then would get handled by secondary error handling
>>         like on
>>         Posix platforms, but that is not easy. It is doable - one can
>>         suspend the
>>         thread, modify the thread context in a way that it will crash
>>         upon resume.
>>         But that felt a bit heavyweight for this problem. So on windows,
>>         timeout
>>         handling still works (after ErrorLogTimeout the VM gets shut
>>         down), but
>>         error reporting steps are not interruptable. If we feel this is
>>         important,
>>         this can be added later.
>>
>>         Kind Regards, Thomas
>>
>>
>>
>>
>>


From thomas.stuefe at gmail.com  Thu Oct 27 06:09:23 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 27 Oct 2016 08:09:23 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
	<113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com>
Message-ID: <CAA-vtUyc2ConhWcs1cp+qWFr5JdNc3OWT=jsUUduxQdN=zNBQw@mail.gmail.com>

Hi Chris,

On Wed, Oct 26, 2016 at 9:27 PM, Chris Plummer <chris.plummer at oracle.com>
wrote:

> Hi Thomas,
>
> On 10/26/16 7:45 AM, Thomas St?fe wrote:
>
> Hi Chris,
>
> Thanks for the review!
>
> New webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8166944-Hanging-Error-Reporting/webrev.01/webrev/
>
> Comments inline.
>
> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer <chris.plummer at oracle.com>
> wrote:
>
>> Hi Tomas,
>>
>> See JDK-8156821. I'm curious as to how your changes will impact it, since
>> David says you can't interrupt a thread blocked trying to acquire mutex. I
>> suspect that means this enhancement won't help in this case, and presumably
>> in general you are not fixing the issue of error reporting getting
>> deadlocked, or maybe I'm misinterpreting what David said in JDK-8156821.
>>
>
> Not sure what exactly David meant with "You can't "interrupt" a thread
> that is blocked trying to acquire a mutex." Maybe he can elaborate :)
>
> My guesses:
>
> 1) If he meant "you cannot interrupt a thread blocking in
> pthread_mutex_lock()" - not true, you can and my patch works just fine in
> this situation. Just tested again, to be sure. This covers crashes in
> sections guarded by pthread_mutex, which then try to reaquire the lock in
> the error handler.
>
> 2) If he meant "you cannot interrupt malloc if it is executing a system
> call in the linux kernel" - that may be. I am not a linux kernel expert
> but would have thought that syscalls may block if interrupts are disabled
> for certain lengths by the syscall author. But in that case i would have
> expected the process to hang too and to be not killable? Again, I am no
> expert.
>
> Ok. I'll let David explain once he's available.
>
>
>
>>
>> Otherwise overall your changes look good, but I have a few comments.
>> Also, since this is an enhancement, it needs to wait for JDK 10.
>>
>> I think your test will fail for product builds. You should add "@requires
>> vm.debug == true". Also, java files use 4 char indentation, not 2 like we
>> use in hotspot C/C++ code. Lastly, it should only have a 2016 copyright.
>>
>>
> Thank you for the hints. Did fix all that. Note that I had disabled the
> test for product builds in the code (!Platform.isDebugBuild()) but I
> added the vm.debug tag as well as you suggested.
>
> Ah, sorry I missed that, but IMHO the Platform checks should only be used
> to alter test behavior, not completely disable the entire test. @requires
> is best for disabling a test for certain platforms and builds. You should
> probably remove the Platform checks and also add 'os.family != "windows"'
> to the @requires line.
>

Good point, thank you! Will adjust the test accordingly.


>
>
>
>> A couple of files need the copyright updated to 2016.
>>
>> Why do set_to_now() and get_timestamp() need to be atomic, and what are
>> the consequences of cx8 not being supported?
>>
>>
> The error reporting thread sets the timestamp on each STEP start, and the
> timestamp is read from another thread, the WatcherThread. Timestamp is
> 64bit. I wanted to make sure the 64bit value is written and read
> atomically, especially on 32bit platforms.
>
> But then, I had to check whether 64bit atomic stores/loads are even
> supported by this platform (I actually did not find a 32bit platform
> whithout 64bit atomics, but the comment in atomic.hpp is pretty insistent
> and I did not want to risk regressions for other platforms).
>
> Well, if no cx8 support was available, I pretty much just give up and read
> and write timestamps directly. As I said, I am not sure if this code path
> gets ever executed.
>
> Maybe I was overthinking all this and just reading and writing the (C++
> volatile) jlongs would have been enough, but I wanted to prevent sporadic
> test errors because of incompletely read 64bit values.
>
> Closed ports may not have cx8 support, although I don't believe any are
> being released with JDK9. Since you just have one writer and one reader, I
> think the only concern is word tearing on the read. For this reason you
> likely need the cx8 support. David would know, so hopefully he can comment
> on this.
>
> Assuming you need cx8 support, theoretically there are platforms where
> your code could fail due to not having cx8 support. You could argue that
> the risk of word tearing is minimal, both in likelihood of it happening
> (race condition on a platform we aren't currently officially supporting),
> and the possible negative behavior if it does (premature timeout or
> possibly no timeout, but only with debug builds after a crash).
>
> The other choice here is to just disable the whole timeout mechanism if
> cx8 is not supported. In fact simply making set_to_now() and
> get_timestamp() no-ops when cx8 is not supported would accomplish that,
> although I'd suggest also adding some more explicit disabling of the code
> wherever the timestamps are referenced.
>

Another thing I thought of would be to change the timestamp to 32bit - I
only need second resolution - and handle somehow the year 2038 overflow.
But the most easy and pragmatic way is to either ignore the problem
completely for non-cx8 or to do anything.


>
> BTW, the statics you added should probably all be made fields of VMError
> rather than in the global scope.
>
>
>
>> 1282         st->print_raw_cr(buffer);
>> 1283         st->cr();
>>
>> The old code had an additional st->cr() before the above lines. I assume
>> you removed it intentionally.
>>
>>
> I hope I preserved the numbers of cr(). At least that was my intention:
>
> 1260       outputStream* const st = log.is_open() ? &log : &out;
> 1261       st->cr();
>
> ...
>
> and then on every path, a cr (or print_raw_cr) at the end. Where do you
> see the missing cr()?
>
> Ok. It's just moved up about 20 lines of code now so I missed it.
>
>
>
>
>> Is there a reason why you decided to only allow one step to timeout. What
>> if the cause of a timeout in a step also impacts other steps, or is that
>> not common when we see timeouts?
>>
>>
> That is mostly guesswork. In our (SAP) code we allow for four steps (so
> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps known
> to be long" where timeouts are disabled altogether. But we also have more
> complicated error reporting steps, so when porting the patch to OpenJDK, I
> felt the complexity was unneeded.
>
> I think in general you will only have one misbehaving step, but you are
> right, more than one step may timeout if e.g. the file system is slow. I'm
> open for suggestions: the timeout value should be large enough not to be
> hit for "normal slow steps" while still leave room enough for other steps
> to finish. What do you think a reasonable timeout value would be?
> ErrorLogTimeout/4?
>
> I don't think we've run into the "slow steps" case causing timeout, just
> the deadlocks, so I don't really have any data to give you. If you are
> primarily concerned about deadlocks, then you want ErrorLogTimeout div a
> fairly large number. If you are mostly addressing slow steps, then you div
> with a smallish number. I think I'd prefer /4 over /2, maybe even bigger.
>

I'll think about it. Maybe 1/4 is a good compromise.


>
>
>
>
>> It's not clear to me why you changed a couple of os::sleep() calls to
>> os::naked_short_sleep(), and the rationale for the sleep periods. Can you
>> please explain?
>>
>>
> Because os::sleep() does a lot of work under the hood and relies on a bit
> of VM infrastructure. I think that is not a good idea in error situations
> where potentially everything may be broken already. You want to step
> lightly and really only do a naked system sleep. About the sleep periods,
> os::naked_sleep has an inbuilt maximum value of 1000ms, which I have to
> stay below to not hit the assert. I did use 999ms as the longest interval I
> am allowed to sleep nakedly. And after the timeout hit and before the
> WatcherThread calls os::abort, I again sleep 200ms to give the error
> reporter thread time to write the "error log aborted due to timeout" into
> the error log and to flush the error log. Those 200ms are just guesswork.
>
> Ok. I'd like to see someone else comment on the os::naked_sleep() use
> since it's not something I'm familiar enough with.
>
> thanks,
>
> Chris
>

Thanks, Chris.

 Thomas


>
>
>
>> thanks,
>>
>> Chris
>>
>>
> Thanks for the review!
>
> Kind Regards, Thomas
>
>
>
>>
>> On 10/12/16 9:55 PM, Thomas St?fe wrote:
>>
>>> Dear all,
>>>
>>> please take a look at the following fix:
>>>
>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>>> webrev:
>>> http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-E
>>> rror-Reporting/webrev.00/webrev/index.html
>>>
>>> ---
>>>
>>> In short, this fix provides the ability to cancel hanging error reporting
>>> steps. This uses the same code paths secondary error handling uses during
>>> error reporting. With this patch, steps which take too long will be
>>> canceled after 1/2 ErrorLogTimeout. In the log file, it will look like
>>> this:
>>>
>>> 4 [timeout occurred during error reporting in step "<stepname>"] after
>>> xxxx
>>> ms.
>>> 5
>>>
>>> and we now also get a finish message in the hs-err file if we hit the
>>> ErrorLogTimeout and error reporting will stop altogether:
>>>
>>> 6 ------ Timout during error reporting after xxx ms. ------
>>>
>>> (in addition to the "time expired, abort" message the WatcherThread
>>> writes
>>> to stderr)
>>>
>>> ---
>>>
>>> This is something which bugged us for a long time, because we rely
>>> heavily
>>> on the hs_err files for error analysis at customer sites, and there are a
>>> number of reasons why one step may hang and prevent the follow-up steps
>>> from running.
>>>
>>> It works like this:
>>>
>>> Before, when error reporting started, the WatcherThread was waiting for
>>> ErrorLogTimeout seconds, then would stop the VM.
>>>
>>> Now, the WatcherThread periodically pings error reporting, which checks
>>> if
>>> the last step did timeout. If it does, it sends a signal to the reporting
>>> thread, and the thread will continue with the next step. This follows the
>>> same path as secondary crash handling.
>>>
>>> Some implementation details:
>>>
>>> On Posix platforms, to interrupt the thread, I use pthread_kill. This
>>> means
>>> I must know the pthread id of the reporting thread, which I now store at
>>> the beginning of error reporting. We already store the reporting thread
>>> id
>>> in first_error_tid, but that I cannot use, because it gets set by
>>> os::current_thread_id(), which is not always the pthread id. Should we
>>> ever
>>> switch to only using pthread id for posix platforms, this coding can be
>>> simplified.
>>>
>>> On Windows, there is unfortunately no easy way to interrupt a
>>> non-cooperative thread. I would need a way to cause a SEH inside the
>>> target
>>> thread, which then would get handled by secondary error handling like on
>>> Posix platforms, but that is not easy. It is doable - one can suspend the
>>> thread, modify the thread context in a way that it will crash upon
>>> resume.
>>> But that felt a bit heavyweight for this problem. So on windows, timeout
>>> handling still works (after ErrorLogTimeout the VM gets shut down), but
>>> error reporting steps are not interruptable. If we feel this is
>>> important,
>>> this can be added later.
>>>
>>> Kind Regards, Thomas
>>>
>>
>>
>>
>>
>
>

From thomas.stuefe at gmail.com  Thu Oct 27 07:16:39 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 27 Oct 2016 09:16:39 +0200
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
	<8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>
Message-ID: <CAA-vtUy6baJQrjFeM94Bty+maBhV2iO=6jbHPKMAMnAgz_jaKg@mail.gmail.com>

Hi David,

On Thu, Oct 27, 2016 at 2:40 AM, David Holmes <david.holmes at oracle.com>
wrote:

> On 27/10/2016 12:45 AM, Thomas St?fe wrote:
>
>> Hi Chris,
>>
>> Thanks for the review!
>>
>> New
>> webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-
>> Error-Reporting/webrev.01/webrev/
>>
>
> Have not looked at this yet.
>
> Comments inline.
>>
>> On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer <chris.plummer at oracle.com
>> <mailto:chris.plummer at oracle.com>> wrote:
>>
>>     Hi Tomas,
>>
>>     See JDK-8156821. I'm curious as to how your changes will impact it,
>>     since David says you can't interrupt a thread blocked trying to
>>     acquire mutex. I suspect that means this enhancement won't help in
>>     this case, and presumably in general you are not fixing the issue of
>>     error reporting getting deadlocked, or maybe I'm misinterpreting
>>     what David said in JDK-8156821.
>>
>
> That should be 8156823
>
>
>>
>> Not sure what exactly David meant with "You can't "interrupt" a thread
>> that is blocked trying to acquire a mutex." Maybe he can elaborate :)
>>
>> My guesses:
>>
>> 1) If he meant "you cannot interrupt a thread blocking in
>> pthread_mutex_lock()" - not true, you can and my patch works just fine
>> in this situation. Just tested again, to be sure. This covers crashes in
>> sections guarded by pthread_mutex, which then try to reaquire the lock
>> in the error handler.
>>
>
> There is no specified, portable way to get a thread blocked acquiring a
> mutex to stop waiting for the mutex. That is what I meant.
> pthread_mutex_lock is not a cancellation point, nor will it return EINTR in
> response to a signal.
>
> However, if a signal is received by the thread while waiting then POSIX
> semantics indicate that the signal handler will run and then return the
> thread to the waiting state. In our case the crash handler does not return
> so we are into undefined territory there - but our crash handler is already
> not a well-defined signal handler as it is not restricted to
> async-signal-safe functions, so we already run a risk when executing it.
>
>
That was what I meant. Syscalls have not to be interruptible per design,
they just have to call user signal handlers for asynchronous signals.

And I think my patch does not make matters worse or more unsafe. There is
no new concept here - I use the pre-existing secondary signal handling and
that only in situations in which otherwise the error handling would very
probably hang forever, not producing any error log at all.


> I had not considered this aspect in relation to 8156823, so the proposed
> approach here would also attempt to address that issue.
>
> 2) If he meant "you cannot interrupt malloc if it is executing a system
>> call in the linux kernel" - that may be. I am not a linux kernel expert
>> but would have thought that syscalls may block if interrupts are
>> disabled for certain lengths by the syscall author. But in that case i
>> would have expected the process to hang too and to be not killable?
>> Again, I am no expert.
>>
>
> Note "interrupt" here is a logical concept not related to hardware level
> interrupts. I don't know at what point going into malloc you will no longer
> get signal handlers run - malloc doesn't use pthread level mutexes, but
> direct futuxes, so the same signal responsiveness may not be present.
>

I will have to take a closer look at what the glibc does. I always thought
that any locks it takes in user space are interruptible by signals, and
that libc calls only become uninterruptible when it calls kernel syscalls -
and those kernel syscalls cannot be interrupted (that was what I meant with
interrupts disabled). It may be wrong. Ill have to rethink this.

But whatever the outcome, there may be situations where a thread cannot be
interrupted by pthread_kill, but I think those cases are rare. More often
we just wait in very ordinary situations, be it a pthread mutex deadlock or
a slow file system. e.g. one of the more common scenarios is when you want
to print a stack trace and try to load symbol information to resolve an pc
to a name.

Thanks, Thomas


>
> Thanks,
> David
> -----
>
>
>
>>
>>     Otherwise overall your changes look good, but I have a few comments.
>>     Also, since this is an enhancement, it needs to wait for JDK 10.
>>
>>     I think your test will fail for product builds. You should add
>>     "@requires vm.debug == true". Also, java files use 4 char
>>     indentation, not 2 like we use in hotspot C/C++ code. Lastly, it
>>     should only have a 2016 copyright.
>>
>>
>> Thank you for the hints. Did fix all that. Note that I had disabled the
>> test for product builds in the code (!Platform.isDebugBuild()) but I
>> added the vm.debug tag as well as you suggested.
>>
>>
>>     A couple of files need the copyright updated to 2016.
>>
>>     Why do set_to_now() and get_timestamp() need to be atomic, and what
>>     are the consequences of cx8 not being supported?
>>
>>
>> The error reporting thread sets the timestamp on each STEP start, and
>> the timestamp is read from another thread, the WatcherThread. Timestamp
>> is 64bit. I wanted to make sure the 64bit value is written and read
>> atomically, especially on 32bit platforms.
>>
>> But then, I had to check whether 64bit atomic stores/loads are even
>> supported by this platform (I actually did not find a 32bit platform
>> whithout 64bit atomics, but the comment in atomic.hpp is pretty
>> insistent and I did not want to risk regressions for other platforms).
>>
>> Well, if no cx8 support was available, I pretty much just give up and
>> read and write timestamps directly. As I said, I am not sure if this
>> code path gets ever executed.
>>
>> Maybe I was overthinking all this and just reading and writing the (C++
>> volatile) jlongs would have been enough, but I wanted to prevent
>> sporadic test errors because of incompletely read 64bit values.
>>
>>
>>     1282         st->print_raw_cr(buffer);
>>     1283         st->cr();
>>
>>     The old code had an additional st->cr() before the above lines. I
>>     assume you removed it intentionally.
>>
>>
>> I hope I preserved the numbers of cr(). At least that was my intention:
>>
>> 1260       outputStream* const st = log.is_open() ? &log : &out;
>> 1261       st->cr();
>>
>> ...
>>
>> and then on every path, a cr (or print_raw_cr) at the end. Where do you
>> see the missing cr()?
>>
>>
>>
>>     Is there a reason why you decided to only allow one step to timeout.
>>     What if the cause of a timeout in a step also impacts other steps,
>>     or is that not common when we see timeouts?
>>
>>
>> That is mostly guesswork. In our (SAP) code we allow for four steps (so
>> ErrorLogTimeout/4 as step timeout) and additionally allow for "steps
>> known to be long" where timeouts are disabled altogether. But we also
>> have more complicated error reporting steps, so when porting the patch
>> to OpenJDK, I felt the complexity was unneeded.
>>
>> I think in general you will only have one misbehaving step, but you are
>> right, more than one step may timeout if e.g. the file system is slow.
>> I'm open for suggestions: the timeout value should be large enough not
>> to be hit for "normal slow steps" while still leave room enough for
>> other steps to finish. What do you think a reasonable timeout value
>> would be? ErrorLogTimeout/4?
>>
>>
>>
>>     It's not clear to me why you changed a couple of os::sleep() calls
>>     to os::naked_short_sleep(), and the rationale for the sleep periods.
>>     Can you please explain?
>>
>>
>> Because os::sleep() does a lot of work under the hood and relies on a
>> bit of VM infrastructure. I think that is not a good idea in error
>> situations where potentially everything may be broken already. You want
>> to step lightly and really only do a naked system sleep. About the sleep
>> periods, os::naked_sleep has an inbuilt maximum value of 1000ms, which I
>> have to stay below to not hit the assert. I did use 999ms as the longest
>> interval I am allowed to sleep nakedly. And after the timeout hit and
>> before the WatcherThread calls os::abort, I again sleep 200ms to give
>> the error reporter thread time to write the "error log aborted due to
>> timeout" into the error log and to flush the error log. Those 200ms are
>> just guesswork.
>>
>>
>>     thanks,
>>
>>     Chris
>>
>>
>> Thanks for the review!
>>
>> Kind Regards, Thomas
>>
>>
>>
>>
>>     On 10/12/16 9:55 PM, Thomas St?fe wrote:
>>
>>         Dear all,
>>
>>         please take a look at the following fix:
>>
>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>>         <https://bugs.openjdk.java.net/browse/JDK-8166944>
>>         webrev:
>>         http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-
>> Error-Reporting/webrev.00/webrev/index.html
>>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-
>> Error-Reporting/webrev.00/webrev/index.html>
>>
>>         ---
>>
>>         In short, this fix provides the ability to cancel hanging error
>>         reporting
>>         steps. This uses the same code paths secondary error handling
>>         uses during
>>         error reporting. With this patch, steps which take too long will
>> be
>>         canceled after 1/2 ErrorLogTimeout. In the log file, it will
>>         look like this:
>>
>>         4 [timeout occurred during error reporting in step "<stepname>"]
>>         after xxxx
>>         ms.
>>         5
>>
>>         and we now also get a finish message in the hs-err file if we
>>         hit the
>>         ErrorLogTimeout and error reporting will stop altogether:
>>
>>         6 ------ Timout during error reporting after xxx ms. ------
>>
>>         (in addition to the "time expired, abort" message the
>>         WatcherThread writes
>>         to stderr)
>>
>>         ---
>>
>>         This is something which bugged us for a long time, because we
>>         rely heavily
>>         on the hs_err files for error analysis at customer sites, and
>>         there are a
>>         number of reasons why one step may hang and prevent the
>>         follow-up steps
>>         from running.
>>
>>         It works like this:
>>
>>         Before, when error reporting started, the WatcherThread was
>>         waiting for
>>         ErrorLogTimeout seconds, then would stop the VM.
>>
>>         Now, the WatcherThread periodically pings error reporting, which
>>         checks if
>>         the last step did timeout. If it does, it sends a signal to the
>>         reporting
>>         thread, and the thread will continue with the next step. This
>>         follows the
>>         same path as secondary crash handling.
>>
>>         Some implementation details:
>>
>>         On Posix platforms, to interrupt the thread, I use pthread_kill.
>>         This means
>>         I must know the pthread id of the reporting thread, which I now
>>         store at
>>         the beginning of error reporting. We already store the reporting
>>         thread id
>>         in first_error_tid, but that I cannot use, because it gets set by
>>         os::current_thread_id(), which is not always the pthread id.
>>         Should we ever
>>         switch to only using pthread id for posix platforms, this coding
>>         can be
>>         simplified.
>>
>>         On Windows, there is unfortunately no easy way to interrupt a
>>         non-cooperative thread. I would need a way to cause a SEH inside
>>         the target
>>         thread, which then would get handled by secondary error handling
>>         like on
>>         Posix platforms, but that is not easy. It is doable - one can
>>         suspend the
>>         thread, modify the thread context in a way that it will crash
>>         upon resume.
>>         But that felt a bit heavyweight for this problem. So on windows,
>>         timeout
>>         handling still works (after ErrorLogTimeout the VM gets shut
>>         down), but
>>         error reporting steps are not interruptable. If we feel this is
>>         important,
>>         this can be added later.
>>
>>         Kind Regards, Thomas
>>
>>
>>
>>
>>
>>

From staffan.larsen at oracle.com  Thu Oct 27 08:36:48 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 27 Oct 2016 10:36:48 +0200
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
Message-ID: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>

All,

Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided. 

While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.

I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java

bug: https://bugs.openjdk.java.net/browse/JDK-8168305
webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/

Thanks,
/Staffan

From robbin.ehn at oracle.com  Thu Oct 27 09:23:34 2016
From: robbin.ehn at oracle.com (Robbin Ehn)
Date: Thu, 27 Oct 2016 11:23:34 +0200
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
Message-ID: <7e63d940-5ab1-5858-9d0e-c352d16c989a@oracle.com>

Looks good!

/Robbin

On 10/27/2016 10:36 AM, Staffan Larsen wrote:
> All,
>
> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided.
>
> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.
>
> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8168305
> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/
>
> Thanks,
> /Staffan
>

From marcus.larsson at oracle.com  Thu Oct 27 09:41:48 2016
From: marcus.larsson at oracle.com (Marcus Larsson)
Date: Thu, 27 Oct 2016 11:41:48 +0200
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
Message-ID: <7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com>

Hi Staffan,


On 2016-10-27 10:36, Staffan Larsen wrote:
> All,
>
> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided.
>
> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.
>
> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8168305
> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/

The command description still mentions the flag, see
src/share/vm/services/diagnosticCommand.hpp:389

Apart from that this looks good to me!

Thanks,
Marcus

>
> Thanks,
> /Staffan


From martin.doerr at sap.com  Thu Oct 27 10:10:13 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Thu, 27 Oct 2016 10:10:13 +0000
Subject: RFR(M): 8168083: PPC64: Cleanup template interpreter after
	8154580 and 8154867
In-Reply-To: <13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com>
References: <3fe51d73420847cd85508d0a31cdd852@DEWDFE13DE14.global.corp.sap>
	<7ef7bcb6-5092-3b29-e1d6-8d6e4fbb3b69@oracle.com>
	<f43fe5c7b0f442719f27310e9e7cf710@DEWDFE13DE14.global.corp.sap>
	<13e4100e-b385-6c71-8222-d36819f2fbdd@oracle.com>
Message-ID: <39282c5a307648cb931e0c2781cc5810@DEWDFE13DE10.global.corp.sap>

Hi Coleen,

thanks for your email and for opening the bug.

Reloading of ConstMethod is not restricted to load_mirror(). E.g. SPARC's generate_fixed_frame loads it 4 times.
Therefore, I have added a comment to the bug.
I guess the load_mirror change alone is not so relevant, but I appreciate any cleanup there as well.

Thanks and best regards,
Martin


-----Original Message-----
From: Coleen Phillimore [mailto:coleen.phillimore at oracle.com] 
Sent: Mittwoch, 26. Oktober 2016 21:32
To: Doerr, Martin <martin.doerr at sap.com>; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter after 8154580 and 8154867


On 10/20/16 4:58 AM, Doerr, Martin wrote:
> Hi Coleen,
>
> thank you very much for reviewing my PPC change.
>
> We had originally spent a lot of effort to get the template interpreter fast. I think startup performance is still important.
> A large amount of less optimized changes will make it slower over time.
> That's why we have reduced reloading constMethod in the PPC implementation. I think this would be good for other platforms as well.
> Maybe we should improve them in 10.

I don't know. I though load_mirror() made for a nice API.  Does the extra indirect matter?  I filed RFE
https://bugs.openjdk.java.net/browse/JDK-8168795 so we can investigate further in 10.

This is approved and I think reviewed so you can check it in anytime.  I 
put a due date of Friday on your bug.   Feel free to change it if that's 
not good.

Thanks,
Coleen

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-runtime-dev 
> [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of 
> Coleen Phillimore
> Sent: Dienstag, 18. Oktober 2016 23:56
> To: hotspot-runtime-dev at openjdk.java.net
> Subject: Re: RFR(M): 8168083: PPC64: Cleanup template interpreter 
> after 8154580 and 8154867
>
>
> This seems good.   I think it's a shame to change load_mirror() to
> load_mirror_from_const_method() though because there's load_mirror() 
> with the same parameters on all the other platforms and it makes
> platform development a little easier.   But that's up to you to because
> you can generate shorter sequences.
>
> Coleen
>
>
> On 10/17/16 12:38 PM, Doerr, Martin wrote:
>> Hi,
>>
>> I'd like to clean up the template interpreter on PPC64 a little bit after changes which were pushed into jdk9:
>>
>> 8154580 introduced copying the java mirror into the interpreter frame. Some code can be implemented shorter. Before this change, the size of the ijava state was designed to be a multiple of 16. We should remove the comment as this is no longer true. I have checked that this is not really required (generate_fixed_frame inserts frame padding if needed).
>>
>> 8154867 is the PPC64 port of "better byte behavior". The shorter TOS states are not treated appropriately (which is not critical because the template interpreter also uses itos for shorter types). This part of the change was requested by Coleen, but it didn't make it into the original webrev.
>>
>> Webrev is here:
>> http://cr.openjdk.java.net/~mdoerr/8168083_PPC64_interp_cleanup/webre
>> v.00/
>>
>> Please review.
>>
>> Thanks and best regards,
>> Martin
>>


From staffan.larsen at oracle.com  Thu Oct 27 11:25:53 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 27 Oct 2016 13:25:53 +0200
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
In-Reply-To: <7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com>
References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
	<7a8a2e7d-1751-1743-fbfd-a39d818ff2f4@oracle.com>
Message-ID: <E3882CD5-C1AB-40BE-BAD0-FCC10674CBD8@oracle.com>


> On 27 Oct 2016, at 11:41, Marcus Larsson <marcus.larsson at oracle.com> wrote:
> 
> Hi Staffan,
> 
> 
> On 2016-10-27 10:36, Staffan Larsen wrote:
>> All,
>> 
>> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided.
>> 
>> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.
>> 
>> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java
>> 
>> bug: https://bugs.openjdk.java.net/browse/JDK-8168305
>> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/
> 
> The command description still mentions the flag, see
> src/share/vm/services/diagnosticCommand.hpp:389

Well spotted!

new webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.02/ <http://cr.openjdk.java.net/~sla/8168305/webrev.02/>


> Apart from that this looks good to me!

Thanks.

> 
> Thanks,
> Marcus
> 
>> 
>> Thanks,
>> /Staffan


From thomas.stuefe at gmail.com  Thu Oct 27 12:08:03 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 27 Oct 2016 14:08:03 +0200
Subject: metaspace.cpp: why are counters in ChunkManager updated atomically?
Message-ID: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>

Hi all,

I am currently working on a prototype for
https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question about
the ChunkManager class in metaspace.cpp.

ChunkManager has the _free_chunks_total, _free_chunks_count counters. It
seems the coding goes some lengths to avoid updating them often, so instead
of updating them when a chunk is freed it attemps to delay and accumulate
updates. This makes changing the MetaChunk allocation quite complicated,
because there are large windows during which the counters are invalid and
do not reflect reality.

I see that the counters are updated atomically, so I assume the reason for
delaying the updates is that atomics are expensive. But I could not find a
good reason why the counters are updated atomically. To me, all
modifications seem happen under lock protection
(SpaceManager::expand_lock()). What am I overlooking?

Thanks a lot,

Kind Regards, Thomas

From mikael.gerdin at oracle.com  Thu Oct 27 12:41:26 2016
From: mikael.gerdin at oracle.com (Mikael Gerdin)
Date: Thu, 27 Oct 2016 14:41:26 +0200
Subject: metaspace.cpp: why are counters in ChunkManager updated
	atomically?
In-Reply-To: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>
References: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>
Message-ID: <e55e8f20-35ed-1da0-2733-1dea8fde6780@oracle.com>

Hi Thomas,

On 2016-10-27 14:08, Thomas St?fe wrote:
> Hi all,
>
> I am currently working on a prototype for
> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question about
> the ChunkManager class in metaspace.cpp.
>
> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It
> seems the coding goes some lengths to avoid updating them often, so instead
> of updating them when a chunk is freed it attemps to delay and accumulate
> updates. This makes changing the MetaChunk allocation quite complicated,
> because there are large windows during which the counters are invalid and
> do not reflect reality.
>
> I see that the counters are updated atomically, so I assume the reason for
> delaying the updates is that atomics are expensive. But I could not find a
> good reason why the counters are updated atomically. To me, all
> modifications seem happen under lock protection
> (SpaceManager::expand_lock()). What am I overlooking?

I don't think you are overlooking anything. The fact that these are 
updated with atomics is something that I've noticed as well at some 
point but I don't think I ever got around to fixing that.


I'm not sure I understand where in the code the delayed and accumulated 
updates take place but if you think that's the case then it's probably 
true. I suspect that at this point you are one of the handful of people 
who are familiar with the chunk allocation code :)

Regards
/Mikael

>
> Thanks a lot,
>
> Kind Regards, Thomas
>

From thomas.stuefe at gmail.com  Thu Oct 27 12:56:48 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 27 Oct 2016 14:56:48 +0200
Subject: metaspace.cpp: why are counters in ChunkManager updated
	atomically?
In-Reply-To: <e55e8f20-35ed-1da0-2733-1dea8fde6780@oracle.com>
References: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>
	<e55e8f20-35ed-1da0-2733-1dea8fde6780@oracle.com>
Message-ID: <CAA-vtUy0nQdCYrsb9h9sXHKA+K3t6EeO5RvKJjmdUXkaZzyOBg@mail.gmail.com>

Hi Mikael,

On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin <mikael.gerdin at oracle.com>
wrote:

> Hi Thomas,
>
> On 2016-10-27 14:08, Thomas St?fe wrote:
>
>> Hi all,
>>
>> I am currently working on a prototype for
>> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question
>> about
>> the ChunkManager class in metaspace.cpp.
>>
>> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It
>> seems the coding goes some lengths to avoid updating them often, so
>> instead
>> of updating them when a chunk is freed it attemps to delay and accumulate
>> updates. This makes changing the MetaChunk allocation quite complicated,
>> because there are large windows during which the counters are invalid and
>> do not reflect reality.
>>
>> I see that the counters are updated atomically, so I assume the reason for
>> delaying the updates is that atomics are expensive. But I could not find a
>> good reason why the counters are updated atomically. To me, all
>> modifications seem happen under lock protection
>> (SpaceManager::expand_lock()). What am I overlooking?
>>
>
> I don't think you are overlooking anything. The fact that these are
> updated with atomics is something that I've noticed as well at some point
> but I don't think I ever got around to fixing that.
>
>
> I'm not sure I understand where in the code the delayed and accumulated
> updates take place but if you think that's the case then it's probably
> true. I suspect that at this point you are one of the handful of people who
> are familiar with the chunk allocation code :)
>
>
:) That is good news, because I then can straighten the updates out, this
makes making the changes much simpler. Will probably do this in a separate
fix. Thank you!

Thomas


> Regards
> /Mikael
>
>
>
>> Thanks a lot,
>>
>> Kind Regards, Thomas
>>
>>

From erik.helin at oracle.com  Thu Oct 27 13:26:31 2016
From: erik.helin at oracle.com (Erik Helin)
Date: Thu, 27 Oct 2016 15:26:31 +0200
Subject: metaspace.cpp: why are counters in ChunkManager updated
	atomically?
In-Reply-To: <CAA-vtUy0nQdCYrsb9h9sXHKA+K3t6EeO5RvKJjmdUXkaZzyOBg@mail.gmail.com>
References: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>
	<e55e8f20-35ed-1da0-2733-1dea8fde6780@oracle.com>
	<CAA-vtUy0nQdCYrsb9h9sXHKA+K3t6EeO5RvKJjmdUXkaZzyOBg@mail.gmail.com>
Message-ID: <2121d75b-4762-0961-74a0-edef212b651c@oracle.com>

On 10/27/2016 02:56 PM, Thomas St?fe wrote:
> Hi Mikael,
>
> On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin <mikael.gerdin at oracle.com>
> wrote:
>
>> Hi Thomas,
>>
>> On 2016-10-27 14:08, Thomas St?fe wrote:
>>
>>> Hi all,
>>>
>>> I am currently working on a prototype for
>>> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question
>>> about
>>> the ChunkManager class in metaspace.cpp.
>>>
>>> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It
>>> seems the coding goes some lengths to avoid updating them often, so
>>> instead
>>> of updating them when a chunk is freed it attemps to delay and accumulate
>>> updates. This makes changing the MetaChunk allocation quite complicated,
>>> because there are large windows during which the counters are invalid and
>>> do not reflect reality.
>>>
>>> I see that the counters are updated atomically, so I assume the reason for
>>> delaying the updates is that atomics are expensive. But I could not find a
>>> good reason why the counters are updated atomically. To me, all
>>> modifications seem happen under lock protection
>>> (SpaceManager::expand_lock()). What am I overlooking?
>>>
>>
>> I don't think you are overlooking anything. The fact that these are
>> updated with atomics is something that I've noticed as well at some point
>> but I don't think I ever got around to fixing that.
>>
>>
>> I'm not sure I understand where in the code the delayed and accumulated
>> updates take place but if you think that's the case then it's probably
>> true. I suspect that at this point you are one of the handful of people who
>> are familiar with the chunk allocation code :)
>>
>>
> :) That is good news, because I then can straighten the updates out, this
> makes making the changes much simpler. Will probably do this in a separate
> fix. Thank you!

I came to the same conclusion as Mikael the last time I checked, but due 
to lack of time I didn't got around to fix it. Please send this out as a 
separate patch, it will make reviewing much easier.

Thanks,
Erik

> Thomas
>
>
>> Regards
>> /Mikael
>>
>>
>>
>>> Thanks a lot,
>>>
>>> Kind Regards, Thomas
>>>
>>>

From thomas.stuefe at gmail.com  Thu Oct 27 13:39:31 2016
From: thomas.stuefe at gmail.com (=?UTF-8?Q?Thomas_St=C3=BCfe?=)
Date: Thu, 27 Oct 2016 15:39:31 +0200
Subject: metaspace.cpp: why are counters in ChunkManager updated
	atomically?
In-Reply-To: <2121d75b-4762-0961-74a0-edef212b651c@oracle.com>
References: <CAA-vtUx1MTYB91=HV+qjZUPS159Dj23HR3+5zWSaW=_hU4ChWA@mail.gmail.com>
	<e55e8f20-35ed-1da0-2733-1dea8fde6780@oracle.com>
	<CAA-vtUy0nQdCYrsb9h9sXHKA+K3t6EeO5RvKJjmdUXkaZzyOBg@mail.gmail.com>
	<2121d75b-4762-0961-74a0-edef212b651c@oracle.com>
Message-ID: <CAA-vtUy8Te71NrfPUf4memcHCHpuDeVzXTE_i=uaeFLTtjWhCg@mail.gmail.com>

On Thu, Oct 27, 2016 at 3:26 PM, Erik Helin <erik.helin at oracle.com> wrote:

> On 10/27/2016 02:56 PM, Thomas St?fe wrote:
>
>> Hi Mikael,
>>
>> On Thu, Oct 27, 2016 at 2:41 PM, Mikael Gerdin <mikael.gerdin at oracle.com>
>> wrote:
>>
>> Hi Thomas,
>>>
>>> On 2016-10-27 14:08, Thomas St?fe wrote:
>>>
>>> Hi all,
>>>>
>>>> I am currently working on a prototype for
>>>> https://bugs.openjdk.java.net/browse/JDK-8166690 and have a question
>>>> about
>>>> the ChunkManager class in metaspace.cpp.
>>>>
>>>> ChunkManager has the _free_chunks_total, _free_chunks_count counters. It
>>>> seems the coding goes some lengths to avoid updating them often, so
>>>> instead
>>>> of updating them when a chunk is freed it attemps to delay and
>>>> accumulate
>>>> updates. This makes changing the MetaChunk allocation quite complicated,
>>>> because there are large windows during which the counters are invalid
>>>> and
>>>> do not reflect reality.
>>>>
>>>> I see that the counters are updated atomically, so I assume the reason
>>>> for
>>>> delaying the updates is that atomics are expensive. But I could not
>>>> find a
>>>> good reason why the counters are updated atomically. To me, all
>>>> modifications seem happen under lock protection
>>>> (SpaceManager::expand_lock()). What am I overlooking?
>>>>
>>>>
>>> I don't think you are overlooking anything. The fact that these are
>>> updated with atomics is something that I've noticed as well at some point
>>> but I don't think I ever got around to fixing that.
>>>
>>>
>>> I'm not sure I understand where in the code the delayed and accumulated
>>> updates take place but if you think that's the case then it's probably
>>> true. I suspect that at this point you are one of the handful of people
>>> who
>>> are familiar with the chunk allocation code :)
>>>
>>>
>>> :) That is good news, because I then can straighten the updates out, this
>> makes making the changes much simpler. Will probably do this in a separate
>> fix. Thank you!
>>
>
> I came to the same conclusion as Mikael the last time I checked, but due
> to lack of time I didn't got around to fix it. Please send this out as a
> separate patch, it will make reviewing much easier.
>
> Thanks,
> Erik
>
>
Thanks, Erik, will do that.


>
> Thomas
>>
>>
>> Regards
>>> /Mikael
>>>
>>>
>>>
>>> Thanks a lot,
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>
>>>>

From david.holmes at oracle.com  Thu Oct 27 17:40:49 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 28 Oct 2016 03:40:49 +1000
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUyc2ConhWcs1cp+qWFr5JdNc3OWT=jsUUduxQdN=zNBQw@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
	<113f387c-cea8-7cfc-9d6a-29d0151c8a83@oracle.com>
	<CAA-vtUyc2ConhWcs1cp+qWFr5JdNc3OWT=jsUUduxQdN=zNBQw@mail.gmail.com>
Message-ID: <f285e6e5-b0c4-11fe-baa5-54a7a71f324d@oracle.com>

Just picking up some a couple of specific discussion points ...

On 27/10/2016 4:09 PM, Thomas St?fe wrote:
> On Wed, Oct 26, 2016 at 9:27 PM, Chris Plummer <chris.plummer at oracle.com
> <mailto:chris.plummer at oracle.com>> wrote:
>>     But then, I had to check whether 64bit atomic stores/loads are
>>     even supported by this platform (I actually did not find a 32bit
>>     platform whithout 64bit atomics, but the comment in atomic.hpp is
>>     pretty insistent and I did not want to risk regressions for other
>>     platforms).
>>
>>     Well, if no cx8 support was available, I pretty much just give up
>>     and read and write timestamps directly. As I said, I am not sure
>>     if this code path gets ever executed.
>>
>>     Maybe I was overthinking all this and just reading and writing the
>>     (C++ volatile) jlongs would have been enough, but I wanted to
>>     prevent sporadic test errors because of incompletely read 64bit
>>     values.
>     Closed ports may not have cx8 support, although I don't believe any
>     are being released with JDK9. Since you just have one writer and one
>     reader, I think the only concern is word tearing on the read. For
>     this reason you likely need the cx8 support. David would know, so
>     hopefully he can comment on this.
>
>     Assuming you need cx8 support, theoretically there are platforms
>     where your code could fail due to not having cx8 support. You could
>     argue that the risk of word tearing is minimal, both in likelihood
>     of it happening (race condition on a platform we aren't currently
>     officially supporting), and the possible negative behavior if it
>     does (premature timeout or possibly no timeout, but only with debug
>     builds after a crash).
>
>     The other choice here is to just disable the whole timeout mechanism
>     if cx8 is not supported. In fact simply making set_to_now() and
>     get_timestamp() no-ops when cx8 is not supported would accomplish
>     that, although I'd suggest also adding some more explicit disabling
>     of the code wherever the timestamps are referenced.
>
>
> Another thing I thought of would be to change the timestamp to 32bit - I
> only need second resolution - and handle somehow the year 2038 overflow.
> But the most easy and pragmatic way is to either ignore the problem
> completely for non-cx8 or to do anything.

PPC32 did not support CX8, which is why this is present in the codebase. 
(It is also present at the Java level too.)

All platforms the JVM runs on must supported 64-bit atomic loads and 
stores by some means - to implement Java volatile long semantics. Even 
platforms that don't support CX8 have some means to do this eg by using 
FPU unit. But this isn't necessarily implemented in the Atomic class (it 
wasn't for PPC32 because there were no calls to those methods in the VM).

The warnings in the atomic.hpp file were to avoid causally defining 
different Atomic ops for jlongs, when they could not be implemented 
efficiently on systems without CX8 support. So the onus was put back on 
the user of the API to check this and define an alternative - rather 
than, for example, forcing use of a global lock on such platforms.

Given you are only using Atomic::load/store I think you can dispense 
with the supports_cx8 check, because, as I said, every platform must 
have some means to support such atomic loads/stores. And we currently 
don't have any ports that don't support CX8.

>>         It's not clear to me why you changed a couple of os::sleep()
>>         calls to os::naked_short_sleep(), and the rationale for the
>>         sleep periods. Can you please explain?
>>
>>
>>     Because os::sleep() does a lot of work under the hood and relies
>>     on a bit of VM infrastructure. I think that is not a good idea in
>>     error situations where potentially everything may be broken
>>     already. You want to step lightly and really only do a naked
>>     system sleep. About the sleep periods, os::naked_sleep has an
>>     inbuilt maximum value of 1000ms, which I have to stay below to not
>>     hit the assert. I did use 999ms as the longest interval I am
>>     allowed to sleep nakedly. And after the timeout hit and before the
>>     WatcherThread calls os::abort, I again sleep 200ms to give the
>>     error reporter thread time to write the "error log aborted due to
>>     timeout" into the error log and to flush the error log. Those
>>     200ms are just guesswork.
>     Ok. I'd like to see someone else comment on the os::naked_sleep()
>     use since it's not something I'm familiar enough with.

In this case, because we are in the WatcherThread, os::sleep will not 
doing anything interesting that relies on other VM infrastructure 
(modification of osThread wait-state only, calls to javaTimeNanos). For 
the same reason changing to naked_sleep is also fine. We can/should 
relax the assert in naked_sleep so that the sleep time is only limited 
for JavaThreads.

Thanks,
David

From david.holmes at oracle.com  Thu Oct 27 17:45:17 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 28 Oct 2016 03:45:17 +1000
Subject: RFR(s): 8166944: Hanging Error Reporting steps may lead to torn
	error logs.
In-Reply-To: <CAA-vtUy6baJQrjFeM94Bty+maBhV2iO=6jbHPKMAMnAgz_jaKg@mail.gmail.com>
References: <CAA-vtUzooDdxgiUO-yUA8Q9Fzdz_A6joU12VQCM+5q2cVRhp4A@mail.gmail.com>
	<7d236201-144f-8b65-18c3-6b70971b819a@oracle.com>
	<CAA-vtUwtSHEJXDQvV9dAYwqbAktcvBM11sHXRuY0H4ZCAx_t8Q@mail.gmail.com>
	<8bad2560-9c13-bbd8-0312-128168e8ed4c@oracle.com>
	<CAA-vtUy6baJQrjFeM94Bty+maBhV2iO=6jbHPKMAMnAgz_jaKg@mail.gmail.com>
Message-ID: <abb577c4-2ef7-f61c-c834-6b0fb5b39261@oracle.com>

Hi Thomas,

Totally agree your proposal makes things no better nor worse when it 
comes to what we do from the signal handler, and it may help with those 
deadlock situations. I'm not concerned about digging too deep into 
malloc to see whether it may or may not help in that particular case - 
it either will or it won't.

Overall I think this is looking quite good. I hope we get the JDK10 repo 
very soon ... once the JDK10 project officially takes off 9and probably 
once the repo consolidation project has settled on a final repo layout).

Thanks,
David

On 27/10/2016 5:16 PM, Thomas St?fe wrote:
> Hi David,
>
> On Thu, Oct 27, 2016 at 2:40 AM, David Holmes <david.holmes at oracle.com
> <mailto:david.holmes at oracle.com>> wrote:
>
>     On 27/10/2016 12:45 AM, Thomas St?fe wrote:
>
>         Hi Chris,
>
>         Thanks for the review!
>
>         New
>         webrev:
>         http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.01/webrev/>
>
>
>     Have not looked at this yet.
>
>         Comments inline.
>
>         On Wed, Oct 26, 2016 at 9:00 AM, Chris Plummer
>         <chris.plummer at oracle.com <mailto:chris.plummer at oracle.com>
>         <mailto:chris.plummer at oracle.com
>         <mailto:chris.plummer at oracle.com>>> wrote:
>
>             Hi Tomas,
>
>             See JDK-8156821. I'm curious as to how your changes will
>         impact it,
>             since David says you can't interrupt a thread blocked trying to
>             acquire mutex. I suspect that means this enhancement won't
>         help in
>             this case, and presumably in general you are not fixing the
>         issue of
>             error reporting getting deadlocked, or maybe I'm misinterpreting
>             what David said in JDK-8156821.
>
>
>     That should be 8156823
>
>
>
>         Not sure what exactly David meant with "You can't "interrupt" a
>         thread
>         that is blocked trying to acquire a mutex." Maybe he can
>         elaborate :)
>
>         My guesses:
>
>         1) If he meant "you cannot interrupt a thread blocking in
>         pthread_mutex_lock()" - not true, you can and my patch works
>         just fine
>         in this situation. Just tested again, to be sure. This covers
>         crashes in
>         sections guarded by pthread_mutex, which then try to reaquire
>         the lock
>         in the error handler.
>
>
>     There is no specified, portable way to get a thread blocked
>     acquiring a mutex to stop waiting for the mutex. That is what I
>     meant. pthread_mutex_lock is not a cancellation point, nor will it
>     return EINTR in response to a signal.
>
>     However, if a signal is received by the thread while waiting then
>     POSIX semantics indicate that the signal handler will run and then
>     return the thread to the waiting state. In our case the crash
>     handler does not return so we are into undefined territory there -
>     but our crash handler is already not a well-defined signal handler
>     as it is not restricted to async-signal-safe functions, so we
>     already run a risk when executing it.
>
>
> That was what I meant. Syscalls have not to be interruptible per design,
> they just have to call user signal handlers for asynchronous signals.
>
> And I think my patch does not make matters worse or more unsafe. There
> is no new concept here - I use the pre-existing secondary signal
> handling and that only in situations in which otherwise the error
> handling would very probably hang forever, not producing any error log
> at all.
>
>
>     I had not considered this aspect in relation to 8156823, so the
>     proposed approach here would also attempt to address that issue.
>
>         2) If he meant "you cannot interrupt malloc if it is executing a
>         system
>         call in the linux kernel" - that may be. I am not a linux kernel
>         expert
>         but would have thought that syscalls may block if interrupts are
>         disabled for certain lengths by the syscall author. But in that
>         case i
>         would have expected the process to hang too and to be not killable?
>         Again, I am no expert.
>
>
>     Note "interrupt" here is a logical concept not related to hardware
>     level interrupts. I don't know at what point going into malloc you
>     will no longer get signal handlers run - malloc doesn't use pthread
>     level mutexes, but direct futuxes, so the same signal responsiveness
>     may not be present.
>
>
> I will have to take a closer look at what the glibc does. I always
> thought that any locks it takes in user space are interruptible by
> signals, and that libc calls only become uninterruptible when it calls
> kernel syscalls - and those kernel syscalls cannot be interrupted (that
> was what I meant with interrupts disabled). It may be wrong. Ill have to
> rethink this.
>
> But whatever the outcome, there may be situations where a thread cannot
> be interrupted by pthread_kill, but I think those cases are rare. More
> often we just wait in very ordinary situations, be it a pthread mutex
> deadlock or a slow file system. e.g. one of the more common scenarios is
> when you want to print a stack trace and try to load symbol information
> to resolve an pc to a name.
>
> Thanks, Thomas
>
>
>
>     Thanks,
>     David
>     -----
>
>
>
>
>             Otherwise overall your changes look good, but I have a few
>         comments.
>             Also, since this is an enhancement, it needs to wait for JDK 10.
>
>             I think your test will fail for product builds. You should add
>             "@requires vm.debug == true". Also, java files use 4 char
>             indentation, not 2 like we use in hotspot C/C++ code. Lastly, it
>             should only have a 2016 copyright.
>
>
>         Thank you for the hints. Did fix all that. Note that I had
>         disabled the
>         test for product builds in the code (!Platform.isDebugBuild()) but I
>         added the vm.debug tag as well as you suggested.
>
>
>             A couple of files need the copyright updated to 2016.
>
>             Why do set_to_now() and get_timestamp() need to be atomic,
>         and what
>             are the consequences of cx8 not being supported?
>
>
>         The error reporting thread sets the timestamp on each STEP
>         start, and
>         the timestamp is read from another thread, the WatcherThread.
>         Timestamp
>         is 64bit. I wanted to make sure the 64bit value is written and read
>         atomically, especially on 32bit platforms.
>
>         But then, I had to check whether 64bit atomic stores/loads are even
>         supported by this platform (I actually did not find a 32bit platform
>         whithout 64bit atomics, but the comment in atomic.hpp is pretty
>         insistent and I did not want to risk regressions for other
>         platforms).
>
>         Well, if no cx8 support was available, I pretty much just give
>         up and
>         read and write timestamps directly. As I said, I am not sure if this
>         code path gets ever executed.
>
>         Maybe I was overthinking all this and just reading and writing
>         the (C++
>         volatile) jlongs would have been enough, but I wanted to prevent
>         sporadic test errors because of incompletely read 64bit values.
>
>
>             1282         st->print_raw_cr(buffer);
>             1283         st->cr();
>
>             The old code had an additional st->cr() before the above
>         lines. I
>             assume you removed it intentionally.
>
>
>         I hope I preserved the numbers of cr(). At least that was my
>         intention:
>
>         1260       outputStream* const st = log.is_open() ? &log : &out;
>         1261       st->cr();
>
>         ...
>
>         and then on every path, a cr (or print_raw_cr) at the end. Where
>         do you
>         see the missing cr()?
>
>
>
>             Is there a reason why you decided to only allow one step to
>         timeout.
>             What if the cause of a timeout in a step also impacts other
>         steps,
>             or is that not common when we see timeouts?
>
>
>         That is mostly guesswork. In our (SAP) code we allow for four
>         steps (so
>         ErrorLogTimeout/4 as step timeout) and additionally allow for "steps
>         known to be long" where timeouts are disabled altogether. But we
>         also
>         have more complicated error reporting steps, so when porting the
>         patch
>         to OpenJDK, I felt the complexity was unneeded.
>
>         I think in general you will only have one misbehaving step, but
>         you are
>         right, more than one step may timeout if e.g. the file system is
>         slow.
>         I'm open for suggestions: the timeout value should be large
>         enough not
>         to be hit for "normal slow steps" while still leave room enough for
>         other steps to finish. What do you think a reasonable timeout value
>         would be? ErrorLogTimeout/4?
>
>
>
>             It's not clear to me why you changed a couple of os::sleep()
>         calls
>             to os::naked_short_sleep(), and the rationale for the sleep
>         periods.
>             Can you please explain?
>
>
>         Because os::sleep() does a lot of work under the hood and relies
>         on a
>         bit of VM infrastructure. I think that is not a good idea in error
>         situations where potentially everything may be broken already.
>         You want
>         to step lightly and really only do a naked system sleep. About
>         the sleep
>         periods, os::naked_sleep has an inbuilt maximum value of 1000ms,
>         which I
>         have to stay below to not hit the assert. I did use 999ms as the
>         longest
>         interval I am allowed to sleep nakedly. And after the timeout
>         hit and
>         before the WatcherThread calls os::abort, I again sleep 200ms to
>         give
>         the error reporter thread time to write the "error log aborted
>         due to
>         timeout" into the error log and to flush the error log. Those
>         200ms are
>         just guesswork.
>
>
>             thanks,
>
>             Chris
>
>
>         Thanks for the review!
>
>         Kind Regards, Thomas
>
>
>
>
>             On 10/12/16 9:55 PM, Thomas St?fe wrote:
>
>                 Dear all,
>
>                 please take a look at the following fix:
>
>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8166944
>         <https://bugs.openjdk.java.net/browse/JDK-8166944>
>                 <https://bugs.openjdk.java.net/browse/JDK-8166944
>         <https://bugs.openjdk.java.net/browse/JDK-8166944>>
>                 webrev:
>
>         http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html>
>
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html
>         <http://cr.openjdk.java.net/~stuefe/webrevs/8166944-Hanging-Error-Reporting/webrev.00/webrev/index.html>>
>
>                 ---
>
>                 In short, this fix provides the ability to cancel
>         hanging error
>                 reporting
>                 steps. This uses the same code paths secondary error
>         handling
>                 uses during
>                 error reporting. With this patch, steps which take too
>         long will be
>                 canceled after 1/2 ErrorLogTimeout. In the log file, it will
>                 look like this:
>
>                 4 [timeout occurred during error reporting in step
>         "<stepname>"]
>                 after xxxx
>                 ms.
>                 5
>
>                 and we now also get a finish message in the hs-err file
>         if we
>                 hit the
>                 ErrorLogTimeout and error reporting will stop altogether:
>
>                 6 ------ Timout during error reporting after xxx ms. ------
>
>                 (in addition to the "time expired, abort" message the
>                 WatcherThread writes
>                 to stderr)
>
>                 ---
>
>                 This is something which bugged us for a long time,
>         because we
>                 rely heavily
>                 on the hs_err files for error analysis at customer
>         sites, and
>                 there are a
>                 number of reasons why one step may hang and prevent the
>                 follow-up steps
>                 from running.
>
>                 It works like this:
>
>                 Before, when error reporting started, the WatcherThread was
>                 waiting for
>                 ErrorLogTimeout seconds, then would stop the VM.
>
>                 Now, the WatcherThread periodically pings error
>         reporting, which
>                 checks if
>                 the last step did timeout. If it does, it sends a signal
>         to the
>                 reporting
>                 thread, and the thread will continue with the next step.
>         This
>                 follows the
>                 same path as secondary crash handling.
>
>                 Some implementation details:
>
>                 On Posix platforms, to interrupt the thread, I use
>         pthread_kill.
>                 This means
>                 I must know the pthread id of the reporting thread,
>         which I now
>                 store at
>                 the beginning of error reporting. We already store the
>         reporting
>                 thread id
>                 in first_error_tid, but that I cannot use, because it
>         gets set by
>                 os::current_thread_id(), which is not always the pthread id.
>                 Should we ever
>                 switch to only using pthread id for posix platforms,
>         this coding
>                 can be
>                 simplified.
>
>                 On Windows, there is unfortunately no easy way to
>         interrupt a
>                 non-cooperative thread. I would need a way to cause a
>         SEH inside
>                 the target
>                 thread, which then would get handled by secondary error
>         handling
>                 like on
>                 Posix platforms, but that is not easy. It is doable -
>         one can
>                 suspend the
>                 thread, modify the thread context in a way that it will
>         crash
>                 upon resume.
>                 But that felt a bit heavyweight for this problem. So on
>         windows,
>                 timeout
>                 handling still works (after ErrorLogTimeout the VM gets shut
>                 down), but
>                 error reporting steps are not interruptable. If we feel
>         this is
>                 important,
>                 this can be added later.
>
>                 Kind Regards, Thomas
>
>
>
>
>
>

From david.holmes at oracle.com  Thu Oct 27 18:15:16 2016
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 28 Oct 2016 04:15:16 +1000
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
In-Reply-To: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
Message-ID: <2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com>

On 27/10/2016 6:36 PM, Staffan Larsen wrote:
> All,
>
> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided.

Right - it doesn't make sense to have to use UnlockDiagnosticVMOptions 
to run any diagnostic Dcmd. Otherwise it should be a requirement for all.

> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.

The only reason a Dcmd should require a specific VM option is if the 
Dcmd will not be able to function unless the VM was started with that 
option - IMHO :) Is that the case for LogTouchedMethods?

Thanks,
David

> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java
>
> bug: https://bugs.openjdk.java.net/browse/JDK-8168305
> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/
>
> Thanks,
> /Staffan
>

From staffan.larsen at oracle.com  Thu Oct 27 19:19:27 2016
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 27 Oct 2016 21:19:27 +0200
Subject: RFR(S): 8168305 GC.class_stats should not require
	-XX:+UnlockDiagnosticVMOptions
In-Reply-To: <2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com>
References: <0684E1C9-0714-4A0C-9D0F-8C3C97AD515D@oracle.com>
	<2f645023-1c9a-38f4-a51d-bcaad8223d6f@oracle.com>
Message-ID: <1437F2CC-C254-47A0-A235-04A1B9E031CF@oracle.com>


> On 27 Oct 2016, at 20:15, David Holmes <david.holmes at oracle.com> wrote:
> 
> On 27/10/2016 6:36 PM, Staffan Larsen wrote:
>> All,
>> 
>> Please review this small patch to remove the requirement -XX:+UnlockDiagnosticVMOptions when running the GC.class_stats diagnostic command. Diagnostic commands are used for diagnosing problems and should not require restarting the JVM with a different command line flag if this can be avoided.
> 
> Right - it doesn't make sense to have to use UnlockDiagnosticVMOptions to run any diagnostic Dcmd. Otherwise it should be a requirement for all.
> 
>> While fixing this I also noticed that VM.print_touched_methods said it required -XX:+UnlockDiagnosticVMOptions, while what it really required was -XX:+LogTouchedMethods (which in turn requires -XX:+UnlockDiagnosticVMOptions) so I changed the error message here. In this case I think it is ok to require a special command line flag since collecting the information comes with an overhead.
> 
> The only reason a Dcmd should require a specific VM option is if the Dcmd will not be able to function unless the VM was started with that option - IMHO :) Is that the case for LogTouchedMethods?

I believe LogTouchedMethods stores a long list of all methods ever being run. We don?t want to have that enabled by default. It would maybe be a good future enhancement to be able to turn this on and off?

> 
> Thanks,
> David
> 
>> I have verified the fix manually and by running these tests: hotspot/test/runtime/CommandLine/PrintTouchedMethods.java hotspot/test/serviceability/sa/TestInstanceKlassSize.java hotspot/test/serviceability/sa/TestInstanceKlassSizeForInterface.java
>> 
>> bug: https://bugs.openjdk.java.net/browse/JDK-8168305
>> webrev: http://cr.openjdk.java.net/~sla/8168305/webrev.01/
>> 
>> Thanks,
>> /Staffan
>> 


From mandy.chung at oracle.com  Fri Oct 28 02:54:58 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Thu, 27 Oct 2016 19:54:58 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
Message-ID: <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>


> On Oct 27, 2016, at 3:28 PM, Brent Christian <brent.christian at oracle.com> wrote:
> 
> Hi, Mandy
> 
> It looks pretty good to me.  Just a couple small things:
> 
> * StackTraceElement.java
> 
> 379             ClassLoader loader = cls.getClassLoader0();
> 
> It looks as if 'loader' isn't used??

Good catch.  Leftover code.  Removed.

> * Throwable.java
> 
> 832             // VM to fill in StackTraceElement
> 833             getStackTraceElements(stackTrace);
> 834             // ensure the proper StackTraceElement initialization
> 835             for (StackTraceElement ste : stackTrace) {
> 836                 ste.buildLoaderModuleClassName();
> 837             }
> 
> For my own curiosity, why is this buildLoaderModuleClassName() call needed?

When the VM fills in the stack trace, it sets Class object in StackTraceElement and the buildLoaderModuleClassName() call here to (1) build the output string whose format as described in the javadoc, and stored in a serial form (2) not to hold a strong reference to Class object.  StackTraceElement is serializable and it can?t build the correct string, when deserialized.

Mandy

From serguei.spitsyn at oracle.com  Fri Oct 28 07:06:39 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Fri, 28 Oct 2016 00:06:39 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
Message-ID: <c41f719f-a584-2887-30c3-afad406679b5@oracle.com>

Hi Mandy,


I have a few comments.

http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java.udiff.html

      private static class BootClassLoader extends BuiltinClassLoader {
          BootClassLoader(URLClassPath bcp) {
- super(null, bcp);
+ super(null, null, bcp);
          }
. . .

          PlatformClassLoader(BootClassLoader parent) {
- super(parent, null);
+ super("platform", parent, null);
          }

. . .

          AppClassLoader(PlatformClassLoader parent, URLClassPath ucp) {
- super(parent, ucp);
+ super("app", parent, ucp);
              this.ucp = ucp;
          }


    Can we give the bootstrap classloader the name "boot" or "bootstrap"?
    Or this will impact too many places, and so, very risky to do?


http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/java/lang/StackTraceElement.java.frames.html

379 ClassLoader loader = cls.getClassLoader0(); The loader is unused.
402 private static String toLoaderModuleClassName(Class<?> cls) {
403 ClassLoader loader = cls.getClassLoader0();
404 Module m = cls.getModule();
405
406 // First element - class loader name
407 String s = "";
408 if (loader != null && !(loader instanceof BuiltinClassLoader) &&
409 loader.getName() != null) {
410 s = loader.getName() + "/";
411 }
412
413 // Second element - module name and version
414 if (m != null && m.isNamed()) {
415 s = s.isEmpty() ? m.getName() : s + m.getName();
416 // drop version if it's JDK module tied with java.base,
417 // i.e. non-upgradeable
418 if (!HashedModules.contains(m)) {
419 Optional<ModuleDescriptor.Version> ov = m.getDescriptor().version();
420 if (ov.isPresent()) {
421 String version = "@" + ov.get().toString();
422 s = s.isEmpty() ? version : s + version;
423 }
424 }
425 }
426
427 // fully-qualified class name
428 return s.isEmpty() ? cls.getName() : s + "/" + cls.getName();
429 }
Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 
s += version; Also, if the loader has a name but (m == null || 
!m.isNamed())  then it looks like the sign "/" will be added twice (see 
L410 and L428). It can be fixed and simplified with: Add line before 
425: s += "/"; 428 return s + cls.getName();

   Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader?
   Would it make sense to add a comment explaining it?

Thanks, Serguei

On 10/25/16 16:10, Mandy Chung wrote:
> Webrev at:
>     http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/
>
> Specdiff:
>     http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html
>
> This is a long-standing RFE for adding support for class
> loader names.  It's #ClassLoaderNames on JSR 376 issue
> list where the proposal [1] has been implemented in jake
> for some time.  This patch brings this change to jdk9.
>
> A short summary:
> - New constructors are added in ClassLoader, SecureClassLoader
>    and URLClassLoader to specify the class loader name.
>
> - New ClassLoader::getName and StackTraceElement::getClassLoaderName
>    method
>
> - StackTraceElement::toString is updated to include the name
>    of the class loader and module of that frame in this format:
>       <loader>/<module>/<fully-qualified-name>(<src>:<line>)
>
> The detail is in StackTraceElement::buildLoaderModuleClassName
> that compress the output string for cases when the loader
> has no name or the module is unnamed module.  Another thing
> to mention is that VM sets the Class object when filling in
> a stack trace of a Throwable object.  Then the library will
> build a String from the Class object for serialization purpose.
>
> Mandy
> [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html


From thomas.schatzl at oracle.com  Fri Oct 28 11:31:53 2016
From: thomas.schatzl at oracle.com (Thomas Schatzl)
Date: Fri, 28 Oct 2016 13:31:53 +0200
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor
	for ppc64
In-Reply-To: <OFBB39F73B.41D1EAF7-ON4925804F.00059A25-4925804F.00099D08@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com>
	<OFC81622C2.A <9bffd66d-abe0-8d3d-262a-4be55b81b9a4@oracle.com>
	<OFBB39F73B.41D1EAF7-ON4925804F.00059A25-4925804F.00099D08@notes.na.collabserv.com>
Message-ID: <1477654313.3851.11.camel@oracle.com>

Hi Hiroshi,

? first, apologies for taking so long for an answer. Sorry.

On Mon, 2016-10-17 at 10:44 +0900, Hiroshi H Horii wrote:
> Hi David,
> 
> Thank you for your comments.
> 
> > Do you have any metrics on this latest version?
> > [...]
> >?
> > I think the GC experts need to have a discussion to resolve things
> > to their mutual satisfaction.
> 
> Thank you for lots of your comments and suggestions. And lots of my
> mistakes made the discussion long. very sorry. I would like to know
> comments of GC experts.

? we in the gc team have discussed this change quite a bit internally.
Overall, we think this change seems far too risky from both a
functional and performance perspective to go into 9 at this time.

The current proposal lacks some clear analysis on why removing the
barriers is safe, most analysis in this thread has been "it is fine";
and "the code is faster and does not crash" on one particular platform
for one particular application, and that seems too little.

We at least expect the change to be not only analyzed "good" in a
review, but also tested thoroughly on all platforms affected (which are
all of them in the latest change). We can of course help with testing
on platforms we support.

We also think the testing needs to include both functional and
performance testing, and the performance testing ought to be using some
well-chosen benchmarks. (It was pointed out very early in the
discussion of this change that specjbb2013 is deprecated, yet that is
the only benchmark that's been reported out.)

The most recent change also penalizes current platforms that do not
implement the release-CAS with an additional acquire. That might be not
an issue for TSO platforms, but others will be affected.

While we think other platforms could quickly adapt to this, this would
force that the developer that implements this for other platforms
(arm/aarch64) to be stuck with re-analyzing these issues. We
do not think this is fair. We think this is a change (or set of
changes) that needs to be pushed for all platforms at the same time.

There also one (minor) question about the change: why isn't the CAS
result value being used for the failing paths of the CAS, rather than
reloaded in copy_to_survivor_space?

Thanks,
? Thomas


From mandy.chung at oracle.com  Fri Oct 28 20:44:19 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Fri, 28 Oct 2016 13:44:19 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <bec2675d-20b6-8c30-a558-23b3de412f73@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
	<52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>
	<bec2675d-20b6-8c30-a558-23b3de412f73@oracle.com>
Message-ID: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com>


> On Oct 28, 2016, at 11:11 AM, Brent Christian <brent.christian at oracle.com> wrote:
> 
> Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ?

Good catch - I missed it.  I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s).

http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/

Mandy

From david.holmes at oracle.com  Fri Oct 28 21:09:45 2016
From: david.holmes at oracle.com (David Holmes)
Date: Sat, 29 Oct 2016 07:09:45 +1000
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
Message-ID: <dc8b491c-4d9c-b08a-894e-afe4de11e0b4@oracle.com>

Hi Mandy,

I know it's rather late in the game to notice this but I only just 
noticed this due to Serguei's comment ...

On 28/10/2016 5:06 PM, serguei.spitsyn at oracle.com wrote:
> Hi Mandy,
>
>
> I have a few comments.
>
> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/jdk/internal/loader/ClassLoaders.java.udiff.html
>
>
>      private static class BootClassLoader extends BuiltinClassLoader {
>          BootClassLoader(URLClassPath bcp) {
> - super(null, bcp);
> + super(null, null, bcp);
>          }
> . . .
>
>          PlatformClassLoader(BootClassLoader parent) {
> - super(parent, null);
> + super("platform", parent, null);
>          }
>
> . . .
>
>          AppClassLoader(PlatformClassLoader parent, URLClassPath ucp) {
> - super(parent, ucp);
> + super("app", parent, ucp);
>              this.ucp = ucp;
>          }
>
>
>    Can we give the bootstrap classloader the name "boot" or "bootstrap"?
>    Or this will impact too many places, and so, very risky to do?

Given the BootClassLoader instance is not in fact the boot loader at all 
I think it would have been clearer and avoid potential confusion to call 
this something more representative of its purpose - perhaps 
BootResourceloader or BootLoaderHelper or ...

Thanks,
David

>
>
> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/jdk/src/java.base/share/classes/java/lang/StackTraceElement.java.frames.html
>
>
> 379 ClassLoader loader = cls.getClassLoader0(); The loader is unused.
> 402 private static String toLoaderModuleClassName(Class<?> cls) {
> 403 ClassLoader loader = cls.getClassLoader0();
> 404 Module m = cls.getModule();
> 405
> 406 // First element - class loader name
> 407 String s = "";
> 408 if (loader != null && !(loader instanceof BuiltinClassLoader) &&
> 409 loader.getName() != null) {
> 410 s = loader.getName() + "/";
> 411 }
> 412
> 413 // Second element - module name and version
> 414 if (m != null && m.isNamed()) {
> 415 s = s.isEmpty() ? m.getName() : s + m.getName();
> 416 // drop version if it's JDK module tied with java.base,
> 417 // i.e. non-upgradeable
> 418 if (!HashedModules.contains(m)) {
> 419 Optional<ModuleDescriptor.Version> ov = m.getDescriptor().version();
> 420 if (ov.isPresent()) {
> 421 String version = "@" + ov.get().toString();
> 422 s = s.isEmpty() ? version : s + version;
> 423 }
> 424 }
> 425 }
> 426
> 427 // fully-qualified class name
> 428 return s.isEmpty() ? cls.getName() : s + "/" + cls.getName();
> 429 }
> Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422
> s += version; Also, if the loader has a name but (m == null ||
> !m.isNamed())  then it looks like the sign "/" will be added twice (see
> L410 and L428). It can be fixed and simplified with: Add line before
> 425: s += "/"; 428 return s + cls.getName();
>
>   Also, it is not clear why the loader name is not included for an
> instance of theBuiltinClassLoader?
>   Would it make sense to add a comment explaining it?
>
> Thanks, Serguei
>
> On 10/25/16 16:10, Mandy Chung wrote:
>> Webrev at:
>>     http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/
>>
>> Specdiff:
>>
>> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html
>>
>>
>> This is a long-standing RFE for adding support for class
>> loader names.  It's #ClassLoaderNames on JSR 376 issue
>> list where the proposal [1] has been implemented in jake
>> for some time.  This patch brings this change to jdk9.
>>
>> A short summary:
>> - New constructors are added in ClassLoader, SecureClassLoader
>>    and URLClassLoader to specify the class loader name.
>>
>> - New ClassLoader::getName and StackTraceElement::getClassLoaderName
>>    method
>>
>> - StackTraceElement::toString is updated to include the name
>>    of the class loader and module of that frame in this format:
>>       <loader>/<module>/<fully-qualified-name>(<src>:<line>)
>>
>> The detail is in StackTraceElement::buildLoaderModuleClassName
>> that compress the output string for cases when the loader
>> has no name or the module is unnamed module.  Another thing
>> to mention is that VM sets the Class object when filling in
>> a stack trace of a Throwable object.  Then the library will
>> build a String from the Class object for serialization purpose.
>>
>> Mandy
>> [1]
>> http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html
>>
>

From mandy.chung at oracle.com  Fri Oct 28 21:36:03 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Fri, 28 Oct 2016 14:36:03 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
Message-ID: <A48237AB-8E6A-4161-8131-A18068D9FEFA@oracle.com>


> On Oct 28, 2016, at 12:06 AM, serguei.spitsyn at oracle.com wrote:
> 
>   Can we give the bootstrap classloader the name "boot" or "bootstrap??

BootClassLoader is not the boostrap class loader but instead it's implementation details.  The bootstrap ClassLoader instance is null and so you can?t invoke ClassLoader::getName.

>   
> Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 s += version;

OK.  At one point, that was how it was coded.  

> Also, if the loader has a name but (m == null || !m.isNamed())  then it looks like the sign "/" will be added twice (see L410 and L428). It can be fixed and simplified with: Add line before 425: s += "/"; 428 return s + cls.getName();

?<loader>//<classname>? is correct.

> 
>  Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader?

Make the output compact when it can, for example, the class loader name ?app? and ?platform? from classes from the JDK can be implied.

>  Would it make sense to add a comment explaining it?

Maybe not much to add that.

Mandy

From serguei.spitsyn at oracle.com  Sat Oct 29 10:03:40 2016
From: serguei.spitsyn at oracle.com (serguei.spitsyn at oracle.com)
Date: Sat, 29 Oct 2016 03:03:40 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <A48237AB-8E6A-4161-8131-A18068D9FEFA@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
	<A48237AB-8E6A-4161-8131-A18068D9FEFA@oracle.com>
Message-ID: <58a37318-6f30-5996-c619-ff7b4c23eec2@oracle.com>

Thank you for clarifications, Mandy!
Serguei


On 10/28/16 14:36, Mandy Chung wrote:
>> On Oct 28, 2016, at 12:06 AM, serguei.spitsyn at oracle.com wrote:
>>
>>    Can we give the bootstrap classloader the name "boot" or "bootstrap??
> BootClassLoader is not the boostrap class loader but instead it's implementation details.  The bootstrap ClassLoader instance is null and so you can?t invoke ClassLoader::getName.
>
>>    
>> Also, the lines 415 and 422 can be simplified: 415 s += m.getName(); 422 s += version;
> OK.  At one point, that was how it was coded.
>
>> Also, if the loader has a name but (m == null || !m.isNamed())  then it looks like the sign "/" will be added twice (see L410 and L428). It can be fixed and simplified with: Add line before 425: s += "/"; 428 return s + cls.getName();
> ?<loader>//<classname>? is correct.
>
>>   Also, it is not clear why the loader name is not included for an instance of theBuiltinClassLoader?
> Make the output compact when it can, for example, the class loader name ?app? and ?platform? from classes from the JDK can be implied.
>
>>   Would it make sense to add a comment explaining it?
> Maybe not much to add that.
>
> Mandy


From HORII at jp.ibm.com  Sat Oct 29 10:37:18 2016
From: HORII at jp.ibm.com (Hiroshi H Horii)
Date: Sat, 29 Oct 2016 19:37:18 +0900
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <1477654313.3851.11.camel@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
Message-ID: <OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>

Hi Thomas,

>   we in the gc team have discussed this change quite a bit internally.
> Overall, we think this change seems far too risky from both a
> functional and performance perspective to go into 9 at this time.

Thank you for your comments and giving a decision.
I completely agree with the decision and would like to keep contributing
to this change for future releases.

> We also think the testing needs to include both functional and
> performance testing, and the performance testing ought to be using some
> well-chosen benchmarks. (It was pointed out very early in the
> discussion of this change that specjbb2013 is deprecated, yet that is
> the only benchmark that's been reported out.)

I see. I will try other workloads and evaluate effects of this change.

> The most recent change also penalizes current platforms that do not
> implement the release-CAS with an additional acquire. That might be not
> an issue for TSO platforms, but others will be affected.
> 
> While we think other platforms could quickly adapt to this, this would
> force that the developer that implements this for other platforms
> (arm/aarch64) to be stuck with re-analyzing these issues. We
> do not think this is fair. We think this is a change (or set of
> changes) that needs to be pushed for all platforms at the same time.

Sure. I would like to ask developers for the other platforms to consider
this change.

> There also one (minor) question about the change: why isn't the CAS
> result value being used for the failing paths of the CAS, rather than
> reloaded in copy_to_survivor_space?

I believe, the original code also doesn't use the CAS result because
the current cas_forward_to doesn't return the CAS result value.

bool oopDesc::cas_forward_to(oop p, markOop compare, cmpxchg_memory_order 
order)

I guess, reloading a forwardee is not expensive because CAS fails are 
rare,
then maintenanceability was emphasized.

"Doerr, Martin" <martin.doerr at sap.com> wrote on 10/21/2016 21:57:42:
> The webrev also contains a logging change in 
> psPromotionManager.inline.hpp which I'm not sure if it's still wanted.

For the future discussion, I would like to inform a webrev that doesn't
have any changes of log formats.
http://cr.openjdk.java.net/~horii/8154736/webrev.06/

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo


From aph at redhat.com  Sun Oct 30 18:36:38 2016
From: aph at redhat.com (Andrew Haley)
Date: Sun, 30 Oct 2016 18:36:38 +0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
	<OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
Message-ID: <a86d39db-574c-1613-f5c0-200e517cdb05@redhat.com>

On 29/10/16 11:37, Hiroshi H Horii wrote:
>> The most recent change also penalizes current platforms that do not
>> > implement the release-CAS with an additional acquire. That might be not
>> > an issue for TSO platforms, but others will be affected.
>> > 
>> > While we think other platforms could quickly adapt to this, this would
>> > force that the developer that implements this for other platforms
>> > (arm/aarch64) to be stuck with re-analyzing these issues. We
>> > do not think this is fair. We think this is a change (or set of
>> > changes) that needs to be pushed for all platforms at the same time.
>
> Sure. I would like to ask developers for the other platforms to consider
> this change.

OK, I will.  Can you please point me to the change and what it means?

And, while we're on the subject, is memory_order_conservative actually
defined anywhere?

Thanks,

Andrew.


From david.holmes at oracle.com  Sun Oct 30 21:26:26 2016
From: david.holmes at oracle.com (David Holmes)
Date: Mon, 31 Oct 2016 07:26:26 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <a86d39db-574c-1613-f5c0-200e517cdb05@redhat.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
	<OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
	<a86d39db-574c-1613-f5c0-200e517cdb05@redhat.com>
Message-ID: <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com>

On 31/10/2016 4:36 AM, Andrew Haley wrote:
> On 29/10/16 11:37, Hiroshi H Horii wrote:
>>> The most recent change also penalizes current platforms that do not
>>>> implement the release-CAS with an additional acquire. That might be not
>>>> an issue for TSO platforms, but others will be affected.
>>>>
>>>> While we think other platforms could quickly adapt to this, this would
>>>> force that the developer that implements this for other platforms
>>>> (arm/aarch64) to be stuck with re-analyzing these issues. We
>>>> do not think this is fair. We think this is a change (or set of
>>>> changes) that needs to be pushed for all platforms at the same time.
>>
>> Sure. I would like to ask developers for the other platforms to consider
>> this change.
>
> OK, I will.  Can you please point me to the change and what it means?
>
> And, while we're on the subject, is memory_order_conservative actually
> defined anywhere?

No. It was chosen to represent the current status quo that the Atomic:: 
ops should all be (by default) full bi-directional fences. It is a place 
holder until this memory order stuff is fleshed out in hotspot. We 
didn't adopt C++ memory_order_seq_cst as is isn't obvious that actually 
matches our current semantics. At least it isn't obvious to me.

Cheers,
David

> Thanks,
>
> Andrew.
>

From aph at redhat.com  Mon Oct 31 09:32:44 2016
From: aph at redhat.com (Andrew Haley)
Date: Mon, 31 Oct 2016 09:32:44 +0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
	<OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
	<a86d39db-574c-1613-f5c0-200e517cdb05@redhat.com>
	<1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com>
Message-ID: <f13b8b58-4aa5-2bf3-8c00-a66d8809b355@redhat.com>

On 30/10/16 21:26, David Holmes wrote:
> On 31/10/2016 4:36 AM, Andrew Haley wrote:
>>
>> And, while we're on the subject, is memory_order_conservative actually
>> defined anywhere?
> 
> No. It was chosen to represent the current status quo that the Atomic:: 
> ops should all be (by default) full bi-directional fences.

Does that mean that a CAS is actually stronger than a load acquire
followed by a store release?  And that a CAS is a release fence even
when it fails and no store happens?

And that a conservative load is a *store* barrier?

> It is a place holder until this memory order stuff is fleshed out in
> hotspot. We didn't adopt C++ memory_order_seq_cst as is isn't
> obvious that actually matches our current semantics. At least it
> isn't obvious to me.

It's not obvious to me either, because I don't know what our current
semantics are.  But I believe that if we need anything stronger than
sequential consistency we should look at fixing the callers of the
Atomic:: ops.  But I guess the real problem is that we don't know
which callers actually need the super-strong guarantees, or even that
any exist.

Andrew.

From lois.foltan at oracle.com  Mon Oct 31 11:10:52 2016
From: lois.foltan at oracle.com (Lois Foltan)
Date: Mon, 31 Oct 2016 07:10:52 -0400
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
	<52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>
	<bec2675d-20b6-8c30-a558-23b3de412f73@oracle.com>
	<931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com>
Message-ID: <581726BC.9080007@oracle.com>


On 10/28/2016 4:44 PM, Mandy Chung wrote:
>> On Oct 28, 2016, at 11:11 AM, Brent Christian <brent.christian at oracle.com> wrote:
>>
>> Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ?
> Good catch - I missed it.  I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s).
>
> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/
>
> Mandy
Looks good.
Lois


From martin.doerr at sap.com  Mon Oct 31 14:38:05 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Mon, 31 Oct 2016 14:38:05 +0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
	<OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
Message-ID: <5ee98a2421d84934a11ef9f3a24b11de@DEWDFE13DE10.global.corp.sap>

Hi Hiroshi,

when looking over the change for the first time, I had missed that the cmpxchg_post_membar is not safe for future enhancements:
Please use the condition "else if (order != memory_order_relaxed)" for the sync as in cmpxchg_pre_membar. The code should still work reliably if somebody adds new enum values.
I think this is a key property to justify the safety of this change. Adding enum values should not break any platform. This can be established by using maximum conservative barriers for unknown values.

If I remember correctly, some reviewers had complained about the acquire barriers. I think it will be better to present the change without them as this minimizes the impact to Oracle platforms. At least the comment "call acquire for reading fields of new_obj in callers" does not apply to any supported platform (Alpha is not supported) and should be changed.
I think a precise specification of the required ordering semantics is important. This is the second part which is needed to justify the safety of this change.

Best regards,
Martin


From: Hiroshi H Horii [mailto:HORII at jp.ibm.com]
Sent: Samstag, 29. Oktober 2016 12:37
To: Thomas Schatzl <thomas.schatzl at oracle.com>
Cc: David Holmes <david.holmes at oracle.com>; hotspot-compiler-dev <hotspot-compiler-dev at openjdk.java.net>; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net; Kim Barrett <kim.barrett at oracle.com>; Doerr, Martin <martin.doerr at sap.com>; ppc-aix-port-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

Hi Thomas,

>   we in the gc team have discussed this change quite a bit internally.
> Overall, we think this change seems far too risky from both a
> functional and performance perspective to go into 9 at this time.

Thank you for your comments and giving a decision.
I completely agree with the decision and would like to keep contributing
to this change for future releases.

> We also think the testing needs to include both functional and
> performance testing, and the performance testing ought to be using some
> well-chosen benchmarks. (It was pointed out very early in the
> discussion of this change that specjbb2013 is deprecated, yet that is
> the only benchmark that's been reported out.)

I see. I will try other workloads and evaluate effects of this change.

> The most recent change also penalizes current platforms that do not
> implement the release-CAS with an additional acquire. That might be not
> an issue for TSO platforms, but others will be affected.
>
> While we think other platforms could quickly adapt to this, this would
> force that the developer that implements this for other platforms
> (arm/aarch64) to be stuck with re-analyzing these issues. We
> do not think this is fair. We think this is a change (or set of
> changes) that needs to be pushed for all platforms at the same time.

Sure. I would like to ask developers for the other platforms to consider
this change.

> There also one (minor) question about the change: why isn't the CAS
> result value being used for the failing paths of the CAS, rather than
> reloaded in copy_to_survivor_space?

I believe, the original code also doesn't use the CAS result because
the current cas_forward_to doesn't return the CAS result value.

bool oopDesc::cas_forward_to(oop p, markOop compare, cmpxchg_memory_order order)

I guess, reloading a forwardee is not expensive because CAS fails are rare,
then maintenanceability was emphasized.

"Doerr, Martin" <martin.doerr at sap.com<mailto:martin.doerr at sap.com>> wrote on 10/21/2016 21:57:42:
> The webrev also contains a logging change in
> psPromotionManager.inline.hpp which I'm not sure if it's still wanted.

For the future discussion, I would like to inform a webrev that doesn't
have any changes of log formats.
http://cr.openjdk.java.net/~horii/8154736/webrev.06/

Regards,
Hiroshi
-----------------------
Hiroshi Horii, Ph.D.
IBM Research - Tokyo

From mandy.chung at oracle.com  Mon Oct 31 15:09:17 2016
From: mandy.chung at oracle.com (Mandy Chung)
Date: Mon, 31 Oct 2016 08:09:17 -0700
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <dc8b491c-4d9c-b08a-894e-afe4de11e0b4@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<c41f719f-a584-2887-30c3-afad406679b5@oracle.com>
	<dc8b491c-4d9c-b08a-894e-afe4de11e0b4@oracle.com>
Message-ID: <3431C1FF-9F28-4993-8CB8-8B38AB0B73BF@oracle.com>


> On Oct 28, 2016, at 2:09 PM, David Holmes <david.holmes at oracle.com> wrote:
> 
> :
> 
> Given the BootClassLoader instance is not in fact the boot loader at all I think it would have been clearer and avoid potential confusion to call this something more representative of its purpose - perhaps BootResourceloader or BootLoaderHelper or ...

BootClassLoader is private class and BootLoader is the internal API to find resources and packages.  IMO their names are fine and the comment in BootLoader is clear.

Mandy

From nipa at codefx.org  Mon Oct 31 15:39:51 2016
From: nipa at codefx.org (Nicolai Parlog)
Date: Mon, 31 Oct 2016 16:39:51 +0100
Subject: How to use @ReservedStackAccess?
Message-ID: <8850ae23-fda8-2481-261e-42b53131eb72@codefx.org>

 Hi!

I've been experimenting with @ReservedStackAccess but couldn't get it
to work. Any help would be highly appreciated.

## SETUP

I'm artificially creating a stack overflow by recursing indefinitely.
I then want to benefit from @ReservedStackAccess by executing some
code outside of an exception handler.

Here's my code:

	public static void main(String[] args) {
		try {
			recurseThenGreet();
		} catch (StackOverflowError err) {
			// to not have the console spammed with output
			System.out.println("Error");
		}
	}

	@ReservedStackAccess
	private static void recurseThenGreet() {
		recurse();
		System.out.println("Hi!");
	}

	private static void recurse() {
		recurse();
	}

I'm using build 9-ea+141-jigsaw-nightly-h5650-20161026. I compile with

	--add-exports java.base/jdk.internal.vm.annotation=ALL-UNNAMED

to make the annotation available and launch with

	-XX:-RestrictReservedStack

to activate the reserved stack for user land code.

## OBSERVED

This is the output I get:

Java HotSpot(TM) 64-Bit Server VM warning: Potentially dangerous stack
overflow in ReservedStackAccess annotated method
org.codefx.demo.java9.internal.stack.ReservingStackFrames_Simple.recurseThenGreet()V[1]
Error

## EXPECTED

I expected "Hi!" to show up somewhere there.

My best guess is that I put the annotation in the wrong place but
experimenting didn't help. Any help would be greatly appreciated!

 Thanks!
 Nicolai


-- 

PGP Key:
    http://keys.gnupg.net/pks/lookup?op=vindex&search=0xCA3BAD2E9CCCD509

Web:
    http://codefx.org
        a blog about software development
    https://www.sitepoint.com/java
        high-quality Java/JVM content
    http://do-foss.de
        Free and Open Source Software for the City of Dortmund

Twitter:
    https://twitter.com/nipafx

From david.holmes at oracle.com  Mon Oct 31 21:30:19 2016
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 1 Nov 2016 07:30:19 +1000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <f13b8b58-4aa5-2bf3-8c00-a66d8809b355@redhat.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-4925
	<c5dcc160-8d30-8160-2c5d-93a62bf2c940@oracle.com> <OFC81622C2.A
	<9bffd66d-abe0-8d3d-262a-4be55b81b9a4@
	<1477654313.3851.11.camel@oracle.com>
	<OF6B2B76B0.120062F1-ON4925805B.00311F7B-4925805B.003A58B5@notes.na.collabserv.com>
	<a86d39db-574c-1613-f5c0-200e517cdb05@redhat.com>
	<1cbb094f-b29b-c6b3-1e50-bed21b140fcb@oracle.com>
	<f13b8b58-4aa5-2bf3-8c00-a66d8809b355@redhat.com>
Message-ID: <b680fce7-b279-f28f-5ba0-5cd891372e01@oracle.com>


On 31/10/2016 7:32 PM, Andrew Haley wrote:
> On 30/10/16 21:26, David Holmes wrote:
>> On 31/10/2016 4:36 AM, Andrew Haley wrote:
>>>
>>> And, while we're on the subject, is memory_order_conservative actually
>>> defined anywhere?
>>
>> No. It was chosen to represent the current status quo that the Atomic::
>> ops should all be (by default) full bi-directional fences.
>
> Does that mean that a CAS is actually stronger than a load acquire
> followed by a store release?  And that a CAS is a release fence even
> when it fails and no store happens?

Yes. Yes.

   // All of the atomic operations that imply a read-modify-write action
   // guarantee a two-way memory barrier across that operation. Historically
   // these semantics reflect the strength of atomic operations that are
   // provided on SPARC/X86. We assume that strength is necessary unless
   // we can prove that a weaker form is sufficiently safe.

But there is some contention as to whether the actual implementations 
obey this completely.

>
> And that a conservative load is a *store* barrier?

Not sure what you mean. Atomic::load is not a r-m-w action so not 
expected to be a two-way memory barrier.

>> It is a place holder until this memory order stuff is fleshed out in
>> hotspot. We didn't adopt C++ memory_order_seq_cst as is isn't
>> obvious that actually matches our current semantics. At least it
>> isn't obvious to me.
>
> It's not obvious to me either, because I don't know what our current
> semantics are.  But I believe that if we need anything stronger than
> sequential consistency we should look at fixing the callers of the
> Atomic:: ops.  But I guess the real problem is that we don't know
> which callers actually need the super-strong guarantees, or even that
> any exist.

Indeed. I don't know how to reliably analyse all uses to determine what 
"strength" is needed, or what features of that code enable, or reject, 
use of a particular strength. Ref the current discussions.

David

> Andrew.
>

From martin.doerr at sap.com  Fri Oct 21 12:57:52 2016
From: martin.doerr at sap.com (Doerr, Martin)
Date: Fri, 21 Oct 2016 12:57:52 -0000
Subject: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for
	ppc64
In-Reply-To: <a76e06b0-4004-cd32-73e8-cdb5850a96b9@redhat.com>
References: <201604221228.u3MCSXCL020021@d19av07.sagamino.japan.ibm.com>
	<0e47ed4857d94f9bbd99b0738bf1708a@DEWDFE13DE14.global.corp.sap>
	<f5826c30-0e12-8af9-9f78-3e7fd173b899@oracle.com>
	<OFE8C20C07.4A5437DD-ON4925803D.0040476D-4925803D.0041F53D@notes.na.collabserv.com>
	<CAP_pwnWpE9OhRA-XxTjKAq4T2rLjnLXLDomkBvAPdJ1G8XEjQw@mail.gmail.com>
	<f52703e8-67b9-0852-540e-a31e5dca1c1e@oracle.com>
	<OFA2287681.8B1427FA-ON4925803E.0035621E-4925803E.00387EBB@notes.na.collabserv.com>
	<1475236951.6301.72.camel@oracle.com>
	<OF78EB09B0.8B71606C-ON49258040.004F1656-49258040.00512C99@notes.na.collabserv.com>
	<CAP_pwnUsC18TNvRg1_M273tjCav11_Xy=jQCkQC2_KPgztEu2A@mail.gmail.com>
	<6ee4f1c6-f638-c5b9-7475-8fb6aeabf20b@oracle.com>
	<14c2eff4-4f90-caa0-17a7-835e6f1f1167@oracle.com>
	<OFCFB3DB17.F187E7F2-ON49258042.0053F83E-49258043.00035A78@notes.na.collabserv.com>
	<f2fb462a-843b-7310-bb41-9b238071ec3a@oracle.com>
	<D34486A1-FEDC-43D3-BA67-8699981DB511@oracle.com>
	<a76e06b0-4004-cd32-73e8-cdb5850a96b9@redhat.com>
Message-ID: <c1b17fcdf9b7458b8cfdb12b8e8d57d4@DEWDFE13DE14.global.corp.sap>

Hi all,

thank you very much for reviewing. I fully agree with the latest replies.

I think Hiroshi's latest webrev (http://cr.openjdk.java.net/~horii/8154736/webrev.05/) is pretty close to it.
There are only still acquire barriers which could be replaced by a comment like "We rely on memory_order_consume here.".
I'd prefer this, too, even though acquire barriers in failure cases would probably not really hurt.
Cmpxchg Release,Relaxed + Load Consume seems to be the pattern which matches the needs exactly.

The webrev also contains a logging change in psPromotionManager.inline.hpp which I'm not sure if it's still wanted.

Not sure if aarch64 should be addressed in a separate change.

Besides that, it looks good to me.

Best regards,
Martin


-----Original Message-----
From: hotspot-runtime-dev [mailto:hotspot-runtime-dev-bounces at openjdk.java.net] On Behalf Of Andrew Haley
Sent: Dienstag, 11. Oktober 2016 11:26
To: Kim Barrett; David Holmes
Cc: hotspot-compiler-dev; Hiroshi H Horii; Tim Ellison; ppc-aix-port-dev at openjdk.java.net; Michihiro Horie; hotspot-gc-dev at openjdk.java.net; hotspot-runtime-dev at openjdk.java.net
Subject: Re: RFR(M): 8154736: enhancement of cmpxchg and copy_to_survivor for ppc64

On 06/10/16 23:16, Kim Barrett wrote:

> The key issue here is that we copy obj into new_obj, and then make
> new_obj accessible to other threads via the CAS.  Those other
> threads might attempt to access data in new_obj.  This suggests the
> CAS ought to have at least a release fence to ensure the copy is
> complete before the CAS is performed.  No amount of fencing on the
> read side (such as in the work stealing) can remove that need.

I agree.

> And that might be all that is needed.  On the post-CAS side, we load
> the forwardee and then load values from it.  I thik we can use
> implicit consume with dependent loads (except on Alpha) plus the
> suggested release fence to get the desired effect.

That's probably true, except that there's not really any such thing as
"implicit consume" in C++.  While all of the hardware we use respects
address dependencies, it's not something that the compiler knows
about, and it's explicitly undefined behaviour in the C++ memory
model.  If we're depending on memory_order_consume, perhaps we ought
to think about adding it to Atomic, even though it's just a volatile
load in older compilers.

Andrew.

From brent.christian at oracle.com  Thu Oct 27 22:28:42 2016
From: brent.christian at oracle.com (Brent Christian)
Date: Thu, 27 Oct 2016 22:28:42 -0000
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
Message-ID: <5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>

Hi, Mandy

It looks pretty good to me.  Just a couple small things:

* StackTraceElement.java

  379             ClassLoader loader = cls.getClassLoader0();

It looks as if 'loader' isn't used...?


* Throwable.java

  832             // VM to fill in StackTraceElement
  833             getStackTraceElements(stackTrace);
  834             // ensure the proper StackTraceElement initialization
  835             for (StackTraceElement ste : stackTrace) {
  836                 ste.buildLoaderModuleClassName();
  837             }

For my own curiosity, why is this buildLoaderModuleClassName() call needed?

Thanks,
-Brent

On 10/25/16 4:10 PM, Mandy Chung wrote:
> Webrev at:
>    http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.00/
>
> Specdiff:
>    http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/specdiff/overview-summary.html
>
> This is a long-standing RFE for adding support for class
> loader names.  It's #ClassLoaderNames on JSR 376 issue
> list where the proposal [1] has been implemented in jake
> for some time.  This patch brings this change to jdk9.
>
> A short summary:
> - New constructors are added in ClassLoader, SecureClassLoader
>   and URLClassLoader to specify the class loader name.
>
> - New ClassLoader::getName and StackTraceElement::getClassLoaderName
>   method
>
> - StackTraceElement::toString is updated to include the name
>   of the class loader and module of that frame in this format:
>      <loader>/<module>/<fully-qualified-name>(<src>:<line>)
>
> The detail is in StackTraceElement::buildLoaderModuleClassName
> that compress the output string for cases when the loader
> has no name or the module is unnamed module.  Another thing
> to mention is that VM sets the Class object when filling in
> a stack trace of a Throwable object.  Then the library will
> build a String from the Class object for serialization purpose.
>
> Mandy
> [1] http://mail.openjdk.java.net/pipermail/jpms-spec-observers/2016-September/000550.html
>

From brent.christian at oracle.com  Fri Oct 28 18:11:28 2016
From: brent.christian at oracle.com (Brent Christian)
Date: Fri, 28 Oct 2016 18:11:28 -0000
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
	<52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>
Message-ID: <bec2675d-20b6-8c30-a558-23b3de412f73@oracle.com>

On 10/27/16 7:54 PM, Mandy Chung wrote:
>> On Oct 27, 2016, at 3:28 PM, Brent Christian <brent.christian at oracle.com> wrote:
>>
>> * Throwable.java
>>
>> 832             // VM to fill in StackTraceElement
>> 833             getStackTraceElements(stackTrace);
>> 834             // ensure the proper StackTraceElement initialization
>> 835             for (StackTraceElement ste : stackTrace) {
>> 836                 ste.buildLoaderModuleClassName();
>> 837             }
>>
>> For my own curiosity, why is this buildLoaderModuleClassName() call needed?
>
> When the VM fills in the stack trace, it sets Class object in
> StackTraceElement and the buildLoaderModuleClassName() call here to
> (1) build the output string whose format as described in the javadoc,
> and stored in a serial form (2) not to hold a strong reference to
> Class object.  StackTraceElement is serializable and it can?t build
> the correct string, when deserialized.

Should something be done for STEs returned from 
StackFrameInfo.toStackTraceElement() ?  These are also filled in by the 
VM.  The strong Class reference is probably not such a concern, as the 
StackFrameInfo itself also holds one, but would we run into trouble upon 
trying to deserialize such an STE?

Thanks,
-Brent

From brent.christian at oracle.com  Sat Oct 29 00:14:52 2016
From: brent.christian at oracle.com (Brent Christian)
Date: Sat, 29 Oct 2016 00:14:52 -0000
Subject: Request Review: JDK-6479237 (cl) Add support for classloader names
In-Reply-To: <931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com>
References: <2C30243B-71D2-49E2-A8B6-2C33B82DB104@oracle.com>
	<5d494d3a-f8ec-0456-8d04-f8998d840fc7@oracle.com>
	<52E8F822-2D4E-4BD7-BE4E-748E47901625@oracle.com>
	<bec2675d-20b6-8c30-a558-23b3de412f73@oracle.com>
	<931BF9A2-6F22-48FF-855E-287BAF10FDC0@oracle.com>
Message-ID: <52930077-0002-cb83-f58d-d4ea6040076a@oracle.com>

On 10/28/16 1:44 PM, Mandy Chung wrote:
>
>> On Oct 28, 2016, at 11:11 AM, Brent Christian <brent.christian at oracle.com> wrote:
>>
>> Should something be done for STEs returned from StackFrameInfo.toStackTraceElement() ?
>
> Good catch - I missed it.  I added package-private static methods in StackTraceElement class for both Throwable and StackFrameInfo to get StackTraceElement(s).
>
> http://cr.openjdk.java.net/~mchung/jdk9/webrevs/6479237/webrev.02/
>

Looks good.

-Brent