From jaroslav.bachorik at oracle.com  Tue Jul  9 03:02:48 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 09 Jul 2013 12:02:48 +0200
Subject: jmx-dev RFR: 8010285 Enforce the requirement of Management
 Interfaces being public
In-Reply-To: <51C0440B.9030601@oracle.com>
References: <51A4BD98.1040704@oracle.com> <51A6171E.8040606@oracle.com>
	<CACBEn44CB3yw3rCs8k8CKVzVuTGubK2mSiFgqrmgspiba1=pGg@mail.gmail.com>
	<51A6382F.3000204@oracle.com> <51A63E82.4050505@oracle.com>
	<51A70081.5050203@oracle.com> <51AD1955.2090109@oracle.com>
	<51AF060D.5070706@oracle.com> <51AF136C.8070806@oracle.com>
	<51AF3494.3070304@oracle.com> <51AF4368.1040403@oracle.com>
	<51AF4ABB.1080005@oracle.com> <51AF7B42.7020902@oracle.com>
	<51AF9A56.4090709@oracle.com> <51B0A937.90607@oracle.com>
	<51B0AAB7.7070802@oracle.com> <51B0B39C.1050305@oracle.com>
	<51B18803.7060406@oracle.com> <51B1A2E2.5030001@oracle.com>
	<51C03013.2020700@oracle.com> <51C0359F.8090201@oracle.com>
	<51C036AF.1030206@oracle.com> <51C0440B.9030601@oracle.com>
Message-ID: <51DBDFC8.4090800@oracle.com>

Please, review the final version of the changes:
http://cr.openjdk.java.net/~jbachorik/8010285/webrev.07

It addresses all the concerns raised during the CCC process.

I will need at least one official OpenJDK reviewer for the integration.

Thanks,

-JB-

From mandy.chung at oracle.com  Tue Jul  9 12:42:43 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 09 Jul 2013 12:42:43 -0700
Subject: jmx-dev RFR: 8010285 Enforce the requirement of Management
 Interfaces being public
In-Reply-To: <51DBDFC8.4090800@oracle.com>
References: <51A4BD98.1040704@oracle.com> <51A6171E.8040606@oracle.com>
	<CACBEn44CB3yw3rCs8k8CKVzVuTGubK2mSiFgqrmgspiba1=pGg@mail.gmail.com>
	<51A6382F.3000204@oracle.com> <51A63E82.4050505@oracle.com>
	<51A70081.5050203@oracle.com> <51AD1955.2090109@oracle.com>
	<51AF060D.5070706@oracle.com> <51AF136C.8070806@oracle.com>
	<51AF3494.3070304@oracle.com> <51AF4368.1040403@oracle.com>
	<51AF4ABB.1080005@oracle.com> <51AF7B42.7020902@oracle.com>
	<51AF9A56.4090709@oracle.com> <51B0A937.90607@oracle.com>
	<51B0AAB7.7070802@oracle.com> <51B0B39C.1050305@oracle.com>
	<51B18803.7060406@oracle.com> <51B1A2E2.5030001@oracle.com>
	<51C03013.2020700@oracle.com> <51C0359F.8090201@oracle.com>
	<51C036AF.1030206@oracle.com> <51C0440B.9030601@oracle.com>
	<51DBDFC8.4090800@oracle.com>
Message-ID: <51DC67B3.8060103@oracle.com>

On 7/9/13 3:02 AM, Jaroslav Bachorik wrote:
> Please, review the final version of the changes:
> http://cr.openjdk.java.net/~jbachorik/8010285/webrev.07
>

The change looks reasonable. In the class spec for  MXBean, suggest to 
rename

    interface ThisIsNotMXBean{}

to something more explicit

    interface NonPublicInterfaceNotMXBean{}

You removed JMX.checkProxyInterface.  I believe the checkPackageAccess method on the given mbean
interface is called somewhere as part of the MBean validation - where is that check being done?

Other than that, it's fine with me.

Mandy

> It addresses all the concerns raised during the CCC process.
>
> I will need at least one official OpenJDK reviewer for the integration.
>
> Thanks,
>
> -JB-


From jaroslav.bachorik at oracle.com  Wed Jul 10 01:33:17 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 10 Jul 2013 10:33:17 +0200
Subject: jmx-dev RFR: 8010285 Enforce the requirement of Management
 Interfaces being public
In-Reply-To: <51DC67B3.8060103@oracle.com>
References: <51A4BD98.1040704@oracle.com> <51A6171E.8040606@oracle.com>
	<CACBEn44CB3yw3rCs8k8CKVzVuTGubK2mSiFgqrmgspiba1=pGg@mail.gmail.com>
	<51A6382F.3000204@oracle.com> <51A63E82.4050505@oracle.com>
	<51A70081.5050203@oracle.com> <51AD1955.2090109@oracle.com>
	<51AF060D.5070706@oracle.com> <51AF136C.8070806@oracle.com>
	<51AF3494.3070304@oracle.com> <51AF4368.1040403@oracle.com>
	<51AF4ABB.1080005@oracle.com> <51AF7B42.7020902@oracle.com>
	<51AF9A56.4090709@oracle.com> <51B0A937.90607@oracle.com>
	<51B0AAB7.7070802@oracle.com> <51B0B39C.1050305@oracle.com>
	<51B18803.7060406@oracle.com> <51B1A2E2.5030001@oracle.com>
	<51C03013.2020700@oracle.com> <51C0359F.8090201@oracle.com>
	<51C036AF.1030206@oracle.com> <51C0440B.9030601@oracle.com>
	<51DBDFC8.4090800@oracle.com> <51DC67B3.8060103@oracle.com>
Message-ID: <51DD1C4D.2060601@oracle.com>

On 07/09/2013 09:42 PM, Mandy Chung wrote:
> On 7/9/13 3:02 AM, Jaroslav Bachorik wrote:
>> Please, review the final version of the changes:
>> http://cr.openjdk.java.net/~jbachorik/8010285/webrev.07
>>
> 
> The change looks reasonable. In the class spec for  MXBean, suggest to
> rename
> 
>    interface ThisIsNotMXBean{}
> 
> to something more explicit
> 
>    interface NonPublicInterfaceNotMXBean{}

Since this was a part of the CCC review which was approved I am not sure
if I am allowed to change the class spec. If it is allowed I have no
objections against the proposal and will change the interface name.

> 
> You removed JMX.checkProxyInterface.  I believe the checkPackageAccess
> method on the given mbean
> interface is called somewhere as part of the MBean validation - where is
> that check being done?

com.sun.jmx.mbeanserver.MBeanIntrospector.getMethods() performs this
check. It is not possible to construct an M(X)Bean proxy without
consulting com.sun.jmx.mbeanserver.MBeanIntrospector.getMethods() first.

This functionality is enforced by a closed vulnerability test.

-JB-

> 
> Other than that, it's fine with me.
> 
> Mandy
> 
>> It addresses all the concerns raised during the CCC process.
>>
>> I will need at least one official OpenJDK reviewer for the integration.
>>
>> Thanks,
>>
>> -JB-
> 


From jaroslav.bachorik at oracle.com  Wed Jul 10 02:10:52 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 10 Jul 2013 11:10:52 +0200
Subject: jmx-dev RFR: 8019826 Test
 com/sun/management/HotSpotDiagnosticMXBean/SetVMOption.java fails with NPE
Message-ID: <51DD251C.5050009@oracle.com>

Please, review this simple fix.

http://cr.openjdk.java.net/~jbachorik/8019826/webrev.00

Firstly, the patch removes a conditional early exit which checks for a
build 52 of an unspecified major JVM version - it is not needed any
more. Basically, the condition just made the test a noop till the latest
hotspot version.

The second fix is correctly setting the "mbean" attribute - it was not
properly initialized and because of this the test was going to fail with
NPE.

Thanks,

-JB-

From shanliang.jiang at oracle.com  Wed Jul 10 09:48:34 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 10 Jul 2013 18:48:34 +0200
Subject: jmx-dev RFR: 8019826 Test
 com/sun/management/HotSpotDiagnosticMXBean/SetVMOption.java fails with NPE
In-Reply-To: <51DD251C.5050009@oracle.com>
References: <51DD251C.5050009@oracle.com>
Message-ID: <51DD9062.8020602@oracle.com>

It looks fine to me.

Shanliang

Jaroslav Bachorik wrote:
> Please, review this simple fix.
>
> http://cr.openjdk.java.net/~jbachorik/8019826/webrev.00
>
> Firstly, the patch removes a conditional early exit which checks for a
> build 52 of an unspecified major JVM version - it is not needed any
> more. Basically, the condition just made the test a noop till the latest
> hotspot version.
>
> The second fix is correctly setting the "mbean" attribute - it was not
> properly initialized and because of this the test was going to fail with
> NPE.
>
> Thanks,
>
> -JB-
>   


From mandy.chung at oracle.com  Wed Jul 10 17:52:28 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Thu, 11 Jul 2013 08:52:28 +0800
Subject: jmx-dev RFR: 8010285 Enforce the requirement of Management
 Interfaces being public
In-Reply-To: <51DD1C4D.2060601@oracle.com>
References: <51A4BD98.1040704@oracle.com> <51A6171E.8040606@oracle.com>
	<CACBEn44CB3yw3rCs8k8CKVzVuTGubK2mSiFgqrmgspiba1=pGg@mail.gmail.com>
	<51A6382F.3000204@oracle.com> <51A63E82.4050505@oracle.com>
	<51A70081.5050203@oracle.com> <51AD1955.2090109@oracle.com>
	<51AF060D.5070706@oracle.com> <51AF136C.8070806@oracle.com>
	<51AF3494.3070304@oracle.com> <51AF4368.1040403@oracle.com>
	<51AF4ABB.1080005@oracle.com> <51AF7B42.7020902@oracle.com>
	<51AF9A56.4090709@oracle.com> <51B0A937.90607@oracle.com>
	<51B0AAB7.7070802@oracle.com> <51B0B39C.1050305@oracle.com>
	<51B18803.7060406@oracle.com> <51B1A2E2.5030001@oracle.com>
	<51C03013.2020700@oracle.com> <51C0359F.8090201@oracle.com>
	<51C036AF.1030206@oracle.com> <51C0440B.9030601@oracle.com>
	<51DBDFC8.4090800@oracle.com> <51DC67B3.8060103@oracle.com>
	<51DD1C4D.2060601@oracle.com>
Message-ID: <51DE01CC.7060402@oracle.com>

On 7/10/2013 4:33 PM, Jaroslav Bachorik wrote:
>> >The change looks reasonable. In the class spec for  MXBean, suggest to
>> >rename
>> >
>> >    interface ThisIsNotMXBean{}
>> >
>> >to something more explicit
>> >
>> >    interface NonPublicInterfaceNotMXBean{}
> Since this was a part of the CCC review which was approved I am not sure
> if I am allowed to change the class spec. If it is allowed I have no
> objections against the proposal and will change the interface name.

That is an example interface name and is non-normative (unless you see 
it differently).  This can be revised.
thanks
Mandy

From david.holmes at oracle.com  Wed Jul 10 22:23:36 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 11 Jul 2013 15:23:36 +1000
Subject: jmx-dev RFR: 8019826 Test
 com/sun/management/HotSpotDiagnosticMXBean/SetVMOption.java fails with NPE
In-Reply-To: <51DD251C.5050009@oracle.com>
References: <51DD251C.5050009@oracle.com>
Message-ID: <51DE4158.7080503@oracle.com>

On 10/07/2013 7:10 PM, Jaroslav Bachorik wrote:
> Please, review this simple fix.
>
> http://cr.openjdk.java.net/~jbachorik/8019826/webrev.00
>
> Firstly, the patch removes a conditional early exit which checks for a
> build 52 of an unspecified major JVM version - it is not needed any
> more. Basically, the condition just made the test a noop till the latest
> hotspot version.
>
> The second fix is correctly setting the "mbean" attribute - it was not
> properly initialized and because of this the test was going to fail with
> NPE.

Looks fine to me.

David
-----

> Thanks,
>
> -JB-
>

From mandy.chung at oracle.com  Wed Jul 10 22:49:32 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Thu, 11 Jul 2013 13:49:32 +0800
Subject: jmx-dev RFR: 8019826 Test
	com/sun/management/HotSpotDiagnosticMXBean/SetVMOption.java
	fails with NPE
In-Reply-To: <51DD251C.5050009@oracle.com>
References: <51DD251C.5050009@oracle.com>
Message-ID: <0B1D8082-0D31-4C40-80DB-2D9D2453F0B3@oracle.com>

Looks good.

Mandy

On Jul 10, 2013, at 5:10 PM, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> Please, review this simple fix.
> 
> http://cr.openjdk.java.net/~jbachorik/8019826/webrev.00
> 
> Firstly, the patch removes a conditional early exit which checks for a
> build 52 of an unspecified major JVM version - it is not needed any
> more. Basically, the condition just made the test a noop till the latest
> hotspot version.
> 
> The second fix is correctly setting the "mbean" attribute - it was not
> properly initialized and because of this the test was going to fail with
> NPE.
> 
> Thanks,
> 
> -JB-

From jaroslav.bachorik at oracle.com  Thu Jul 11 04:48:02 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 11 Jul 2013 13:48:02 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
Message-ID: <51DE9B72.5030308@oracle.com>

Please, review the change.

http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/

The combination of the fix for JDK-8014085 and
ObjectInputStream.readFields() not throwing CNFE when trying to
deserialize an object graph containing references to non-available
classes makes an InvalidObjectException being thrown instead of the CNFE
when processing JMX notifications.

The patch makes the ClientNotificationForwarder ready for
InvalidObjectException - it will correctly report lost notifications but
will not cause the notification processing loop to fail with unhandled
exception.

Thanks,

-JB-

From jaroslav.bachorik at oracle.com  Mon Jul 15 01:41:10 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 15 Jul 2013 10:41:10 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
Message-ID: <51E3B5A6.4060301@oracle.com>

Please, review the patch for https://jbs.oracle.com/bugs/browse/JDK-8019584

http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/

The reason for the failure is that the ObjectInputStream.readFields()
method does not throw CNFE as specified when encountering instances of
unknown in the object graph to be deserialized. Instead, it leaves the
fields in the default state which in this case is "null" and is not
valid. Hence, the deserialization validation fails.

Since the main cause is in the RMI code, has been there for very long
time and changing the behaviour there might have disrupting effects on
various 3rd party applications I decided to work around this problem in
the JMX code.

The workaround adds InvalidObjectException to the list of expected
exceptions when processing JMX notifications. It is treated the same way
as eg. CNFE - the exception is logged and the notification will be
reported as missing. This will resolve the problem on the JMX side.

Thanks,

-JB-

From daniel.fuchs at oracle.com  Mon Jul 15 05:56:35 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Mon, 15 Jul 2013 14:56:35 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
In-Reply-To: <51DE9B72.5030308@oracle.com>
References: <51DE9B72.5030308@oracle.com>
Message-ID: <51E3F183.3080106@oracle.com>

Hi Jaroslav,

This looks reasonable. I assume you have run the JCK to verify that
it doesn't break anything else?

best regards,

-- daniel

On 7/11/13 1:48 PM, Jaroslav Bachorik wrote:
> Please, review the change.
>
> http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/
>
> The combination of the fix for JDK-8014085 and
> ObjectInputStream.readFields() not throwing CNFE when trying to
> deserialize an object graph containing references to non-available
> classes makes an InvalidObjectException being thrown instead of the CNFE
> when processing JMX notifications.
>
> The patch makes the ClientNotificationForwarder ready for
> InvalidObjectException - it will correctly report lost notifications but
> will not cause the notification processing loop to fail with unhandled
> exception.
>
> Thanks,
>
> -JB-
>


From david.holmes at oracle.com  Mon Jul 15 18:01:09 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 16 Jul 2013 11:01:09 +1000
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
In-Reply-To: <51E3B5A6.4060301@oracle.com>
References: <51E3B5A6.4060301@oracle.com>
Message-ID: <51E49B55.2060603@oracle.com>

On 15/07/2013 6:41 PM, Jaroslav Bachorik wrote:
> Please, review the patch for https://jbs.oracle.com/bugs/browse/JDK-8019584
>
> http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/
>
> The reason for the failure is that the ObjectInputStream.readFields()
> method does not throw CNFE as specified when encountering instances of
> unknown in the object graph to be deserialized. Instead, it leaves the
> fields in the default state which in this case is "null" and is not
> valid. Hence, the deserialization validation fails.
>
> Since the main cause is in the RMI code, has been there for very long
> time and changing the behaviour there might have disrupting effects on
> various 3rd party applications I decided to work around this problem in
> the JMX code.

Can you pinpoint the code that actually fails to propagate the 
ClassNotFoundException - I don't see any issue in OIS.readFields itself 
so this comes from elsewhere. Failing to throw CNFE when deserializing 
seems like a major bug to me.

Thanks,
David


> The workaround adds InvalidObjectException to the list of expected
> exceptions when processing JMX notifications. It is treated the same way
> as eg. CNFE - the exception is logged and the notification will be
> reported as missing. This will resolve the problem on the JMX side.
>
> Thanks,
>
> -JB-
>

From shanliang.jiang at oracle.com  Tue Jul 16 00:31:52 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Tue, 16 Jul 2013 09:31:52 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
In-Reply-To: <51DE9B72.5030308@oracle.com>
References: <51DE9B72.5030308@oracle.com>
Message-ID: <51E4F6E8.7030200@oracle.com>

Jaroslav,

I am not sure that it is a good idea to add simply 
InvalidObjectException into the catching list. I remember that we 
carefully analyzed and tested the catching list, in order to avoid 
no-needed call of "fetchOneNotif", and to avoid fetching on a dead 
connection. Look at 
javax.management.remote.rmi.RMIConnector$RMINotifClient.fetchNotifs, we 
carefully retrieve an original exception wrapped in a 
UnmarshalException, with different protocol to allow 
ClientNotifForwarder to do right catching.

What I am afraid is that InvalidObjectException would be thrown with 
other situations, like the connection was cut in the middle way of 
fetching, then the fix would make ClientNotifForwarder fail to stop 
fetching.

Shanliang

Jaroslav Bachorik wrote:
> Please, review the change.
>
> http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/
>
> The combination of the fix for JDK-8014085 and
> ObjectInputStream.readFields() not throwing CNFE when trying to
> deserialize an object graph containing references to non-available
> classes makes an InvalidObjectException being thrown instead of the CNFE
> when processing JMX notifications.
>
> The patch makes the ClientNotificationForwarder ready for
> InvalidObjectException - it will correctly report lost notifications but
> will not cause the notification processing loop to fail with unhandled
> exception.
>
> Thanks,
>
> -JB-
>   


From jaroslav.bachorik at oracle.com  Tue Jul 16 00:47:05 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 16 Jul 2013 09:47:05 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
In-Reply-To: <51E4F6E8.7030200@oracle.com>
References: <51DE9B72.5030308@oracle.com> <51E4F6E8.7030200@oracle.com>
Message-ID: <51E4FA79.3000307@oracle.com>

According to the documentation InvalidObjectException
"Indicates that one or more deserialized objects failed validation
 tests."

meaning that the severed connection should generate a different type of 
exception. InvalidObjectException should be reserved for the cases when 
the deserialized data violate the validation rules. But I can't say I 
am certain; when a CNFE can disappear in the process, anything might be 
possible ...

Anyway, if I can't catch the InvalidObjectException it will leave me 
with two options:

1. Just forget about the validation
2. Do the validation but live with the fact that even when the 
validation fails a potential attacker can get access to an instance 
with invalid fields (eg. using the finalizer trick)

-JB-

On Tue 16 Jul 2013 09:31:52 AM CEST, shanliang wrote:
> Jaroslav,
>
> I am not sure that it is a good idea to add simply
> InvalidObjectException into the catching list. I remember that we
> carefully analyzed and tested the catching list, in order to avoid
> no-needed call of "fetchOneNotif", and to avoid fetching on a dead
> connection. Look at
> javax.management.remote.rmi.RMIConnector$RMINotifClient.fetchNotifs,
> we carefully retrieve an original exception wrapped in a
> UnmarshalException, with different protocol to allow
> ClientNotifForwarder to do right catching.
>
> What I am afraid is that InvalidObjectException would be thrown with
> other situations, like the connection was cut in the middle way of
> fetching, then the fix would make ClientNotifForwarder fail to stop
> fetching.
>
> Shanliang
>
> Jaroslav Bachorik wrote:
>> Please, review the change.
>>
>> http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/
>>
>> The combination of the fix for JDK-8014085 and
>> ObjectInputStream.readFields() not throwing CNFE when trying to
>> deserialize an object graph containing references to non-available
>> classes makes an InvalidObjectException being thrown instead of the CNFE
>> when processing JMX notifications.
>>
>> The patch makes the ClientNotificationForwarder ready for
>> InvalidObjectException - it will correctly report lost notifications but
>> will not cause the notification processing loop to fail with unhandled
>> exception.
>>
>> Thanks,
>>
>> -JB-
>>
>


From jaroslav.bachorik at oracle.com  Tue Jul 16 00:49:21 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 16 Jul 2013 09:49:21 +0200
Subject: jmx-dev RFR: 8019584
 javax/management/remote/mandatory/loading/MissingClassTest.java failed in
 nightly against jdk7u45: java.io.InvalidObjectException: Invalid
 notification: null
In-Reply-To: <51E49B55.2060603@oracle.com>
References: <51E3B5A6.4060301@oracle.com> <51E49B55.2060603@oracle.com>
Message-ID: <51E4FB01.60403@oracle.com>

On Tue 16 Jul 2013 03:01:09 AM CEST, David Holmes wrote:
> On 15/07/2013 6:41 PM, Jaroslav Bachorik wrote:
>> Please, review the patch for
>> https://jbs.oracle.com/bugs/browse/JDK-8019584
>>
>> http://cr.openjdk.java.net/~jbachorik/8019584/webrev.00/
>>
>> The reason for the failure is that the ObjectInputStream.readFields()
>> method does not throw CNFE as specified when encountering instances of
>> unknown in the object graph to be deserialized. Instead, it leaves the
>> fields in the default state which in this case is "null" and is not
>> valid. Hence, the deserialization validation fails.
>>
>> Since the main cause is in the RMI code, has been there for very long
>> time and changing the behaviour there might have disrupting effects on
>> various 3rd party applications I decided to work around this problem in
>> the JMX code.
>
> Can you pinpoint the code that actually fails to propagate the
> ClassNotFoundException - I don't see any issue in OIS.readFields
> itself so this comes from elsewhere. Failing to throw CNFE when
> deserializing seems like a major bug to me.

Yes, I agree.

When you take a look at the ObjectInputStream.defaultReadObject() you 
can see that it forwards any captured exception on the lines 509-512
---
   ClassNotFoundException ex = handles.lookupException(passHandle);
   if (ex != null) {
     throw ex;
   }
---

On the other hand the GetFieldImpl just nulifies the read field on 
lines 2137-2138
---
  return (handles.lookupException(objHandle) == null) ?
                    objVals[off] : null;
--

and the ObjectInputStream.readFields() completely disregards the 
"handles" map and basically swallows any exception discovered during 
the fields deserialization, AFAIK.

-JB-

and the ObjectInputStream.readFields
>
> Thanks,
> David
>
>
>> The workaround adds InvalidObjectException to the list of expected
>> exceptions when processing JMX notifications. It is treated the same way
>> as eg. CNFE - the exception is logged and the notification will be
>> reported as missing. This will resolve the problem on the JMX side.
>>
>> Thanks,
>>
>> -JB-
>>


From jaroslav.bachorik at oracle.com  Thu Jul 18 02:54:52 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 18 Jul 2013 11:54:52 +0200
Subject: jmx-dev RFR: 8002307
 javax.management.modelmbean.ModelMBeanInfoSupport may expose internal
 representation by storing an externally mutable object
In-Reply-To: <51A4DC90.7050809@oracle.com>
References: <51A4AB45.4070100@oracle.com> <51A4DC90.7050809@oracle.com>
Message-ID: <51E7BB6C.6030704@oracle.com>

Hi,

thanks for the comments.

Here (http://cr.openjdk.java.net/~jbachorik/8002307/webrev.03/) is the 
updated webrev implementing suggestions from Daniel and Shanliang.

-JB-


From daniel.fuchs at oracle.com  Thu Jul 18 05:11:37 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Thu, 18 Jul 2013 14:11:37 +0200
Subject: jmx-dev RFR: 8002307
 javax.management.modelmbean.ModelMBeanInfoSupport may expose internal
 representation by storing an externally mutable object
In-Reply-To: <51E7BB6C.6030704@oracle.com>
References: <51A4AB45.4070100@oracle.com> <51A4DC90.7050809@oracle.com>
	<51E7BB6C.6030704@oracle.com>
Message-ID: <51E7DB79.2010504@oracle.com>

Hi Jaroslav,

Looks good overall.

Small nit:

You should remove the comment lines 322-327
in ModelMBeanInfoSupport.java since your changes make it obsolete.

Also the copyright year in ImmutableDataTest should be 2013 (not 2005).

No need for another round of review.

-- daniel

On 7/18/13 11:54 AM, Jaroslav Bachorik wrote:
> Hi,
>
> thanks for the comments.
>
> Here (http://cr.openjdk.java.net/~jbachorik/8002307/webrev.03/) is the
> updated webrev implementing suggestions from Daniel and Shanliang.
>
> -JB-
>
>


From jaroslav.bachorik at oracle.com  Mon Jul 22 04:55:42 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 22 Jul 2013 13:55:42 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
Message-ID: <51ED1DBE.3030304@oracle.com>

The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
seems to be failing intermittently.

The test checks the functionality of the
j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
capturing the current value of "getPeakThreadCount()", starting a
predefined number of the user threads, stopping them and resetting the
stored peak value and making sure the new peak equals to the number of
the actually running threads.

The main problem is that it is not possible to prevent JVM to start/stop
arbitrary system threads while executing the test. This might lead to
small variations of the reported peak (a short-lived system thread is
started while the batch of the user threads is running) or the expected
number of running threads (again, a short-lived system thread is started
at the moment the test asks for the number of running threads).

The patch does not fix those shortcomings as it is not really possible
to do given the nature of the JVM threading system. It rather tries to
relax the conditions while still maintaining the ability to detect
functional problems - eg. decreasing peak without explicitly resetting
it and reporting false number of threads.

The webrev is at:
http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00

Thanks,

-JB-

From david.holmes at oracle.com  Tue Jul 23 01:19:39 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 18:19:39 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51ED1DBE.3030304@oracle.com>
References: <51ED1DBE.3030304@oracle.com>
Message-ID: <51EE3C9B.3050604@oracle.com>

Hi Jaroslav,

On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
> seems to be failing intermittently.
>
> The test checks the functionality of the
> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
> capturing the current value of "getPeakThreadCount()", starting a
> predefined number of the user threads, stopping them and resetting the
> stored peak value and making sure the new peak equals to the number of
> the actually running threads.
>
> The main problem is that it is not possible to prevent JVM to start/stop
> arbitrary system threads while executing the test. This might lead to
> small variations of the reported peak (a short-lived system thread is
> started while the batch of the user threads is running) or the expected
> number of running threads (again, a short-lived system thread is started
> at the moment the test asks for the number of running threads).

Do you know what "system threads" these are? I would not expect VM 
internal threads to be counted in getPeakThreadCount(), but even if they 
are I can't think of any short-lived threads that get created other than 
the Signal handling thread.

> The patch does not fix those shortcomings as it is not really possible
> to do given the nature of the JVM threading system. It rather tries to
> relax the conditions while still maintaining the ability to detect
> functional problems - eg. decreasing peak without explicitly resetting
> it and reporting false number of threads.
>
> The webrev is at:
> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00

Seems reasonable.

David
-----

> Thanks,
>
> -JB-
>

From daniel.fuchs at oracle.com  Tue Jul 23 01:25:28 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Tue, 23 Jul 2013 10:25:28 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51ED1DBE.3030304@oracle.com>
References: <51ED1DBE.3030304@oracle.com>
Message-ID: <51EE3DF8.8060903@oracle.com>

Hi Jaroslav,

This looks like a tough problem as it is altogether possible that
some of the VM daemon threads will terminate during the duration
of the call - and if that's the case, the condition:
    new peak >= old peak + delta
might not even be true.
I am not a VM specialist so I don't know whether there can be
such daemon threads that will be arbitrarily started and stopped
by the VM - but if that happens I don't see how you could work around
it.

There seems to be something strange in the test though: line 209,
you catch InterruptedException just to call
Thread.currentThread().interrupt() and interrupt the thread again??
Did you mean maybe to call Thread.currentThread().interrupted() instead?

There are other places that seems to be prone to failures in this test
too for instance:

startThreads(...) {

   while(mbean.getThreadCount() < (current + count)) {
       ...
   }

}

If the VM can start and stop arbitrary threads then this condition
seems dubious. There's the same kind of logic in terminateThreads.
Not sure you can/should do anything about it though - it's
just to point out that these steps might need to be revisited
if the test still fails sporadically...

Also I'm not sure that using volatile for the 'live' array will
work - the array itself is volatile - but does it extends to its
elements?

It might be better to declare the live array as static final and
use a synchronization block on the array itself when accessing it:

private static final boolean live[] = new boolean[ALL_THREADS];
private static boolean isAlive(int i) {
     synchronized(live) { return live[i] };
}

...

      synchronized(live) {
           live[i] == false;
      }

...

      while (isAlive[id]) {
            ...
      }

...

best regards,

-- daniel

On 7/22/13 1:55 PM, Jaroslav Bachorik wrote:
> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
> seems to be failing intermittently.
>
> The test checks the functionality of the
> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
> capturing the current value of "getPeakThreadCount()", starting a
> predefined number of the user threads, stopping them and resetting the
> stored peak value and making sure the new peak equals to the number of
> the actually running threads.
>
> The main problem is that it is not possible to prevent JVM to start/stop
> arbitrary system threads while executing the test. This might lead to
> small variations of the reported peak (a short-lived system thread is
> started while the batch of the user threads is running) or the expected
> number of running threads (again, a short-lived system thread is started
> at the moment the test asks for the number of running threads).
>
> The patch does not fix those shortcomings as it is not really possible
> to do given the nature of the JVM threading system. It rather tries to
> relax the conditions while still maintaining the ability to detect
> functional problems - eg. decreasing peak without explicitly resetting
> it and reporting false number of threads.
>
> The webrev is at:
> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>
> Thanks,
>
> -JB-
>


From jaroslav.bachorik at oracle.com  Tue Jul 23 01:29:22 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 23 Jul 2013 10:29:22 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE3C9B.3050604@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
Message-ID: <51EE3EE2.1000202@oracle.com>

On 07/23/2013 10:19 AM, David Holmes wrote:
> Hi Jaroslav,
> 
> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>> seems to be failing intermittently.
>>
>> The test checks the functionality of the
>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>> capturing the current value of "getPeakThreadCount()", starting a
>> predefined number of the user threads, stopping them and resetting the
>> stored peak value and making sure the new peak equals to the number of
>> the actually running threads.
>>
>> The main problem is that it is not possible to prevent JVM to start/stop
>> arbitrary system threads while executing the test. This might lead to
>> small variations of the reported peak (a short-lived system thread is
>> started while the batch of the user threads is running) or the expected
>> number of running threads (again, a short-lived system thread is started
>> at the moment the test asks for the number of running threads).
> 
> Do you know what "system threads" these are? I would not expect VM
> internal threads to be counted in getPeakThreadCount(), but even if they
> are I can't think of any short-lived threads that get created other than
> the Signal handling thread.

Unfortunatelly I don't. Capturing the thread dump at the moment of
discovering the discrepancy seems to to be too late. I tried monitoring
the JVM under the test from external tools but it just brings more
entropy to the result.

I am completely relying on the JVM native thread accounting to be
correct and accurate - that it reports the thread count peak based on
the real data.

-JB-

> 
>> The patch does not fix those shortcomings as it is not really possible
>> to do given the nature of the JVM threading system. It rather tries to
>> relax the conditions while still maintaining the ability to detect
>> functional problems - eg. decreasing peak without explicitly resetting
>> it and reporting false number of threads.
>>
>> The webrev is at:
>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
> 
> Seems reasonable.
> 
> David
> -----
> 
>> Thanks,
>>
>> -JB-
>>


From david.holmes at oracle.com  Tue Jul 23 02:15:07 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 19:15:07 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE3DF8.8060903@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3DF8.8060903@oracle.com>
Message-ID: <51EE499B.9060600@oracle.com>

On 23/07/2013 6:25 PM, Daniel Fuchs wrote:
> Hi Jaroslav,
>
> This looks like a tough problem as it is altogether possible that
> some of the VM daemon threads will terminate during the duration
> of the call - and if that's the case, the condition:
>     new peak >= old peak + delta
> might not even be true.
> I am not a VM specialist so I don't know whether there can be
> such daemon threads that will be arbitrarily started and stopped
> by the VM - but if that happens I don't see how you could work around
> it.
>
> There seems to be something strange in the test though: line 209,
> you catch InterruptedException just to call
> Thread.currentThread().interrupt() and interrupt the thread again??
> Did you mean maybe to call Thread.currentThread().interrupted() instead?

No but good catch as the way this is done is not quite right. The 
re-posting of the interrupt() needs to happen outside the loop otherwise 
the sleep() will simply rethrow the InterruptedException. The normal 
pattern is:

boolean interrupted = false;
while (...) {
   try {
       Thread.sleep(5);
       ...
   }
    catch (InterruptedException ie) {
      interrupted = true;
    }
}
if (interrupted)
    Thread.currentThread().interrupt(); // re-assert interrupt state


Of course it is debatable whether there is any point continuing the loop 
if you do get interrupted (which should never happen anyway).

> There are other places that seems to be prone to failures in this test
> too for instance:
>
> startThreads(...) {
>
>    while(mbean.getThreadCount() < (current + count)) {
>        ...
>    }
>
> }
>
> If the VM can start and stop arbitrary threads then this condition
> seems dubious. There's the same kind of logic in terminateThreads.
> Not sure you can/should do anything about it though - it's
> just to point out that these steps might need to be revisited
> if the test still fails sporadically...
>
> Also I'm not sure that using volatile for the 'live' array will
> work - the array itself is volatile - but does it extends to its
> elements?

No it doesn't.

David
-----

> It might be better to declare the live array as static final and
> use a synchronization block on the array itself when accessing it:
>
> private static final boolean live[] = new boolean[ALL_THREADS];
> private static boolean isAlive(int i) {
>      synchronized(live) { return live[i] };
> }
>
> ...
>
>       synchronized(live) {
>            live[i] == false;
>       }
>
> ...
>
>       while (isAlive[id]) {
>             ...
>       }
>
> ...
>
> best regards,
>
> -- daniel
>
> On 7/22/13 1:55 PM, Jaroslav Bachorik wrote:
>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>> seems to be failing intermittently.
>>
>> The test checks the functionality of the
>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>> capturing the current value of "getPeakThreadCount()", starting a
>> predefined number of the user threads, stopping them and resetting the
>> stored peak value and making sure the new peak equals to the number of
>> the actually running threads.
>>
>> The main problem is that it is not possible to prevent JVM to start/stop
>> arbitrary system threads while executing the test. This might lead to
>> small variations of the reported peak (a short-lived system thread is
>> started while the batch of the user threads is running) or the expected
>> number of running threads (again, a short-lived system thread is started
>> at the moment the test asks for the number of running threads).
>>
>> The patch does not fix those shortcomings as it is not really possible
>> to do given the nature of the JVM threading system. It rather tries to
>> relax the conditions while still maintaining the ability to detect
>> functional problems - eg. decreasing peak without explicitly resetting
>> it and reporting false number of threads.
>>
>> The webrev is at:
>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>
>> Thanks,
>>
>> -JB-
>>
>

From david.holmes at oracle.com  Tue Jul 23 02:19:13 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 19:19:13 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE3EE2.1000202@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com>
Message-ID: <51EE4A91.3000305@oracle.com>

On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
> On 07/23/2013 10:19 AM, David Holmes wrote:
>> Hi Jaroslav,
>>
>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>> seems to be failing intermittently.
>>>
>>> The test checks the functionality of the
>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>> capturing the current value of "getPeakThreadCount()", starting a
>>> predefined number of the user threads, stopping them and resetting the
>>> stored peak value and making sure the new peak equals to the number of
>>> the actually running threads.
>>>
>>> The main problem is that it is not possible to prevent JVM to start/stop
>>> arbitrary system threads while executing the test. This might lead to
>>> small variations of the reported peak (a short-lived system thread is
>>> started while the batch of the user threads is running) or the expected
>>> number of running threads (again, a short-lived system thread is started
>>> at the moment the test asks for the number of running threads).
>>
>> Do you know what "system threads" these are? I would not expect VM
>> internal threads to be counted in getPeakThreadCount(), but even if they
>> are I can't think of any short-lived threads that get created other than
>> the Signal handling thread.
>
> Unfortunatelly I don't. Capturing the thread dump at the moment of
> discovering the discrepancy seems to to be too late. I tried monitoring
> the JVM under the test from external tools but it just brings more
> entropy to the result.

We'd need to instrument the thread creation logic to keep a separate 
record. Dtrace probes could probably do it - but the problem is getting 
the test to fail.

> I am completely relying on the JVM native thread accounting to be
> correct and accurate - that it reports the thread count peak based on
> the real data.

The spec isn't clear but I would only expect these counters to apply to 
Java threads not VM internal threads (compiler, gc etc). So I'd really 
like to know what thread is messing up this count.

David

> -JB-
>
>>
>>> The patch does not fix those shortcomings as it is not really possible
>>> to do given the nature of the JVM threading system. It rather tries to
>>> relax the conditions while still maintaining the ability to detect
>>> functional problems - eg. decreasing peak without explicitly resetting
>>> it and reporting false number of threads.
>>>
>>> The webrev is at:
>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>
>> Seems reasonable.
>>
>> David
>> -----
>>
>>> Thanks,
>>>
>>> -JB-
>>>
>

From jaroslav.bachorik at oracle.com  Tue Jul 23 02:25:59 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 23 Jul 2013 11:25:59 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE4A91.3000305@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
Message-ID: <51EE4C27.206@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>> Hi Jaroslav,
>>> 
>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>> The
>>>> java/lang/management/ThreadMXBean/ResetPeakThreadCount.java
>>>> test seems to be failing intermittently.
>>>> 
>>>> The test checks the functionality of the 
>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so
>>>> by capturing the current value of "getPeakThreadCount()",
>>>> starting a predefined number of the user threads, stopping
>>>> them and resetting the stored peak value and making sure the
>>>> new peak equals to the number of the actually running
>>>> threads.
>>>> 
>>>> The main problem is that it is not possible to prevent JVM
>>>> to start/stop arbitrary system threads while executing the
>>>> test. This might lead to small variations of the reported
>>>> peak (a short-lived system thread is started while the batch
>>>> of the user threads is running) or the expected number of
>>>> running threads (again, a short-lived system thread is 
>>>> started at the moment the test asks for the number of running
>>>> threads).
>>> 
>>> Do you know what "system threads" these are? I would not expect
>>> VM internal threads to be counted in getPeakThreadCount(), but
>>> even if they are I can't think of any short-lived threads that
>>> get created other than the Signal handling thread.
>> 
>> Unfortunatelly I don't. Capturing the thread dump at the moment
>> of discovering the discrepancy seems to to be too late. I tried
>> monitoring the JVM under the test from external tools but it just
>> brings more entropy to the result.
> 
> We'd need to instrument the thread creation logic to keep a
> separate record. Dtrace probes could probably do it - but the
> problem is getting the test to fail.

Well, while responding to the previous email I thought about yet
another way to try to pinpoint the mysterious thread - I've tried NB
profiler. It filters out it's own threads and can do thread monitoring
at the same time as tracking the call tree.

The result is that the offender is j.u.l.LogManager$Cleaner thread.

> 
>> I am completely relying on the JVM native thread accounting to
>> be correct and accurate - that it reports the thread count peak
>> based on the real data.
> 
> The spec isn't clear but I would only expect these counters to
> apply to Java threads not VM internal threads (compiler, gc etc).
> So I'd really like to know what thread is messing up this count.

I hope my previous finding makes this clearer.

- -JB-

> 
> David
> 
>> -JB-
>> 
>>> 
>>>> The patch does not fix those shortcomings as it is not really
>>>> possible to do given the nature of the JVM threading system.
>>>> It rather tries to relax the conditions while still
>>>> maintaining the ability to detect functional problems - eg.
>>>> decreasing peak without explicitly resetting it and reporting
>>>> false number of threads.
>>>> 
>>>> The webrev is at: 
>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>> 
>>> Seems reasonable.
>>> 
>>> David -----
>>> 
>>>> Thanks,
>>>> 
>>>> -JB-
>>>> 
>> 


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR7kwnAAoJELSZyqhGiB1MT1EH+wVuy+XhmDWRygxGnJCaSGwb
B0RoeOovuhQa2y2AKKF8P1PRULNxxDQ5i+DG21Zd/xJA2WVBsm0h8Kkj0s3PJIOq
8EHZMY7Onw/kDrmoJMNlJrFf/wlSOXC6E/4lZeiSCqyzobZQRBzfLUOMPDXjYTEt
76+RYUDw5DON05ph5BbknIAr/JBy0iUoT7K39q8/b5Z6ZId8Z2pIguLUhDs49YOD
xZSwHgZkJsJCQCDW3Fnth8qGOkQC4StnwE0X5vTCLCIurjIrAYiIciVBJVpjTOEZ
zqo8JL7m5dFVl2NfK1on1XCV71phybgxB2qCpWGh4Z9mv+o9XNe4kY3cC1waIVs=
=mSja
-----END PGP SIGNATURE-----

From jaroslav.bachorik at oracle.com  Tue Jul 23 02:35:16 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 23 Jul 2013 11:35:16 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE3DF8.8060903@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3DF8.8060903@oracle.com>
Message-ID: <51EE4E54.5040005@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/23/2013 10:25 AM, Daniel Fuchs wrote:
> Hi Jaroslav,
> 
> This looks like a tough problem as it is altogether possible that 
> some of the VM daemon threads will terminate during the duration of
> the call - and if that's the case, the condition: new peak >= old
> peak + delta might not even be true. I am not a VM specialist so I
> don't know whether there can be such daemon threads that will be
> arbitrarily started and stopped by the VM - but if that happens I
> don't see how you could work around it.

As I wrote in my reply to David the offending thread is
j.u.l.LogManager$Cleaner which kicks in randomly.

This would confirm my observations that the discrepancy is always at
most one thread more than expected.

> 
> There seems to be something strange in the test though: line 209, 
> you catch InterruptedException just to call 
> Thread.currentThread().interrupt() and interrupt the thread
> again?? Did you mean maybe to call
> Thread.currentThread().interrupted() instead?

No, it checks whether the thread has been interrupted and cleans the
interrupted flag.

> 
> There are other places that seems to be prone to failures in this
> test too for instance:
> 
> startThreads(...) {
> 
> while(mbean.getThreadCount() < (current + count)) { ... }
> 
> }
> 
> If the VM can start and stop arbitrary threads then this condition 
> seems dubious. There's the same kind of logic in terminateThreads. 
> Not sure you can/should do anything about it though - it's just to
> point out that these steps might need to be revisited if the test
> still fails sporadically...
> 
> Also I'm not sure that using volatile for the 'live' array will 
> work - the array itself is volatile - but does it extends to its 
> elements?

No, it does not. But this code has been sitting there for some time.

- -JB-

> 
> It might be better to declare the live array as static final and 
> use a synchronization block on the array itself when accessing it:
> 
> private static final boolean live[] = new boolean[ALL_THREADS]; 
> private static boolean isAlive(int i) { synchronized(live) { return
> live[i] }; }
> 
> ...
> 
> synchronized(live) { live[i] == false; }
> 
> ...
> 
> while (isAlive[id]) { ... }
> 
> ...
> 
> best regards,
> 
> -- daniel
> 
> On 7/22/13 1:55 PM, Jaroslav Bachorik wrote:
>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java
>> test seems to be failing intermittently.
>> 
>> The test checks the functionality of the 
>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by 
>> capturing the current value of "getPeakThreadCount()", starting
>> a predefined number of the user threads, stopping them and
>> resetting the stored peak value and making sure the new peak
>> equals to the number of the actually running threads.
>> 
>> The main problem is that it is not possible to prevent JVM to
>> start/stop arbitrary system threads while executing the test.
>> This might lead to small variations of the reported peak (a
>> short-lived system thread is started while the batch of the user
>> threads is running) or the expected number of running threads
>> (again, a short-lived system thread is started at the moment the
>> test asks for the number of running threads).
>> 
>> The patch does not fix those shortcomings as it is not really
>> possible to do given the nature of the JVM threading system. It
>> rather tries to relax the conditions while still maintaining the
>> ability to detect functional problems - eg. decreasing peak
>> without explicitly resetting it and reporting false number of
>> threads.
>> 
>> The webrev is at: 
>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>> 
>> Thanks,
>> 
>> -JB-
>> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR7k5UAAoJELSZyqhGiB1MsFoH/Rm/Of3/3U0hxvnB/1/PixYJ
z1fakf98Gepyp9eIyNKZ5sfNCu6Zy+A826Uqfp/Hve8nUA5D9pzEiTpNoB4Fzts1
CWwn+Gd8r4crXXTNKKEg1vTOUEMcmRkUujY356ndmrcdZElRMQJwdOvkwgg9Z+Tn
l0ZJLPTDyaDUtuP5D32RZYSMxf1yXL6hXbXNiOEWm9VD4NgxPpl8b4vu0cMrRiHH
A+anZ9nUiEhdBsTJIcqgU4bmHBM8eXEDDepBMpnK6LyM/2eDhPj3iTqQpav26Lsd
cURgR1Tgqs36bdlUCU4Q3MqPtHfnBibTTPxphXbhzgfAGMUW5JFerYGJIvTvpAw=
=d/Q+
-----END PGP SIGNATURE-----

From david.holmes at oracle.com  Tue Jul 23 02:39:44 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 19:39:44 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE4A91.3000305@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
Message-ID: <51EE4F60.4000506@oracle.com>

Sorry - I took a closer look at the full test rather than just that 
patch. We already have this code to try and help expose these 
intermittent failures:

  213         // Nightly testing showed some intermittent failure.
  214         // Check here to get diagnostic information if some strange
  215         // behavior occurs.
  216         checkThreadCount(expectedCount, current, 0);

but the sleep loop you added means this check will rarely fail so we 
won't get to see this unexpected behaviour happening. So this block of 
code could be deleted in my view. Though it is preferable to determine 
exactly why we fail!

Also looking at the sleep() used elsewhere you may as well follow the 
same pattern and abort on interrupt as it isn't expected.

Finally with regard to Daniel's comment about the live array he is right 
that the volatile on the array is not sufficient in theory - a thread 
need never see the value of live[i] become false. There are a number of 
reasons why we are unlikely to see that in practice on hotspot. Using 
synchronized will fix that; or an alternative cancellation mechanism 
could be used.

Cheers,
David

On 23/07/2013 7:19 PM, David Holmes wrote:
> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>> Hi Jaroslav,
>>>
>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>>> seems to be failing intermittently.
>>>>
>>>> The test checks the functionality of the
>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>> predefined number of the user threads, stopping them and resetting the
>>>> stored peak value and making sure the new peak equals to the number of
>>>> the actually running threads.
>>>>
>>>> The main problem is that it is not possible to prevent JVM to
>>>> start/stop
>>>> arbitrary system threads while executing the test. This might lead to
>>>> small variations of the reported peak (a short-lived system thread is
>>>> started while the batch of the user threads is running) or the expected
>>>> number of running threads (again, a short-lived system thread is
>>>> started
>>>> at the moment the test asks for the number of running threads).
>>>
>>> Do you know what "system threads" these are? I would not expect VM
>>> internal threads to be counted in getPeakThreadCount(), but even if they
>>> are I can't think of any short-lived threads that get created other than
>>> the Signal handling thread.
>>
>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>> discovering the discrepancy seems to to be too late. I tried monitoring
>> the JVM under the test from external tools but it just brings more
>> entropy to the result.
>
> We'd need to instrument the thread creation logic to keep a separate
> record. Dtrace probes could probably do it - but the problem is getting
> the test to fail.
>
>> I am completely relying on the JVM native thread accounting to be
>> correct and accurate - that it reports the thread count peak based on
>> the real data.
>
> The spec isn't clear but I would only expect these counters to apply to
> Java threads not VM internal threads (compiler, gc etc). So I'd really
> like to know what thread is messing up this count.
>
> David
>
>> -JB-
>>
>>>
>>>> The patch does not fix those shortcomings as it is not really possible
>>>> to do given the nature of the JVM threading system. It rather tries to
>>>> relax the conditions while still maintaining the ability to detect
>>>> functional problems - eg. decreasing peak without explicitly resetting
>>>> it and reporting false number of threads.
>>>>
>>>> The webrev is at:
>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>
>>> Seems reasonable.
>>>
>>> David
>>> -----
>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>>
>>

From daniel.fuchs at oracle.com  Tue Jul 23 02:44:50 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Tue, 23 Jul 2013 11:44:50 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE4E54.5040005@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3DF8.8060903@oracle.com>
	<51EE4E54.5040005@oracle.com>
Message-ID: <51EE5092.4050100@oracle.com>

On 7/23/13 11:35 AM, Jaroslav Bachorik wrote:
> As I wrote in my reply to David the offending thread is
> j.u.l.LogManager$Cleaner which kicks in randomly.

Argh... Logging again :-)

> This would confirm my observations that the discrepancy is always at
> most one thread more than expected.

What you could do then is call:

   Logger.getLogger("foo").info("Logging initialized");

first thing in the main(). This way the Cleaner thread will
already be there and won't perturb the test.

>> There seems to be something strange in the test though: line 209,
>> you catch InterruptedException just to call
>> Thread.currentThread().interrupt() and interrupt the thread
>> again?? Did you mean maybe to call
>> Thread.currentThread().interrupted() instead?
>
> No, it checks whether the thread has been interrupted and cleans the
> interrupted flag.

That's what interrupted() will do. But interrupt() will cause the next
call to Thread.sleep() to throw InterruptedException - hence
my question.

>> Also I'm not sure that using volatile for the 'live' array will
>> work - the array itself is volatile - but does it extends to its
>> elements?
>
> No, it does not. But this code has been sitting there for some time.

Well - I'll leave it to you - but personally I would fix it along,
just to make sure the test doesn't fail because of it.

cheers,

-- daniel

>
> - -JB-
>
>>
>> It might be better to declare the live array as static final and
>> use a synchronization block on the array itself when accessing it:
>>
>> private static final boolean live[] = new boolean[ALL_THREADS];
>> private static boolean isAlive(int i) { synchronized(live) { return
>> live[i] }; }
>>
>> ...
>>
>> synchronized(live) { live[i] == false; }
>>
>> ...
>>
>> while (isAlive[id]) { ... }
>>
>> ...
>>
>> best regards,
>>
>> -- daniel
>>
>> On 7/22/13 1:55 PM, Jaroslav Bachorik wrote:
>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java
>>> test seems to be failing intermittently.
>>>
>>> The test checks the functionality of the
>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>> capturing the current value of "getPeakThreadCount()", starting
>>> a predefined number of the user threads, stopping them and
>>> resetting the stored peak value and making sure the new peak
>>> equals to the number of the actually running threads.
>>>
>>> The main problem is that it is not possible to prevent JVM to
>>> start/stop arbitrary system threads while executing the test.
>>> This might lead to small variations of the reported peak (a
>>> short-lived system thread is started while the batch of the user
>>> threads is running) or the expected number of running threads
>>> (again, a short-lived system thread is started at the moment the
>>> test asks for the number of running threads).
>>>
>>> The patch does not fix those shortcomings as it is not really
>>> possible to do given the nature of the JVM threading system. It
>>> rather tries to relax the conditions while still maintaining the
>>> ability to detect functional problems - eg. decreasing peak
>>> without explicitly resetting it and reporting false number of
>>> threads.
>>>
>>> The webrev is at:
>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>
>>> Thanks,
>>>
>>> -JB-
>>>
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJR7k5UAAoJELSZyqhGiB1MsFoH/Rm/Of3/3U0hxvnB/1/PixYJ
> z1fakf98Gepyp9eIyNKZ5sfNCu6Zy+A826Uqfp/Hve8nUA5D9pzEiTpNoB4Fzts1
> CWwn+Gd8r4crXXTNKKEg1vTOUEMcmRkUujY356ndmrcdZElRMQJwdOvkwgg9Z+Tn
> l0ZJLPTDyaDUtuP5D32RZYSMxf1yXL6hXbXNiOEWm9VD4NgxPpl8b4vu0cMrRiHH
> A+anZ9nUiEhdBsTJIcqgU4bmHBM8eXEDDepBMpnK6LyM/2eDhPj3iTqQpav26Lsd
> cURgR1Tgqs36bdlUCU4Q3MqPtHfnBibTTPxphXbhzgfAGMUW5JFerYGJIvTvpAw=
> =d/Q+
> -----END PGP SIGNATURE-----
>


From david.holmes at oracle.com  Tue Jul 23 02:45:24 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 19:45:24 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE4BD6.7040707@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com>
Message-ID: <51EE50B4.8040000@oracle.com>

On 23/07/2013 7:24 PM, Jaroslav Bachorik wrote:
 > The result is that the offender is j.u.l.LogManager$Cleaner thread. I
 > am attaching the profiler snapshot (can be opened in eg. jvisualvm)

That doesn't quite make sense. The Cleaner thread is a shutdownhook, it 
should not be starting unless the VM is shutting down!

David
-----

> On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>> Hi Jaroslav,
>>>>
>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>>>> seems to be failing intermittently.
>>>>>
>>>>> The test checks the functionality of the
>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>>> predefined number of the user threads, stopping them and resetting the
>>>>> stored peak value and making sure the new peak equals to the number of
>>>>> the actually running threads.
>>>>>
>>>>> The main problem is that it is not possible to prevent JVM to
>>>>> start/stop
>>>>> arbitrary system threads while executing the test. This might lead to
>>>>> small variations of the reported peak (a short-lived system thread is
>>>>> started while the batch of the user threads is running) or the
>>>>> expected
>>>>> number of running threads (again, a short-lived system thread is
>>>>> started
>>>>> at the moment the test asks for the number of running threads).
>>>>
>>>> Do you know what "system threads" these are? I would not expect VM
>>>> internal threads to be counted in getPeakThreadCount(), but even if
>>>> they
>>>> are I can't think of any short-lived threads that get created other
>>>> than
>>>> the Signal handling thread.
>>>
>>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>>> discovering the discrepancy seems to to be too late. I tried monitoring
>>> the JVM under the test from external tools but it just brings more
>>> entropy to the result.
>>
>> We'd need to instrument the thread creation logic to keep a separate
>> record. Dtrace probes could probably do it - but the problem is
>> getting the test to fail.
>
> Well, while responding to the previous email I thought about yet
> another way to try to pinpoint the mysterious thread - I've tried NB
> profiler. It filters out it's own threads and can do thread monitoring
> at the same time as tracking the call tree.
>
> The result is that the offender is j.u.l.LogManager$Cleaner thread. I
> am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>
>>
>>> I am completely relying on the JVM native thread accounting to be
>>> correct and accurate - that it reports the thread count peak based on
>>> the real data.
>>
>> The spec isn't clear but I would only expect these counters to apply
>> to Java threads not VM internal threads (compiler, gc etc). So I'd
>> really like to know what thread is messing up this count.
>
> I hope my previous finding makes this clearer.
>
> -JB-
>
>>
>> David
>>
>>> -JB-
>>>
>>>>
>>>>> The patch does not fix those shortcomings as it is not really possible
>>>>> to do given the nature of the JVM threading system. It rather tries to
>>>>> relax the conditions while still maintaining the ability to detect
>>>>> functional problems - eg. decreasing peak without explicitly resetting
>>>>> it and reporting false number of threads.
>>>>>
>>>>> The webrev is at:
>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>>
>>>> Seems reasonable.
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>
>>>
>

From daniel.fuchs at oracle.com  Tue Jul 23 02:53:19 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Tue, 23 Jul 2013 11:53:19 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE50B4.8040000@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
Message-ID: <51EE528F.2050302@oracle.com>

On 7/23/13 11:45 AM, David Holmes wrote:
> On 23/07/2013 7:24 PM, Jaroslav Bachorik wrote:
>  > The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>  > am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>
> That doesn't quite make sense. The Cleaner thread is a shutdownhook, it
> should not be starting unless the VM is shutting down!

Hummm... Right: the javadoc says "Returns the peak live thread count 
since the Java virtual machine started or peak was reset." so the
Cleaner thread should not be counted.

If it is actually counted it might indicate a real problem in the
implementation of the ThreadMXBean.

-- daniel.


>
> David
> -----
>
>> On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
>>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>>>>> seems to be failing intermittently.
>>>>>>
>>>>>> The test checks the functionality of the
>>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>>>> predefined number of the user threads, stopping them and resetting
>>>>>> the
>>>>>> stored peak value and making sure the new peak equals to the
>>>>>> number of
>>>>>> the actually running threads.
>>>>>>
>>>>>> The main problem is that it is not possible to prevent JVM to
>>>>>> start/stop
>>>>>> arbitrary system threads while executing the test. This might lead to
>>>>>> small variations of the reported peak (a short-lived system thread is
>>>>>> started while the batch of the user threads is running) or the
>>>>>> expected
>>>>>> number of running threads (again, a short-lived system thread is
>>>>>> started
>>>>>> at the moment the test asks for the number of running threads).
>>>>>
>>>>> Do you know what "system threads" these are? I would not expect VM
>>>>> internal threads to be counted in getPeakThreadCount(), but even if
>>>>> they
>>>>> are I can't think of any short-lived threads that get created other
>>>>> than
>>>>> the Signal handling thread.
>>>>
>>>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>>>> discovering the discrepancy seems to to be too late. I tried monitoring
>>>> the JVM under the test from external tools but it just brings more
>>>> entropy to the result.
>>>
>>> We'd need to instrument the thread creation logic to keep a separate
>>> record. Dtrace probes could probably do it - but the problem is
>>> getting the test to fail.
>>
>> Well, while responding to the previous email I thought about yet
>> another way to try to pinpoint the mysterious thread - I've tried NB
>> profiler. It filters out it's own threads and can do thread monitoring
>> at the same time as tracking the call tree.
>>
>> The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>> am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>
>>>
>>>> I am completely relying on the JVM native thread accounting to be
>>>> correct and accurate - that it reports the thread count peak based on
>>>> the real data.
>>>
>>> The spec isn't clear but I would only expect these counters to apply
>>> to Java threads not VM internal threads (compiler, gc etc). So I'd
>>> really like to know what thread is messing up this count.
>>
>> I hope my previous finding makes this clearer.
>>
>> -JB-
>>
>>>
>>> David
>>>
>>>> -JB-
>>>>
>>>>>
>>>>>> The patch does not fix those shortcomings as it is not really
>>>>>> possible
>>>>>> to do given the nature of the JVM threading system. It rather
>>>>>> tries to
>>>>>> relax the conditions while still maintaining the ability to detect
>>>>>> functional problems - eg. decreasing peak without explicitly
>>>>>> resetting
>>>>>> it and reporting false number of threads.
>>>>>>
>>>>>> The webrev is at:
>>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>>>
>>>>> Seems reasonable.
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>
>>


From david.holmes at oracle.com  Tue Jul 23 02:54:01 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 23 Jul 2013 19:54:01 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE528F.2050302@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com>
Message-ID: <51EE52B9.6070506@oracle.com>

On 23/07/2013 7:53 PM, Daniel Fuchs wrote:
> On 7/23/13 11:45 AM, David Holmes wrote:
>> On 23/07/2013 7:24 PM, Jaroslav Bachorik wrote:
>>  > The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>>  > am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>
>> That doesn't quite make sense. The Cleaner thread is a shutdownhook, it
>> should not be starting unless the VM is shutting down!
>
> Hummm... Right: the javadoc says "Returns the peak live thread count
> since the Java virtual machine started or peak was reset." so the
> Cleaner thread should not be counted.

Not sure why you say that. It is a live Java thread - if you happen to 
query the MXBean during VM shutdown then it should be in the count.

> If it is actually counted it might indicate a real problem in the
> implementation of the ThreadMXBean.

My point is: why is the VM apparently shutting down while this test is 
running???

David

> -- daniel.
>
>
>>
>> David
>> -----
>>
>>> On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
>>>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>>>> Hi Jaroslav,
>>>>>>
>>>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>>>>>> seems to be failing intermittently.
>>>>>>>
>>>>>>> The test checks the functionality of the
>>>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>>>>> predefined number of the user threads, stopping them and resetting
>>>>>>> the
>>>>>>> stored peak value and making sure the new peak equals to the
>>>>>>> number of
>>>>>>> the actually running threads.
>>>>>>>
>>>>>>> The main problem is that it is not possible to prevent JVM to
>>>>>>> start/stop
>>>>>>> arbitrary system threads while executing the test. This might
>>>>>>> lead to
>>>>>>> small variations of the reported peak (a short-lived system
>>>>>>> thread is
>>>>>>> started while the batch of the user threads is running) or the
>>>>>>> expected
>>>>>>> number of running threads (again, a short-lived system thread is
>>>>>>> started
>>>>>>> at the moment the test asks for the number of running threads).
>>>>>>
>>>>>> Do you know what "system threads" these are? I would not expect VM
>>>>>> internal threads to be counted in getPeakThreadCount(), but even if
>>>>>> they
>>>>>> are I can't think of any short-lived threads that get created other
>>>>>> than
>>>>>> the Signal handling thread.
>>>>>
>>>>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>>>>> discovering the discrepancy seems to to be too late. I tried
>>>>> monitoring
>>>>> the JVM under the test from external tools but it just brings more
>>>>> entropy to the result.
>>>>
>>>> We'd need to instrument the thread creation logic to keep a separate
>>>> record. Dtrace probes could probably do it - but the problem is
>>>> getting the test to fail.
>>>
>>> Well, while responding to the previous email I thought about yet
>>> another way to try to pinpoint the mysterious thread - I've tried NB
>>> profiler. It filters out it's own threads and can do thread monitoring
>>> at the same time as tracking the call tree.
>>>
>>> The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>>> am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>>
>>>>
>>>>> I am completely relying on the JVM native thread accounting to be
>>>>> correct and accurate - that it reports the thread count peak based on
>>>>> the real data.
>>>>
>>>> The spec isn't clear but I would only expect these counters to apply
>>>> to Java threads not VM internal threads (compiler, gc etc). So I'd
>>>> really like to know what thread is messing up this count.
>>>
>>> I hope my previous finding makes this clearer.
>>>
>>> -JB-
>>>
>>>>
>>>> David
>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>>> The patch does not fix those shortcomings as it is not really
>>>>>>> possible
>>>>>>> to do given the nature of the JVM threading system. It rather
>>>>>>> tries to
>>>>>>> relax the conditions while still maintaining the ability to detect
>>>>>>> functional problems - eg. decreasing peak without explicitly
>>>>>>> resetting
>>>>>>> it and reporting false number of threads.
>>>>>>>
>>>>>>> The webrev is at:
>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>>>>
>>>>>> Seems reasonable.
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>
>>>
>

From jaroslav.bachorik at oracle.com  Tue Jul 23 03:23:38 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 23 Jul 2013 12:23:38 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE4F60.4000506@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4F60.4000506@oracle.com>
Message-ID: <51EE59AA.8010002@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/23/2013 11:39 AM, David Holmes wrote:
> Sorry - I took a closer look at the full test rather than just
> that patch. We already have this code to try and help expose these 
> intermittent failures:
> 
> 213         // Nightly testing showed some intermittent failure. 
> 214         // Check here to get diagnostic information if some
> strange 215         // behavior occurs. 216
> checkThreadCount(expectedCount, current, 0);

Unfortunately, this does not help to get any closer to the culprit.
Until the code gets to the point of making the thread dump the
offending thread is gone. So you only get the information that
something went wrong.

- -JB-

> 
> but the sleep loop you added means this check will rarely fail so
> we won't get to see this unexpected behaviour happening. So this
> block of code could be deleted in my view. Though it is preferable
> to determine exactly why we fail!
> 
> Also looking at the sleep() used elsewhere you may as well follow
> the same pattern and abort on interrupt as it isn't expected.
> 
> Finally with regard to Daniel's comment about the live array he is
> right that the volatile on the array is not sufficient in theory -
> a thread need never see the value of live[i] become false. There
> are a number of reasons why we are unlikely to see that in practice
> on hotspot. Using synchronized will fix that; or an alternative
> cancellation mechanism could be used.
> 
> Cheers, David
> 
> On 23/07/2013 7:19 PM, David Holmes wrote:
>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>> Hi Jaroslav,
>>>> 
>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>> The
>>>>> java/lang/management/ThreadMXBean/ResetPeakThreadCount.java
>>>>> test seems to be failing intermittently.
>>>>> 
>>>>> The test checks the functionality of the 
>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does
>>>>> so by capturing the current value of
>>>>> "getPeakThreadCount()", starting a predefined number of the
>>>>> user threads, stopping them and resetting the stored peak
>>>>> value and making sure the new peak equals to the number of 
>>>>> the actually running threads.
>>>>> 
>>>>> The main problem is that it is not possible to prevent JVM
>>>>> to start/stop arbitrary system threads while executing the
>>>>> test. This might lead to small variations of the reported
>>>>> peak (a short-lived system thread is started while the
>>>>> batch of the user threads is running) or the expected 
>>>>> number of running threads (again, a short-lived system
>>>>> thread is started at the moment the test asks for the
>>>>> number of running threads).
>>>> 
>>>> Do you know what "system threads" these are? I would not
>>>> expect VM internal threads to be counted in
>>>> getPeakThreadCount(), but even if they are I can't think of
>>>> any short-lived threads that get created other than the
>>>> Signal handling thread.
>>> 
>>> Unfortunatelly I don't. Capturing the thread dump at the moment
>>> of discovering the discrepancy seems to to be too late. I tried
>>> monitoring the JVM under the test from external tools but it
>>> just brings more entropy to the result.
>> 
>> We'd need to instrument the thread creation logic to keep a
>> separate record. Dtrace probes could probably do it - but the
>> problem is getting the test to fail.
>> 
>>> I am completely relying on the JVM native thread accounting to
>>> be correct and accurate - that it reports the thread count peak
>>> based on the real data.
>> 
>> The spec isn't clear but I would only expect these counters to
>> apply to Java threads not VM internal threads (compiler, gc etc).
>> So I'd really like to know what thread is messing up this count.
>> 
>> David
>> 
>>> -JB-
>>> 
>>>> 
>>>>> The patch does not fix those shortcomings as it is not
>>>>> really possible to do given the nature of the JVM threading
>>>>> system. It rather tries to relax the conditions while still
>>>>> maintaining the ability to detect functional problems - eg.
>>>>> decreasing peak without explicitly resetting it and
>>>>> reporting false number of threads.
>>>>> 
>>>>> The webrev is at: 
>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>> 
>>>> Seems reasonable.
>>>> 
>>>> David -----
>>>> 
>>>>> Thanks,
>>>>> 
>>>>> -JB-
>>>>> 
>>> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR7lmqAAoJELSZyqhGiB1MdS8IAJEgnUI83ZQNYP2Md6vMe4C+
kGRgls2ml9x9ljwqMHnreOjww7pzyXeDKoX1vR09OD6znDUIuHkvjIOD8QRjFnjz
/E0uBnoaIIhREuvbopq4dHFXU0wPPK9VnU6OgGUtTKU0aqk9256NMJwprO06CrXa
TZlmUljgk3rci7pE9ZA7Up4+3Qr0tWPn5EjLAVG/UmAvC5zNptsAZcYjf8i9yQ+1
9Hp+4xY68i9QffdE3bNEAWGTQGkNy2rF4HHwSorxnruUHgi3yTxxbykJ2pBgDgYl
3IwnbrwWxNOOPW3h5DLaqCjdromCBfzYbm4xmY6Tbcxfvh0LR8QWm5eCfE151Ss=
=MYqb
-----END PGP SIGNATURE-----

From shanliang.jiang at oracle.com  Tue Jul 23 08:30:17 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Tue, 23 Jul 2013 17:30:17 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51ED1DBE.3030304@oracle.com>
References: <51ED1DBE.3030304@oracle.com>
Message-ID: <51EEA189.5000801@oracle.com>

If it is not possible to prevent JVM to start/stop arbitrary system 
threads, then the test may still fail even with the fix, but I should 
say the fix improves the test.

Line 176:
    // assuming no system thread is added
so here at line 177 is still a potential failure, even very little.

To know a thread status, better to call
    Thread.getState()

for example we can save all  MyThread instances into a list, and then 
check them one by one like:
    for (Thread t : list) {
       while (t.getState() != TERMINATED) {
          Thread.sleep(10);
       }
    }
(can add a max waiting time here)

this is because it is possible that a MyThread is suspended after calling:
    barrier.signal();
but before leaving run() method, especially when stopping many threads 
at same time on a slow testing machine.

Shanliang

Jaroslav Bachorik wrote:
> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
> seems to be failing intermittently.
>
> The test checks the functionality of the
> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
> capturing the current value of "getPeakThreadCount()", starting a
> predefined number of the user threads, stopping them and resetting the
> stored peak value and making sure the new peak equals to the number of
> the actually running threads.
>
> The main problem is that it is not possible to prevent JVM to start/stop
> arbitrary system threads while executing the test. This might lead to
> small variations of the reported peak (a short-lived system thread is
> started while the batch of the user threads is running) or the expected
> number of running threads (again, a short-lived system thread is started
> at the moment the test asks for the number of running threads).
>
> The patch does not fix those shortcomings as it is not really possible
> to do given the nature of the JVM threading system. It rather tries to
> relax the conditions while still maintaining the ability to detect
> functional problems - eg. decreasing peak without explicitly resetting
> it and reporting false number of threads.
>
> The webrev is at:
> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>
> Thanks,
>
> -JB-
>   


From mandy.chung at oracle.com  Tue Jul 23 23:01:58 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 24 Jul 2013 14:01:58 +0800
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EE52B9.6070506@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com> <51EE52B9.6070506@oracle.com>
Message-ID: <51EF6DD6.5060806@oracle.com>

On 7/23/2013 5:54 PM, David Holmes wrote:
> On 23/07/2013 7:53 PM, Daniel Fuchs wrote:
>> On 7/23/13 11:45 AM, David Holmes wrote:
>>> On 23/07/2013 7:24 PM, Jaroslav Bachorik wrote:
>>>  > The result is that the offender is j.u.l.LogManager$Cleaner 
>>> thread. I
>>>  > am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>>
>>> That doesn't quite make sense. The Cleaner thread is a shutdownhook, it
>>> should not be starting unless the VM is shutting down!
>>
>> Hummm... Right: the javadoc says "Returns the peak live thread count
>> since the Java virtual machine started or peak was reset." so the
>> Cleaner thread should not be counted.
>
> Not sure why you say that. It is a live Java thread - if you happen to 
> query the MXBean during VM shutdown then it should be in the count.
>

I am catching up on this thread....

The thread count counts Java threads that are not hidden.  I believe all 
VM internal threads are hidden from external API.   This test runs in 
othervm mode and AFAICT the thread count is expected to be 
deterministic.  I don't expect the VM will start and terminate any 
thread any time.

I agree with David that we should diagnose why there is one additional 
thread started before the reset.  If it is the LogManager$Cleaner 
thread, like David said, the VM is shutting down while the test is still 
running which doesn't quite make sense.

Mandy

>> If it is actually counted it might indicate a real problem in the
>> implementation of the ThreadMXBean.
>
> My point is: why is the VM apparently shutting down while this test is 
> running???
>
> David
>
>> -- daniel.
>>
>>
>>>
>>> David
>>> -----
>>>
>>>> On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
>>>>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>>>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>>>>> Hi Jaroslav,
>>>>>>>
>>>>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>>>>> The java/lang/management/ThreadMXBean/ResetPeakThreadCount.java 
>>>>>>>> test
>>>>>>>> seems to be failing intermittently.
>>>>>>>>
>>>>>>>> The test checks the functionality of the
>>>>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>>>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>>>>>> predefined number of the user threads, stopping them and resetting
>>>>>>>> the
>>>>>>>> stored peak value and making sure the new peak equals to the
>>>>>>>> number of
>>>>>>>> the actually running threads.
>>>>>>>>
>>>>>>>> The main problem is that it is not possible to prevent JVM to
>>>>>>>> start/stop
>>>>>>>> arbitrary system threads while executing the test. This might
>>>>>>>> lead to
>>>>>>>> small variations of the reported peak (a short-lived system
>>>>>>>> thread is
>>>>>>>> started while the batch of the user threads is running) or the
>>>>>>>> expected
>>>>>>>> number of running threads (again, a short-lived system thread is
>>>>>>>> started
>>>>>>>> at the moment the test asks for the number of running threads).
>>>>>>>
>>>>>>> Do you know what "system threads" these are? I would not expect VM
>>>>>>> internal threads to be counted in getPeakThreadCount(), but even if
>>>>>>> they
>>>>>>> are I can't think of any short-lived threads that get created other
>>>>>>> than
>>>>>>> the Signal handling thread.
>>>>>>
>>>>>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>>>>>> discovering the discrepancy seems to to be too late. I tried
>>>>>> monitoring
>>>>>> the JVM under the test from external tools but it just brings more
>>>>>> entropy to the result.
>>>>>
>>>>> We'd need to instrument the thread creation logic to keep a separate
>>>>> record. Dtrace probes could probably do it - but the problem is
>>>>> getting the test to fail.
>>>>
>>>> Well, while responding to the previous email I thought about yet
>>>> another way to try to pinpoint the mysterious thread - I've tried NB
>>>> profiler. It filters out it's own threads and can do thread monitoring
>>>> at the same time as tracking the call tree.
>>>>
>>>> The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>>>> am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>>>
>>>>>
>>>>>> I am completely relying on the JVM native thread accounting to be
>>>>>> correct and accurate - that it reports the thread count peak 
>>>>>> based on
>>>>>> the real data.
>>>>>
>>>>> The spec isn't clear but I would only expect these counters to apply
>>>>> to Java threads not VM internal threads (compiler, gc etc). So I'd
>>>>> really like to know what thread is messing up this count.
>>>>
>>>> I hope my previous finding makes this clearer.
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> David
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>>> The patch does not fix those shortcomings as it is not really
>>>>>>>> possible
>>>>>>>> to do given the nature of the JVM threading system. It rather
>>>>>>>> tries to
>>>>>>>> relax the conditions while still maintaining the ability to detect
>>>>>>>> functional problems - eg. decreasing peak without explicitly
>>>>>>>> resetting
>>>>>>>> it and reporting false number of threads.
>>>>>>>>
>>>>>>>> The webrev is at:
>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>>>>>
>>>>>>> Seems reasonable.
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>
>>>>
>>


From daniel.fuchs at oracle.com  Tue Jul 23 23:09:37 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Wed, 24 Jul 2013 08:09:37 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF6DD6.5060806@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com> <51EE52B9.6070506@oracle.com>
	<51EF6DD6.5060806@oracle.com>
Message-ID: <51EF6FA1.9000103@oracle.com>

On 7/24/13 8:01 AM, Mandy Chung wrote:
> On 7/23/2013 5:54 PM, David Holmes wrote:
>> On 23/07/2013 7:53 PM, Daniel Fuchs wrote:
>>> On 7/23/13 11:45 AM, David Holmes wrote:
>>>> On 23/07/2013 7:24 PM, Jaroslav Bachorik wrote:
>>>>  > The result is that the offender is j.u.l.LogManager$Cleaner 
>>>> thread. I
>>>>  > am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>>>
>>>> That doesn't quite make sense. The Cleaner thread is a 
>>>> shutdownhook, it
>>>> should not be starting unless the VM is shutting down!
>>>
>>> Hummm... Right: the javadoc says "Returns the peak live thread count
>>> since the Java virtual machine started or peak was reset." so the
>>> Cleaner thread should not be counted.
>>
>> Not sure why you say that. It is a live Java thread - if you happen 
>> to query the MXBean during VM shutdown then it should be in the count.
>>
>
> I am catching up on this thread....
>
> The thread count counts Java threads that are not hidden.  I believe 
> all VM internal threads are hidden from external API. This test runs 
> in othervm mode and AFAICT the thread count is expected to be 
> deterministic.  I don't expect the VM will start and terminate any 
> thread any time.
>
> I agree with David that we should diagnose why there is one additional 
> thread started before the reset.  If it is the LogManager$Cleaner 
> thread, like David said, the VM is shutting down while the test is 
> still running which doesn't quite make sense.
I think that Shanliang's suspicion that a thread might be still alive if 
unscheduled just after having
called its barrier.signal() is a very good suggestion. I would advise 
calling thread.join() on all threads in
terminateThreads, just to make sure they are all really dead and not in 
some comatose state...
If Shanliang is right then the test would be failing because some of the 
threads we think are dead are
not actually dead yet - and not because of some new VM thread that 
nobody can see :-)

-- daniel

>
> Mandy
>
>>> If it is actually counted it might indicate a real problem in the
>>> implementation of the ThreadMXBean.
>>
>> My point is: why is the VM apparently shutting down while this test 
>> is running???
>>
>> David
>>
>>> -- daniel.
>>>
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> On Tue 23 Jul 2013 11:19:13 AM CEST, David Holmes wrote:
>>>>>> On 23/07/2013 6:29 PM, Jaroslav Bachorik wrote:
>>>>>>> On 07/23/2013 10:19 AM, David Holmes wrote:
>>>>>>>> Hi Jaroslav,
>>>>>>>>
>>>>>>>> On 22/07/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>>>>>>> The 
>>>>>>>>> java/lang/management/ThreadMXBean/ResetPeakThreadCount.java test
>>>>>>>>> seems to be failing intermittently.
>>>>>>>>>
>>>>>>>>> The test checks the functionality of the
>>>>>>>>> j.l.m.ThreadMXBean.resetPeakThreadCount() method. It does so by
>>>>>>>>> capturing the current value of "getPeakThreadCount()", starting a
>>>>>>>>> predefined number of the user threads, stopping them and 
>>>>>>>>> resetting
>>>>>>>>> the
>>>>>>>>> stored peak value and making sure the new peak equals to the
>>>>>>>>> number of
>>>>>>>>> the actually running threads.
>>>>>>>>>
>>>>>>>>> The main problem is that it is not possible to prevent JVM to
>>>>>>>>> start/stop
>>>>>>>>> arbitrary system threads while executing the test. This might
>>>>>>>>> lead to
>>>>>>>>> small variations of the reported peak (a short-lived system
>>>>>>>>> thread is
>>>>>>>>> started while the batch of the user threads is running) or the
>>>>>>>>> expected
>>>>>>>>> number of running threads (again, a short-lived system thread is
>>>>>>>>> started
>>>>>>>>> at the moment the test asks for the number of running threads).
>>>>>>>>
>>>>>>>> Do you know what "system threads" these are? I would not expect VM
>>>>>>>> internal threads to be counted in getPeakThreadCount(), but 
>>>>>>>> even if
>>>>>>>> they
>>>>>>>> are I can't think of any short-lived threads that get created 
>>>>>>>> other
>>>>>>>> than
>>>>>>>> the Signal handling thread.
>>>>>>>
>>>>>>> Unfortunatelly I don't. Capturing the thread dump at the moment of
>>>>>>> discovering the discrepancy seems to to be too late. I tried
>>>>>>> monitoring
>>>>>>> the JVM under the test from external tools but it just brings more
>>>>>>> entropy to the result.
>>>>>>
>>>>>> We'd need to instrument the thread creation logic to keep a separate
>>>>>> record. Dtrace probes could probably do it - but the problem is
>>>>>> getting the test to fail.
>>>>>
>>>>> Well, while responding to the previous email I thought about yet
>>>>> another way to try to pinpoint the mysterious thread - I've tried NB
>>>>> profiler. It filters out it's own threads and can do thread 
>>>>> monitoring
>>>>> at the same time as tracking the call tree.
>>>>>
>>>>> The result is that the offender is j.u.l.LogManager$Cleaner thread. I
>>>>> am attaching the profiler snapshot (can be opened in eg. jvisualvm)
>>>>>
>>>>>>
>>>>>>> I am completely relying on the JVM native thread accounting to be
>>>>>>> correct and accurate - that it reports the thread count peak 
>>>>>>> based on
>>>>>>> the real data.
>>>>>>
>>>>>> The spec isn't clear but I would only expect these counters to apply
>>>>>> to Java threads not VM internal threads (compiler, gc etc). So I'd
>>>>>> really like to know what thread is messing up this count.
>>>>>
>>>>> I hope my previous finding makes this clearer.
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>>> The patch does not fix those shortcomings as it is not really
>>>>>>>>> possible
>>>>>>>>> to do given the nature of the JVM threading system. It rather
>>>>>>>>> tries to
>>>>>>>>> relax the conditions while still maintaining the ability to 
>>>>>>>>> detect
>>>>>>>>> functional problems - eg. decreasing peak without explicitly
>>>>>>>>> resetting
>>>>>>>>> it and reporting false number of threads.
>>>>>>>>>
>>>>>>>>> The webrev is at:
>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.00
>>>>>>>>
>>>>>>>> Seems reasonable.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>
>>>>>
>>>
>


From jaroslav.bachorik at oracle.com  Tue Jul 23 23:18:12 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 08:18:12 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF6DD6.5060806@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com> <51EE52B9.6070506@oracle.com>
	<51EF6DD6.5060806@oracle.com>
Message-ID: <51EF71A4.3090209@oracle.com>

Thanks everyone for taking the time to dig into this issue.

I've done more testing and it turns out that my initial analysis was 
wrong. There are no threads magically appearing and disappearing (it 
was all caused by the monitoring tools I used). It rather seems that 
there is an issue with terminating the test threads - I've added a lot 
of logging to the original test and was able to observe that sometimes 
the new test threads were started before the terminating test threads 
have disappeared.

So I've added more rigorous check for the threads termination - 
checking the thread states instead of just comparing the thread counts. 
By doing this I was able to decrease the chances of failing but it 
still seems that there is some discrepancy between the numbers reported 
by the mbean and eg. the result of Thread.getAllStackTraces(). I am 
logging all the threads reported by Thread.getAllStackTraces() before 
the call to mbean.getThreadCount() and after the call and sometimes it 
just happens that mbean.getThreadCount() reports the thread count which 
is off by 1 in regards to both Thread.getAllStackTraces() calls.

I will try the "thread.join()" suggestion from Daniel.

-JB-

From mandy.chung at oracle.com  Tue Jul 23 23:20:56 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 24 Jul 2013 14:20:56 +0800
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF6FA1.9000103@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com> <51EE52B9.6070506@oracle.com>
	<51EF6DD6.5060806@oracle.com> <51EF6FA1.9000103@oracle.com>
Message-ID: <51EF7248.2070405@oracle.com>


On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
> On 7/24/13 8:01 AM, Mandy Chung wrote:
>> I am catching up on this thread....
>>
>> The thread count counts Java threads that are not hidden.  I believe 
>> all VM internal threads are hidden from external API. This test runs 
>> in othervm mode and AFAICT the thread count is expected to be 
>> deterministic.  I don't expect the VM will start and terminate any 
>> thread any time.
>>
>> I agree with David that we should diagnose why there is one 
>> additional thread started before the reset.  If it is the 
>> LogManager$Cleaner thread, like David said, the VM is shutting down 
>> while the test is still running which doesn't quite make sense.
> I think that Shanliang's suspicion that a thread might be still alive 
> if unscheduled just after having
> called its barrier.signal() is a very good suggestion. I would advise 
> calling thread.join() on all threads in
> terminateThreads, just to make sure they are all really dead and not 
> in some comatose state...
> If Shanliang is right then the test would be failing because some of 
> the threads we think are dead are
> not actually dead yet - and not because of some new VM thread that 
> nobody can see :-)

Thanks for pointing that out.

I agree that the test should be changed to call Thread.join(). There may 
be other java.lang.management tests that should also be fixed to call 
Thread.join.

Mandy

From jaroslav.bachorik at oracle.com  Tue Jul 23 23:47:36 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 08:47:36 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF7248.2070405@oracle.com>
References: <51ED1DBE.3030304@oracle.com> <51EE3C9B.3050604@oracle.com>
	<51EE3EE2.1000202@oracle.com> <51EE4A91.3000305@oracle.com>
	<51EE4BD6.7040707@oracle.com> <51EE50B4.8040000@oracle.com>
	<51EE528F.2050302@oracle.com> <51EE52B9.6070506@oracle.com>
	<51EF6DD6.5060806@oracle.com> <51EF6FA1.9000103@oracle.com>
	<51EF7248.2070405@oracle.com>
Message-ID: <51EF7888.40100@oracle.com>

On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>
> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>> I am catching up on this thread....
>>>
>>> The thread count counts Java threads that are not hidden.  I believe
>>> all VM internal threads are hidden from external API. This test runs
>>> in othervm mode and AFAICT the thread count is expected to be
>>> deterministic.  I don't expect the VM will start and terminate any
>>> thread any time.
>>>
>>> I agree with David that we should diagnose why there is one
>>> additional thread started before the reset.  If it is the
>>> LogManager$Cleaner thread, like David said, the VM is shutting down
>>> while the test is still running which doesn't quite make sense.
>> I think that Shanliang's suspicion that a thread might be still alive
>> if unscheduled just after having
>> called its barrier.signal() is a very good suggestion. I would advise
>> calling thread.join() on all threads in
>> terminateThreads, just to make sure they are all really dead and not
>> in some comatose state...
>> If Shanliang is right then the test would be failing because some of
>> the threads we think are dead are
>> not actually dead yet - and not because of some new VM thread that
>> nobody can see :-)
>
> Thanks for pointing that out.
>
> I agree that the test should be changed to call Thread.join(). There
> may be other java.lang.management tests that should also be fixed to
> call Thread.join.

I've tried using Thread.join() but I am still getting the thread count 
discrepancy.

Specifically:
1. 10 worker threads have been successfully started - 
mben.getThreadCount() reports 14 and Thread.getAllStackTraces() returns 
14 items
---
Thread: Thread[Signal Dispatcher,9,system]
Thread: Thread[worker-5,5,main]
Thread: Thread[worker-7,5,main]
Thread: Thread[worker-9,5,main]
Thread: Thread[worker-12,5,main]
Thread: Thread[worker-11,5,main]
Thread: Thread[Reference Handler,10,system]
Thread: Thread[main,5,main]
Thread: Thread[worker-10,5,main]
Thread: Thread[worker-8,5,main]
Thread: Thread[Finalizer,8,system]
Thread: Thread[worker-6,5,main]
Thread: Thread[worker-13,5,main]
Thread: Thread[worker-4,5,main]
---
2. Terminating 8 threads
3. After the threads have been terminated (waiting on Thread.join() for 
them to die) - mbean.getThreadCount() reports 7 while 
Thread.getAllStackTraces() returns only 6 items
---
Thread: Thread[Signal Dispatcher,9,system]
Thread: Thread[Finalizer,8,system]
Thread: Thread[worker-12,5,main]
Thread: Thread[Reference Handler,10,system]
Thread: Thread[main,5,main]
Thread: Thread[worker-13,5,main]
---

This would almost point to mbean.getThreadCount() reporting a stale 
value. Is that possible?

-JB-

>
> Mandy


From shanliang.jiang at oracle.com  Wed Jul 24 00:21:53 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 24 Jul 2013 09:21:53 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF7888.40100@oracle.com>
References: <51ED1DBE.3030304@oracle.com>
	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>
	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>
	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>
	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>
	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>
	<51EF7888.40100@oracle.com>
Message-ID: <51EF8091.9030603@oracle.com>

Just to be a test, after terminated 8 threads and checked their states 
by calling Thread.join() (must be same to Thread.getState()), DO sleep 
sometime and then call mbean.getThreadCount(), if it reports a right 
number, then we may need to verify mbean.getThreadCount() method.

Shanliang

Jaroslav Bachorik wrote:
> On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>   
>> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>>     
>>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>>       
>>>> I am catching up on this thread....
>>>>
>>>> The thread count counts Java threads that are not hidden.  I believe
>>>> all VM internal threads are hidden from external API. This test runs
>>>> in othervm mode and AFAICT the thread count is expected to be
>>>> deterministic.  I don't expect the VM will start and terminate any
>>>> thread any time.
>>>>
>>>> I agree with David that we should diagnose why there is one
>>>> additional thread started before the reset.  If it is the
>>>> LogManager$Cleaner thread, like David said, the VM is shutting down
>>>> while the test is still running which doesn't quite make sense.
>>>>         
>>> I think that Shanliang's suspicion that a thread might be still alive
>>> if unscheduled just after having
>>> called its barrier.signal() is a very good suggestion. I would advise
>>> calling thread.join() on all threads in
>>> terminateThreads, just to make sure they are all really dead and not
>>> in some comatose state...
>>> If Shanliang is right then the test would be failing because some of
>>> the threads we think are dead are
>>> not actually dead yet - and not because of some new VM thread that
>>> nobody can see :-)
>>>       
>> Thanks for pointing that out.
>>
>> I agree that the test should be changed to call Thread.join(). There
>> may be other java.lang.management tests that should also be fixed to
>> call Thread.join.
>>     
>
> I've tried using Thread.join() but I am still getting the thread count 
> discrepancy.
>
> Specifically:
> 1. 10 worker threads have been successfully started - 
> mben.getThreadCount() reports 14 and Thread.getAllStackTraces() returns 
> 14 items
> ---
> Thread: Thread[Signal Dispatcher,9,system]
> Thread: Thread[worker-5,5,main]
> Thread: Thread[worker-7,5,main]
> Thread: Thread[worker-9,5,main]
> Thread: Thread[worker-12,5,main]
> Thread: Thread[worker-11,5,main]
> Thread: Thread[Reference Handler,10,system]
> Thread: Thread[main,5,main]
> Thread: Thread[worker-10,5,main]
> Thread: Thread[worker-8,5,main]
> Thread: Thread[Finalizer,8,system]
> Thread: Thread[worker-6,5,main]
> Thread: Thread[worker-13,5,main]
> Thread: Thread[worker-4,5,main]
> ---
> 2. Terminating 8 threads
> 3. After the threads have been terminated (waiting on Thread.join() for 
> them to die) - mbean.getThreadCount() reports 7 while 
> Thread.getAllStackTraces() returns only 6 items
> ---
> Thread: Thread[Signal Dispatcher,9,system]
> Thread: Thread[Finalizer,8,system]
> Thread: Thread[worker-12,5,main]
> Thread: Thread[Reference Handler,10,system]
> Thread: Thread[main,5,main]
> Thread: Thread[worker-13,5,main]
> ---
>
> This would almost point to mbean.getThreadCount() reporting a stale 
> value. Is that possible?
>
> -JB-
>
>   
>> Mandy
>>     
>
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130724/6c50234a/attachment.html 

From jaroslav.bachorik at oracle.com  Wed Jul 24 00:35:02 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 09:35:02 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF8091.9030603@oracle.com>
References: <51ED1DBE.3030304@oracle.com>
	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>
	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>
	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>
	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>
	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>
	<51EF7888.40100@oracle.com> <51EF8091.9030603@oracle.com>
Message-ID: <51EF83A6.1040200@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/24/2013 09:21 AM, shanliang wrote:
> Just to be a test, after terminated 8 threads and checked their
> states by calling Thread.join() (must be same to
> Thread.getState()), DO sleep sometime and then call
> mbean.getThreadCount(), if it reports a right number, then we may
> need to verify mbean.getThreadCount() method.

The result is:

Thread: Thread[Signal Dispatcher,9,system]
Thread: Thread[Finalizer,8,system]
Thread: Thread[worker-12,5,main]
Thread: Thread[Reference Handler,10,system]
Thread: Thread[main,5,main]
Thread: Thread[worker-13,5,main]
- ---> MBean.getThreadCount() = 7
Thread(1): Thread[Signal Dispatcher,9,system]
Thread(1): Thread[Finalizer,8,system]
Thread(1): Thread[worker-12,5,main]
Thread(1): Thread[Reference Handler,10,system]
Thread(1): Thread[main,5,main]
Thread(1): Thread[worker-13,5,main]
- ---> MBean.getThreadCount() = 6

- -JB-


> 
> Shanliang
> 
> Jaroslav Bachorik wrote:
>> On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>> 
>>> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>>> 
>>>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>>> 
>>>>> I am catching up on this thread....
>>>>> 
>>>>> The thread count counts Java threads that are not hidden.
>>>>> I believe all VM internal threads are hidden from external
>>>>> API. This test runs in othervm mode and AFAICT the thread
>>>>> count is expected to be deterministic.  I don't expect the
>>>>> VM will start and terminate any thread any time.
>>>>> 
>>>>> I agree with David that we should diagnose why there is
>>>>> one additional thread started before the reset.  If it is
>>>>> the LogManager$Cleaner thread, like David said, the VM is
>>>>> shutting down while the test is still running which doesn't
>>>>> quite make sense.
>>>>> 
>>>> I think that Shanliang's suspicion that a thread might be
>>>> still alive if unscheduled just after having called its
>>>> barrier.signal() is a very good suggestion. I would advise 
>>>> calling thread.join() on all threads in terminateThreads,
>>>> just to make sure they are all really dead and not in some
>>>> comatose state... If Shanliang is right then the test would
>>>> be failing because some of the threads we think are dead are 
>>>> not actually dead yet - and not because of some new VM thread
>>>> that nobody can see :-)
>>>> 
>>> Thanks for pointing that out.
>>> 
>>> I agree that the test should be changed to call Thread.join().
>>> There may be other java.lang.management tests that should also
>>> be fixed to call Thread.join.
>>> 
>> 
>> I've tried using Thread.join() but I am still getting the thread
>> count discrepancy.
>> 
>> Specifically: 1. 10 worker threads have been successfully started
>> - mben.getThreadCount() reports 14 and
>> Thread.getAllStackTraces() returns 14 items --- Thread:
>> Thread[Signal Dispatcher,9,system] Thread:
>> Thread[worker-5,5,main] Thread: Thread[worker-7,5,main] Thread:
>> Thread[worker-9,5,main] Thread: Thread[worker-12,5,main] Thread:
>> Thread[worker-11,5,main] Thread: Thread[Reference
>> Handler,10,system] Thread: Thread[main,5,main] Thread:
>> Thread[worker-10,5,main] Thread: Thread[worker-8,5,main] Thread:
>> Thread[Finalizer,8,system] Thread: Thread[worker-6,5,main] 
>> Thread: Thread[worker-13,5,main] Thread: Thread[worker-4,5,main] 
>> --- 2. Terminating 8 threads 3. After the threads have been
>> terminated (waiting on Thread.join() for them to die) -
>> mbean.getThreadCount() reports 7 while Thread.getAllStackTraces()
>> returns only 6 items --- Thread: Thread[Signal
>> Dispatcher,9,system] Thread: Thread[Finalizer,8,system] Thread:
>> Thread[worker-12,5,main] Thread: Thread[Reference
>> Handler,10,system] Thread: Thread[main,5,main] Thread:
>> Thread[worker-13,5,main] ---
>> 
>> This would almost point to mbean.getThreadCount() reporting a
>> stale value. Is that possible?
>> 
>> -JB-
>> 
>> 
>>> Mandy
>>> 
>> 
>> 
>> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR74OmAAoJELSZyqhGiB1M4k8H/2hM5o2vVe2lfhc374IBaR5R
R8i9Z2n0prBRKIqg4bTkAcllq5pmdozxwFyaEBzJtAGh9vnL7Tmojn6ksg9K+MMl
bSgWSeg+gSZyymS7aE8rTVqKigH8vNOpHOogePDrUOCZGeZgJIMpmY1QcVbLeq8k
mkz5mPYxEE2E7gt8cjvcXknOWeQUTyZILWGIPBfx9FL0iwBtK5h0PnfasR7bCxcR
DO48USIuTxe+aN687OkAlJq9bCR6HRzWQiaSdi4ROVyrx2xYtir4n9sZtNWJwokv
3p5TdX6S64jnVZZMjbPJgCENTYMvTeRCj/8GvCYlI9KQEa9x2zhU2wIp5Zw4ag4=
=4vkV
-----END PGP SIGNATURE-----

From shanliang.jiang at oracle.com  Wed Jul 24 01:38:02 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 24 Jul 2013 10:38:02 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF83A6.1040200@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
Message-ID: <51EF926A.3060705@oracle.com>

- ---> MBean.getThreadCount() = 7

................

- ---> MBean.getThreadCount() = 6

I suppose that you added sleep between 2 calls, then there might be an 
issue with MBean.getThreadCount()

Shanliang

Jaroslav Bachorik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/24/2013 09:21 AM, shanliang wrote:
>   
>> Just to be a test, after terminated 8 threads and checked their
>> states by calling Thread.join() (must be same to
>> Thread.getState()), DO sleep sometime and then call
>> mbean.getThreadCount(), if it reports a right number, then we may
>> need to verify mbean.getThreadCount() method.
>>     
>
> The result is:
>
> Thread: Thread[Signal Dispatcher,9,system]
> Thread: Thread[Finalizer,8,system]
> Thread: Thread[worker-12,5,main]
> Thread: Thread[Reference Handler,10,system]
> Thread: Thread[main,5,main]
> Thread: Thread[worker-13,5,main]
> - ---> MBean.getThreadCount() = 7
> Thread(1): Thread[Signal Dispatcher,9,system]
> Thread(1): Thread[Finalizer,8,system]
> Thread(1): Thread[worker-12,5,main]
> Thread(1): Thread[Reference Handler,10,system]
> Thread(1): Thread[main,5,main]
> Thread(1): Thread[worker-13,5,main]
> - ---> MBean.getThreadCount() = 6
>
> - -JB-
>
>
>   
>> Shanliang
>>
>> Jaroslav Bachorik wrote:
>>     
>>> On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>>>
>>>       
>>>> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>>>>
>>>>         
>>>>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>>>>
>>>>>           
>>>>>> I am catching up on this thread....
>>>>>>
>>>>>> The thread count counts Java threads that are not hidden.
>>>>>> I believe all VM internal threads are hidden from external
>>>>>> API. This test runs in othervm mode and AFAICT the thread
>>>>>> count is expected to be deterministic.  I don't expect the
>>>>>> VM will start and terminate any thread any time.
>>>>>>
>>>>>> I agree with David that we should diagnose why there is
>>>>>> one additional thread started before the reset.  If it is
>>>>>> the LogManager$Cleaner thread, like David said, the VM is
>>>>>> shutting down while the test is still running which doesn't
>>>>>> quite make sense.
>>>>>>
>>>>>>             
>>>>> I think that Shanliang's suspicion that a thread might be
>>>>> still alive if unscheduled just after having called its
>>>>> barrier.signal() is a very good suggestion. I would advise 
>>>>> calling thread.join() on all threads in terminateThreads,
>>>>> just to make sure they are all really dead and not in some
>>>>> comatose state... If Shanliang is right then the test would
>>>>> be failing because some of the threads we think are dead are 
>>>>> not actually dead yet - and not because of some new VM thread
>>>>> that nobody can see :-)
>>>>>
>>>>>           
>>>> Thanks for pointing that out.
>>>>
>>>> I agree that the test should be changed to call Thread.join().
>>>> There may be other java.lang.management tests that should also
>>>> be fixed to call Thread.join.
>>>>
>>>>         
>>> I've tried using Thread.join() but I am still getting the thread
>>> count discrepancy.
>>>
>>> Specifically: 1. 10 worker threads have been successfully started
>>> - mben.getThreadCount() reports 14 and
>>> Thread.getAllStackTraces() returns 14 items --- Thread:
>>> Thread[Signal Dispatcher,9,system] Thread:
>>> Thread[worker-5,5,main] Thread: Thread[worker-7,5,main] Thread:
>>> Thread[worker-9,5,main] Thread: Thread[worker-12,5,main] Thread:
>>> Thread[worker-11,5,main] Thread: Thread[Reference
>>> Handler,10,system] Thread: Thread[main,5,main] Thread:
>>> Thread[worker-10,5,main] Thread: Thread[worker-8,5,main] Thread:
>>> Thread[Finalizer,8,system] Thread: Thread[worker-6,5,main] 
>>> Thread: Thread[worker-13,5,main] Thread: Thread[worker-4,5,main] 
>>> --- 2. Terminating 8 threads 3. After the threads have been
>>> terminated (waiting on Thread.join() for them to die) -
>>> mbean.getThreadCount() reports 7 while Thread.getAllStackTraces()
>>> returns only 6 items --- Thread: Thread[Signal
>>> Dispatcher,9,system] Thread: Thread[Finalizer,8,system] Thread:
>>> Thread[worker-12,5,main] Thread: Thread[Reference
>>> Handler,10,system] Thread: Thread[main,5,main] Thread:
>>> Thread[worker-13,5,main] ---
>>>
>>> This would almost point to mbean.getThreadCount() reporting a
>>> stale value. Is that possible?
>>>
>>> -JB-
>>>
>>>
>>>       
>>>> Mandy
>>>>
>>>>         
>>>
>>>       
>>     
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJR74OmAAoJELSZyqhGiB1M4k8H/2hM5o2vVe2lfhc374IBaR5R
> R8i9Z2n0prBRKIqg4bTkAcllq5pmdozxwFyaEBzJtAGh9vnL7Tmojn6ksg9K+MMl
> bSgWSeg+gSZyymS7aE8rTVqKigH8vNOpHOogePDrUOCZGeZgJIMpmY1QcVbLeq8k
> mkz5mPYxEE2E7gt8cjvcXknOWeQUTyZILWGIPBfx9FL0iwBtK5h0PnfasR7bCxcR
> DO48USIuTxe+aN687OkAlJq9bCR6HRzWQiaSdi4ROVyrx2xYtir4n9sZtNWJwokv
> 3p5TdX6S64jnVZZMjbPJgCENTYMvTeRCj/8GvCYlI9KQEa9x2zhU2wIp5Zw4ag4=
> =4vkV
> -----END PGP SIGNATURE-----
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20130724/723980f5/attachment-0001.html 

From jaroslav.bachorik at oracle.com  Wed Jul 24 01:40:46 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 10:40:46 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF926A.3060705@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com>
Message-ID: <51EF930E.4050507@oracle.com>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/24/2013 10:38 AM, shanliang wrote:
> - ---> MBean.getThreadCount() = 7
> 
> ................
> 
> - ---> MBean.getThreadCount() = 6
> 
> I suppose that you added sleep between 2 calls, then there might be
> an issue with MBean.getThreadCount()

Actually I tried it with sleep for 10ms as well as without. It seems
that the natural latency between those 2 calls is enough to get the
thread count updated to the actual value.

- -JB-

> 
> Shanliang
> 
> Jaroslav Bachorik wrote: On 07/24/2013 09:21 AM, shanliang wrote:
> 
>>>> Just to be a test, after terminated 8 threads and checked
>>>> their states by calling Thread.join() (must be same to 
>>>> Thread.getState()), DO sleep sometime and then call 
>>>> mbean.getThreadCount(), if it reports a right number, then we
>>>> may need to verify mbean.getThreadCount() method.
>>>> 
> 
> The result is:
> 
> Thread: Thread[Signal Dispatcher,9,system] Thread:
> Thread[Finalizer,8,system] Thread: Thread[worker-12,5,main] Thread:
> Thread[Reference Handler,10,system] Thread: Thread[main,5,main] 
> Thread: Thread[worker-13,5,main] ---> MBean.getThreadCount() = 7 
> Thread(1): Thread[Signal Dispatcher,9,system] Thread(1):
> Thread[Finalizer,8,system] Thread(1): Thread[worker-12,5,main] 
> Thread(1): Thread[Reference Handler,10,system] Thread(1):
> Thread[main,5,main] Thread(1): Thread[worker-13,5,main] --->
> MBean.getThreadCount() = 6
> 
> -JB-
> 
> 
> 
>>>> Shanliang
>>>> 
>>>> Jaroslav Bachorik wrote:
>>>> 
>>>>> On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>>>>> 
>>>>> 
>>>>>> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>>>>>> 
>>>>>> 
>>>>>>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>>>>>> 
>>>>>>> 
>>>>>>>> I am catching up on this thread....
>>>>>>>> 
>>>>>>>> The thread count counts Java threads that are not
>>>>>>>> hidden. I believe all VM internal threads are hidden
>>>>>>>> from external API. This test runs in othervm mode and
>>>>>>>> AFAICT the thread count is expected to be
>>>>>>>> deterministic.  I don't expect the VM will start and
>>>>>>>> terminate any thread any time.
>>>>>>>> 
>>>>>>>> I agree with David that we should diagnose why there
>>>>>>>> is one additional thread started before the reset.
>>>>>>>> If it is the LogManager$Cleaner thread, like David
>>>>>>>> said, the VM is shutting down while the test is still
>>>>>>>> running which doesn't quite make sense.
>>>>>>>> 
>>>>>>>> 
>>>>>>> I think that Shanliang's suspicion that a thread might
>>>>>>> be still alive if unscheduled just after having called
>>>>>>> its barrier.signal() is a very good suggestion. I would
>>>>>>> advise calling thread.join() on all threads in
>>>>>>> terminateThreads, just to make sure they are all really
>>>>>>> dead and not in some comatose state... If Shanliang is
>>>>>>> right then the test would be failing because some of
>>>>>>> the threads we think are dead are not actually dead yet
>>>>>>> - and not because of some new VM thread that nobody can
>>>>>>> see :-)
>>>>>>> 
>>>>>>> 
>>>>>> Thanks for pointing that out.
>>>>>> 
>>>>>> I agree that the test should be changed to call
>>>>>> Thread.join(). There may be other java.lang.management
>>>>>> tests that should also be fixed to call Thread.join.
>>>>>> 
>>>>>> 
>>>>> I've tried using Thread.join() but I am still getting the
>>>>> thread count discrepancy.
>>>>> 
>>>>> Specifically: 1. 10 worker threads have been successfully
>>>>> started - mben.getThreadCount() reports 14 and 
>>>>> Thread.getAllStackTraces() returns 14 items --- Thread: 
>>>>> Thread[Signal Dispatcher,9,system] Thread: 
>>>>> Thread[worker-5,5,main] Thread: Thread[worker-7,5,main]
>>>>> Thread: Thread[worker-9,5,main] Thread:
>>>>> Thread[worker-12,5,main] Thread: Thread[worker-11,5,main]
>>>>> Thread: Thread[Reference Handler,10,system] Thread:
>>>>> Thread[main,5,main] Thread: Thread[worker-10,5,main]
>>>>> Thread: Thread[worker-8,5,main] Thread: 
>>>>> Thread[Finalizer,8,system] Thread: Thread[worker-6,5,main]
>>>>> Thread: Thread[worker-13,5,main] Thread:
>>>>> Thread[worker-4,5,main] --- 2. Terminating 8 threads 3.
>>>>> After the threads have been terminated (waiting on
>>>>> Thread.join() for them to die) - mbean.getThreadCount()
>>>>> reports 7 while Thread.getAllStackTraces() returns only 6
>>>>> items --- Thread: Thread[Signal Dispatcher,9,system]
>>>>> Thread: Thread[Finalizer,8,system] Thread: 
>>>>> Thread[worker-12,5,main] Thread: Thread[Reference 
>>>>> Handler,10,system] Thread: Thread[main,5,main] Thread: 
>>>>> Thread[worker-13,5,main] ---
>>>>> 
>>>>> This would almost point to mbean.getThreadCount() reporting
>>>>> a stale value. Is that possible?
>>>>> 
>>>>> -JB-
>>>>> 
>>>>> 
>>>>> 
>>>>>> Mandy
>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
> 
>> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQEcBAEBAgAGBQJR75MOAAoJELSZyqhGiB1M1CgIAKcMQMTZlZqM6qFsI5nCc53Y
fHEFykf4792Qh/TgqDiyNbDCiTgY0TWoChUEJJEQvlho01TpJmKbkyqx5fNoNqjO
l94p073f4GsUSHR4exGmDjJkg87DCzhbhX3bZdwjfsxJHxup8qrXxpz4c5lyBHDH
ttoSasrcDIUh7cRoeqY7uWkIcnc8xI1cj7p3JlPUwB251eKzh15GZgMJhNKrn9N2
nhjpGywh3t/kwcsDVCibgBBOJ4ju55PRDZTyxH2R6o4fM+Twl80nZSaxUJiPUfEe
yDNFUxMfPcNH+jRAhRlmKRZtfHfYV/nwaj/eqCL8CDtluzVR+lraII81pg7OU+c=
=lqyg
-----END PGP SIGNATURE-----

From shanliang.jiang at oracle.com  Wed Jul 24 01:50:26 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 24 Jul 2013 10:50:26 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF930E.4050507@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
Message-ID: <51EF9552.1020901@oracle.com>

Jaroslav Bachorik wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 07/24/2013 10:38 AM, shanliang wrote:
>   
>> - ---> MBean.getThreadCount() = 7
>>
>> ................
>>
>> - ---> MBean.getThreadCount() = 6
>>
>> I suppose that you added sleep between 2 calls, then there might be
>> an issue with MBean.getThreadCount()
>>     
>
> Actually I tried it with sleep for 10ms as well as without. It seems
> that the natural latency between those 2 calls is enough to get the
> thread count updated to the actual value.
>   
So we have 2 kinds of issues here:
1) the test related, like Thread state checking, we can fix them in the test
2) MBean.getThreadCount() issue, we can create a bug to trace it (add 
your test case to the bug), and add a workaround (sleep or call 2 times) 
in the test to make the test pass. Mandy is the expert and better to get 
her opinion.

Shanliang

> - -JB-
>
>   
>> Shanliang
>>
>> Jaroslav Bachorik wrote: On 07/24/2013 09:21 AM, shanliang wrote:
>>
>>     
>>>>> Just to be a test, after terminated 8 threads and checked
>>>>> their states by calling Thread.join() (must be same to 
>>>>> Thread.getState()), DO sleep sometime and then call 
>>>>> mbean.getThreadCount(), if it reports a right number, then we
>>>>> may need to verify mbean.getThreadCount() method.
>>>>>
>>>>>           
>> The result is:
>>
>> Thread: Thread[Signal Dispatcher,9,system] Thread:
>> Thread[Finalizer,8,system] Thread: Thread[worker-12,5,main] Thread:
>> Thread[Reference Handler,10,system] Thread: Thread[main,5,main] 
>> Thread: Thread[worker-13,5,main] ---> MBean.getThreadCount() = 7 
>> Thread(1): Thread[Signal Dispatcher,9,system] Thread(1):
>> Thread[Finalizer,8,system] Thread(1): Thread[worker-12,5,main] 
>> Thread(1): Thread[Reference Handler,10,system] Thread(1):
>> Thread[main,5,main] Thread(1): Thread[worker-13,5,main] --->
>> MBean.getThreadCount() = 6
>>
>> -JB-
>>
>>
>>
>>     
>>>>> Shanliang
>>>>>
>>>>> Jaroslav Bachorik wrote:
>>>>>
>>>>>           
>>>>>> On Wed 24 Jul 2013 08:20:56 AM CEST, Mandy Chung wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> On 7/24/2013 2:09 PM, Daniel Fuchs wrote:
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> On 7/24/13 8:01 AM, Mandy Chung wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> I am catching up on this thread....
>>>>>>>>>
>>>>>>>>> The thread count counts Java threads that are not
>>>>>>>>> hidden. I believe all VM internal threads are hidden
>>>>>>>>> from external API. This test runs in othervm mode and
>>>>>>>>> AFAICT the thread count is expected to be
>>>>>>>>> deterministic.  I don't expect the VM will start and
>>>>>>>>> terminate any thread any time.
>>>>>>>>>
>>>>>>>>> I agree with David that we should diagnose why there
>>>>>>>>> is one additional thread started before the reset.
>>>>>>>>> If it is the LogManager$Cleaner thread, like David
>>>>>>>>> said, the VM is shutting down while the test is still
>>>>>>>>> running which doesn't quite make sense.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> I think that Shanliang's suspicion that a thread might
>>>>>>>> be still alive if unscheduled just after having called
>>>>>>>> its barrier.signal() is a very good suggestion. I would
>>>>>>>> advise calling thread.join() on all threads in
>>>>>>>> terminateThreads, just to make sure they are all really
>>>>>>>> dead and not in some comatose state... If Shanliang is
>>>>>>>> right then the test would be failing because some of
>>>>>>>> the threads we think are dead are not actually dead yet
>>>>>>>> - and not because of some new VM thread that nobody can
>>>>>>>> see :-)
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>> Thanks for pointing that out.
>>>>>>>
>>>>>>> I agree that the test should be changed to call
>>>>>>> Thread.join(). There may be other java.lang.management
>>>>>>> tests that should also be fixed to call Thread.join.
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> I've tried using Thread.join() but I am still getting the
>>>>>> thread count discrepancy.
>>>>>>
>>>>>> Specifically: 1. 10 worker threads have been successfully
>>>>>> started - mben.getThreadCount() reports 14 and 
>>>>>> Thread.getAllStackTraces() returns 14 items --- Thread: 
>>>>>> Thread[Signal Dispatcher,9,system] Thread: 
>>>>>> Thread[worker-5,5,main] Thread: Thread[worker-7,5,main]
>>>>>> Thread: Thread[worker-9,5,main] Thread:
>>>>>> Thread[worker-12,5,main] Thread: Thread[worker-11,5,main]
>>>>>> Thread: Thread[Reference Handler,10,system] Thread:
>>>>>> Thread[main,5,main] Thread: Thread[worker-10,5,main]
>>>>>> Thread: Thread[worker-8,5,main] Thread: 
>>>>>> Thread[Finalizer,8,system] Thread: Thread[worker-6,5,main]
>>>>>> Thread: Thread[worker-13,5,main] Thread:
>>>>>> Thread[worker-4,5,main] --- 2. Terminating 8 threads 3.
>>>>>> After the threads have been terminated (waiting on
>>>>>> Thread.join() for them to die) - mbean.getThreadCount()
>>>>>> reports 7 while Thread.getAllStackTraces() returns only 6
>>>>>> items --- Thread: Thread[Signal Dispatcher,9,system]
>>>>>> Thread: Thread[Finalizer,8,system] Thread: 
>>>>>> Thread[worker-12,5,main] Thread: Thread[Reference 
>>>>>> Handler,10,system] Thread: Thread[main,5,main] Thread: 
>>>>>> Thread[worker-13,5,main] ---
>>>>>>
>>>>>> This would almost point to mbean.getThreadCount() reporting
>>>>>> a stale value. Is that possible?
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Mandy
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>             
>>     
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.12 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQEcBAEBAgAGBQJR75MOAAoJELSZyqhGiB1M1CgIAKcMQMTZlZqM6qFsI5nCc53Y
> fHEFykf4792Qh/TgqDiyNbDCiTgY0TWoChUEJJEQvlho01TpJmKbkyqx5fNoNqjO
> l94p073f4GsUSHR4exGmDjJkg87DCzhbhX3bZdwjfsxJHxup8qrXxpz4c5lyBHDH
> ttoSasrcDIUh7cRoeqY7uWkIcnc8xI1cj7p3JlPUwB251eKzh15GZgMJhNKrn9N2
> nhjpGywh3t/kwcsDVCibgBBOJ4ju55PRDZTyxH2R6o4fM+Twl80nZSaxUJiPUfEe
> yDNFUxMfPcNH+jRAhRlmKRZtfHfYV/nwaj/eqCL8CDtluzVR+lraII81pg7OU+c=
> =lqyg
> -----END PGP SIGNATURE-----
>   


From mandy.chung at oracle.com  Wed Jul 24 02:31:57 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 24 Jul 2013 17:31:57 +0800
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF9552.1020901@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com>
Message-ID: <51EF9F0D.7040709@oracle.com>


On 7/24/2013 4:50 PM, shanliang wrote:
> So we have 2 kinds of issues here:
> 1) the test related, like Thread state checking, we can fix them in 
> the test
> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add 
> your test case to the bug), and add a workaround (sleep or call 2 
> times) in the test to make the test pass. Mandy is the expert and 
> better to get her opinion. 

It's probably a race in the VM implementation in determining the thread 
count.   You will need to diagnose the VM implementation and compare the 
thread list and the implementation of getting the thread count (check 
hotspot/src/share/vm/services/threadService.cpp)

Mandy

From david.holmes at oracle.com  Wed Jul 24 04:21:27 2013
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 24 Jul 2013 21:21:27 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EF9F0D.7040709@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
Message-ID: <51EFB8B7.6030204@oracle.com>

On 24/07/2013 7:31 PM, Mandy Chung wrote:
>
> On 7/24/2013 4:50 PM, shanliang wrote:
>> So we have 2 kinds of issues here:
>> 1) the test related, like Thread state checking, we can fix them in
>> the test
>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>> your test case to the bug), and add a workaround (sleep or call 2
>> times) in the test to make the test pass. Mandy is the expert and
>> better to get her opinion.
>
> It's probably a race in the VM implementation in determining the thread
> count.   You will need to diagnose the VM implementation and compare the
> thread list and the implementation of getting the thread count (check
> hotspot/src/share/vm/services/threadService.cpp)

There is a considerable code path between the point where a terminating 
thread causes Thread.join() to be allowed to return, and the point where 
the live thread count gets decremented. So using join() does not help 
here. Arguably JVMTI should have based its counts around the lifecycle 
of the Java thread not the underlying native thread.

David
-----

> Mandy

From david.holmes at oracle.com  Wed Jul 24 04:58:32 2013
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 24 Jul 2013 21:58:32 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFBFA3.90608@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFBFA3.90608@oracle.com>
Message-ID: <51EFC168.6030205@oracle.com>

Aside: it is really annoying that jmx-dev mangles the subject such that 
cross-posts end up creating two different email threads :(

On 24/07/2013 9:50 PM, Jaroslav Bachorik wrote:
> On 07/24/2013 01:21 PM, David Holmes wrote:
>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>
>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>> So we have 2 kinds of issues here:
>>>> 1) the test related, like Thread state checking, we can fix them in
>>>> the test
>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>> times) in the test to make the test pass. Mandy is the expert and
>>>> better to get her opinion.
>>>
>>> It's probably a race in the VM implementation in determining the thread
>>> count.   You will need to diagnose the VM implementation and compare the
>>> thread list and the implementation of getting the thread count (check
>>> hotspot/src/share/vm/services/threadService.cpp)
>>
>> There is a considerable code path between the point where a terminating
>> thread causes Thread.join() to be allowed to return, and the point where
>> the live thread count gets decremented. So using join() does not help
>> here. Arguably JVMTI should have based its counts around the lifecycle
>> of the Java thread not the underlying native thread.
>
> So, if I understand it correctly, it is not possible to get 100%
> accuracy of the thread related counters in situations when you create
> and terminate a number of threads rapidly.

Correct.

> In that case this test could be fixed with a small waiting period after
> all the joined threads were terminated - just to make sure that all the
> exiting threads were properly collected.

Yes.

> The only question remains whether a bug should be filed for the
> discrepancy between the thread counters obtained from ThreadMXBean and
> the ones coming from different paths.

I'm unclear what the "different paths" are.

David
-----

> -JB-
>
>>
>> David
>> -----
>>
>>> Mandy
>

From jaroslav.bachorik at oracle.com  Wed Jul 24 05:02:13 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 14:02:13 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFC168.6030205@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFBFA3.90608@oracle.com>
	<51EFC168.6030205@oracle.com>
Message-ID: <51EFC245.7070805@oracle.com>

On 07/24/2013 01:58 PM, David Holmes wrote:
> Aside: it is really annoying that jmx-dev mangles the subject such that
> cross-posts end up creating two different email threads :(
> 
> On 24/07/2013 9:50 PM, Jaroslav Bachorik wrote:
>> On 07/24/2013 01:21 PM, David Holmes wrote:
>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>
>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>> So we have 2 kinds of issues here:
>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>> the test
>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>> better to get her opinion.
>>>>
>>>> It's probably a race in the VM implementation in determining the thread
>>>> count.   You will need to diagnose the VM implementation and compare
>>>> the
>>>> thread list and the implementation of getting the thread count (check
>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>
>>> There is a considerable code path between the point where a terminating
>>> thread causes Thread.join() to be allowed to return, and the point where
>>> the live thread count gets decremented. So using join() does not help
>>> here. Arguably JVMTI should have based its counts around the lifecycle
>>> of the Java thread not the underlying native thread.
>>
>> So, if I understand it correctly, it is not possible to get 100%
>> accuracy of the thread related counters in situations when you create
>> and terminate a number of threads rapidly.
> 
> Correct.
> 
>> In that case this test could be fixed with a small waiting period after
>> all the joined threads were terminated - just to make sure that all the
>> exiting threads were properly collected.
> 
> Yes.
> 
>> The only question remains whether a bug should be filed for the
>> discrepancy between the thread counters obtained from ThreadMXBean and
>> the ones coming from different paths.
> 
> I'm unclear what the "different paths" are.

Hm, there might be only one "different path" in Java -
Thread.dumpStack() and Thread.getAllStackTraces()

-JB-

> 
> David
> -----
> 
>> -JB-
>>
>>>
>>> David
>>> -----
>>>
>>>> Mandy
>>


From jaroslav.bachorik at oracle.com  Wed Jul 24 05:49:34 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 14:49:34 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFC951.70704@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
Message-ID: <51EFCD5E.3090007@oracle.com>

On 07/24/2013 02:32 PM, Chris Hegarty wrote:
> On 24/07/2013 12:21, David Holmes wrote:
>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>
>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>> So we have 2 kinds of issues here:
>>>> 1) the test related, like Thread state checking, we can fix them in
>>>> the test
>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>> times) in the test to make the test pass. Mandy is the expert and
>>>> better to get her opinion.
>>>
>>> It's probably a race in the VM implementation in determining the thread
>>> count. You will need to diagnose the VM implementation and compare the
>>> thread list and the implementation of getting the thread count (check
>>> hotspot/src/share/vm/services/threadService.cpp)
>>
>> There is a considerable code path between the point where a terminating
>> thread causes Thread.join() to be allowed to return, and the point where
>> the live thread count gets decremented. So using join() does not help
>> here. Arguably JVMTI should have based its counts around the lifecycle
>> of the Java thread not the underlying native thread.
> 
> It appears, from my reading of the code, that this situation ( a thread
> exiting ) should be handled. Or maybe I'm looking at the wrong interface.
> 
> JavaThread::exit(...) {
>   ...
>   ThreadService::current_thread_exiting(this);
>   ...
>   ensure_join(..)
>   ...
> }
> 
> So the exiting thread should be removed from the live thread count
> before Thread.join returns.

Unfortunately, ensure_join(...) is called on line 1860 but
Threads::remove(this), which does the actual cleanup of the live threads
counter, is called only on line 1919, leaving at least a few ns window
when the thread is reported as terminated in java but the counters
haven't been updated yet.

-JB-

> 
> -Chris.
> 
>>
>> David
>> -----
>>
>>> Mandy


From jaroslav.bachorik at oracle.com  Wed Jul 24 07:08:02 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 24 Jul 2013 16:08:02 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFD3F5.3060209@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
Message-ID: <51EFDFC2.20503@oracle.com>

On 07/24/2013 03:17 PM, Chris Hegarty wrote:
> On 24/07/2013 13:49, Jaroslav Bachorik wrote:
>> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>>> On 24/07/2013 12:21, David Holmes wrote:
>>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>>
>>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>>> So we have 2 kinds of issues here:
>>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>>> the test
>>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>>> better to get her opinion.
>>>>>
>>>>> It's probably a race in the VM implementation in determining the
>>>>> thread
>>>>> count. You will need to diagnose the VM implementation and compare the
>>>>> thread list and the implementation of getting the thread count (check
>>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>>
>>>> There is a considerable code path between the point where a terminating
>>>> thread causes Thread.join() to be allowed to return, and the point
>>>> where
>>>> the live thread count gets decremented. So using join() does not help
>>>> here. Arguably JVMTI should have based its counts around the lifecycle
>>>> of the Java thread not the underlying native thread.
>>>
>>> It appears, from my reading of the code, that this situation ( a thread
>>> exiting ) should be handled. Or maybe I'm looking at the wrong
>>> interface.
>>>
>>> JavaThread::exit(...) {
>>>    ...
>>>    ThreadService::current_thread_exiting(this);
>>>    ...
>>>    ensure_join(..)
>>>    ...
>>> }
>>>
>>> So the exiting thread should be removed from the live thread count
>>> before Thread.join returns.
>>
>> Unfortunately, ensure_join(...) is called on line 1860 but
>> Threads::remove(this), which does the actual cleanup of the live threads
>> counter, is called only on line 1919, leaving at least a few ns window
>> when the thread is reported as terminated in java but the counters
>> haven't been updated yet.
> 
> Again, maybe I'm missing something but,
> 
> static jlong get_live_thread_count()        { return
> _live_threads_count->get_value() - _exiting_threads_count; }
> 
>  ... and current_thread_exiting(..) increments _exiting_threads_count, no?

Well, apparently it does.

I am a complete stranger to the concurrency issues in the hotspot -
would it be possible that in ThreadService::remove_thread(..) the
_exiting_threads_count is decremented but _live_threads_count hasn't
been updated yet when someone calls the get_live_thread_count() function?

-JB-

> 
> -Chris.
> 
>>
>> -JB-
>>
>>>
>>> -Chris.
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>> Mandy
>>


From david.holmes at oracle.com  Wed Jul 24 22:07:01 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 25 Jul 2013 15:07:01 +1000
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFDFC2.20503@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com>
Message-ID: <51F0B275.4060906@oracle.com>

On 25/07/2013 12:08 AM, Jaroslav Bachorik wrote:
> On 07/24/2013 03:17 PM, Chris Hegarty wrote:
>> On 24/07/2013 13:49, Jaroslav Bachorik wrote:
>>> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>>>> On 24/07/2013 12:21, David Holmes wrote:
>>>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>>>
>>>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>>>> So we have 2 kinds of issues here:
>>>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>>>> the test
>>>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>>>> better to get her opinion.
>>>>>>
>>>>>> It's probably a race in the VM implementation in determining the
>>>>>> thread
>>>>>> count. You will need to diagnose the VM implementation and compare the
>>>>>> thread list and the implementation of getting the thread count (check
>>>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>>>
>>>>> There is a considerable code path between the point where a terminating
>>>>> thread causes Thread.join() to be allowed to return, and the point
>>>>> where
>>>>> the live thread count gets decremented. So using join() does not help
>>>>> here. Arguably JVMTI should have based its counts around the lifecycle
>>>>> of the Java thread not the underlying native thread.
>>>>
>>>> It appears, from my reading of the code, that this situation ( a thread
>>>> exiting ) should be handled. Or maybe I'm looking at the wrong
>>>> interface.
>>>>
>>>> JavaThread::exit(...) {
>>>>     ...
>>>>     ThreadService::current_thread_exiting(this);
>>>>     ...
>>>>     ensure_join(..)
>>>>     ...
>>>> }
>>>>
>>>> So the exiting thread should be removed from the live thread count
>>>> before Thread.join returns.
>>>
>>> Unfortunately, ensure_join(...) is called on line 1860 but
>>> Threads::remove(this), which does the actual cleanup of the live threads
>>> counter, is called only on line 1919, leaving at least a few ns window
>>> when the thread is reported as terminated in java but the counters
>>> haven't been updated yet.
>>
>> Again, maybe I'm missing something but,
>>
>> static jlong get_live_thread_count()        { return
>> _live_threads_count->get_value() - _exiting_threads_count; }
>>
>>   ... and current_thread_exiting(..) increments _exiting_threads_count, no?
>
> Well, apparently it does.

Yes. Thanks Chris I completely missed the use of the 
_exiting_threads_count to address this very issue.

> I am a complete stranger to the concurrency issues in the hotspot -
> would it be possible that in ThreadService::remove_thread(..) the
> _exiting_threads_count is decremented but _live_threads_count hasn't
> been updated yet when someone calls the get_live_thread_count() function?

Yes. Updates are guarded by acquiring the Threads_lock, but reads are 
not. So it is indeed possible to request the live count between the 
decrement of the exiting count and the decrement of the live count 
itself. Mind you that is an extremely small window of opportunity in 
terms of this bug manifesting as often as it does.

Because get_live_thread_count returns the sum of two variables it has to 
use the same synchronization as is used to update those variables to 
ensure it returns a valid value. We can't grab the Threads_lock directly 
in get_live_thread_count as it is already called from code that holds 
the lock. So we would have to push this out to management.cpp's 
get_long_attribute.

David
-----

> -JB-
>
>>
>> -Chris.
>>
>>>
>>> -JB-
>>>
>>>>
>>>> -Chris.
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Mandy
>>>
>

From jaroslav.bachorik at oracle.com  Thu Jul 25 05:28:02 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 25 Jul 2013 14:28:02 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51F0B275.4060906@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com> <51F0B275.4060906@oracle.com>
Message-ID: <51F119D2.4080602@oracle.com>

On 07/25/2013 07:07 AM, David Holmes wrote:
> On 25/07/2013 12:08 AM, Jaroslav Bachorik wrote:
>> On 07/24/2013 03:17 PM, Chris Hegarty wrote:
>>> On 24/07/2013 13:49, Jaroslav Bachorik wrote:
>>>> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>>>>> On 24/07/2013 12:21, David Holmes wrote:
>>>>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>>>>
>>>>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>>>>> So we have 2 kinds of issues here:
>>>>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>>>>> the test
>>>>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it
>>>>>>>> (add
>>>>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>>>>> better to get her opinion.
>>>>>>>
>>>>>>> It's probably a race in the VM implementation in determining the
>>>>>>> thread
>>>>>>> count. You will need to diagnose the VM implementation and
>>>>>>> compare the
>>>>>>> thread list and the implementation of getting the thread count
>>>>>>> (check
>>>>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>>>>
>>>>>> There is a considerable code path between the point where a
>>>>>> terminating
>>>>>> thread causes Thread.join() to be allowed to return, and the point
>>>>>> where
>>>>>> the live thread count gets decremented. So using join() does not help
>>>>>> here. Arguably JVMTI should have based its counts around the
>>>>>> lifecycle
>>>>>> of the Java thread not the underlying native thread.
>>>>>
>>>>> It appears, from my reading of the code, that this situation ( a
>>>>> thread
>>>>> exiting ) should be handled. Or maybe I'm looking at the wrong
>>>>> interface.
>>>>>
>>>>> JavaThread::exit(...) {
>>>>>     ...
>>>>>     ThreadService::current_thread_exiting(this);
>>>>>     ...
>>>>>     ensure_join(..)
>>>>>     ...
>>>>> }
>>>>>
>>>>> So the exiting thread should be removed from the live thread count
>>>>> before Thread.join returns.
>>>>
>>>> Unfortunately, ensure_join(...) is called on line 1860 but
>>>> Threads::remove(this), which does the actual cleanup of the live
>>>> threads
>>>> counter, is called only on line 1919, leaving at least a few ns window
>>>> when the thread is reported as terminated in java but the counters
>>>> haven't been updated yet.
>>>
>>> Again, maybe I'm missing something but,
>>>
>>> static jlong get_live_thread_count()        { return
>>> _live_threads_count->get_value() - _exiting_threads_count; }
>>>
>>>   ... and current_thread_exiting(..) increments
>>> _exiting_threads_count, no?
>>
>> Well, apparently it does.
> 
> Yes. Thanks Chris I completely missed the use of the
> _exiting_threads_count to address this very issue.
> 
>> I am a complete stranger to the concurrency issues in the hotspot -
>> would it be possible that in ThreadService::remove_thread(..) the
>> _exiting_threads_count is decremented but _live_threads_count hasn't
>> been updated yet when someone calls the get_live_thread_count() function?
> 
> Yes. Updates are guarded by acquiring the Threads_lock, but reads are
> not. So it is indeed possible to request the live count between the
> decrement of the exiting count and the decrement of the live count
> itself. Mind you that is an extremely small window of opportunity in
> terms of this bug manifesting as often as it does.
> 
> Because get_live_thread_count returns the sum of two variables it has to
> use the same synchronization as is used to update those variables to
> ensure it returns a valid value. We can't grab the Threads_lock directly
> in get_live_thread_count as it is already called from code that holds
> the lock. So we would have to push this out to management.cpp's
> get_long_attribute.

I have filed a separate issue for hotspot/svc (JDK-8021335)

For the time being I propose modifying the test to be less race-prone in
java and adding a timeout of 500ms after terminating a number of threads.

The test modifications are at
http://cr.openjdk.java.net/~jbachorik/8020875/webrev.02

Thanks,

-JB-

> 
> David
> -----
> 
>> -JB-
>>
>>>
>>> -Chris.
>>>
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> -Chris.
>>>>>
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> Mandy
>>>>
>>


From daniel.fuchs at oracle.com  Thu Jul 25 05:37:47 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Thu, 25 Jul 2013 14:37:47 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51F119D2.4080602@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com> <51F0B275.4060906@oracle.com>
	<51F119D2.4080602@oracle.com>
Message-ID: <51F11C1B.5090601@oracle.com>

On 7/25/13 2:28 PM, Jaroslav Bachorik wrote:
>
> For the time being I propose modifying the test to be less race-prone in
> java and adding a timeout of 500ms after terminating a number of threads.
>
> The test modifications are at
> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.02
>
> Thanks,

Hi Jaroslvav,

This looks good!

-- daniel


From daniel.fuchs at oracle.com  Thu Jul 25 05:41:49 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Thu, 25 Jul 2013 14:41:49 +0200
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51F11C1B.5090601@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com> <51F0B275.4060906@oracle.com>
	<51F119D2.4080602@oracle.com> <51F11C1B.5090601@oracle.com>
Message-ID: <51F11D0D.7060902@oracle.com>

BTW - I wonder if you should add 8021335 in the @bug line.

-- daniel

On 7/25/13 2:37 PM, Daniel Fuchs wrote:
> On 7/25/13 2:28 PM, Jaroslav Bachorik wrote:
>>
>> For the time being I propose modifying the test to be less race-prone in
>> java and adding a timeout of 500ms after terminating a number of threads.
>>
>> The test modifications are at
>> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.02
>>
>> Thanks,
>
> Hi Jaroslvav,
>
> This looks good!
>
> -- daniel
>


From chris.hegarty at oracle.com  Wed Jul 24 05:32:17 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Wed, 24 Jul 2013 13:32:17 +0100
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFB8B7.6030204@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com>
Message-ID: <51EFC951.70704@oracle.com>

On 24/07/2013 12:21, David Holmes wrote:
> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>
>> On 7/24/2013 4:50 PM, shanliang wrote:
>>> So we have 2 kinds of issues here:
>>> 1) the test related, like Thread state checking, we can fix them in
>>> the test
>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>> your test case to the bug), and add a workaround (sleep or call 2
>>> times) in the test to make the test pass. Mandy is the expert and
>>> better to get her opinion.
>>
>> It's probably a race in the VM implementation in determining the thread
>> count. You will need to diagnose the VM implementation and compare the
>> thread list and the implementation of getting the thread count (check
>> hotspot/src/share/vm/services/threadService.cpp)
>
> There is a considerable code path between the point where a terminating
> thread causes Thread.join() to be allowed to return, and the point where
> the live thread count gets decremented. So using join() does not help
> here. Arguably JVMTI should have based its counts around the lifecycle
> of the Java thread not the underlying native thread.

It appears, from my reading of the code, that this situation ( a thread 
exiting ) should be handled. Or maybe I'm looking at the wrong interface.

JavaThread::exit(...) {
   ...
   ThreadService::current_thread_exiting(this);
   ...
   ensure_join(..)
   ...
}

So the exiting thread should be removed from the live thread count 
before Thread.join returns.

-Chris.

>
> David
> -----
>
>> Mandy

From chris.hegarty at oracle.com  Wed Jul 24 06:17:41 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Wed, 24 Jul 2013 14:17:41 +0100
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFCD5E.3090007@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com>
Message-ID: <51EFD3F5.3060209@oracle.com>

On 24/07/2013 13:49, Jaroslav Bachorik wrote:
> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>> On 24/07/2013 12:21, David Holmes wrote:
>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>
>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>> So we have 2 kinds of issues here:
>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>> the test
>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>> better to get her opinion.
>>>>
>>>> It's probably a race in the VM implementation in determining the thread
>>>> count. You will need to diagnose the VM implementation and compare the
>>>> thread list and the implementation of getting the thread count (check
>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>
>>> There is a considerable code path between the point where a terminating
>>> thread causes Thread.join() to be allowed to return, and the point where
>>> the live thread count gets decremented. So using join() does not help
>>> here. Arguably JVMTI should have based its counts around the lifecycle
>>> of the Java thread not the underlying native thread.
>>
>> It appears, from my reading of the code, that this situation ( a thread
>> exiting ) should be handled. Or maybe I'm looking at the wrong interface.
>>
>> JavaThread::exit(...) {
>>    ...
>>    ThreadService::current_thread_exiting(this);
>>    ...
>>    ensure_join(..)
>>    ...
>> }
>>
>> So the exiting thread should be removed from the live thread count
>> before Thread.join returns.
>
> Unfortunately, ensure_join(...) is called on line 1860 but
> Threads::remove(this), which does the actual cleanup of the live threads
> counter, is called only on line 1919, leaving at least a few ns window
> when the thread is reported as terminated in java but the counters
> haven't been updated yet.

Again, maybe I'm missing something but,

static jlong get_live_thread_count()        { return 
_live_threads_count->get_value() - _exiting_threads_count; }

  ... and current_thread_exiting(..) increments _exiting_threads_count, no?

-Chris.

>
> -JB-
>
>>
>> -Chris.
>>
>>>
>>> David
>>> -----
>>>
>>>> Mandy
>

From chris.hegarty at oracle.com  Wed Jul 24 07:20:22 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Wed, 24 Jul 2013 15:20:22 +0100
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51EFDFC2.20503@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com>
Message-ID: <51EFE2A6.8010103@oracle.com>

On 24/07/2013 15:08, Jaroslav Bachorik wrote:
> On 07/24/2013 03:17 PM, Chris Hegarty wrote:
>> On 24/07/2013 13:49, Jaroslav Bachorik wrote:
>>> On 07/24/2013 02:32 PM, Chris Hegarty wrote:
>>>> On 24/07/2013 12:21, David Holmes wrote:
>>>>> On 24/07/2013 7:31 PM, Mandy Chung wrote:
>>>>>>
>>>>>> On 7/24/2013 4:50 PM, shanliang wrote:
>>>>>>> So we have 2 kinds of issues here:
>>>>>>> 1) the test related, like Thread state checking, we can fix them in
>>>>>>> the test
>>>>>>> 2) MBean.getThreadCount() issue, we can create a bug to trace it (add
>>>>>>> your test case to the bug), and add a workaround (sleep or call 2
>>>>>>> times) in the test to make the test pass. Mandy is the expert and
>>>>>>> better to get her opinion.
>>>>>>
>>>>>> It's probably a race in the VM implementation in determining the
>>>>>> thread
>>>>>> count. You will need to diagnose the VM implementation and compare the
>>>>>> thread list and the implementation of getting the thread count (check
>>>>>> hotspot/src/share/vm/services/threadService.cpp)
>>>>>
>>>>> There is a considerable code path between the point where a terminating
>>>>> thread causes Thread.join() to be allowed to return, and the point
>>>>> where
>>>>> the live thread count gets decremented. So using join() does not help
>>>>> here. Arguably JVMTI should have based its counts around the lifecycle
>>>>> of the Java thread not the underlying native thread.
>>>>
>>>> It appears, from my reading of the code, that this situation ( a thread
>>>> exiting ) should be handled. Or maybe I'm looking at the wrong
>>>> interface.
>>>>
>>>> JavaThread::exit(...) {
>>>>     ...
>>>>     ThreadService::current_thread_exiting(this);
>>>>     ...
>>>>     ensure_join(..)
>>>>     ...
>>>> }
>>>>
>>>> So the exiting thread should be removed from the live thread count
>>>> before Thread.join returns.
>>>
>>> Unfortunately, ensure_join(...) is called on line 1860 but
>>> Threads::remove(this), which does the actual cleanup of the live threads
>>> counter, is called only on line 1919, leaving at least a few ns window
>>> when the thread is reported as terminated in java but the counters
>>> haven't been updated yet.
>>
>> Again, maybe I'm missing something but,
>>
>> static jlong get_live_thread_count()        { return
>> _live_threads_count->get_value() - _exiting_threads_count; }
>>
>>   ... and current_thread_exiting(..) increments _exiting_threads_count, no?
>
> Well, apparently it does.
>
> I am a complete stranger to the concurrency issues in the hotspot -
> would it be possible that in ThreadService::remove_thread(..) the
> _exiting_threads_count is decremented but _live_threads_count hasn't
> been updated yet when someone calls the get_live_thread_count() function?

I am not familiar with the intricate workings of this code, but as a 
casual observer I would say that this must be a bug in the VM. It 
appears that the original authors did take into account exiting threads, 
and went to some lengths to provide accurate diagnostic information. If 
this is not producing the correct results, then I can only imagine there 
is a bug here.

To your specific question, then yes this would appear possible. I am not 
sure what synchronization, if any, protects this code.

-Chris.


>
> -JB-
>
>>
>> -Chris.
>>
>>>
>>> -JB-
>>>
>>>>
>>>> -Chris.
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> Mandy
>>>
>

From chris.hegarty at oracle.com  Thu Jul 25 05:53:08 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Thu, 25 Jul 2013 13:53:08 +0100
Subject: jmx-dev RFR: 8020875
 java/lang/management/ThreadMXBean/ResetPeakThreadCount.java fails
 intermittently
In-Reply-To: <51F119D2.4080602@oracle.com>
References: <51ED1DBE.3030304@oracle.com>	<51EE3C9B.3050604@oracle.com>	<51EE3EE2.1000202@oracle.com>	<51EE4A91.3000305@oracle.com>	<51EE4BD6.7040707@oracle.com>	<51EE50B4.8040000@oracle.com>	<51EE528F.2050302@oracle.com>	<51EE52B9.6070506@oracle.com>	<51EF6DD6.5060806@oracle.com>	<51EF6FA1.9000103@oracle.com>	<51EF7248.2070405@oracle.com>	<51EF7888.40100@oracle.com>
	<51EF8091.9030603@oracle.com> <51EF83A6.1040200@oracle.com>
	<51EF926A.3060705@oracle.com> <51EF930E.4050507@oracle.com>
	<51EF9552.1020901@oracle.com> <51EF9F0D.7040709@oracle.com>
	<51EFB8B7.6030204@oracle.com> <51EFC951.70704@oracle.com>
	<51EFCD5E.3090007@oracle.com> <51EFD3F5.3060209@oracle.com>
	<51EFDFC2.20503@oracle.com> <51F0B275.4060906@oracle.com>
	<51F119D2.4080602@oracle.com>
Message-ID: <51F11FB4.7070200@oracle.com>


On 07/25/2013 01:28 PM, Jaroslav Bachorik wrote:
> ......
>
> I have filed a separate issue for hotspot/svc (JDK-8021335)

Yes, this is probably a separate, and more involved, issue.

> For the time being I propose modifying the test to be less race-prone in
> java and adding a timeout of 500ms after terminating a number of threads.

Sounds reasonable.

> The test modifications are at
> http://cr.openjdk.java.net/~jbachorik/8020875/webrev.02

Looks fine.

Trivially, testFailed should be volatile ( if you still need it ). I 
don't like that MyThread is not interruptible, but being run in othervm 
this might not be such an issue.

-Chris.

>
> Thanks,
>
> -JB-