From jaroslav.bachorik at oracle.com  Tue Oct  1 02:01:00 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 01 Oct 2013 11:01:00 +0200
Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL
 keystore setup when loaded dynamically
Message-ID: <524A8F4C.2000908@oracle.com>

Hi,

Currently it is not possible to configure SSL parameters when loading 
the management agent. The fix is to forward any javax.net.ssl.* 
properties to the target JVM. The javax.net.ssl.* properties provided in 
the agent configuration should never replace any javax.net.ssl.* 
properties defined by the target JVM.

The issue:  https://bugs.openjdk.java.net/browse/JDK-6399961
Webrev:     http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00

Thanks,

-JB-

From dmitry.samersoff at oracle.com  Tue Oct  1 04:51:46 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Tue, 01 Oct 2013 15:51:46 +0400
Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL
 keystore setup when loaded dynamically
In-Reply-To: <524A8F4C.2000908@oracle.com>
References: <524A8F4C.2000908@oracle.com>
Message-ID: <524AB752.2030900@oracle.com>

Jaroslav,

Agent.java:

99:

 As far as you intorduce SSL_PREFIX constant, please add one for
"com.sun.management."

259:

 It's better to keep all property manipulations in agentmain and
startRemoteManagementAgent - different methods to start Agent might have
separate property set and scoping rules.

Do you need to modify jcmd as well?

-Dmitry


On 2013-10-01 13:01, Jaroslav Bachorik wrote:
> Hi,
> 
> Currently it is not possible to configure SSL parameters when loading
> the management agent. The fix is to forward any javax.net.ssl.*
> properties to the target JVM. The javax.net.ssl.* properties provided in
> the agent configuration should never replace any javax.net.ssl.*
> properties defined by the target JVM.
> 
> The issue:  https://bugs.openjdk.java.net/browse/JDK-6399961
> Webrev:     http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00
> 
> Thanks,
> 
> -JB-


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From jaroslav.bachorik at oracle.com  Tue Oct  1 06:03:35 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 01 Oct 2013 15:03:35 +0200
Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL
 keystore setup when loaded dynamically
In-Reply-To: <524AB752.2030900@oracle.com>
References: <524A8F4C.2000908@oracle.com> <524AB752.2030900@oracle.com>
Message-ID: <524AC827.9000301@oracle.com>

Ok, thanks everyone for taking time reviewing this. Dmitry's comment 
about jcmd got me looking at the properties available when using the 
jcmd to start the management agent.

It turns out that "com.sun.management.jmxremote.ssl.config.file" allows 
exactly the thing this issue talks about.

So, please, disregard this change completely.

Cheers,

-JB-

On 1.10.2013 13:51, Dmitry Samersoff wrote:
> Jaroslav,
>
> Agent.java:
>
> 99:
>
>   As far as you intorduce SSL_PREFIX constant, please add one for
> "com.sun.management."
>
> 259:
>
>   It's better to keep all property manipulations in agentmain and
> startRemoteManagementAgent - different methods to start Agent might have
> separate property set and scoping rules.
>
> Do you need to modify jcmd as well?
>
> -Dmitry
>
>
> On 2013-10-01 13:01, Jaroslav Bachorik wrote:
>> Hi,
>>
>> Currently it is not possible to configure SSL parameters when loading
>> the management agent. The fix is to forward any javax.net.ssl.*
>> properties to the target JVM. The javax.net.ssl.* properties provided in
>> the agent configuration should never replace any javax.net.ssl.*
>> properties defined by the target JVM.
>>
>> The issue:  https://bugs.openjdk.java.net/browse/JDK-6399961
>> Webrev:     http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00
>>
>> Thanks,
>>
>> -JB-
>
>


From jaroslav.bachorik at oracle.com  Wed Oct  2 01:47:26 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 02 Oct 2013 10:47:26 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
Message-ID: <524BDD9E.1050100@oracle.com>

Hello,

currently the JVM uptime reported by the RuntimeMXBean is based on 
System.currentTimeMillis() which makes it susceptible to changes of the 
OS time (eg. changing timezone, NTP synchronization etc.). The uptime 
should not depend on the system time and should be calculated using a 
monotonic clock source.

There is already the way to get the actual JVM uptime in ticks. It is 
accessible as Management::timestamp() and the ticks are convertible to 
milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very 
easy to switch to the monotonic clock based uptime.

The patch consists of the hotspot and jdk parts.

For the hotspot a new constant needs to be introduced in 
src/share/vm/services/jmm.h and the actual logic to obtain the uptime in 
milliseconds is added in src/share/vm/services/management.cpp.

For the jdk the changes comprise of adding the necessary JNI bridging 
methods in order to get the new uptime, introducing the same constant 
that is used in hotspot and changes to mapfile-vers files in order to 
properly build the native library.

Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00

Thanks,

-JB-

From staffan.larsen at oracle.com  Wed Oct  2 02:23:34 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 2 Oct 2013 11:23:34 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <524BDD9E.1050100@oracle.com>
References: <524BDD9E.1050100@oracle.com>
Message-ID: <4088022F-6550-4C95-86D5-640F2E737839@oracle.com>

Looks good!

Thanks,
/Staffan

On 2 okt 2013, at 10:47, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> Hello,
> 
> currently the JVM uptime reported by the RuntimeMXBean is based on System.currentTimeMillis() which makes it susceptible to changes of the OS time (eg. changing timezone, NTP synchronization etc.). The uptime should not depend on the system time and should be calculated using a monotonic clock source.
> 
> There is already the way to get the actual JVM uptime in ticks. It is accessible as Management::timestamp() and the ticks are convertible to milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very easy to switch to the monotonic clock based uptime.
> 
> The patch consists of the hotspot and jdk parts.
> 
> For the hotspot a new constant needs to be introduced in src/share/vm/services/jmm.h and the actual logic to obtain the uptime in milliseconds is added in src/share/vm/services/management.cpp.
> 
> For the jdk the changes comprise of adding the necessary JNI bridging methods in order to get the new uptime, introducing the same constant that is used in hotspot and changes to mapfile-vers files in order to properly build the native library.
> 
> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
> 
> Thanks,
> 
> -JB-


From jaroslav.bachorik at oracle.com  Wed Oct  2 03:55:03 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 02 Oct 2013 12:55:03 +0200
Subject: jmx-dev RFR: 8024613
 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
 failing intermittently
In-Reply-To: <523C459A.3080303@oracle.com>
References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com>
Message-ID: <524BFB87.10808@oracle.com>

On 20.9.2013 14:54, shanliang wrote:
> Jaroslav,
>
> It is a good idea to use the RMI Testlibrary.
>
> Better to call:
>         agent.close();
>
> at Line 55,  close the RMIRegistry (rmid.shutdown(rmidPort) Line 55)
> does not ensure the JMX connector doing full clean, it is always better
> to do clean within a test.

Thanks. Implemented.

http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01

-JB-

>
> Shanliang
>
>
> Jaroslav Bachorik wrote:
>> Please, review the following change for JDK-8024613
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613
>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/
>> <http://cr.openjdk.java.net/%7Ejbachorik/8024613/webrev.00/>
>>
>> The patch takes care of intermittent test failures caused by timing
>> issues when starting the RMID process. It could happen that the RMID
>> process hasn't been properly initialized in the timeframe of 5 seconds
>> and the test would fail.
>>
>> The patch replaces the home-brewed RMID process management with the
>> one available in the RMI Testlibrary which is used by more tests and
>> therefore should be more stable.
>>
>> Thanks,
>>
>> -JB-
>


From jaroslav.bachorik at oracle.com  Wed Oct  2 03:57:06 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 02 Oct 2013 12:57:06 +0200
Subject: jmx-dev [ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <523B0B30.4020003@oracle.com>
References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com>
	<52317782.1060300@oracle.com> <523179C8.50606@oracle.com>
	<5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com>
	<5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com>
Message-ID: <524BFC02.4050800@oracle.com>

On 19.9.2013 16:33, Jaroslav Bachorik wrote:
> The updated webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>
> I've moved some of the functionality to the testlibrary.
>
> -JB -
>
> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>> Jaroslav,
>>>
>>> CustomLauncherTest.java:
>>>
>>> 102: this check could be moved to switch at ll. 108
>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>> unset.
>> Good idea. Thanks.
>>
>>> 129: I would prefer don't have pattern like this one ever in shell
>>> script. Could you prepare a list of VM's to check and just loop over it?
>>> It makes test better readable. Also I think nowdays we can always use
>>> server VM.
>> I tried to mirror the original shell test as closely as possible. It
>> would be nice if we could rely on the "server" vm only. Definitely more
>> readable.
>>
>> -JB-
>>
>>> -Dmitry
>>>
>>>
>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>> Hi Jaroslav,
>>>>>>>
>>>>>>> You need a copyright notice in the new file.
>>>>>>>
>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>> it to
>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to come
>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>
>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>> further modification but was wondering if there is something
>>>>>>> inherent in
>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>
>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>> hotspot
>>>>>>> test library's process tools available. :(
>>>>>> We have some, is there an obvious gap?
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>
>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>
>>>>> Please, stand by for the updated webrev.
>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>> application can recognize its own PID and print it to its stdout. The
>>>> main application then just reads the stdout to parse the PID. No need
>>>> for JCMD any more.
>>>>
>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>> property. According to the jtreg web documentation
>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>> "test.java"
>>>> system property should be available but in fact is not. But it seems
>>>> that the testlibrary uses "test.jdk" system property too.
>>>>
>>>> The test does not run on OSX because nobody built the launcher
>>>> binary :)
>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>> linux-amd64 launcher while working on the test.
>>>>
>>>> While working with the test library I realized I was missing a crucial
>>>> feature (at least for my purposes) - waiting for a certain message to
>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>> to wait for the target process to get to certain point before the test
>>>> can be allowed to continue - and the point is indicated by a message in
>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>> analyzed after the target process died - and are not suitable for this
>>>> kind of usage.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> -Chris.
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>
>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>
>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>> similarity
>>>>>>>> between the *.sh and *.java file so one needs to go through the new
>>>>>>>> source in whole.
>>>>>>>>
>>>>>>>> The changes in "launcher" files are all about adding permissions to
>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Issue  : JDK-8004926
>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>
>


From jaroslav.bachorik at oracle.com  Wed Oct  2 03:59:33 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 02 Oct 2013 12:59:33 +0200
Subject: jmx-dev [ping] Re: RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <52308A5B.8020206@oracle.com>
References: <52308389.6060001@oracle.com> <523086E2.4050307@oracle.com>
	<52308A5B.8020206@oracle.com>
Message-ID: <524BFC95.60605@oracle.com>

On 11.9.2013 17:20, Jaroslav Bachorik wrote:
> On 09/11/2013 05:06 PM, shanliang wrote:
>> The fix looks OK for me.
>>
>> I am wondering that in case of loopback address, is it better to always
>> using "127.0.0.1" to generate a connectionId? this will make sure to
>> have a unique id.
>
> I am afraid we are getting the 127.0.1.1 variant from RMI
> (java.rmi.server.RemoteServer#getClientHost()). I don't know what else
> might break if we start fiddling around with it. For now I would better
> keep it the simplest possible.
>
> -JB-
>
>>
>> Shanliang
>>
>> Jaroslav Bachorik wrote:
>>> Please, review this simple patch for an intermittently failing test.
>>>
>>> The test fails in cases when the connection loopback is resolved to be
>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The
>>> test does not anticipate this possibility and requires the loopback
>>> address to be exactly 127.0.0.1
>>>
>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>> consider them non equal even though they are both the same loopback. The
>>> patch adds a bit of flexibility to the test allowing for any two valid
>>> loopback addresses (127.0.0.0/8) to be equal.
>>>
>>> Issue  : JDK-8022220
>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>
>>> Thanks,
>>>
>>> -JB-
>>>
>>
>


From dmitry.samersoff at oracle.com  Wed Oct  2 07:11:02 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Wed, 02 Oct 2013 18:11:02 +0400
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <52308389.6060001@oracle.com>
References: <52308389.6060001@oracle.com>
Message-ID: <524C2976.8020109@oracle.com>

Jaroslav,

As a far as loopback address could be resolved to any of 127.0.0.0/8
client and server have to use the same loopback address.

Generally speaking it's not required for 127.0.1.1 to be able to talk to
127.0.0.1 and we are in risk to get a weird fail instead of clear error
message.

-Dmitry


On 2013-09-11 18:51, Jaroslav Bachorik wrote:
> Please, review this simple patch for an intermittently failing test.
> 
> The test fails in cases when the connection loopback is resolved to be
> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The
> test does not anticipate this possibility and requires the loopback
> address to be exactly 127.0.0.1
> 
> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
> consider them non equal even though they are both the same loopback. The
> patch adds a bit of flexibility to the test allowing for any two valid
> loopback addresses (127.0.0.0/8) to be equal.
> 
> Issue  : JDK-8022220
> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
> 
> Thanks,
> 
> -JB-
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From jaroslav.bachorik at oracle.com  Thu Oct  3 08:02:37 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 03 Oct 2013 17:02:37 +0200
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524C2976.8020109@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
Message-ID: <524D870D.2080804@oracle.com>

On 2.10.2013 16:11, Dmitry Samersoff wrote:
> Jaroslav,
>
> As a far as loopback address could be resolved to any of 127.0.0.0/8
> client and server have to use the same loopback address.

AFAIK, all the IPs 127.*.*.* equally designate the loopback interface. 
This might start breaking when you have more than one loopback interface 
in the system.
But all of this might be irrelevant here - the IPs are retrieved *after* 
the JMX connection has been established making it clear that they are 
reachable.

>
> Generally speaking it's not required for 127.0.1.1 to be able to talk to
> 127.0.0.1 and we are in risk to get a weird fail instead of clear error
> message.

As I said before as long as there is only one loopback interface it is 
safe to assume that all the loopback IPs are virtually identical. When 
we start considering multiple loopback interfaces we would need to take 
into account the also the assigned network interfaces.

But it might hardly matter - it seems that the main culprit for this 
test to fail on this particular configuration was the fact that 
127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and 
makes one question the sanity of the test setup...

-JB-

>
> -Dmitry
>
>
> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>> Please, review this simple patch for an intermittently failing test.
>>
>> The test fails in cases when the connection loopback is resolved to be
>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The
>> test does not anticipate this possibility and requires the loopback
>> address to be exactly 127.0.0.1
>>
>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>> consider them non equal even though they are both the same loopback. The
>> patch adds a bit of flexibility to the test allowing for any two valid
>> loopback addresses (127.0.0.0/8) to be equal.
>>
>> Issue  : JDK-8022220
>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>
>> Thanks,
>>
>> -JB-
>>
>
>


From chris.hegarty at oracle.com  Thu Oct  3 08:29:48 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Thu, 03 Oct 2013 16:29:48 +0100
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524D870D.2080804@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
	<524D870D.2080804@oracle.com>
Message-ID: <524D8D6C.9050907@oracle.com>


On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote:
> .......
> But it might hardly matter - it seems that the main culprit for this
> test to fail on this particular configuration was the fact that
> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and

I have not looked at the specifics, but if you have an InetAddress 
instance you can invoke the isLoopbackAddress() [1][2] method to 
correctly determine if the instance is a valid loopback address.

-Chris.

[1] 
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java
[2] 
http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java

> makes one question the sanity of the test setup...
>
> -JB-
>
>>
>> -Dmitry
>>
>>
>> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>>> Please, review this simple patch for an intermittently failing test.
>>>
>>> The test fails in cases when the connection loopback is resolved to be
>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The
>>> test does not anticipate this possibility and requires the loopback
>>> address to be exactly 127.0.0.1
>>>
>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>> consider them non equal even though they are both the same loopback. The
>>> patch adds a bit of flexibility to the test allowing for any two valid
>>> loopback addresses (127.0.0.0/8) to be equal.
>>>
>>> Issue  : JDK-8022220
>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>
>>> Thanks,
>>>
>>> -JB-
>>>
>>
>>
>

From jaroslav.bachorik at oracle.com  Thu Oct  3 08:37:02 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 03 Oct 2013 17:37:02 +0200
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524D8D6C.9050907@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
	<524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com>
Message-ID: <524D8F1E.9050904@oracle.com>

On 3.10.2013 17:29, Chris Hegarty wrote:
>
>
> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote:
>> .......
>> But it might hardly matter - it seems that the main culprit for this
>> test to fail on this particular configuration was the fact that
>> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and
>
> I have not looked at the specifics, but if you have an InetAddress
> instance you can invoke the isLoopbackAddress() [1][2] method to
> correctly determine if the instance is a valid loopback address.

Yes, and exactly this method seems to have failed to determine 127.0.0.1 
being a loopback - according to the test output.

I really can't see how because it basically compares the left-most byte 
of the IP to 127 ...

-JB-

>
> -Chris.
>
> [1]
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java
>
> [2]
> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java
>
>
>> makes one question the sanity of the test setup...
>>
>> -JB-
>>
>>>
>>> -Dmitry
>>>
>>>
>>> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>>>> Please, review this simple patch for an intermittently failing test.
>>>>
>>>> The test fails in cases when the connection loopback is resolved to be
>>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu.
>>>> The
>>>> test does not anticipate this possibility and requires the loopback
>>>> address to be exactly 127.0.0.1
>>>>
>>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>>> consider them non equal even though they are both the same loopback.
>>>> The
>>>> patch adds a bit of flexibility to the test allowing for any two valid
>>>> loopback addresses (127.0.0.0/8) to be equal.
>>>>
>>>> Issue  : JDK-8022220
>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>>
>>>
>>>
>>


From chris.hegarty at oracle.com  Thu Oct  3 08:43:20 2013
From: chris.hegarty at oracle.com (Chris Hegarty)
Date: Thu, 03 Oct 2013 16:43:20 +0100
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524D8F1E.9050904@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
	<524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com>
	<524D8F1E.9050904@oracle.com>
Message-ID: <524D9098.6030701@oracle.com>

On 10/03/2013 04:37 PM, Jaroslav Bachorik wrote:
> On 3.10.2013 17:29, Chris Hegarty wrote:
>>
>>
>> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote:
>>> .......
>>> But it might hardly matter - it seems that the main culprit for this
>>> test to fail on this particular configuration was the fact that
>>> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and
>>
>> I have not looked at the specifics, but if you have an InetAddress
>> instance you can invoke the isLoopbackAddress() [1][2] method to
>> correctly determine if the instance is a valid loopback address.
>
> Yes, and exactly this method seems to have failed to determine 127.0.0.1
> being a loopback - according to the test output.
>
> I really can't see how because it basically compares the left-most byte
> of the IP to 127 ...

Hmm... if this method fails to make the correct determination then we 
have problems ;-) We use isLoopbackAddress in may other networking, and 
similar, tests in the jdk.

Sorry, I don't know what to say, there must be some other kind of issue 
on your machine, or address is not truly 127.0.0.1.

-Chris.


>
> -JB-
>
>>
>> -Chris.
>>
>> [1]
>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java
>>
>>
>> [2]
>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java
>>
>>
>>
>>> makes one question the sanity of the test setup...
>>>
>>> -JB-
>>>
>>>>
>>>> -Dmitry
>>>>
>>>>
>>>> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>>>>> Please, review this simple patch for an intermittently failing test.
>>>>>
>>>>> The test fails in cases when the connection loopback is resolved to be
>>>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu.
>>>>> The
>>>>> test does not anticipate this possibility and requires the loopback
>>>>> address to be exactly 127.0.0.1
>>>>>
>>>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>>>> consider them non equal even though they are both the same loopback.
>>>>> The
>>>>> patch adds a bit of flexibility to the test allowing for any two valid
>>>>> loopback addresses (127.0.0.0/8) to be equal.
>>>>>
>>>>> Issue  : JDK-8022220
>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>
>>>>
>>>>
>>>
>

From dmitry.samersoff at oracle.com  Thu Oct  3 12:09:55 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Thu, 03 Oct 2013 23:09:55 +0400
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524D870D.2080804@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
	<524D870D.2080804@oracle.com>
Message-ID: <524DC103.7080401@oracle.com>

Jaroslav,

Behevior of mulitple loopback is not specified clearly[1] and is up to
OS developers or more precise - kernel setup.

Common practice is to assign 127.*.*.* to interfaces like tun, to be
able to use some socket-related calls ever if it is not connected to peer.

Other common situation - multiple loopback interfaces on host computer
to support virtual instances.

So on my opinion, it's better to be pessimistic and don't assume that
different loopback addresses are able to talk with each other.


[1]
http://tools.ietf.org/html/rfc3330


127.0.0.0/8 - This block is assigned for use as the Internet host
   loopback address.  A datagram sent by a higher level protocol to an
   address anywhere within this block should loop back inside the host.
   This is ordinarily implemented using only 127.0.0.1/32 for loopback,
   but no addresses within this block should ever appear on any network
   anywhere.

-Dmitry


On 2013-10-03 19:02, Jaroslav Bachorik wrote:
> On 2.10.2013 16:11, Dmitry Samersoff wrote:
>> Jaroslav,
>>
>> As a far as loopback address could be resolved to any of 127.0.0.0/8
>> client and server have to use the same loopback address.
> 
> AFAIK, all the IPs 127.*.*.* equally designate the loopback interface.
> This might start breaking when you have more than one loopback interface
> in the system.
> But all of this might be irrelevant here - the IPs are retrieved *after*
> the JMX connection has been established making it clear that they are
> reachable.
> 
>>
>> Generally speaking it's not required for 127.0.1.1 to be able to talk to
>> 127.0.0.1 and we are in risk to get a weird fail instead of clear error
>> message.
> 
> As I said before as long as there is only one loopback interface it is
> safe to assume that all the loopback IPs are virtually identical. When
> we start considering multiple loopback interfaces we would need to take
> into account the also the assigned network interfaces.
> 
> But it might hardly matter - it seems that the main culprit for this
> test to fail on this particular configuration was the fact that
> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and
> makes one question the sanity of the test setup...
> 
> -JB-
> 
>>
>> -Dmitry
>>
>>
>> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>>> Please, review this simple patch for an intermittently failing test.
>>>
>>> The test fails in cases when the connection loopback is resolved to be
>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The
>>> test does not anticipate this possibility and requires the loopback
>>> address to be exactly 127.0.0.1
>>>
>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>> consider them non equal even though they are both the same loopback. The
>>> patch adds a bit of flexibility to the test allowing for any two valid
>>> loopback addresses (127.0.0.0/8) to be equal.
>>>
>>> Issue  : JDK-8022220
>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>
>>> Thanks,
>>>
>>> -JB-
>>>
>>
>>
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From jaroslav.bachorik at oracle.com  Fri Oct  4 02:15:47 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Fri, 04 Oct 2013 11:15:47 +0200
Subject: jmx-dev RFR: 8022220 Intermittent test failures in
	javax/management/remote/mandatory/connection/RMIConnectionIdTest.java
In-Reply-To: <524D9098.6030701@oracle.com>
References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com>
	<524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com>
	<524D8F1E.9050904@oracle.com> <524D9098.6030701@oracle.com>
Message-ID: <524E8743.3070901@oracle.com>

On 3.10.2013 17:43, Chris Hegarty wrote:
> On 10/03/2013 04:37 PM, Jaroslav Bachorik wrote:
>> On 3.10.2013 17:29, Chris Hegarty wrote:
>>>
>>>
>>> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote:
>>>> .......
>>>> But it might hardly matter - it seems that the main culprit for this
>>>> test to fail on this particular configuration was the fact that
>>>> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and
>>>
>>> I have not looked at the specifics, but if you have an InetAddress
>>> instance you can invoke the isLoopbackAddress() [1][2] method to
>>> correctly determine if the instance is a valid loopback address.
>>
>> Yes, and exactly this method seems to have failed to determine 127.0.0.1
>> being a loopback - according to the test output.
>>
>> I really can't see how because it basically compares the left-most byte
>> of the IP to 127 ...
>
> Hmm... if this method fails to make the correct determination then we
> have problems ;-) We use isLoopbackAddress in may other networking, and
> similar, tests in the jdk.
>
> Sorry, I don't know what to say, there must be some other kind of issue
> on your machine, or address is not truly 127.0.0.1.

Well, it turns out that this issue was reported roughly 7 months after 
it actually appeared in the test stabilization run. When digging around 
for more info in the logs it became obvious that this problem has been 
covered by a separate issue and fixed for b84. Additionaly, there was 
some fiddling with /etc/hosts during the test run.

So, as usual, no black magic here ... just a lot of communication noise :/

Thanks everybody for taking your time and reviewing this unnecessary change.

-JB-

>
> -Chris.
>
>
>>
>> -JB-
>>
>>>
>>> -Chris.
>>>
>>> [1]
>>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java
>>>
>>>
>>>
>>> [2]
>>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java
>>>
>>>
>>>
>>>
>>>> makes one question the sanity of the test setup...
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> -Dmitry
>>>>>
>>>>>
>>>>> On 2013-09-11 18:51, Jaroslav Bachorik wrote:
>>>>>> Please, review this simple patch for an intermittently failing test.
>>>>>>
>>>>>> The test fails in cases when the connection loopback is resolved
>>>>>> to be
>>>>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu.
>>>>>> The
>>>>>> test does not anticipate this possibility and requires the loopback
>>>>>> address to be exactly 127.0.0.1
>>>>>>
>>>>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will
>>>>>> consider them non equal even though they are both the same loopback.
>>>>>> The
>>>>>> patch adds a bit of flexibility to the test allowing for any two
>>>>>> valid
>>>>>> loopback addresses (127.0.0.0/8) to be equal.
>>>>>>
>>>>>> Issue  : JDK-8022220
>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>
>>>>>
>>>>
>>


From jaroslav.bachorik at oracle.com  Mon Oct  7 06:59:23 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 15:59:23 +0200
Subject: jmx-dev RFR 7144200:
 java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR
 enabled
Message-ID: <5252BE3B.5020607@oracle.com>

The test captures the number of loaded classes right at the start and 
then checks the diffs when it's finished. However, it seems that there 
might by some async class loading still going on, initiated by JFR.

The patch simply adds a loop to wait for the number of loaded classes to 
settle before continuing. This should prevent the test failing with JFR 
intermittently.

Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/

Cheers,

-JB-

From jaroslav.bachorik at oracle.com  Mon Oct  7 07:14:14 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 16:14:14 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <523B0B30.4020003@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com>
Message-ID: <5252C1B6.2060904@oracle.com>

On 19.9.2013 16:33, Jaroslav Bachorik wrote:
> The updated webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>
> I've moved some of the functionality to the testlibrary.
>
> -JB -
>
> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>> Jaroslav,
>>>
>>> CustomLauncherTest.java:
>>>
>>> 102: this check could be moved to switch at ll. 108
>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>> unset.
>> Good idea. Thanks.
>>
>>> 129: I would prefer don't have pattern like this one ever in shell
>>> script. Could you prepare a list of VM's to check and just loop over it?
>>> It makes test better readable. Also I think nowdays we can always use
>>> server VM.
>> I tried to mirror the original shell test as closely as possible. It
>> would be nice if we could rely on the "server" vm only. Definitely more
>> readable.
>>
>> -JB-
>>
>>> -Dmitry
>>>
>>>
>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>> Hi Jaroslav,
>>>>>>>
>>>>>>> You need a copyright notice in the new file.
>>>>>>>
>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>> it to
>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to come
>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>
>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>> further modification but was wondering if there is something
>>>>>>> inherent in
>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>
>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>> hotspot
>>>>>>> test library's process tools available. :(
>>>>>> We have some, is there an obvious gap?
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>
>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>
>>>>> Please, stand by for the updated webrev.
>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>> application can recognize its own PID and print it to its stdout. The
>>>> main application then just reads the stdout to parse the PID. No need
>>>> for JCMD any more.
>>>>
>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>> property. According to the jtreg web documentation
>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>> "test.java"
>>>> system property should be available but in fact is not. But it seems
>>>> that the testlibrary uses "test.jdk" system property too.
>>>>
>>>> The test does not run on OSX because nobody built the launcher
>>>> binary :)
>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>> linux-amd64 launcher while working on the test.
>>>>
>>>> While working with the test library I realized I was missing a crucial
>>>> feature (at least for my purposes) - waiting for a certain message to
>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>> to wait for the target process to get to certain point before the test
>>>> can be allowed to continue - and the point is indicated by a message in
>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>> analyzed after the target process died - and are not suitable for this
>>>> kind of usage.
>>>>
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> -Chris.
>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>
>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>
>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>> similarity
>>>>>>>> between the *.sh and *.java file so one needs to go through the new
>>>>>>>> source in whole.
>>>>>>>>
>>>>>>>> The changes in "launcher" files are all about adding permissions to
>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> Issue  : JDK-8004926
>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>
>


From daniel.fuchs at oracle.com  Mon Oct  7 07:22:10 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Mon, 07 Oct 2013 16:22:10 +0200
Subject: jmx-dev RFR 7144200:
 java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR
 enabled
In-Reply-To: <5252BE3B.5020607@oracle.com>
References: <5252BE3B.5020607@oracle.com>
Message-ID: <5252C392.5010909@oracle.com>

Hi Jaroslav,

I am not an expert in classloading but I don't see any obvious
issue with what you propose.

I wonder whether making the test always run in /othervm mode
might make it more stable.

best regards,

-- daniel

On 10/7/13 3:59 PM, Jaroslav Bachorik wrote:
> The test captures the number of loaded classes right at the start and
> then checks the diffs when it's finished. However, it seems that there
> might by some async class loading still going on, initiated by JFR.
>
> The patch simply adds a loop to wait for the number of loaded classes to
> settle before continuing. This should prevent the test failing with JFR
> intermittently.
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/
>
> Cheers,
>
> -JB-


From dmitry.samersoff at oracle.com  Mon Oct  7 07:31:27 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Mon, 07 Oct 2013 18:31:27 +0400
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252C1B6.2060904@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com>
Message-ID: <5252C5BF.4060406@oracle.com>

Jarsolav,

Looks good for me, comments below is just a nits - so fill free to
ignore it.

1.
As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
findLibjvm parameter, It's better to create an overload function
findLibjvm().

2.
it's better to check for File.isFile() - readable (e.g. device) is not
always what you whant here.

3. It's good to try
ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
in order for the possible platforms with the only vm

-Dmitry


On 2013-10-07 18:14, Jaroslav Bachorik wrote:
> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>> The updated webrev:
>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>
>> I've moved some of the functionality to the testlibrary.
>>
>> -JB -
>>
>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>> Jaroslav,
>>>>
>>>> CustomLauncherTest.java:
>>>>
>>>> 102: this check could be moved to switch at ll. 108
>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>> unset.
>>> Good idea. Thanks.
>>>
>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>> script. Could you prepare a list of VM's to check and just loop over
>>>> it?
>>>> It makes test better readable. Also I think nowdays we can always use
>>>> server VM.
>>> I tried to mirror the original shell test as closely as possible. It
>>> would be nice if we could rely on the "server" vm only. Definitely more
>>> readable.
>>>
>>> -JB-
>>>
>>>> -Dmitry
>>>>
>>>>
>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>> Hi Jaroslav,
>>>>>>>>
>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>
>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>> it to
>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>> come
>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>
>>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>>> further modification but was wondering if there is something
>>>>>>>> inherent in
>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>
>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>> hotspot
>>>>>>>> test library's process tools available. :(
>>>>>>> We have some, is there an obvious gap?
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>
>>>>>>>
>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>
>>>>>> Please, stand by for the updated webrev.
>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>> application can recognize its own PID and print it to its stdout. The
>>>>> main application then just reads the stdout to parse the PID. No need
>>>>> for JCMD any more.
>>>>>
>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>> property. According to the jtreg web documentation
>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>> "test.java"
>>>>> system property should be available but in fact is not. But it seems
>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>
>>>>> The test does not run on OSX because nobody built the launcher
>>>>> binary :)
>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>> linux-amd64 launcher while working on the test.
>>>>>
>>>>> While working with the test library I realized I was missing a crucial
>>>>> feature (at least for my purposes) - waiting for a certain message to
>>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>>> to wait for the target process to get to certain point before the test
>>>>> can be allowed to continue - and the point is indicated by a
>>>>> message in
>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>> analyzed after the target process died - and are not suitable for this
>>>>> kind of usage.
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>
>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>
>>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>> similarity
>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>> new
>>>>>>>>> source in whole.
>>>>>>>>>
>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>> permissions to
>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>
>>
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From jaroslav.bachorik at oracle.com  Mon Oct  7 07:34:52 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 16:34:52 +0200
Subject: jmx-dev RFR 7144200:
 java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR
 enabled
In-Reply-To: <5252C392.5010909@oracle.com>
References: <5252BE3B.5020607@oracle.com> <5252C392.5010909@oracle.com>
Message-ID: <5252C68C.4050608@oracle.com>

On 7.10.2013 16:22, Daniel Fuchs wrote:
> Hi Jaroslav,
>
> I am not an expert in classloading but I don't see any obvious
> issue with what you propose.

I hope there is none :) If the number of loaded classes is not changing 
the test should continue immediately. The only problem could be loading 
the classes veeery slowly - not increasing the number of the loaded 
classes in 300ms interval. We could get a false positive and end up with 
the same failure as now :(

>
> I wonder whether making the test always run in /othervm mode
> might make it more stable.

I don't know. I was not able to reproduce the problem but from the 
description it sounds like it is spotted only with JFR enabled. So, I 
suppose, running it in othervm would not help at all.

-JB-

>
> best regards,
>
> -- daniel
>
> On 10/7/13 3:59 PM, Jaroslav Bachorik wrote:
>> The test captures the number of loaded classes right at the start and
>> then checks the diffs when it's finished. However, it seems that there
>> might by some async class loading still going on, initiated by JFR.
>>
>> The patch simply adds a loop to wait for the number of loaded classes to
>> settle before continuing. This should prevent the test failing with JFR
>> intermittently.
>>
>> Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
>> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/
>>
>> Cheers,
>>
>> -JB-
>


From staffan.larsen at oracle.com  Mon Oct  7 07:35:47 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Mon, 7 Oct 2013 16:35:47 +0200
Subject: jmx-dev RFR 7144200:
	java/lang/management/ClassLoadingMXBean/LoadCounts.java
	failed with JFR enabled
In-Reply-To: <5252BE3B.5020607@oracle.com>
References: <5252BE3B.5020607@oracle.com>
Message-ID: <F9D0D308-A172-4747-A0BD-EC8670C8BCCF@oracle.com>

This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.)

A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable.

In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test.

S/taffan 

On 7 okt 2013, at 15:59, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR.
> 
> The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently.
> 
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/
> 
> Cheers,
> 
> -JB-


From jaroslav.bachorik at oracle.com  Mon Oct  7 09:39:06 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 18:39:06 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252C5BF.4060406@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com>
Message-ID: <5252E3AA.5010702@oracle.com>

On 7.10.2013 16:31, Dmitry Samersoff wrote:
> Jarsolav,
>
> Looks good for me, comments below is just a nits - so fill free to
> ignore it.
>
> 1.
> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
> findLibjvm parameter, It's better to create an overload function
> findLibjvm().

Ok. It will make the code a further bit readable.

>
> 2.
> it's better to check for File.isFile() - readable (e.g. device) is not
> always what you whant here.

Can you elaborate why checking for the current user being able to read 
the actual library file might be wrong?

>
> 3. It's good to try
> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
> in order for the possible platforms with the only vm

Ok.

-JB-

>
> -Dmitry
>
>
> On 2013-10-07 18:14, Jaroslav Bachorik wrote:
>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>
>>> I've moved some of the functionality to the testlibrary.
>>>
>>> -JB -
>>>
>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>> Jaroslav,
>>>>>
>>>>> CustomLauncherTest.java:
>>>>>
>>>>> 102: this check could be moved to switch at ll. 108
>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>> unset.
>>>> Good idea. Thanks.
>>>>
>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>> it?
>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>> server VM.
>>>> I tried to mirror the original shell test as closely as possible. It
>>>> would be nice if we could rely on the "server" vm only. Definitely more
>>>> readable.
>>>>
>>>> -JB-
>>>>
>>>>> -Dmitry
>>>>>
>>>>>
>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>> Hi Jaroslav,
>>>>>>>>>
>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>
>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>> it to
>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>> come
>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>
>>>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>> inherent in
>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>
>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>> hotspot
>>>>>>>>> test library's process tools available. :(
>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>
>>>>>>>>
>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>
>>>>>>> Please, stand by for the updated webrev.
>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>> application can recognize its own PID and print it to its stdout. The
>>>>>> main application then just reads the stdout to parse the PID. No need
>>>>>> for JCMD any more.
>>>>>>
>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>> property. According to the jtreg web documentation
>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>> "test.java"
>>>>>> system property should be available but in fact is not. But it seems
>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>
>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>> binary :)
>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>> linux-amd64 launcher while working on the test.
>>>>>>
>>>>>> While working with the test library I realized I was missing a crucial
>>>>>> feature (at least for my purposes) - waiting for a certain message to
>>>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>>>> to wait for the target process to get to certain point before the test
>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>> message in
>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>> analyzed after the target process died - and are not suitable for this
>>>>>> kind of usage.
>>>>>>
>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>
>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>
>>>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>> similarity
>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>> new
>>>>>>>>>> source in whole.
>>>>>>>>>>
>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>> permissions to
>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>
>>>
>>
>
>


From dmitry.samersoff at oracle.com  Mon Oct  7 09:47:25 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Mon, 07 Oct 2013 20:47:25 +0400
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252E3AA.5010702@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com>
	<5252E3AA.5010702@oracle.com>
Message-ID: <5252E59D.30200@oracle.com>

Jaroslav,

> Can you elaborate why checking for the current user being able to read
> the actual library file might be wrong?

It's not applicable to this particular testcase (so I'd marked it as a
nit) but a generic security rule is to always check that we deal with a
regular file.

Try to link any block device to libjvm.so and see what happens.

-Dmitry


On 2013-10-07 20:39, Jaroslav Bachorik wrote:
> On 7.10.2013 16:31, Dmitry Samersoff wrote:
>> Jarsolav,
>>
>> Looks good for me, comments below is just a nits - so fill free to
>> ignore it.
>>
>> 1.
>> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
>> findLibjvm parameter, It's better to create an overload function
>> findLibjvm().
> 
> Ok. It will make the code a further bit readable.
> 
>>
>> 2.
>> it's better to check for File.isFile() - readable (e.g. device) is not
>> always what you whant here.
> 
> Can you elaborate why checking for the current user being able to read
> the actual library file might be wrong?
> 
>>
>> 3. It's good to try
>> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
>> in order for the possible platforms with the only vm
> 
> Ok.
> 
> -JB-
> 
>>
>> -Dmitry
>>
>>
>> On 2013-10-07 18:14, Jaroslav Bachorik wrote:
>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>> The updated webrev:
>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>
>>>> I've moved some of the functionality to the testlibrary.
>>>>
>>>> -JB -
>>>>
>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>> Jaroslav,
>>>>>>
>>>>>> CustomLauncherTest.java:
>>>>>>
>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>>> unset.
>>>>> Good idea. Thanks.
>>>>>
>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>>> it?
>>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>>> server VM.
>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>> more
>>>>> readable.
>>>>>
>>>>> -JB-
>>>>>
>>>>>> -Dmitry
>>>>>>
>>>>>>
>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>
>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>
>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>> it to
>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>> come
>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>
>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>> need
>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>> inherent in
>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>
>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>>> hotspot
>>>>>>>>>> test library's process tools available. :(
>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>
>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>
>>>>>>>> Please, stand by for the updated webrev.
>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>> The
>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>> need
>>>>>>> for JCMD any more.
>>>>>>>
>>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>>> property. According to the jtreg web documentation
>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>> "test.java"
>>>>>>> system property should be available but in fact is not. But it seems
>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>
>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>> binary :)
>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>
>>>>>>> While working with the test library I realized I was missing a
>>>>>>> crucial
>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>> message to
>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>> need
>>>>>>> to wait for the target process to get to certain point before the
>>>>>>> test
>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>> message in
>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>> this
>>>>>>> kind of usage.
>>>>>>>
>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>
>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>
>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>> original
>>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>> similarity
>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>>> new
>>>>>>>>>>> source in whole.
>>>>>>>>>>>
>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>> permissions to
>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From jaroslav.bachorik at oracle.com  Mon Oct  7 09:55:53 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 18:55:53 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252E59D.30200@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com>
	<5252E3AA.5010702@oracle.com> <5252E59D.30200@oracle.com>
Message-ID: <5252E799.4090402@oracle.com>

On 7.10.2013 18:47, Dmitry Samersoff wrote:
> Jaroslav,
>
>> Can you elaborate why checking for the current user being able to read
>> the actual library file might be wrong?
>
> It's not applicable to this particular testcase (so I'd marked it as a
> nit) but a generic security rule is to always check that we deal with a
> regular file.
>
> Try to link any block device to libjvm.so and see what happens.

Ok. I see - in that case it would probably be good to check either for 
regular file and it being readable.

-JB-


>
> -Dmitry
>
>
>
> On 2013-10-07 20:39, Jaroslav Bachorik wrote:
>> On 7.10.2013 16:31, Dmitry Samersoff wrote:
>>> Jarsolav,
>>>
>>> Looks good for me, comments below is just a nits - so fill free to
>>> ignore it.
>>>
>>> 1.
>>> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
>>> findLibjvm parameter, It's better to create an overload function
>>> findLibjvm().
>>
>> Ok. It will make the code a further bit readable.
>>
>>>
>>> 2.
>>> it's better to check for File.isFile() - readable (e.g. device) is not
>>> always what you whant here.
>>
>> Can you elaborate why checking for the current user being able to read
>> the actual library file might be wrong?
>>
>>>
>>> 3. It's good to try
>>> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
>>> in order for the possible platforms with the only vm
>>
>> Ok.
>>
>> -JB-
>>
>>>
>>> -Dmitry
>>>
>>>
>>> On 2013-10-07 18:14, Jaroslav Bachorik wrote:
>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>>> The updated webrev:
>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>>
>>>>> I've moved some of the functionality to the testlibrary.
>>>>>
>>>>> -JB -
>>>>>
>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>>> Jaroslav,
>>>>>>>
>>>>>>> CustomLauncherTest.java:
>>>>>>>
>>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>>>> unset.
>>>>>> Good idea. Thanks.
>>>>>>
>>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>>>> it?
>>>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>>>> server VM.
>>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>>> more
>>>>>> readable.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>> -Dmitry
>>>>>>>
>>>>>>>
>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>
>>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>>
>>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>>> it to
>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>>> come
>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>>
>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>>> need
>>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>>> inherent in
>>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>>
>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>>>> hotspot
>>>>>>>>>>> test library's process tools available. :(
>>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>>
>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>>
>>>>>>>>> Please, stand by for the updated webrev.
>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>>> The
>>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>>> need
>>>>>>>> for JCMD any more.
>>>>>>>>
>>>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>>>> property. According to the jtreg web documentation
>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>>> "test.java"
>>>>>>>> system property should be available but in fact is not. But it seems
>>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>>
>>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>>> binary :)
>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>>
>>>>>>>> While working with the test library I realized I was missing a
>>>>>>>> crucial
>>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>>> message to
>>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>>> need
>>>>>>>> to wait for the target process to get to certain point before the
>>>>>>>> test
>>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>>> message in
>>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>>> this
>>>>>>>> kind of usage.
>>>>>>>>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Chris.
>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>>
>>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>>
>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>>> original
>>>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>>> similarity
>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>>>> new
>>>>>>>>>>>> source in whole.
>>>>>>>>>>>>
>>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>>> permissions to
>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>
>


From jaroslav.bachorik at oracle.com  Mon Oct  7 10:10:33 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 07 Oct 2013 19:10:33 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252C5BF.4060406@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com>
Message-ID: <5252EB09.2090103@oracle.com>

On 7.10.2013 16:31, Dmitry Samersoff wrote:
> Jarsolav,
>
> Looks good for me, comments below is just a nits - so fill free to
> ignore it.
>
> 1.
> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
> findLibjvm parameter, It's better to create an overload function
> findLibjvm().
>
> 2.
> it's better to check for File.isFile() - readable (e.g. device) is not
> always what you whant here.
>
> 3. It's good to try
> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
> in order for the possible platforms with the only vm

Nits not ignored - 
http://cr.openjdk.java.net/~jbachorik/8004926/webrev.04/ :)

-JB-

>
> -Dmitry
>
>
> On 2013-10-07 18:14, Jaroslav Bachorik wrote:
>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>
>>> I've moved some of the functionality to the testlibrary.
>>>
>>> -JB -
>>>
>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>> Jaroslav,
>>>>>
>>>>> CustomLauncherTest.java:
>>>>>
>>>>> 102: this check could be moved to switch at ll. 108
>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>> unset.
>>>> Good idea. Thanks.
>>>>
>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>> it?
>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>> server VM.
>>>> I tried to mirror the original shell test as closely as possible. It
>>>> would be nice if we could rely on the "server" vm only. Definitely more
>>>> readable.
>>>>
>>>> -JB-
>>>>
>>>>> -Dmitry
>>>>>
>>>>>
>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>> Hi Jaroslav,
>>>>>>>>>
>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>
>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>> it to
>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>> come
>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>
>>>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>> inherent in
>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>
>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>> hotspot
>>>>>>>>> test library's process tools available. :(
>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>
>>>>>>>>
>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>
>>>>>>> Please, stand by for the updated webrev.
>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>> application can recognize its own PID and print it to its stdout. The
>>>>>> main application then just reads the stdout to parse the PID. No need
>>>>>> for JCMD any more.
>>>>>>
>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>> property. According to the jtreg web documentation
>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>> "test.java"
>>>>>> system property should be available but in fact is not. But it seems
>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>
>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>> binary :)
>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>> linux-amd64 launcher while working on the test.
>>>>>>
>>>>>> While working with the test library I realized I was missing a crucial
>>>>>> feature (at least for my purposes) - waiting for a certain message to
>>>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>>>> to wait for the target process to get to certain point before the test
>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>> message in
>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>> analyzed after the target process died - and are not suitable for this
>>>>>> kind of usage.
>>>>>>
>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>
>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>
>>>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>> similarity
>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>> new
>>>>>>>>>> source in whole.
>>>>>>>>>>
>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>> permissions to
>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>
>>>
>>
>
>


From dmitry.samersoff at oracle.com  Mon Oct  7 10:12:23 2013
From: dmitry.samersoff at oracle.com (Dmitry Samersoff)
Date: Mon, 07 Oct 2013 21:12:23 +0400
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252EB09.2090103@oracle.com>
References: <52308ECC.1050304@oracle.com>	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>	<523B0B30.4020003@oracle.com>
	<5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com>
	<5252EB09.2090103@oracle.com>
Message-ID: <5252EB77.8090702@oracle.com>

Jaroslav,

Thumbs up!

Thank you for addressing my comments.

-Dmitry


On 2013-10-07 21:10, Jaroslav Bachorik wrote:
> On 7.10.2013 16:31, Dmitry Samersoff wrote:
>> Jarsolav,
>>
>> Looks good for me, comments below is just a nits - so fill free to
>> ignore it.
>>
>> 1.
>> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for
>> findLibjvm parameter, It's better to create an overload function
>> findLibjvm().
>>
>> 2.
>> it's better to check for File.isFile() - readable (e.g. device) is not
>> always what you whant here.
>>
>> 3. It's good to try
>> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so
>> in order for the possible platforms with the only vm
> 
> Nits not ignored -
> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.04/ :)
> 
> -JB-
> 
>>
>> -Dmitry
>>
>>
>> On 2013-10-07 18:14, Jaroslav Bachorik wrote:
>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>> The updated webrev:
>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>
>>>> I've moved some of the functionality to the testlibrary.
>>>>
>>>> -JB -
>>>>
>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>> Jaroslav,
>>>>>>
>>>>>> CustomLauncherTest.java:
>>>>>>
>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>>> unset.
>>>>> Good idea. Thanks.
>>>>>
>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>>> it?
>>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>>> server VM.
>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>> more
>>>>> readable.
>>>>>
>>>>> -JB-
>>>>>
>>>>>> -Dmitry
>>>>>>
>>>>>>
>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>
>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>
>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>> it to
>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>> come
>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>
>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>> need
>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>> inherent in
>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>
>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>>> hotspot
>>>>>>>>>> test library's process tools available. :(
>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>
>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>
>>>>>>>> Please, stand by for the updated webrev.
>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>> The
>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>> need
>>>>>>> for JCMD any more.
>>>>>>>
>>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>>> property. According to the jtreg web documentation
>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>> "test.java"
>>>>>>> system property should be available but in fact is not. But it seems
>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>
>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>> binary :)
>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>
>>>>>>> While working with the test library I realized I was missing a
>>>>>>> crucial
>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>> message to
>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>> need
>>>>>>> to wait for the target process to get to certain point before the
>>>>>>> test
>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>> message in
>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>> this
>>>>>>> kind of usage.
>>>>>>>
>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>
>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>
>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>> original
>>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>> similarity
>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>>> new
>>>>>>>>>>> source in whole.
>>>>>>>>>>>
>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>> permissions to
>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>
>>>>
>>>
>>
>>
> 


-- 
Dmitry Samersoff
Oracle Java development team, Saint Petersburg, Russia
* I would love to change the world, but they won't give me the sources.

From david.holmes at oracle.com  Mon Oct  7 20:42:46 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 08 Oct 2013 13:42:46 +1000
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5252C1B6.2060904@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
Message-ID: <52537F36.8020001@oracle.com>

Jaroslav,

Can you summarise the changes please? With the conversion to Java and 
the infrastructure additions I can't tell what is actually fixing the 
original timeout issue :)

Thanks,
David

On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>> The updated webrev:
>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>
>> I've moved some of the functionality to the testlibrary.
>>
>> -JB -
>>
>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>> Jaroslav,
>>>>
>>>> CustomLauncherTest.java:
>>>>
>>>> 102: this check could be moved to switch at ll. 108
>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>> unset.
>>> Good idea. Thanks.
>>>
>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>> script. Could you prepare a list of VM's to check and just loop over
>>>> it?
>>>> It makes test better readable. Also I think nowdays we can always use
>>>> server VM.
>>> I tried to mirror the original shell test as closely as possible. It
>>> would be nice if we could rely on the "server" vm only. Definitely more
>>> readable.
>>>
>>> -JB-
>>>
>>>> -Dmitry
>>>>
>>>>
>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>> Hi Jaroslav,
>>>>>>>>
>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>
>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>> it to
>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>> come
>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>
>>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>>> further modification but was wondering if there is something
>>>>>>>> inherent in
>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>
>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>> hotspot
>>>>>>>> test library's process tools available. :(
>>>>>>> We have some, is there an obvious gap?
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>
>>>>>>>
>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>
>>>>>> Please, stand by for the updated webrev.
>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>> application can recognize its own PID and print it to its stdout. The
>>>>> main application then just reads the stdout to parse the PID. No need
>>>>> for JCMD any more.
>>>>>
>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>> property. According to the jtreg web documentation
>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>> "test.java"
>>>>> system property should be available but in fact is not. But it seems
>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>
>>>>> The test does not run on OSX because nobody built the launcher
>>>>> binary :)
>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>> linux-amd64 launcher while working on the test.
>>>>>
>>>>> While working with the test library I realized I was missing a crucial
>>>>> feature (at least for my purposes) - waiting for a certain message to
>>>>> appear in the stdout/stderr of the launched process. Very often I need
>>>>> to wait for the target process to get to certain point before the test
>>>>> can be allowed to continue - and the point is indicated by a
>>>>> message in
>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>> analyzed after the target process died - and are not suitable for this
>>>>> kind of usage.
>>>>>
>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> -Chris.
>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>
>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>
>>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>> similarity
>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>> new
>>>>>>>>> source in whole.
>>>>>>>>>
>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>> permissions to
>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>
>>
>

From david.holmes at oracle.com  Tue Oct  8 00:34:46 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 08 Oct 2013 17:34:46 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <524BDD9E.1050100@oracle.com>
References: <524BDD9E.1050100@oracle.com>
Message-ID: <5253B596.1000206@oracle.com>

Jaroslav,

On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
> Hello,
>
> currently the JVM uptime reported by the RuntimeMXBean is based on
> System.currentTimeMillis() which makes it susceptible to changes of the
> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
> should not depend on the system time and should be calculated using a
> monotonic clock source.
>
> There is already the way to get the actual JVM uptime in ticks. It is
> accessible as Management::timestamp() and the ticks are convertible to
> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very
> easy to switch to the monotonic clock based uptime.

Maybe I'm missing something but TiumeStamp updates using 
os::elapsed_counter() which on Linux uses gettimeofday so is not a 
monotonic clock source.

David
-----


> The patch consists of the hotspot and jdk parts.
>
> For the hotspot a new constant needs to be introduced in
> src/share/vm/services/jmm.h and the actual logic to obtain the uptime in
> milliseconds is added in src/share/vm/services/management.cpp.
>
> For the jdk the changes comprise of adding the necessary JNI bridging
> methods in order to get the new uptime, introducing the same constant
> that is used in hotspot and changes to mapfile-vers files in order to
> properly build the native library.
>
> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>
> Thanks,
>
> -JB-

From jaroslav.bachorik at oracle.com  Tue Oct  8 04:33:41 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 08 Oct 2013 13:33:41 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <52537F36.8020001@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com>
Message-ID: <5253ED95.20706@oracle.com>

On 8.10.2013 05:42, David Holmes wrote:
> Jaroslav,
>
> Can you summarise the changes please? With the conversion to Java and
> the infrastructure additions I can't tell what is actually fixing the
> original timeout issue :)

The timeout was most caused by using the same file for communication 
between java processes in more test cases. When those test cases were 
run in parallel the file got rewritten silently and some of the tests 
could end up trying to connect to incorrect port in the target 
application. I was able to reproduce the timeout by interleaving the 
test runs for CustomLauncherTest.sh and LocalManagementTest.sh and 
adding an artificial delay to CusteomLauncherTest.sh to allow 
LocalManagementTest.sh to change the port in the file.

While it could be fixed by using a different file for each test case I 
took the liberty of converting the shell tests to java tests. This 
allows me to remove the communication file and, in the end, make the 
tests more robust.

CustomLauncherTest.java and LocalManagementTest.java are the tests 
converted from shell to java. I decided to convert 
LocalManagementTest.sh as well because it has the same problems as the 
CustomLauncherTest.sh.

The changes in the testlibrary are about introducing new methods 
allowing the tests easily start a process and wait for a certain text 
appearing in its stdout/stderr. Using these methods the caller can wait 
till the callee is fully initialized and eg. ready to accept connections.

The changes in launchers make the launchers actually executable + I am 
adding a linux-amd64 launcher (I needed that one to work on the changes 
locally and thought it might be nice to have one more platform covered 
by the test).

I've update the webrev to include changes to LocalManagementTest and 
TEST.groups (both of those tests require JDK) - 
http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05

-JB-

>
> Thanks,
> David
>
> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>> The updated webrev:
>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>
>>> I've moved some of the functionality to the testlibrary.
>>>
>>> -JB -
>>>
>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>> Jaroslav,
>>>>>
>>>>> CustomLauncherTest.java:
>>>>>
>>>>> 102: this check could be moved to switch at ll. 108
>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>> unset.
>>>> Good idea. Thanks.
>>>>
>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>> it?
>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>> server VM.
>>>> I tried to mirror the original shell test as closely as possible. It
>>>> would be nice if we could rely on the "server" vm only. Definitely more
>>>> readable.
>>>>
>>>> -JB-
>>>>
>>>>> -Dmitry
>>>>>
>>>>>
>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>> Hi Jaroslav,
>>>>>>>>>
>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>
>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>> it to
>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>> come
>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>
>>>>>>>>> Is there a reason this test can't run on OSX? I know it would need
>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>> inherent in
>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>
>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>> hotspot
>>>>>>>>> test library's process tools available. :(
>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>
>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>
>>>>>>> Please, stand by for the updated webrev.
>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>> application can recognize its own PID and print it to its stdout. The
>>>>>> main application then just reads the stdout to parse the PID. No need
>>>>>> for JCMD any more.
>>>>>>
>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>> property. According to the jtreg web documentation
>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>> "test.java"
>>>>>> system property should be available but in fact is not. But it seems
>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>
>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>> binary :)
>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>> linux-amd64 launcher while working on the test.
>>>>>>
>>>>>> While working with the test library I realized I was missing a
>>>>>> crucial
>>>>>> feature (at least for my purposes) - waiting for a certain message to
>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>> need
>>>>>> to wait for the target process to get to certain point before the
>>>>>> test
>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>> message in
>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>> analyzed after the target process died - and are not suitable for
>>>>>> this
>>>>>> kind of usage.
>>>>>>
>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> -Chris.
>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>
>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>
>>>>>>>>>> In order to fix this the test is rewritten in Java - the original
>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>> similarity
>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>> new
>>>>>>>>>> source in whole.
>>>>>>>>>>
>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>> permissions to
>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>
>>>
>>


From jaroslav.bachorik at oracle.com  Tue Oct  8 05:36:36 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 08 Oct 2013 14:36:36 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <5253B596.1000206@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
Message-ID: <5253FC54.4010407@oracle.com>

On 8.10.2013 09:34, David Holmes wrote:
> Jaroslav,
>
> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>> Hello,
>>
>> currently the JVM uptime reported by the RuntimeMXBean is based on
>> System.currentTimeMillis() which makes it susceptible to changes of the
>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>> should not depend on the system time and should be calculated using a
>> monotonic clock source.
>>
>> There is already the way to get the actual JVM uptime in ticks. It is
>> accessible as Management::timestamp() and the ticks are convertible to
>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very
>> easy to switch to the monotonic clock based uptime.
>
> Maybe I'm missing something but TiumeStamp updates using
> os::elapsed_counter() which on Linux uses gettimeofday so is not a
> monotonic clock source.

Hm, yes. I wasn't aware of this linux/bsd specific.

Is there any reason why a non monotonic clock source is used for 
timestamping except of the historical one? os::javaTimeNanos() uses 
montonic clock when available - why can't be the same used for 
os::elapsed_counter() especially when a counter based on "gettimeofday" 
is not really a counter?

-JB-

>
> David
> -----
>
>
>
>> The patch consists of the hotspot and jdk parts.
>>
>> For the hotspot a new constant needs to be introduced in
>> src/share/vm/services/jmm.h and the actual logic to obtain the uptime in
>> milliseconds is added in src/share/vm/services/management.cpp.
>>
>> For the jdk the changes comprise of adding the necessary JNI bridging
>> methods in order to get the new uptime, introducing the same constant
>> that is used in hotspot and changes to mapfile-vers files in order to
>> properly build the native library.
>>
>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>
>> Thanks,
>>
>> -JB-


From david.holmes at oracle.com  Tue Oct  8 14:46:12 2013
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 09 Oct 2013 07:46:12 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <5253FC54.4010407@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com>
Message-ID: <52547D24.9060806@oracle.com>

On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
> On 8.10.2013 09:34, David Holmes wrote:
>> Jaroslav,
>>
>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>> Hello,
>>>
>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>> should not depend on the system time and should be calculated using a
>>> monotonic clock source.
>>>
>>> There is already the way to get the actual JVM uptime in ticks. It is
>>> accessible as Management::timestamp() and the ticks are convertible to
>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very
>>> easy to switch to the monotonic clock based uptime.
>>
>> Maybe I'm missing something but TiumeStamp updates using
>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>> monotonic clock source.
>
> Hm, yes. I wasn't aware of this linux/bsd specific.
>
> Is there any reason why a non monotonic clock source is used for
> timestamping except of the historical one? os::javaTimeNanos() uses
> montonic clock when available - why can't be the same used for
> os::elapsed_counter() especially when a counter based on "gettimeofday"
> is not really a counter?

It is all historical. These elapsed_counters and elapsed_timers make me 
cringe. But changing it has a lot of potential consequences because of 
the way these are used in logging etc. Certainly not something to be 
contemplated at this stage of JDK 8.

Perhaps a simpler fix here is to expose a startUpTimeNanos that can then 
be used for the uptime.

David

> -JB-
>
>>
>> David
>> -----
>>
>>
>>
>>> The patch consists of the hotspot and jdk parts.
>>>
>>> For the hotspot a new constant needs to be introduced in
>>> src/share/vm/services/jmm.h and the actual logic to obtain the uptime in
>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>
>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>> methods in order to get the new uptime, introducing the same constant
>>> that is used in hotspot and changes to mapfile-vers files in order to
>>> properly build the native library.
>>>
>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>
>>> Thanks,
>>>
>>> -JB-
>

From david.holmes at oracle.com  Wed Oct  9 03:23:54 2013
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 09 Oct 2013 20:23:54 +1000
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <5253ED95.20706@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com>
Message-ID: <52552EBA.4060308@oracle.com>

Jaroslav,

Thanks for the details description of changes - much appreciated.

There is a lot to digest in there. :)

It isn't obvious to me why these tests require a full JDK?

I don't quite follow the libjvm lookup logic - I would expect that you 
would always want to test the libjvm that is currently running - though 
it is hard to determine that.

Thanks,
David

On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote:
> On 8.10.2013 05:42, David Holmes wrote:
>> Jaroslav,
>>
>> Can you summarise the changes please? With the conversion to Java and
>> the infrastructure additions I can't tell what is actually fixing the
>> original timeout issue :)
>
> The timeout was most caused by using the same file for communication
> between java processes in more test cases. When those test cases were
> run in parallel the file got rewritten silently and some of the tests
> could end up trying to connect to incorrect port in the target
> application. I was able to reproduce the timeout by interleaving the
> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and
> adding an artificial delay to CusteomLauncherTest.sh to allow
> LocalManagementTest.sh to change the port in the file.
>
> While it could be fixed by using a different file for each test case I
> took the liberty of converting the shell tests to java tests. This
> allows me to remove the communication file and, in the end, make the
> tests more robust.
>
> CustomLauncherTest.java and LocalManagementTest.java are the tests
> converted from shell to java. I decided to convert
> LocalManagementTest.sh as well because it has the same problems as the
> CustomLauncherTest.sh.
>
> The changes in the testlibrary are about introducing new methods
> allowing the tests easily start a process and wait for a certain text
> appearing in its stdout/stderr. Using these methods the caller can wait
> till the callee is fully initialized and eg. ready to accept connections.
>
> The changes in launchers make the launchers actually executable + I am
> adding a linux-amd64 launcher (I needed that one to work on the changes
> locally and thought it might be nice to have one more platform covered
> by the test).
>
> I've update the webrev to include changes to LocalManagementTest and
> TEST.groups (both of those tests require JDK) -
> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05
>
> -JB-
>
>>
>> Thanks,
>> David
>>
>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>> The updated webrev:
>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>
>>>> I've moved some of the functionality to the testlibrary.
>>>>
>>>> -JB -
>>>>
>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>> Jaroslav,
>>>>>>
>>>>>> CustomLauncherTest.java:
>>>>>>
>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>>> unset.
>>>>> Good idea. Thanks.
>>>>>
>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>>> it?
>>>>>> It makes test better readable. Also I think nowdays we can always use
>>>>>> server VM.
>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>> more
>>>>> readable.
>>>>>
>>>>> -JB-
>>>>>
>>>>>> -Dmitry
>>>>>>
>>>>>>
>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>
>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>
>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>> it to
>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>> come
>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>
>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>> need
>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>> inherent in
>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>
>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>>> hotspot
>>>>>>>>>> test library's process tools available. :(
>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>
>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>
>>>>>>>> Please, stand by for the updated webrev.
>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target
>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>> The
>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>> need
>>>>>>> for JCMD any more.
>>>>>>>
>>>>>>> I could not find a way to remove the dependency on "test.jdk" system
>>>>>>> property. According to the jtreg web documentation
>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>> "test.java"
>>>>>>> system property should be available but in fact is not. But it seems
>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>
>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>> binary :)
>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>
>>>>>>> While working with the test library I realized I was missing a
>>>>>>> crucial
>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>> message to
>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>> need
>>>>>>> to wait for the target process to get to certain point before the
>>>>>>> test
>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>> message in
>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>> this
>>>>>>> kind of usage.
>>>>>>>
>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Chris.
>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>
>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>
>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>> original
>>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is
>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>> similarity
>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>>> new
>>>>>>>>>>> source in whole.
>>>>>>>>>>>
>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>> permissions to
>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>
>>>>
>>>
>

From jaroslav.bachorik at oracle.com  Wed Oct  9 04:26:27 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Oct 2013 13:26:27 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <52547D24.9060806@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
Message-ID: <52553D63.5000508@oracle.com>

On 8.10.2013 23:46, David Holmes wrote:
> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>> On 8.10.2013 09:34, David Holmes wrote:
>>> Jaroslav,
>>>
>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>> Hello,
>>>>
>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>> should not depend on the system time and should be calculated using a
>>>> monotonic clock source.
>>>>
>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>> very
>>>> easy to switch to the monotonic clock based uptime.
>>>
>>> Maybe I'm missing something but TiumeStamp updates using
>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>> monotonic clock source.
>>
>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>
>> Is there any reason why a non monotonic clock source is used for
>> timestamping except of the historical one? os::javaTimeNanos() uses
>> montonic clock when available - why can't be the same used for
>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>> is not really a counter?
>
> It is all historical. These elapsed_counters and elapsed_timers make me
> cringe. But changing it has a lot of potential consequences because of
> the way these are used in logging etc. Certainly not something to be
> contemplated at this stage of JDK 8.
>
> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
> be used for the uptime.

My attempt at this is at 
http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
I am using os::javaTimeNanos() to get the monotonic ticks where possible.

The JDK part stays the same as for webrev.00

-JB-

>
> David
>
>> -JB-
>>
>>>
>>> David
>>> -----
>>>
>>>
>>>
>>>> The patch consists of the hotspot and jdk parts.
>>>>
>>>> For the hotspot a new constant needs to be introduced in
>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>> uptime in
>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>
>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>> methods in order to get the new uptime, introducing the same constant
>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>> properly build the native library.
>>>>
>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>


From jaroslav.bachorik at oracle.com  Wed Oct  9 04:31:57 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Oct 2013 13:31:57 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <52552EBA.4060308@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com>
	<52552EBA.4060308@oracle.com>
Message-ID: <52553EAD.4040506@oracle.com>

On 9.10.2013 12:23, David Holmes wrote:
> Jaroslav,
>
> Thanks for the details description of changes - much appreciated.
>
> There is a lot to digest in there. :)

Yep, it started as a simple fix :/

>
> It isn't obvious to me why these tests require a full JDK?

IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its 
requirements are the same as the ones of CustomLauncherTest.sh (now 
*.java) so it seemed logical to list it there too.

>
> I don't quite follow the libjvm lookup logic - I would expect that you
> would always want to test the libjvm that is currently running - though
> it is hard to determine that.

I'm afraid I can't be of much assistance here - I just took what was in 
the *.sh version and converted it to *.java.

-JB-

>
> Thanks,
> David
>
> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote:
>> On 8.10.2013 05:42, David Holmes wrote:
>>> Jaroslav,
>>>
>>> Can you summarise the changes please? With the conversion to Java and
>>> the infrastructure additions I can't tell what is actually fixing the
>>> original timeout issue :)
>>
>> The timeout was most caused by using the same file for communication
>> between java processes in more test cases. When those test cases were
>> run in parallel the file got rewritten silently and some of the tests
>> could end up trying to connect to incorrect port in the target
>> application. I was able to reproduce the timeout by interleaving the
>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and
>> adding an artificial delay to CusteomLauncherTest.sh to allow
>> LocalManagementTest.sh to change the port in the file.
>>
>> While it could be fixed by using a different file for each test case I
>> took the liberty of converting the shell tests to java tests. This
>> allows me to remove the communication file and, in the end, make the
>> tests more robust.
>>
>> CustomLauncherTest.java and LocalManagementTest.java are the tests
>> converted from shell to java. I decided to convert
>> LocalManagementTest.sh as well because it has the same problems as the
>> CustomLauncherTest.sh.
>>
>> The changes in the testlibrary are about introducing new methods
>> allowing the tests easily start a process and wait for a certain text
>> appearing in its stdout/stderr. Using these methods the caller can wait
>> till the callee is fully initialized and eg. ready to accept connections.
>>
>> The changes in launchers make the launchers actually executable + I am
>> adding a linux-amd64 launcher (I needed that one to work on the changes
>> locally and thought it might be nice to have one more platform covered
>> by the test).
>>
>> I've update the webrev to include changes to LocalManagementTest and
>> TEST.groups (both of those tests require JDK) -
>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05
>>
>> -JB-
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>>> The updated webrev:
>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>>
>>>>> I've moved some of the functionality to the testlibrary.
>>>>>
>>>>> -JB -
>>>>>
>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>>> Jaroslav,
>>>>>>>
>>>>>>> CustomLauncherTest.java:
>>>>>>>
>>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains
>>>>>>> unset.
>>>>>> Good idea. Thanks.
>>>>>>
>>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>>> script. Could you prepare a list of VM's to check and just loop over
>>>>>>> it?
>>>>>>> It makes test better readable. Also I think nowdays we can always
>>>>>>> use
>>>>>>> server VM.
>>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>>> more
>>>>>> readable.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>> -Dmitry
>>>>>>>
>>>>>>>
>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>
>>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>>
>>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>>> it to
>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>>> come
>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>>
>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>>> need
>>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>>> inherent in
>>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>>
>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the
>>>>>>>>>>> hotspot
>>>>>>>>>>> test library's process tools available. :(
>>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>>
>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>>
>>>>>>>>> Please, stand by for the updated webrev.
>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the
>>>>>>>> target
>>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>>> The
>>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>>> need
>>>>>>>> for JCMD any more.
>>>>>>>>
>>>>>>>> I could not find a way to remove the dependency on "test.jdk"
>>>>>>>> system
>>>>>>>> property. According to the jtreg web documentation
>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>>> "test.java"
>>>>>>>> system property should be available but in fact is not. But it
>>>>>>>> seems
>>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>>
>>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>>> binary :)
>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>>
>>>>>>>> While working with the test library I realized I was missing a
>>>>>>>> crucial
>>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>>> message to
>>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>>> need
>>>>>>>> to wait for the target process to get to certain point before the
>>>>>>>> test
>>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>>> message in
>>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in
>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>>> this
>>>>>>>> kind of usage.
>>>>>>>>
>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -Chris.
>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>>
>>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>>
>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>>> original
>>>>>>>>>>>> functionality and outputs should be 100% preserved. The
>>>>>>>>>>>> patch is
>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>>> similarity
>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the
>>>>>>>>>>>> new
>>>>>>>>>>>> source in whole.
>>>>>>>>>>>>
>>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>>> permissions to
>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>>> Webrev :
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>


From staffan.larsen at oracle.com  Wed Oct  9 07:10:45 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 9 Oct 2013 16:10:45 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <52553D63.5000508@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
Message-ID: <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>

There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns.

This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter().

And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter().

/Staffan


On 9 okt 2013, at 13:26, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 8.10.2013 23:46, David Holmes wrote:
>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>> On 8.10.2013 09:34, David Holmes wrote:
>>>> Jaroslav,
>>>> 
>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>> Hello,
>>>>> 
>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>>> should not depend on the system time and should be calculated using a
>>>>> monotonic clock source.
>>>>> 
>>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>>> very
>>>>> easy to switch to the monotonic clock based uptime.
>>>> 
>>>> Maybe I'm missing something but TiumeStamp updates using
>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>> monotonic clock source.
>>> 
>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>> 
>>> Is there any reason why a non monotonic clock source is used for
>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>> montonic clock when available - why can't be the same used for
>>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>>> is not really a counter?
>> 
>> It is all historical. These elapsed_counters and elapsed_timers make me
>> cringe. But changing it has a lot of potential consequences because of
>> the way these are used in logging etc. Certainly not something to be
>> contemplated at this stage of JDK 8.
>> 
>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
>> be used for the uptime.
> 
> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
> I am using os::javaTimeNanos() to get the monotonic ticks where possible.
> 
> The JDK part stays the same as for webrev.00
> 
> -JB-
> 
>> 
>> David
>> 
>>> -JB-
>>> 
>>>> 
>>>> David
>>>> -----
>>>> 
>>>> 
>>>> 
>>>>> The patch consists of the hotspot and jdk parts.
>>>>> 
>>>>> For the hotspot a new constant needs to be introduced in
>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>> uptime in
>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>> 
>>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>>> methods in order to get the new uptime, introducing the same constant
>>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>>> properly build the native library.
>>>>> 
>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> -JB-
>>> 
> 


From jaroslav.bachorik at oracle.com  Wed Oct  9 07:19:48 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 09 Oct 2013 16:19:48 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
Message-ID: <52556604.3080900@oracle.com>

On 9.10.2013 16:10, Staffan Larsen wrote:
> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns.
>
> This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter().
>
> And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter().

Also the reported GC pauses duration might be wrong since it uses 
Management::timestamp().

On the first sight the change looks rather trivial. But, honestly, I'm 
not sure which other parts could for whatever reason break once the 
time-of-day timestamp is replaced with a monotonic equivalent. One would 
think that it shouldn't matter but one never knows ...

Staffan, do you think this kind of change is suitable for the current 
phase of JDK release cycle? I think I could improve the patch in few 
days and then it should probably be able to pass the review before ZBB. 
But, it's only P3  ...

-JB-

>
> /Staffan
>
>
> On 9 okt 2013, at 13:26, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> On 8.10.2013 23:46, David Holmes wrote:
>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>> Jaroslav,
>>>>>
>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>> Hello,
>>>>>>
>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>>>> should not depend on the system time and should be calculated using a
>>>>>> monotonic clock source.
>>>>>>
>>>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>>>> very
>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>
>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>> monotonic clock source.
>>>>
>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>
>>>> Is there any reason why a non monotonic clock source is used for
>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>> montonic clock when available - why can't be the same used for
>>>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>>>> is not really a counter?
>>>
>>> It is all historical. These elapsed_counters and elapsed_timers make me
>>> cringe. But changing it has a lot of potential consequences because of
>>> the way these are used in logging etc. Certainly not something to be
>>> contemplated at this stage of JDK 8.
>>>
>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
>>> be used for the uptime.
>>
>> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>> I am using os::javaTimeNanos() to get the monotonic ticks where possible.
>>
>> The JDK part stays the same as for webrev.00
>>
>> -JB-
>>
>>>
>>> David
>>>
>>>> -JB-
>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>>
>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>
>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>> uptime in
>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>
>>>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>>>> methods in order to get the new uptime, introducing the same constant
>>>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>>>> properly build the native library.
>>>>>>
>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -JB-
>>>>
>>
>


From staffan.larsen at oracle.com  Wed Oct  9 11:12:47 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 9 Oct 2013 20:12:47 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <52556604.3080900@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
Message-ID: <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>


On 9 okt 2013, at 16:19, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 9.10.2013 16:10, Staffan Larsen wrote:
>> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns.
>> 
>> This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter().
>> 
>> And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter().
> 
> Also the reported GC pauses duration might be wrong since it uses Management::timestamp().
> 
> On the first sight the change looks rather trivial. But, honestly, I'm not sure which other parts could for whatever reason break once the time-of-day timestamp is replaced with a monotonic equivalent. One would think that it shouldn't matter but one never knows ...
> 
> Staffan, do you think this kind of change is suitable for the current phase of JDK release cycle? I think I could improve the patch in few days and then it should probably be able to pass the review before ZBB. But, it's only P3  ...

I think it is a bit late in the release cycle to clean this up in the way it should be cleaned up. I think we should wait until the first 8 update release and do a more thorough job than we have time for right now.

/Staffan


> 
> -JB-
> 
>> 
>> /Staffan
>> 
>> 
>> On 9 okt 2013, at 13:26, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>> 
>>> On 8.10.2013 23:46, David Holmes wrote:
>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>> Jaroslav,
>>>>>> 
>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>> Hello,
>>>>>>> 
>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>>>>> should not depend on the system time and should be calculated using a
>>>>>>> monotonic clock source.
>>>>>>> 
>>>>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>>>>> very
>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>> 
>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>>> monotonic clock source.
>>>>> 
>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>> 
>>>>> Is there any reason why a non monotonic clock source is used for
>>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>>> montonic clock when available - why can't be the same used for
>>>>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>>>>> is not really a counter?
>>>> 
>>>> It is all historical. These elapsed_counters and elapsed_timers make me
>>>> cringe. But changing it has a lot of potential consequences because of
>>>> the way these are used in logging etc. Certainly not something to be
>>>> contemplated at this stage of JDK 8.
>>>> 
>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
>>>> be used for the uptime.
>>> 
>>> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>> I am using os::javaTimeNanos() to get the monotonic ticks where possible.
>>> 
>>> The JDK part stays the same as for webrev.00
>>> 
>>> -JB-
>>> 
>>>> 
>>>> David
>>>> 
>>>>> -JB-
>>>>> 
>>>>>> 
>>>>>> David
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>> 
>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>> uptime in
>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>> 
>>>>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>>>>> methods in order to get the new uptime, introducing the same constant
>>>>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>>>>> properly build the native library.
>>>>>>> 
>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> 
>>>>>>> -JB-
>>>>> 
>>> 
>> 
> 


From david.holmes at oracle.com  Wed Oct  9 20:44:52 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 10 Oct 2013 13:44:52 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
Message-ID: <525622B4.5020606@oracle.com>

On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>
> On 9 okt 2013, at 16:19, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>
>>> This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter().
>>>
>>> And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>
>> Also the reported GC pauses duration might be wrong since it uses Management::timestamp().
>>
>> On the first sight the change looks rather trivial. But, honestly, I'm not sure which other parts could for whatever reason break once the time-of-day timestamp is replaced with a monotonic equivalent. One would think that it shouldn't matter but one never knows ...
>>
>> Staffan, do you think this kind of change is suitable for the current phase of JDK release cycle? I think I could improve the patch in few days and then it should probably be able to pass the review before ZBB. But, it's only P3  ...
>
> I think it is a bit late in the release cycle to clean this up in the way it should be cleaned up. I think we should wait until the first 8 update release and do a more thorough job than we have time for right now.

I second that. The elapsed_counter/elpased_timer APIs and 
implementations are a tangled mess. But part of the problem has been 
that people want/expect monotonic time-of-day based timestamps (yes a 
contradiction - though some people make sure TOD does not get modified 
on their production systems). The use of timestamps in logging has to be 
examined carefully - mainly GC logging. I recall a "simple" attempted 
change in the past that resulted in trying to compare a nanoTime based 
timestamp with the TOD. :(

David
-----

> /Staffan
>
>
>>
>> -JB-
>>
>>>
>>> /Staffan
>>>
>>>
>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>>>
>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>> Jaroslav,
>>>>>>>
>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>>>> System.currentTimeMillis() which makes it susceptible to changes of the
>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>>>>>> should not depend on the system time and should be calculated using a
>>>>>>>> monotonic clock source.
>>>>>>>>
>>>>>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>>>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>>>>>> very
>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>
>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>>>> monotonic clock source.
>>>>>>
>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>
>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>>>> montonic clock when available - why can't be the same used for
>>>>>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>>>>>> is not really a counter?
>>>>>
>>>>> It is all historical. These elapsed_counters and elapsed_timers make me
>>>>> cringe. But changing it has a lot of potential consequences because of
>>>>> the way these are used in logging etc. Certainly not something to be
>>>>> contemplated at this stage of JDK 8.
>>>>>
>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
>>>>> be used for the uptime.
>>>>
>>>> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>> I am using os::javaTimeNanos() to get the monotonic ticks where possible.
>>>>
>>>> The JDK part stays the same as for webrev.00
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> David
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>
>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>> uptime in
>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>
>>>>>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>>>>>> methods in order to get the new uptime, introducing the same constant
>>>>>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>>>>>> properly build the native library.
>>>>>>>>
>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -JB-
>>>>>>
>>>>
>>>
>>
>

From david.holmes at oracle.com  Wed Oct  9 21:33:19 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 10 Oct 2013 14:33:19 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <52553D63.5000508@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
Message-ID: <52562E0F.7020108@oracle.com>

On 9/10/2013 9:26 PM, Jaroslav Bachorik wrote:
> On 8.10.2013 23:46, David Holmes wrote:
>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>> On 8.10.2013 09:34, David Holmes wrote:
>>>> Jaroslav,
>>>>
>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>> Hello,
>>>>>
>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>> System.currentTimeMillis() which makes it susceptible to changes of
>>>>> the
>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime
>>>>> should not depend on the system time and should be calculated using a
>>>>> monotonic clock source.
>>>>>
>>>>> There is already the way to get the actual JVM uptime in ticks. It is
>>>>> accessible as Management::timestamp() and the ticks are convertible to
>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it
>>>>> very
>>>>> easy to switch to the monotonic clock based uptime.
>>>>
>>>> Maybe I'm missing something but TiumeStamp updates using
>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>> monotonic clock source.
>>>
>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>
>>> Is there any reason why a non monotonic clock source is used for
>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>> montonic clock when available - why can't be the same used for
>>> os::elapsed_counter() especially when a counter based on "gettimeofday"
>>> is not really a counter?
>>
>> It is all historical. These elapsed_counters and elapsed_timers make me
>> cringe. But changing it has a lot of potential consequences because of
>> the way these are used in logging etc. Certainly not something to be
>> contemplated at this stage of JDK 8.
>>
>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then
>> be used for the uptime.
>
> My attempt at this is at
> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
> I am using os::javaTimeNanos() to get the monotonic ticks where possible.

Only nit with this is that you initialize _begin_vm_creation_ns very 
early compared to the other timestamps. Plus I'm not even certain when 
this global initializer will execute relative to the VM initialization 
sequence! Best to move it into Management::init() to where 
_begin_vm_creation_time is initialized.

David
-----

> The JDK part stays the same as for webrev.00
>
> -JB-
>
>>
>> David
>>
>>> -JB-
>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>
>>>>
>>>>> The patch consists of the hotspot and jdk parts.
>>>>>
>>>>> For the hotspot a new constant needs to be introduced in
>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>> uptime in
>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>
>>>>> For the jdk the changes comprise of adding the necessary JNI bridging
>>>>> methods in order to get the new uptime, introducing the same constant
>>>>> that is used in hotspot and changes to mapfile-vers files in order to
>>>>> properly build the native library.
>>>>>
>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>
>

From david.holmes at oracle.com  Wed Oct  9 21:41:25 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 10 Oct 2013 14:41:25 +1000
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <52553EAD.4040506@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com>
	<52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com>
Message-ID: <52562FF5.5060304@oracle.com>

On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote:
> On 9.10.2013 12:23, David Holmes wrote:
>> Jaroslav,
>>
>> Thanks for the details description of changes - much appreciated.
>>
>> There is a lot to digest in there. :)
>
> Yep, it started as a simple fix :/
>
>>
>> It isn't obvious to me why these tests require a full JDK?
>
> IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its
> requirements are the same as the ones of CustomLauncherTest.sh (now
> *.java) so it seemed logical to list it there too.

Ah! Now I see it - it uses tools.jar which implies a full JDK.

>>
>> I don't quite follow the libjvm lookup logic - I would expect that you
>> would always want to test the libjvm that is currently running - though
>> it is hard to determine that.
>
> I'm afraid I can't be of much assistance here - I just took what was in
> the *.sh version and converted it to *.java.

Okay. I expect this will need revisiting at some point.

Thanks,
David
-----


> -JB-
>
>>
>> Thanks,
>> David
>>
>> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote:
>>> On 8.10.2013 05:42, David Holmes wrote:
>>>> Jaroslav,
>>>>
>>>> Can you summarise the changes please? With the conversion to Java and
>>>> the infrastructure additions I can't tell what is actually fixing the
>>>> original timeout issue :)
>>>
>>> The timeout was most caused by using the same file for communication
>>> between java processes in more test cases. When those test cases were
>>> run in parallel the file got rewritten silently and some of the tests
>>> could end up trying to connect to incorrect port in the target
>>> application. I was able to reproduce the timeout by interleaving the
>>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and
>>> adding an artificial delay to CusteomLauncherTest.sh to allow
>>> LocalManagementTest.sh to change the port in the file.
>>>
>>> While it could be fixed by using a different file for each test case I
>>> took the liberty of converting the shell tests to java tests. This
>>> allows me to remove the communication file and, in the end, make the
>>> tests more robust.
>>>
>>> CustomLauncherTest.java and LocalManagementTest.java are the tests
>>> converted from shell to java. I decided to convert
>>> LocalManagementTest.sh as well because it has the same problems as the
>>> CustomLauncherTest.sh.
>>>
>>> The changes in the testlibrary are about introducing new methods
>>> allowing the tests easily start a process and wait for a certain text
>>> appearing in its stdout/stderr. Using these methods the caller can wait
>>> till the callee is fully initialized and eg. ready to accept
>>> connections.
>>>
>>> The changes in launchers make the launchers actually executable + I am
>>> adding a linux-amd64 launcher (I needed that one to work on the changes
>>> locally and thought it might be nice to have one more platform covered
>>> by the test).
>>>
>>> I've update the webrev to include changes to LocalManagementTest and
>>> TEST.groups (both of those tests require JDK) -
>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05
>>>
>>> -JB-
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>>>> The updated webrev:
>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>>>
>>>>>> I've moved some of the functionality to the testlibrary.
>>>>>>
>>>>>> -JB -
>>>>>>
>>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>>>> Jaroslav,
>>>>>>>>
>>>>>>>> CustomLauncherTest.java:
>>>>>>>>
>>>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM
>>>>>>>> remains
>>>>>>>> unset.
>>>>>>> Good idea. Thanks.
>>>>>>>
>>>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>>>> script. Could you prepare a list of VM's to check and just loop
>>>>>>>> over
>>>>>>>> it?
>>>>>>>> It makes test better readable. Also I think nowdays we can always
>>>>>>>> use
>>>>>>>> server VM.
>>>>>>> I tried to mirror the original shell test as closely as possible. It
>>>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>>>> more
>>>>>>> readable.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>> -Dmitry
>>>>>>>>
>>>>>>>>
>>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>>
>>>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>>>
>>>>>>>>>>>> As written this test can only run on a full JDK - so please add
>>>>>>>>>>>> it to
>>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to
>>>>>>>>>>>> come
>>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>>>
>>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>>>> need
>>>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>>>> inherent in
>>>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>>>
>>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had
>>>>>>>>>>>> the
>>>>>>>>>>>> hotspot
>>>>>>>>>>>> test library's process tools available. :(
>>>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>>>
>>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> Hm, thanks for the info. I should have used this library instead.
>>>>>>>>>>
>>>>>>>>>> Please, stand by for the updated webrev.
>>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the
>>>>>>>>> target
>>>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>>>> The
>>>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>>>> need
>>>>>>>>> for JCMD any more.
>>>>>>>>>
>>>>>>>>> I could not find a way to remove the dependency on "test.jdk"
>>>>>>>>> system
>>>>>>>>> property. According to the jtreg web documentation
>>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>>>> "test.java"
>>>>>>>>> system property should be available but in fact is not. But it
>>>>>>>>> seems
>>>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>>>
>>>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>>>> binary :)
>>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>>>
>>>>>>>>> While working with the test library I realized I was missing a
>>>>>>>>> crucial
>>>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>>>> message to
>>>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>>>> need
>>>>>>>>> to wait for the target process to get to certain point before the
>>>>>>>>> test
>>>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>>>> message in
>>>>>>>>> stdout/stderr. Currently all the proc tools are designed to
>>>>>>>>> work in
>>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>>>> this
>>>>>>>>> kind of usage.
>>>>>>>>>
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -Chris.
>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>>>
>>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>>>> original
>>>>>>>>>>>>> functionality and outputs should be 100% preserved. The
>>>>>>>>>>>>> patch is
>>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>>>> similarity
>>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through
>>>>>>>>>>>>> the
>>>>>>>>>>>>> new
>>>>>>>>>>>>> source in whole.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>>>> permissions to
>>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>>>> Webrev :
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>
>

From jaroslav.bachorik at oracle.com  Thu Oct 10 04:02:24 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 10 Oct 2013 13:02:24 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525622B4.5020606@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com>
Message-ID: <52568940.4000704@oracle.com>

On 10.10.2013 05:44, David Holmes wrote:
> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>
>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>> <jaroslav.bachorik at oracle.com> wrote:
>>
>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>> There is now an awful amount of different timestamps in the
>>>> Management class. Can they be consolidated somehow? At least
>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>
>>>> This discussion also implies that the "elapsed time" we print in the
>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>> os::elapsed_counter().
>>>>
>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>
>>> Also the reported GC pauses duration might be wrong since it uses
>>> Management::timestamp().
>>>
>>> On the first sight the change looks rather trivial. But, honestly,
>>> I'm not sure which other parts could for whatever reason break once
>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>> One would think that it shouldn't matter but one never knows ...
>>>
>>> Staffan, do you think this kind of change is suitable for the current
>>> phase of JDK release cycle? I think I could improve the patch in few
>>> days and then it should probably be able to pass the review before
>>> ZBB. But, it's only P3  ...
>>
>> I think it is a bit late in the release cycle to clean this up in the
>> way it should be cleaned up. I think we should wait until the first 8
>> update release and do a more thorough job than we have time for right
>> now.
>
> I second that. The elapsed_counter/elpased_timer APIs and
> implementations are a tangled mess. But part of the problem has been
> that people want/expect monotonic time-of-day based timestamps (yes a
> contradiction - though some people make sure TOD does not get modified
> on their production systems). The use of timestamps in logging has to be
> examined carefully - mainly GC logging. I recall a "simple" attempted
> change in the past that resulted in trying to compare a nanoTime based
> timestamp with the TOD. :(

Actually, if I'm reading the sources right for Solaris and Win the 
monotonic clock source is used to provide elapsed_counter() value. It 
falls back to TOD when the monotonic clock source is not available.
For Linux/BSD the TOD is used directly.

This makes me wonder if changing the linux/bsd implementation to follow 
the same logic would be really that disruptive.

-JB-
>
> David
> -----
>
>> /Staffan
>>
>>
>>>
>>> -JB-
>>>
>>>>
>>>> /Staffan
>>>>
>>>>
>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>
>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>> Jaroslav,
>>>>>>>>
>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>> changes of the
>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The
>>>>>>>>> uptime
>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>> using a
>>>>>>>>> monotonic clock source.
>>>>>>>>>
>>>>>>>>> There is already the way to get the actual JVM uptime in ticks.
>>>>>>>>> It is
>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>> convertible to
>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>> making it
>>>>>>>>> very
>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>
>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>>>>> monotonic clock source.
>>>>>>>
>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>
>>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>> "gettimeofday"
>>>>>>> is not really a counter?
>>>>>>
>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>> make me
>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>> because of
>>>>>> the way these are used in logging etc. Certainly not something to be
>>>>>> contemplated at this stage of JDK 8.
>>>>>>
>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>> can then
>>>>>> be used for the uptime.
>>>>>
>>>>> My attempt at this is at
>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>> possible.
>>>>>
>>>>> The JDK part stays the same as for webrev.00
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> David
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>
>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>>> uptime in
>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>>
>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>> bridging
>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>> constant
>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>> order to
>>>>>>>>> properly build the native library.
>>>>>>>>>
>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>
>>>>>
>>>>
>>>
>>


From staffan.larsen at oracle.com  Thu Oct 10 04:15:49 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 10 Oct 2013 13:15:49 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <52568940.4000704@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
Message-ID: <23435103-156B-434F-994C-B6F913EE0364@oracle.com>


On 10 okt 2013, at 13:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 10.10.2013 05:44, David Holmes wrote:
>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>> 
>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>> <jaroslav.bachorik at oracle.com> wrote:
>>> 
>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>> There is now an awful amount of different timestamps in the
>>>>> Management class. Can they be consolidated somehow? At least
>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>> 
>>>>> This discussion also implies that the "elapsed time" we print in the
>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>> os::elapsed_counter().
>>>>> 
>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>> 
>>>> Also the reported GC pauses duration might be wrong since it uses
>>>> Management::timestamp().
>>>> 
>>>> On the first sight the change looks rather trivial. But, honestly,
>>>> I'm not sure which other parts could for whatever reason break once
>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>> One would think that it shouldn't matter but one never knows ...
>>>> 
>>>> Staffan, do you think this kind of change is suitable for the current
>>>> phase of JDK release cycle? I think I could improve the patch in few
>>>> days and then it should probably be able to pass the review before
>>>> ZBB. But, it's only P3  ...
>>> 
>>> I think it is a bit late in the release cycle to clean this up in the
>>> way it should be cleaned up. I think we should wait until the first 8
>>> update release and do a more thorough job than we have time for right
>>> now.
>> 
>> I second that. The elapsed_counter/elpased_timer APIs and
>> implementations are a tangled mess. But part of the problem has been
>> that people want/expect monotonic time-of-day based timestamps (yes a
>> contradiction - though some people make sure TOD does not get modified
>> on their production systems). The use of timestamps in logging has to be
>> examined carefully - mainly GC logging. I recall a "simple" attempted
>> change in the past that resulted in trying to compare a nanoTime based
>> timestamp with the TOD. :(
> 
> Actually, if I'm reading the sources right for Solaris and Win the monotonic clock source is used to provide elapsed_counter() value. It falls back to TOD when the monotonic clock source is not available.
> For Linux/BSD the TOD is used directly.
> 
> This makes me wonder if changing the linux/bsd implementation to follow the same logic would be really that disruptive.

Good point. I would like a world where elapsed_counter is monotonic (where possible). Still a bit scary this late in the release, but an interesting experiment.

/Staffan


> 
> -JB-
>> 
>> David
>> -----
>> 
>>> /Staffan
>>> 
>>> 
>>>> 
>>>> -JB-
>>>> 
>>>>> 
>>>>> /Staffan
>>>>> 
>>>>> 
>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>> 
>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>> Jaroslav,
>>>>>>>>> 
>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>> Hello,
>>>>>>>>>> 
>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>> changes of the
>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The
>>>>>>>>>> uptime
>>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>>> using a
>>>>>>>>>> monotonic clock source.
>>>>>>>>>> 
>>>>>>>>>> There is already the way to get the actual JVM uptime in ticks.
>>>>>>>>>> It is
>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>> convertible to
>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>> making it
>>>>>>>>>> very
>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>> 
>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>>>>>> monotonic clock source.
>>>>>>>> 
>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>> 
>>>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>> "gettimeofday"
>>>>>>>> is not really a counter?
>>>>>>> 
>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>> make me
>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>> because of
>>>>>>> the way these are used in logging etc. Certainly not something to be
>>>>>>> contemplated at this stage of JDK 8.
>>>>>>> 
>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>> can then
>>>>>>> be used for the uptime.
>>>>>> 
>>>>>> My attempt at this is at
>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>> possible.
>>>>>> 
>>>>>> The JDK part stays the same as for webrev.00
>>>>>> 
>>>>>> -JB-
>>>>>> 
>>>>>>> 
>>>>>>> David
>>>>>>> 
>>>>>>>> -JB-
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>> 
>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>>>> uptime in
>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>>> 
>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>> bridging
>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>> constant
>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>> order to
>>>>>>>>>> properly build the native library.
>>>>>>>>>> 
>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> 
>>>>>>>>>> -JB-
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
> 


From jaroslav.bachorik at oracle.com  Mon Oct 14 07:13:33 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 14 Oct 2013 16:13:33 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <23435103-156B-434F-994C-B6F913EE0364@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
Message-ID: <525BFC0D.8090101@oracle.com>

On 10.10.2013 13:15, Staffan Larsen wrote:
>
> On 10 okt 2013, at 13:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> On 10.10.2013 05:44, David Holmes wrote:
>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>
>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>
>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>> There is now an awful amount of different timestamps in the
>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>
>>>>>> This discussion also implies that the "elapsed time" we print in the
>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>> os::elapsed_counter().
>>>>>>
>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>
>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>> Management::timestamp().
>>>>>
>>>>> On the first sight the change looks rather trivial. But, honestly,
>>>>> I'm not sure which other parts could for whatever reason break once
>>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>
>>>>> Staffan, do you think this kind of change is suitable for the current
>>>>> phase of JDK release cycle? I think I could improve the patch in few
>>>>> days and then it should probably be able to pass the review before
>>>>> ZBB. But, it's only P3  ...
>>>>
>>>> I think it is a bit late in the release cycle to clean this up in the
>>>> way it should be cleaned up. I think we should wait until the first 8
>>>> update release and do a more thorough job than we have time for right
>>>> now.
>>>
>>> I second that. The elapsed_counter/elpased_timer APIs and
>>> implementations are a tangled mess. But part of the problem has been
>>> that people want/expect monotonic time-of-day based timestamps (yes a
>>> contradiction - though some people make sure TOD does not get modified
>>> on their production systems). The use of timestamps in logging has to be
>>> examined carefully - mainly GC logging. I recall a "simple" attempted
>>> change in the past that resulted in trying to compare a nanoTime based
>>> timestamp with the TOD. :(
>>
>> Actually, if I'm reading the sources right for Solaris and Win the monotonic clock source is used to provide elapsed_counter() value. It falls back to TOD when the monotonic clock source is not available.
>> For Linux/BSD the TOD is used directly.
>>
>> This makes me wonder if changing the linux/bsd implementation to follow the same logic would be really that disruptive.
>
> Good point. I would like a world where elapsed_counter is monotonic (where possible). Still a bit scary this late in the release, but an interesting experiment.

The change is rather simple and tests ok. All the means to get a 
monotonic timestamp are already there and proved to work. The core tests 
in JPRT went fine.

The updated webrev is at 
http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02

-JB-

>
> /Staffan
>
>
>
>
>>
>> -JB-
>>>
>>> David
>>> -----
>>>
>>>> /Staffan
>>>>
>>>>
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>>
>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>> Jaroslav,
>>>>>>>>>>
>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Hello,
>>>>>>>>>>>
>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on
>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>> changes of the
>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The
>>>>>>>>>>> uptime
>>>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>>>> using a
>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>
>>>>>>>>>>> There is already the way to get the actual JVM uptime in ticks.
>>>>>>>>>>> It is
>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>> convertible to
>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>> making it
>>>>>>>>>>> very
>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>
>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a
>>>>>>>>>> monotonic clock source.
>>>>>>>>>
>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>
>>>>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses
>>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>> "gettimeofday"
>>>>>>>>> is not really a counter?
>>>>>>>>
>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>>> make me
>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>> because of
>>>>>>>> the way these are used in logging etc. Certainly not something to be
>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>
>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>>> can then
>>>>>>>> be used for the uptime.
>>>>>>>
>>>>>>> My attempt at this is at
>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>> possible.
>>>>>>>
>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> David
>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>
>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>>>>> uptime in
>>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>>>>
>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>>> bridging
>>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>>> constant
>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>>> order to
>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>
>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>> Webrev:  http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>
>


From jaroslav.bachorik at oracle.com  Mon Oct 14 08:21:52 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 14 Oct 2013 17:21:52 +0200
Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on
	slower machines
Message-ID: <525C0C10.7000104@oracle.com>

Please, review the following simple change.

The test times out on slower machines and I was able to reproduce the 
failure even on a normally fast machine using the fastdebug build. The 
timeout does not occur on every run - more like once in 10-15 runs.

There is nothing really wrong with the test - it just takes rather long 
time to obtain the jvmstat counters. The remedy is to specify a longer 
timeout and see if it is enough. I am using 10 minutes for the timeout 
in the patch.

Issue : https://bugs.openjdk.java.net/browse/JDK-6804470
Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00

Thanks,

-JB-

From Alan.Bateman at oracle.com  Mon Oct 14 11:11:25 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 14 Oct 2013 19:11:25 +0100
Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on
 slower machines
In-Reply-To: <525C0C10.7000104@oracle.com>
References: <525C0C10.7000104@oracle.com>
Message-ID: <525C33CD.4010505@oracle.com>

On 14/10/2013 16:21, Jaroslav Bachorik wrote:
> Please, review the following simple change.
>
> The test times out on slower machines and I was able to reproduce the 
> failure even on a normally fast machine using the fastdebug build. The 
> timeout does not occur on every run - more like once in 10-15 runs.
>
> There is nothing really wrong with the test - it just takes rather 
> long time to obtain the jvmstat counters. The remedy is to specify a 
> longer timeout and see if it is enough. I am using 10 minutes for the 
> timeout in the patch.
>
> Issue : https://bugs.openjdk.java.net/browse/JDK-6804470
> Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00
>
> Thanks,
>
> -JB-
This looks okay to me but if someone is testing a fastdebug build then 
they really need to specify the -timeoutFactor option to jtreg so as to 
scale the timeouts.

-Alan.

From david.holmes at oracle.com  Mon Oct 14 23:49:17 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 15 Oct 2013 16:49:17 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525BFC0D.8090101@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com>
Message-ID: <525CE56D.4000708@oracle.com>

Hi Jaroslav,

os_bsd.cpp / os_linux.cpp:

If you don't have a monotonic clock you leave timer_frequency set to 0! 
(So you need to test on a system without a monotonic clock, or else 
force it to act as-if not present.)

That aside I don't trust clock_getres to give values that actually allow 
the timer frequency to be determined. As per the comments in os_linux.cpp:

// It's fixed in newer kernels, however clock_getres() still returns
// 1/HZ. We check if clock_getres() works, but will ignore its reported
// resolution for now. Hopefully as people move to new kernels, this
// won't be a problem.

we don't know what kernels provide real values here and which provide 
dummy ones.

On BSD you haven't modified os::elapsed_counter.

Looking at the linux changes I don't think the logic is correct even if 
clock_getres is accurate. In the existing code we have:

elapsed_counter -> elapsed time in microseconds
elapsed_frequency -> 1000 * 1000 (ie micros per second)
elapsed_time -> elapsed_counter*0.000001 -> time in seconds

Now we have:

elapsed_counter -> elapsed time in nanoseconds
elapsed_frequency -> 1x10^9 / whatever clock_getres says
elapsed_time -> counter/frequency -> ???

So elapsed_time not, in general, going to give the elapsed time in 
seconds. And elapsed_time is not dependent on the "frequency" at all 
because elapsed_counter is not reporting ticks but an actual elapsed 
"time" in nanoseconds.


Also note that we constants for:

NANOSECS_PER_SEC
NANOSECS_PER_MILLISEC

to aid with time conversions.

The linux webrev contains unrelated UseLargePages changes!


David
-----


On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
> On 10.10.2013 13:15, Staffan Larsen wrote:
>>
>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>> <jaroslav.bachorik at oracle.com> wrote:
>>
>>> On 10.10.2013 05:44, David Holmes wrote:
>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>
>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>
>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>
>>>>>>> This discussion also implies that the "elapsed time" we print in the
>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>> os::elapsed_counter().
>>>>>>>
>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>>
>>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>>> Management::timestamp().
>>>>>>
>>>>>> On the first sight the change looks rather trivial. But, honestly,
>>>>>> I'm not sure which other parts could for whatever reason break once
>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>
>>>>>> Staffan, do you think this kind of change is suitable for the current
>>>>>> phase of JDK release cycle? I think I could improve the patch in few
>>>>>> days and then it should probably be able to pass the review before
>>>>>> ZBB. But, it's only P3  ...
>>>>>
>>>>> I think it is a bit late in the release cycle to clean this up in the
>>>>> way it should be cleaned up. I think we should wait until the first 8
>>>>> update release and do a more thorough job than we have time for right
>>>>> now.
>>>>
>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>> implementations are a tangled mess. But part of the problem has been
>>>> that people want/expect monotonic time-of-day based timestamps (yes a
>>>> contradiction - though some people make sure TOD does not get modified
>>>> on their production systems). The use of timestamps in logging has
>>>> to be
>>>> examined carefully - mainly GC logging. I recall a "simple" attempted
>>>> change in the past that resulted in trying to compare a nanoTime based
>>>> timestamp with the TOD. :(
>>>
>>> Actually, if I'm reading the sources right for Solaris and Win the
>>> monotonic clock source is used to provide elapsed_counter() value. It
>>> falls back to TOD when the monotonic clock source is not available.
>>> For Linux/BSD the TOD is used directly.
>>>
>>> This makes me wonder if changing the linux/bsd implementation to
>>> follow the same logic would be really that disruptive.
>>
>> Good point. I would like a world where elapsed_counter is monotonic
>> (where possible). Still a bit scary this late in the release, but an
>> interesting experiment.
>
> The change is rather simple and tests ok. All the means to get a
> monotonic timestamp are already there and proved to work. The core tests
> in JPRT went fine.
>
> The updated webrev is at
> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>
> -JB-
>
>>
>> /Staffan
>>
>>
>>
>>
>>>
>>> -JB-
>>>>
>>>> David
>>>> -----
>>>>
>>>>> /Staffan
>>>>>
>>>>>
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>
>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>
>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> Hello,
>>>>>>>>>>>>
>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>> based on
>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>> changes of the
>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The
>>>>>>>>>>>> uptime
>>>>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>>>>> using a
>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>
>>>>>>>>>>>> There is already the way to get the actual JVM uptime in ticks.
>>>>>>>>>>>> It is
>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>> convertible to
>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>> making it
>>>>>>>>>>>> very
>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>
>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is
>>>>>>>>>>> not a
>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>
>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>
>>>>>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos()
>>>>>>>>>> uses
>>>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>> "gettimeofday"
>>>>>>>>>> is not really a counter?
>>>>>>>>>
>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>>>> make me
>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>> because of
>>>>>>>>> the way these are used in logging etc. Certainly not something
>>>>>>>>> to be
>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>
>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>>>> can then
>>>>>>>>> be used for the uptime.
>>>>>>>>
>>>>>>>> My attempt at this is at
>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>>> possible.
>>>>>>>>
>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>> -----
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>
>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>>>>>> uptime in
>>>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>>>>>
>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>>>> bridging
>>>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>>>> constant
>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>>>> order to
>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>
>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>
>

From jaroslav.bachorik at oracle.com  Tue Oct 15 01:10:03 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 15 Oct 2013 10:10:03 +0200
Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on
 slower machines
In-Reply-To: <525C33CD.4010505@oracle.com>
References: <525C0C10.7000104@oracle.com> <525C33CD.4010505@oracle.com>
Message-ID: <525CF85B.7080301@oracle.com>

On 14.10.2013 20:11, Alan Bateman wrote:
> On 14/10/2013 16:21, Jaroslav Bachorik wrote:
>> Please, review the following simple change.
>>
>> The test times out on slower machines and I was able to reproduce the
>> failure even on a normally fast machine using the fastdebug build. The
>> timeout does not occur on every run - more like once in 10-15 runs.
>>
>> There is nothing really wrong with the test - it just takes rather
>> long time to obtain the jvmstat counters. The remedy is to specify a
>> longer timeout and see if it is enough. I am using 10 minutes for the
>> timeout in the patch.
>>
>> Issue : https://bugs.openjdk.java.net/browse/JDK-6804470
>> Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00
>>
>> Thanks,
>>
>> -JB-
> This looks okay to me but if someone is testing a fastdebug build then
> they really need to specify the -timeoutFactor option to jtreg so as to
> scale the timeouts.

Thanks for the review. I'm talking to QE about using the -timeoutFactor 
option in the automated test runs if possible.

-JB-

>
> -Alan.


From jaroslav.bachorik at oracle.com  Tue Oct 15 06:01:32 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 15 Oct 2013 15:01:32 +0200
Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on
 slower machines
In-Reply-To: <525C33CD.4010505@oracle.com>
References: <525C0C10.7000104@oracle.com> <525C33CD.4010505@oracle.com>
Message-ID: <525D3CAC.1030905@oracle.com>

On 14.10.2013 20:11, Alan Bateman wrote:
> On 14/10/2013 16:21, Jaroslav Bachorik wrote:
>> Please, review the following simple change.
>>
>> The test times out on slower machines and I was able to reproduce the
>> failure even on a normally fast machine using the fastdebug build. The
>> timeout does not occur on every run - more like once in 10-15 runs.
>>
>> There is nothing really wrong with the test - it just takes rather
>> long time to obtain the jvmstat counters. The remedy is to specify a
>> longer timeout and see if it is enough. I am using 10 minutes for the
>> timeout in the patch.
>>
>> Issue : https://bugs.openjdk.java.net/browse/JDK-6804470
>> Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00
>>
>> Thanks,
>>
>> -JB-
> This looks okay to me but if someone is testing a fastdebug build then
> they really need to specify the -timeoutFactor option to jtreg so as to
> scale the timeouts.

Just FYI - the SQE does use timeoutFactor 8x for the fastdebug runs. I 
hope this will be enough in combination with the extended timeout in the 
tests.

-JB-

>
> -Alan.


From shanliang.jiang at oracle.com  Wed Oct 16 06:58:31 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 16 Oct 2013 15:58:31 +0200
Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report some
 issue in com.sun.jmx.snmp package
In-Reply-To: <508E8F79.60909@oracle.com>
References: <508E8F79.60909@oracle.com>
Message-ID: <525E9B87.9050406@oracle.com>

Hi,

Please review the following fix, main issue here is that we should clone 
an internal variable before returning.

webrev:
http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/

bug
https://bugs.openjdk.java.net/browse/JDK-8026028

Thanks,
Shanliang


From jaroslav.bachorik at oracle.com  Wed Oct 16 07:18:39 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 16 Oct 2013 16:18:39 +0200
Subject: jmx-dev RFR 7197919:
 java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency
 issues
Message-ID: <525EA03F.2070106@oracle.com>

Please, review this simple test change.

The test tries to get the number of times a certain thread was blocked 
during the test run and intermittently fails with the difference of 1 - 
the expected number is 4 but the reported number is 3.

When updating the thread statistics (the blocked count in this case) no 
lock is used so there might be stale data when the ThreadMXBean 
retrieves the stats. The patch tries to workaround this problem by 
retrying a few times with the added delay. The test will try to obtain 
the correct result for at most 10 seconds - after that it will fail if 
the retrieved blocked count does not equal the expected blocked count.

Issue : https://bugs.openjdk.java.net/browse/JDK-7197919
Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00

Thanks,

-JB-

From jaroslav.bachorik at oracle.com  Wed Oct 16 07:44:47 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 16 Oct 2013 16:44:47 +0200
Subject: jmx-dev [PING] Re: RFR: 8024613
 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
 failing intermittently
In-Reply-To: <524BFB87.10808@oracle.com>
References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com>
	<524BFB87.10808@oracle.com>
Message-ID: <525EA65F.9040509@oracle.com>

On 2.10.2013 12:55, Jaroslav Bachorik wrote:
> On 20.9.2013 14:54, shanliang wrote:
>> Jaroslav,
>>
>> It is a good idea to use the RMI Testlibrary.
>>
>> Better to call:
>>         agent.close();
>>
>> at Line 55,  close the RMIRegistry (rmid.shutdown(rmidPort) Line 55)
>> does not ensure the JMX connector doing full clean, it is always better
>> to do clean within a test.
>
> Thanks. Implemented.
>
> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01
>
> -JB-
>
>>
>> Shanliang
>>
>>
>> Jaroslav Bachorik wrote:
>>> Please, review the following change for JDK-8024613
>>>
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613
>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/
>>> <http://cr.openjdk.java.net/%7Ejbachorik/8024613/webrev.00/>
>>>
>>> The patch takes care of intermittent test failures caused by timing
>>> issues when starting the RMID process. It could happen that the RMID
>>> process hasn't been properly initialized in the timeframe of 5 seconds
>>> and the test would fail.
>>>
>>> The patch replaces the home-brewed RMID process management with the
>>> one available in the RMI Testlibrary which is used by more tests and
>>> therefore should be more stable.
>>>
>>> Thanks,
>>>
>>> -JB-
>>
>


From daniel.fuchs at oracle.com  Wed Oct 16 07:49:42 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Wed, 16 Oct 2013 16:49:42 +0200
Subject: jmx-dev [PING] Re: RFR: 8024613
 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
 failing intermittently
In-Reply-To: <525EA65F.9040509@oracle.com>
References: <523C3F8B.6080002@oracle.com>
	<523C459A.3080303@oracle.com>	<524BFB87.10808@oracle.com>
	<525EA65F.9040509@oracle.com>
Message-ID: <525EA786.6020508@oracle.com>

Hi Jaroslav,

Looks fine to me (not a reviewer).

-- daniel

On 10/16/13 4:44 PM, Jaroslav Bachorik wrote:
> On 2.10.2013 12:55, Jaroslav Bachorik wrote:
>> On 20.9.2013 14:54, shanliang wrote:
>>> Jaroslav,
>>>
>>> It is a good idea to use the RMI Testlibrary.
>>>
>>> Better to call:
>>>         agent.close();
>>>
>>> at Line 55,  close the RMIRegistry (rmid.shutdown(rmidPort) Line 55)
>>> does not ensure the JMX connector doing full clean, it is always better
>>> to do clean within a test.
>>
>> Thanks. Implemented.
>>
>> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01
>>
>> -JB-
>>
>>>
>>> Shanliang
>>>
>>>
>>> Jaroslav Bachorik wrote:
>>>> Please, review the following change for JDK-8024613
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Ejbachorik/8024613/webrev.00/>
>>>>
>>>> The patch takes care of intermittent test failures caused by timing
>>>> issues when starting the RMID process. It could happen that the RMID
>>>> process hasn't been properly initialized in the timeframe of 5 seconds
>>>> and the test would fail.
>>>>
>>>> The patch replaces the home-brewed RMID process management with the
>>>> one available in the RMI Testlibrary which is used by more tests and
>>>> therefore should be more stable.
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>
>>
>


From shanliang.jiang at oracle.com  Wed Oct 16 07:50:13 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Wed, 16 Oct 2013 16:50:13 +0200
Subject: jmx-dev [PING] Re: RFR: 8024613
 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
 failing intermittently
In-Reply-To: <525EA65F.9040509@oracle.com>
References: <523C3F8B.6080002@oracle.com>
	<523C459A.3080303@oracle.com>	<524BFB87.10808@oracle.com>
	<525EA65F.9040509@oracle.com>
Message-ID: <525EA7A5.7080904@oracle.com>

Looks fine to me.

Shanliang

Jaroslav Bachorik wrote:
> On 2.10.2013 12:55, Jaroslav Bachorik wrote:
>> On 20.9.2013 14:54, shanliang wrote:
>>> Jaroslav,
>>>
>>> It is a good idea to use the RMI Testlibrary.
>>>
>>> Better to call:
>>>         agent.close();
>>>
>>> at Line 55,  close the RMIRegistry (rmid.shutdown(rmidPort) Line 55)
>>> does not ensure the JMX connector doing full clean, it is always better
>>> to do clean within a test.
>>
>> Thanks. Implemented.
>>
>> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01
>>
>> -JB-
>>
>>>
>>> Shanliang
>>>
>>>
>>> Jaroslav Bachorik wrote:
>>>> Please, review the following change for JDK-8024613
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Ejbachorik/8024613/webrev.00/>
>>>>
>>>> The patch takes care of intermittent test failures caused by timing
>>>> issues when starting the RMID process. It could happen that the RMID
>>>> process hasn't been properly initialized in the timeframe of 5 seconds
>>>> and the test would fail.
>>>>
>>>> The patch replaces the home-brewed RMID process management with the
>>>> one available in the RMI Testlibrary which is used by more tests and
>>>> therefore should be more stable.
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>
>>
>


From jaroslav.bachorik at oracle.com  Wed Oct 16 09:16:15 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 16 Oct 2013 18:16:15 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525CE56D.4000708@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
Message-ID: <525EBBCF.3020303@oracle.com>

On 15.10.2013 08:49, David Holmes wrote:
> Hi Jaroslav,
>
> os_bsd.cpp / os_linux.cpp:
>
> If you don't have a monotonic clock you leave timer_frequency set to 0!
> (So you need to test on a system without a monotonic clock, or else
> force it to act as-if not present.)
>
> That aside I don't trust clock_getres to give values that actually allow
> the timer frequency to be determined. As per the comments in os_linux.cpp:
>
> // It's fixed in newer kernels, however clock_getres() still returns
> // 1/HZ. We check if clock_getres() works, but will ignore its reported
> // resolution for now. Hopefully as people move to new kernels, this
> // won't be a problem.
>
> we don't know what kernels provide real values here and which provide
> dummy ones.
>
> On BSD you haven't modified os::elapsed_counter.
>
> Looking at the linux changes I don't think the logic is correct even if
> clock_getres is accurate. In the existing code we have:
>
> elapsed_counter -> elapsed time in microseconds
> elapsed_frequency -> 1000 * 1000 (ie micros per second)
> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>
> Now we have:
>
> elapsed_counter -> elapsed time in nanoseconds
> elapsed_frequency -> 1x10^9 / whatever clock_getres says
> elapsed_time -> counter/frequency -> ???
>
> So elapsed_time not, in general, going to give the elapsed time in
> seconds. And elapsed_time is not dependent on the "frequency" at all
> because elapsed_counter is not reporting ticks but an actual elapsed
> "time" in nanoseconds.
>
>
> Also note that we constants for:
>
> NANOSECS_PER_SEC
> NANOSECS_PER_MILLISEC
>
> to aid with time conversions.
>
> The linux webrev contains unrelated UseLargePages changes!

Sorry for the mess with UseLargePages changes :/

I've fixed the problems with the frequency (using a fixed number as 
before) and I kept the changes to minimum.

I was hesitating about changing the elapsed_counter precision from 
microseconds to nanoseconds but since solaris and windows versions 
already use nanosecond ticks for elapsed_counter I think the change is safe.

The update webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03

>
>
> David
> -----
>
>
> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>
>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>> <jaroslav.bachorik at oracle.com> wrote:
>>>
>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>
>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>
>>>>>>>> This discussion also implies that the "elapsed time" we print in
>>>>>>>> the
>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>> os::elapsed_counter().
>>>>>>>>
>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>>>
>>>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>>>> Management::timestamp().
>>>>>>>
>>>>>>> On the first sight the change looks rather trivial. But, honestly,
>>>>>>> I'm not sure which other parts could for whatever reason break once
>>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>
>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>> current
>>>>>>> phase of JDK release cycle? I think I could improve the patch in few
>>>>>>> days and then it should probably be able to pass the review before
>>>>>>> ZBB. But, it's only P3  ...
>>>>>>
>>>>>> I think it is a bit late in the release cycle to clean this up in the
>>>>>> way it should be cleaned up. I think we should wait until the first 8
>>>>>> update release and do a more thorough job than we have time for right
>>>>>> now.
>>>>>
>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>> implementations are a tangled mess. But part of the problem has been
>>>>> that people want/expect monotonic time-of-day based timestamps (yes a
>>>>> contradiction - though some people make sure TOD does not get modified
>>>>> on their production systems). The use of timestamps in logging has
>>>>> to be
>>>>> examined carefully - mainly GC logging. I recall a "simple" attempted
>>>>> change in the past that resulted in trying to compare a nanoTime based
>>>>> timestamp with the TOD. :(
>>>>
>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>> monotonic clock source is used to provide elapsed_counter() value. It
>>>> falls back to TOD when the monotonic clock source is not available.
>>>> For Linux/BSD the TOD is used directly.
>>>>
>>>> This makes me wonder if changing the linux/bsd implementation to
>>>> follow the same logic would be really that disruptive.
>>>
>>> Good point. I would like a world where elapsed_counter is monotonic
>>> (where possible). Still a bit scary this late in the release, but an
>>> interesting experiment.
>>
>> The change is rather simple and tests ok. All the means to get a
>> monotonic timestamp are already there and proved to work. The core tests
>> in JPRT went fine.
>>
>> The updated webrev is at
>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>
>> -JB-
>>
>>>
>>> /Staffan
>>>
>>>
>>>
>>>
>>>>
>>>> -JB-
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>>
>>>>>>>> /Staffan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>
>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>
>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>
>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>> based on
>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The
>>>>>>>>>>>>> uptime
>>>>>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>>>>>> using a
>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>
>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>> It is
>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>> making it
>>>>>>>>>>>>> very
>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>
>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is
>>>>>>>>>>>> not a
>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>
>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>
>>>>>>>>>>> Is there any reason why a non monotonic clock source is used for
>>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos()
>>>>>>>>>>> uses
>>>>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>> is not really a counter?
>>>>>>>>>>
>>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>>>>> make me
>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>> because of
>>>>>>>>>> the way these are used in logging etc. Certainly not something
>>>>>>>>>> to be
>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>
>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>>>>> can then
>>>>>>>>>> be used for the uptime.
>>>>>>>>>
>>>>>>>>> My attempt at this is at
>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>>>> possible.
>>>>>>>>>
>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>> -----
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the
>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp.
>>>>>>>>>>>>>
>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>>>>> bridging
>>>>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>>>>> constant
>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>>>>> order to
>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>


From david.holmes at oracle.com  Wed Oct 16 19:26:40 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 17 Oct 2013 12:26:40 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525EBBCF.3020303@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com>
Message-ID: <525F4AE0.1000406@oracle.com>

Hi Jaroslav,

Minor nit: os::elapsed_time should really be defined in terms of the 
other functions ie:

return ((double) os::elapsed_counter()) / os::elapsed_frequency();

I also prefer the cast above as it is very clear that we will be doing a 
floating-point division.

Aside: AFAICS os::elapsed_time() is never actually used ??

I agree that it appears that changing the frequency should be okay.

Thanks,
David

On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
> On 15.10.2013 08:49, David Holmes wrote:
>> Hi Jaroslav,
>>
>> os_bsd.cpp / os_linux.cpp:
>>
>> If you don't have a monotonic clock you leave timer_frequency set to 0!
>> (So you need to test on a system without a monotonic clock, or else
>> force it to act as-if not present.)
>>
>> That aside I don't trust clock_getres to give values that actually allow
>> the timer frequency to be determined. As per the comments in
>> os_linux.cpp:
>>
>> // It's fixed in newer kernels, however clock_getres() still returns
>> // 1/HZ. We check if clock_getres() works, but will ignore its reported
>> // resolution for now. Hopefully as people move to new kernels, this
>> // won't be a problem.
>>
>> we don't know what kernels provide real values here and which provide
>> dummy ones.
>>
>> On BSD you haven't modified os::elapsed_counter.
>>
>> Looking at the linux changes I don't think the logic is correct even if
>> clock_getres is accurate. In the existing code we have:
>>
>> elapsed_counter -> elapsed time in microseconds
>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>
>> Now we have:
>>
>> elapsed_counter -> elapsed time in nanoseconds
>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>> elapsed_time -> counter/frequency -> ???
>>
>> So elapsed_time not, in general, going to give the elapsed time in
>> seconds. And elapsed_time is not dependent on the "frequency" at all
>> because elapsed_counter is not reporting ticks but an actual elapsed
>> "time" in nanoseconds.
>>
>>
>> Also note that we constants for:
>>
>> NANOSECS_PER_SEC
>> NANOSECS_PER_MILLISEC
>>
>> to aid with time conversions.
>>
>> The linux webrev contains unrelated UseLargePages changes!
>
> Sorry for the mess with UseLargePages changes :/
>
> I've fixed the problems with the frequency (using a fixed number as
> before) and I kept the changes to minimum.
>
> I was hesitating about changing the elapsed_counter precision from
> microseconds to nanoseconds but since solaris and windows versions
> already use nanosecond ticks for elapsed_counter I think the change is
> safe.
>
> The update webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>
>>
>>
>> David
>> -----
>>
>>
>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>
>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>
>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>
>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>
>>>>>>>>> This discussion also implies that the "elapsed time" we print in
>>>>>>>>> the
>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>> os::elapsed_counter().
>>>>>>>>>
>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>>>>
>>>>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>>>>> Management::timestamp().
>>>>>>>>
>>>>>>>> On the first sight the change looks rather trivial. But, honestly,
>>>>>>>> I'm not sure which other parts could for whatever reason break once
>>>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>
>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>> current
>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>> few
>>>>>>>> days and then it should probably be able to pass the review before
>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>
>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>> the
>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>> first 8
>>>>>>> update release and do a more thorough job than we have time for
>>>>>>> right
>>>>>>> now.
>>>>>>
>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>> implementations are a tangled mess. But part of the problem has been
>>>>>> that people want/expect monotonic time-of-day based timestamps (yes a
>>>>>> contradiction - though some people make sure TOD does not get
>>>>>> modified
>>>>>> on their production systems). The use of timestamps in logging has
>>>>>> to be
>>>>>> examined carefully - mainly GC logging. I recall a "simple" attempted
>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>> based
>>>>>> timestamp with the TOD. :(
>>>>>
>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>> monotonic clock source is used to provide elapsed_counter() value. It
>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>> For Linux/BSD the TOD is used directly.
>>>>>
>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>> follow the same logic would be really that disruptive.
>>>>
>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>> (where possible). Still a bit scary this late in the release, but an
>>>> interesting experiment.
>>>
>>> The change is rather simple and tests ok. All the means to get a
>>> monotonic timestamp are already there and proved to work. The core tests
>>> in JPRT went fine.
>>>
>>> The updated webrev is at
>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>
>>> -JB-
>>>
>>>>
>>>> /Staffan
>>>>
>>>>
>>>>
>>>>
>>>>>
>>>>> -JB-
>>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>>
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>> The
>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>> should not depend on the system time and should be calculated
>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>> very
>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is
>>>>>>>>>>>>> not a
>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>
>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>
>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>> for
>>>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos()
>>>>>>>>>>>> uses
>>>>>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>
>>>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>>>>>> make me
>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>> because of
>>>>>>>>>>> the way these are used in logging etc. Certainly not something
>>>>>>>>>>> to be
>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>
>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>>>>>> can then
>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>
>>>>>>>>>> My attempt at this is at
>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>>>>> possible.
>>>>>>>>>>
>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

From daniel.fuchs at oracle.com  Thu Oct 17 02:53:47 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Thu, 17 Oct 2013 11:53:47 +0200
Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report
 some issue in com.sun.jmx.snmp package
In-Reply-To: <525E9B87.9050406@oracle.com>
References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com>
Message-ID: <525FB3AB.3040105@oracle.com>

Hi Shanliang,

Looks good!

-- daniel

On 10/16/13 3:58 PM, shanliang wrote:
> Hi,
>
> Please review the following fix, main issue here is that we should clone
> an internal variable before returning.
>
> webrev:
> http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/
>
> bug
> https://bugs.openjdk.java.net/browse/JDK-8026028
>
> Thanks,
> Shanliang
>
>
>
>


From jaroslav.bachorik at oracle.com  Thu Oct 17 03:10:39 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 17 Oct 2013 12:10:39 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525F4AE0.1000406@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com>
Message-ID: <525FB79F.7070101@oracle.com>

Hi David,

On 17.10.2013 04:26, David Holmes wrote:
> Hi Jaroslav,
>
> Minor nit: os::elapsed_time should really be defined in terms of the
> other functions ie:
>
> return ((double) os::elapsed_counter()) / os::elapsed_frequency();

Ok. I've changed it. It better communicates the way the elapsedTime is 
calculated anyway.

>
> I also prefer the cast above as it is very clear that we will be doing a
> floating-point division.
>
> Aside: AFAICS os::elapsed_time() is never actually used ??

Actually, it is os::elapsedTime() and this one is used quite a lot. The 
"elapsed_time()" form is used only in bytecodeHistogram.hpp, 
parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared.

>
> I agree that it appears that changing the frequency should be okay.

Thanks for the feedback.

Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03

-JB-

>
> Thanks,
> David
>
> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
>> On 15.10.2013 08:49, David Holmes wrote:
>>> Hi Jaroslav,
>>>
>>> os_bsd.cpp / os_linux.cpp:
>>>
>>> If you don't have a monotonic clock you leave timer_frequency set to 0!
>>> (So you need to test on a system without a monotonic clock, or else
>>> force it to act as-if not present.)
>>>
>>> That aside I don't trust clock_getres to give values that actually allow
>>> the timer frequency to be determined. As per the comments in
>>> os_linux.cpp:
>>>
>>> // It's fixed in newer kernels, however clock_getres() still returns
>>> // 1/HZ. We check if clock_getres() works, but will ignore its reported
>>> // resolution for now. Hopefully as people move to new kernels, this
>>> // won't be a problem.
>>>
>>> we don't know what kernels provide real values here and which provide
>>> dummy ones.
>>>
>>> On BSD you haven't modified os::elapsed_counter.
>>>
>>> Looking at the linux changes I don't think the logic is correct even if
>>> clock_getres is accurate. In the existing code we have:
>>>
>>> elapsed_counter -> elapsed time in microseconds
>>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>>
>>> Now we have:
>>>
>>> elapsed_counter -> elapsed time in nanoseconds
>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>>> elapsed_time -> counter/frequency -> ???
>>>
>>> So elapsed_time not, in general, going to give the elapsed time in
>>> seconds. And elapsed_time is not dependent on the "frequency" at all
>>> because elapsed_counter is not reporting ticks but an actual elapsed
>>> "time" in nanoseconds.
>>>
>>>
>>> Also note that we constants for:
>>>
>>> NANOSECS_PER_SEC
>>> NANOSECS_PER_MILLISEC
>>>
>>> to aid with time conversions.
>>>
>>> The linux webrev contains unrelated UseLargePages changes!
>>
>> Sorry for the mess with UseLargePages changes :/
>>
>> I've fixed the problems with the frequency (using a fixed number as
>> before) and I kept the changes to minimum.
>>
>> I was hesitating about changing the elapsed_counter precision from
>> microseconds to nanoseconds but since solaris and windows versions
>> already use nanosecond ticks for elapsed_counter I think the change is
>> safe.
>>
>> The update webrev:
>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>>
>>>
>>>
>>> David
>>> -----
>>>
>>>
>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>>
>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>
>>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>>
>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>
>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>>
>>>>>>>>>> This discussion also implies that the "elapsed time" we print in
>>>>>>>>>> the
>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>
>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>>>>>
>>>>>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>>>>>> Management::timestamp().
>>>>>>>>>
>>>>>>>>> On the first sight the change looks rather trivial. But, honestly,
>>>>>>>>> I'm not sure which other parts could for whatever reason break
>>>>>>>>> once
>>>>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent.
>>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>>
>>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>>> current
>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>>> few
>>>>>>>>> days and then it should probably be able to pass the review before
>>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>>
>>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>>> the
>>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>>> first 8
>>>>>>>> update release and do a more thorough job than we have time for
>>>>>>>> right
>>>>>>>> now.
>>>>>>>
>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>>> implementations are a tangled mess. But part of the problem has been
>>>>>>> that people want/expect monotonic time-of-day based timestamps
>>>>>>> (yes a
>>>>>>> contradiction - though some people make sure TOD does not get
>>>>>>> modified
>>>>>>> on their production systems). The use of timestamps in logging has
>>>>>>> to be
>>>>>>> examined carefully - mainly GC logging. I recall a "simple"
>>>>>>> attempted
>>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>>> based
>>>>>>> timestamp with the TOD. :(
>>>>>>
>>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>>> monotonic clock source is used to provide elapsed_counter() value. It
>>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>>> For Linux/BSD the TOD is used directly.
>>>>>>
>>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>>> follow the same logic would be really that disruptive.
>>>>>
>>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>>> (where possible). Still a bit scary this late in the release, but an
>>>>> interesting experiment.
>>>>
>>>> The change is rather simple and tests ok. All the means to get a
>>>> monotonic timestamp are already there and proved to work. The core
>>>> tests
>>>> in JPRT went fine.
>>>>
>>>> The updated webrev is at
>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> /Staffan
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> -JB-
>>>>>>>
>>>>>>> David
>>>>>>> -----
>>>>>>>
>>>>>>>> /Staffan
>>>>>>>>
>>>>>>>>
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> /Staffan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>>> should not depend on the system time and should be
>>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is
>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>>> for
>>>>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos()
>>>>>>>>>>>>> uses
>>>>>>>>>>>>> montonic clock when available - why can't be the same used for
>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>>
>>>>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers
>>>>>>>>>>>> make me
>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>>> because of
>>>>>>>>>>>> the way these are used in logging etc. Certainly not something
>>>>>>>>>>>> to be
>>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>>
>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that
>>>>>>>>>>>> can then
>>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>>
>>>>>>>>>>> My attempt at this is at
>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>>>>>> possible.
>>>>>>>>>>>
>>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> David
>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI
>>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the same
>>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in
>>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>


From david.holmes at oracle.com  Thu Oct 17 04:07:36 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 17 Oct 2013 21:07:36 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525FB79F.7070101@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com>
	<525FB79F.7070101@oracle.com>
Message-ID: <525FC4F8.1020004@oracle.com>

On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote:
> Hi David,
>
> On 17.10.2013 04:26, David Holmes wrote:
>> Hi Jaroslav,
>>
>> Minor nit: os::elapsed_time should really be defined in terms of the
>> other functions ie:
>>
>> return ((double) os::elapsed_counter()) / os::elapsed_frequency();
>
> Ok. I've changed it. It better communicates the way the elapsedTime is
> calculated anyway.
>
>>
>> I also prefer the cast above as it is very clear that we will be doing a
>> floating-point division.
>>
>> Aside: AFAICS os::elapsed_time() is never actually used ??
>
> Actually, it is os::elapsedTime() and this one is used quite a lot. The
> "elapsed_time()" form is used only in bytecodeHistogram.hpp,
> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared.

AFAICS they all define their own elapsed_time() functions they don't use 
os::elapsed_time().

>>
>> I agree that it appears that changing the frequency should be okay.
>
> Thanks for the feedback.
>
> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03

That should be .04 version :)

Looks okay.

Thanks,
David

> -JB-
>
>>
>> Thanks,
>> David
>>
>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
>>> On 15.10.2013 08:49, David Holmes wrote:
>>>> Hi Jaroslav,
>>>>
>>>> os_bsd.cpp / os_linux.cpp:
>>>>
>>>> If you don't have a monotonic clock you leave timer_frequency set to 0!
>>>> (So you need to test on a system without a monotonic clock, or else
>>>> force it to act as-if not present.)
>>>>
>>>> That aside I don't trust clock_getres to give values that actually
>>>> allow
>>>> the timer frequency to be determined. As per the comments in
>>>> os_linux.cpp:
>>>>
>>>> // It's fixed in newer kernels, however clock_getres() still returns
>>>> // 1/HZ. We check if clock_getres() works, but will ignore its reported
>>>> // resolution for now. Hopefully as people move to new kernels, this
>>>> // won't be a problem.
>>>>
>>>> we don't know what kernels provide real values here and which provide
>>>> dummy ones.
>>>>
>>>> On BSD you haven't modified os::elapsed_counter.
>>>>
>>>> Looking at the linux changes I don't think the logic is correct even if
>>>> clock_getres is accurate. In the existing code we have:
>>>>
>>>> elapsed_counter -> elapsed time in microseconds
>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>>>
>>>> Now we have:
>>>>
>>>> elapsed_counter -> elapsed time in nanoseconds
>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>>>> elapsed_time -> counter/frequency -> ???
>>>>
>>>> So elapsed_time not, in general, going to give the elapsed time in
>>>> seconds. And elapsed_time is not dependent on the "frequency" at all
>>>> because elapsed_counter is not reporting ticks but an actual elapsed
>>>> "time" in nanoseconds.
>>>>
>>>>
>>>> Also note that we constants for:
>>>>
>>>> NANOSECS_PER_SEC
>>>> NANOSECS_PER_MILLISEC
>>>>
>>>> to aid with time conversions.
>>>>
>>>> The linux webrev contains unrelated UseLargePages changes!
>>>
>>> Sorry for the mess with UseLargePages changes :/
>>>
>>> I've fixed the problems with the frequency (using a fixed number as
>>> before) and I kept the changes to minimum.
>>>
>>> I was hesitating about changing the elapsed_counter precision from
>>> microseconds to nanoseconds but since solaris and windows versions
>>> already use nanosecond ticks for elapsed_counter I think the change is
>>> safe.
>>>
>>> The update webrev:
>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>>>
>>>>
>>>>
>>>> David
>>>> -----
>>>>
>>>>
>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>>>
>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>>>
>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>>>
>>>>>>>>>>> This discussion also implies that the "elapsed time" we print in
>>>>>>>>>>> the
>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>
>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter().
>>>>>>>>>>
>>>>>>>>>> Also the reported GC pauses duration might be wrong since it uses
>>>>>>>>>> Management::timestamp().
>>>>>>>>>>
>>>>>>>>>> On the first sight the change looks rather trivial. But,
>>>>>>>>>> honestly,
>>>>>>>>>> I'm not sure which other parts could for whatever reason break
>>>>>>>>>> once
>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic
>>>>>>>>>> equivalent.
>>>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>>>
>>>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>>>> current
>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>>>> few
>>>>>>>>>> days and then it should probably be able to pass the review
>>>>>>>>>> before
>>>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>>>
>>>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>>>> the
>>>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>>>> first 8
>>>>>>>>> update release and do a more thorough job than we have time for
>>>>>>>>> right
>>>>>>>>> now.
>>>>>>>>
>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>>>> implementations are a tangled mess. But part of the problem has
>>>>>>>> been
>>>>>>>> that people want/expect monotonic time-of-day based timestamps
>>>>>>>> (yes a
>>>>>>>> contradiction - though some people make sure TOD does not get
>>>>>>>> modified
>>>>>>>> on their production systems). The use of timestamps in logging has
>>>>>>>> to be
>>>>>>>> examined carefully - mainly GC logging. I recall a "simple"
>>>>>>>> attempted
>>>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>>>> based
>>>>>>>> timestamp with the TOD. :(
>>>>>>>
>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>>>> monotonic clock source is used to provide elapsed_counter()
>>>>>>> value. It
>>>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>>>> For Linux/BSD the TOD is used directly.
>>>>>>>
>>>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>>>> follow the same logic would be really that disruptive.
>>>>>>
>>>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>>>> (where possible). Still a bit scary this late in the release, but an
>>>>>> interesting experiment.
>>>>>
>>>>> The change is rather simple and tests ok. All the means to get a
>>>>> monotonic timestamp are already there and proved to work. The core
>>>>> tests
>>>>> in JPRT went fine.
>>>>>
>>>>> The updated webrev is at
>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> -JB-
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /Staffan
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>>>> should not depend on the system time and should be
>>>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is
>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> timestamping except of the historical one?
>>>>>>>>>>>>>> os::javaTimeNanos()
>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>> montonic clock when available - why can't be the same used
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>>>
>>>>>>>>>>>>> It is all historical. These elapsed_counters and
>>>>>>>>>>>>> elapsed_timers
>>>>>>>>>>>>> make me
>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>>>> because of
>>>>>>>>>>>>> the way these are used in logging etc. Certainly not something
>>>>>>>>>>>>> to be
>>>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos
>>>>>>>>>>>>> that
>>>>>>>>>>>>> can then
>>>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>>>
>>>>>>>>>>>> My attempt at this is at
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where
>>>>>>>>>>>> possible.
>>>>>>>>>>>>
>>>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary
>>>>>>>>>>>>>>>> JNI
>>>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the
>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers
>>>>>>>>>>>>>>>> files in
>>>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>

From david.holmes at oracle.com  Thu Oct 17 04:13:49 2013
From: david.holmes at oracle.com (David Holmes)
Date: Thu, 17 Oct 2013 21:13:49 +1000
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525FC4F8.1020004@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com>
	<525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com>
Message-ID: <525FC66D.9040602@oracle.com>

On 17/10/2013 9:07 PM, David Holmes wrote:
> On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote:
>> Hi David,
>>
>> On 17.10.2013 04:26, David Holmes wrote:
>>> Hi Jaroslav,
>>>
>>> Minor nit: os::elapsed_time should really be defined in terms of the
>>> other functions ie:
>>>
>>> return ((double) os::elapsed_counter()) / os::elapsed_frequency();
>>
>> Ok. I've changed it. It better communicates the way the elapsedTime is
>> calculated anyway.
>>
>>>
>>> I also prefer the cast above as it is very clear that we will be doing a
>>> floating-point division.
>>>
>>> Aside: AFAICS os::elapsed_time() is never actually used ??
>>
>> Actually, it is os::elapsedTime() and this one is used quite a lot. The
>> "elapsed_time()" form is used only in bytecodeHistogram.hpp,
>> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared.
>
> AFAICS they all define their own elapsed_time() functions they don't use
> os::elapsed_time().

Ooops! I mis-grepped. It is os::elapsedTime not os::elapsed_time <sigh> 
Nothing like inconsistent naming :(

David
-----


>>>
>>> I agree that it appears that changing the frequency should be okay.
>>
>> Thanks for the feedback.
>>
>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>
> That should be .04 version :)
>
> Looks okay.
>
> Thanks,
> David
>
>> -JB-
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
>>>> On 15.10.2013 08:49, David Holmes wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> os_bsd.cpp / os_linux.cpp:
>>>>>
>>>>> If you don't have a monotonic clock you leave timer_frequency set
>>>>> to 0!
>>>>> (So you need to test on a system without a monotonic clock, or else
>>>>> force it to act as-if not present.)
>>>>>
>>>>> That aside I don't trust clock_getres to give values that actually
>>>>> allow
>>>>> the timer frequency to be determined. As per the comments in
>>>>> os_linux.cpp:
>>>>>
>>>>> // It's fixed in newer kernels, however clock_getres() still returns
>>>>> // 1/HZ. We check if clock_getres() works, but will ignore its
>>>>> reported
>>>>> // resolution for now. Hopefully as people move to new kernels, this
>>>>> // won't be a problem.
>>>>>
>>>>> we don't know what kernels provide real values here and which provide
>>>>> dummy ones.
>>>>>
>>>>> On BSD you haven't modified os::elapsed_counter.
>>>>>
>>>>> Looking at the linux changes I don't think the logic is correct
>>>>> even if
>>>>> clock_getres is accurate. In the existing code we have:
>>>>>
>>>>> elapsed_counter -> elapsed time in microseconds
>>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>>>>
>>>>> Now we have:
>>>>>
>>>>> elapsed_counter -> elapsed time in nanoseconds
>>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>>>>> elapsed_time -> counter/frequency -> ???
>>>>>
>>>>> So elapsed_time not, in general, going to give the elapsed time in
>>>>> seconds. And elapsed_time is not dependent on the "frequency" at all
>>>>> because elapsed_counter is not reporting ticks but an actual elapsed
>>>>> "time" in nanoseconds.
>>>>>
>>>>>
>>>>> Also note that we constants for:
>>>>>
>>>>> NANOSECS_PER_SEC
>>>>> NANOSECS_PER_MILLISEC
>>>>>
>>>>> to aid with time conversions.
>>>>>
>>>>> The linux webrev contains unrelated UseLargePages changes!
>>>>
>>>> Sorry for the mess with UseLargePages changes :/
>>>>
>>>> I've fixed the problems with the frequency (using a fixed number as
>>>> before) and I kept the changes to minimum.
>>>>
>>>> I was hesitating about changing the elapsed_counter precision from
>>>> microseconds to nanoseconds but since solaris and windows versions
>>>> already use nanosecond ticks for elapsed_counter I think the change is
>>>> safe.
>>>>
>>>> The update webrev:
>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>>>>
>>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>>>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>>>>
>>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>>>>
>>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>>>>
>>>>>>>>>>>> This discussion also implies that the "elapsed time" we
>>>>>>>>>>>> print in
>>>>>>>>>>>> the
>>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>>
>>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>>>>> (class VMUptimeDCmd) which also relies on
>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>
>>>>>>>>>>> Also the reported GC pauses duration might be wrong since it
>>>>>>>>>>> uses
>>>>>>>>>>> Management::timestamp().
>>>>>>>>>>>
>>>>>>>>>>> On the first sight the change looks rather trivial. But,
>>>>>>>>>>> honestly,
>>>>>>>>>>> I'm not sure which other parts could for whatever reason break
>>>>>>>>>>> once
>>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic
>>>>>>>>>>> equivalent.
>>>>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>>>>
>>>>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>>>>> current
>>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>>>>> few
>>>>>>>>>>> days and then it should probably be able to pass the review
>>>>>>>>>>> before
>>>>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>>>>
>>>>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>>>>> the
>>>>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>>>>> first 8
>>>>>>>>>> update release and do a more thorough job than we have time for
>>>>>>>>>> right
>>>>>>>>>> now.
>>>>>>>>>
>>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>>>>> implementations are a tangled mess. But part of the problem has
>>>>>>>>> been
>>>>>>>>> that people want/expect monotonic time-of-day based timestamps
>>>>>>>>> (yes a
>>>>>>>>> contradiction - though some people make sure TOD does not get
>>>>>>>>> modified
>>>>>>>>> on their production systems). The use of timestamps in logging has
>>>>>>>>> to be
>>>>>>>>> examined carefully - mainly GC logging. I recall a "simple"
>>>>>>>>> attempted
>>>>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>>>>> based
>>>>>>>>> timestamp with the TOD. :(
>>>>>>>>
>>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>>>>> monotonic clock source is used to provide elapsed_counter()
>>>>>>>> value. It
>>>>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>>>>> For Linux/BSD the TOD is used directly.
>>>>>>>>
>>>>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>>>>> follow the same logic would be really that disruptive.
>>>>>>>
>>>>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>>>>> (where possible). Still a bit scary this late in the release, but an
>>>>>>> interesting experiment.
>>>>>>
>>>>>> The change is rather simple and tests ok. All the means to get a
>>>>>> monotonic timestamp are already there and proved to work. The core
>>>>>> tests
>>>>>> in JPRT went fine.
>>>>>>
>>>>>> The updated webrev is at
>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> /Staffan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> /Staffan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>>>>> should not depend on the system time and should be
>>>>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday
>>>>>>>>>>>>>>>> so is
>>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> timestamping except of the historical one?
>>>>>>>>>>>>>>> os::javaTimeNanos()
>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>> montonic clock when available - why can't be the same used
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is all historical. These elapsed_counters and
>>>>>>>>>>>>>> elapsed_timers
>>>>>>>>>>>>>> make me
>>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>>>>> because of
>>>>>>>>>>>>>> the way these are used in logging etc. Certainly not
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> can then
>>>>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My attempt at this is at
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks
>>>>>>>>>>>>> where
>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary
>>>>>>>>>>>>>>>>> JNI
>>>>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the
>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers
>>>>>>>>>>>>>>>>> files in
>>>>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>

From jaroslav.bachorik at oracle.com  Thu Oct 17 05:09:40 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 17 Oct 2013 14:09:40 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
 values
In-Reply-To: <525FC4F8.1020004@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com>
	<525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com>
Message-ID: <525FD384.4010904@oracle.com>

On 17.10.2013 13:07, David Holmes wrote:
> On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote:
>> Hi David,
>>
>> On 17.10.2013 04:26, David Holmes wrote:
>>> Hi Jaroslav,
>>>
>>> Minor nit: os::elapsed_time should really be defined in terms of the
>>> other functions ie:
>>>
>>> return ((double) os::elapsed_counter()) / os::elapsed_frequency();
>>
>> Ok. I've changed it. It better communicates the way the elapsedTime is
>> calculated anyway.
>>
>>>
>>> I also prefer the cast above as it is very clear that we will be doing a
>>> floating-point division.
>>>
>>> Aside: AFAICS os::elapsed_time() is never actually used ??
>>
>> Actually, it is os::elapsedTime() and this one is used quite a lot. The
>> "elapsed_time()" form is used only in bytecodeHistogram.hpp,
>> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared.
>
> AFAICS they all define their own elapsed_time() functions they don't use
> os::elapsed_time().
>
>>>
>>> I agree that it appears that changing the frequency should be okay.
>>
>> Thanks for the feedback.
>>
>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>
> That should be .04 version :)

Yep :( copy-paste ...
http://cr.openjdk.java.net/~jbachorik/6523160/webrev.04

>
> Looks okay.
>
> Thanks,
> David
>
>> -JB-
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
>>>> On 15.10.2013 08:49, David Holmes wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> os_bsd.cpp / os_linux.cpp:
>>>>>
>>>>> If you don't have a monotonic clock you leave timer_frequency set
>>>>> to 0!
>>>>> (So you need to test on a system without a monotonic clock, or else
>>>>> force it to act as-if not present.)
>>>>>
>>>>> That aside I don't trust clock_getres to give values that actually
>>>>> allow
>>>>> the timer frequency to be determined. As per the comments in
>>>>> os_linux.cpp:
>>>>>
>>>>> // It's fixed in newer kernels, however clock_getres() still returns
>>>>> // 1/HZ. We check if clock_getres() works, but will ignore its
>>>>> reported
>>>>> // resolution for now. Hopefully as people move to new kernels, this
>>>>> // won't be a problem.
>>>>>
>>>>> we don't know what kernels provide real values here and which provide
>>>>> dummy ones.
>>>>>
>>>>> On BSD you haven't modified os::elapsed_counter.
>>>>>
>>>>> Looking at the linux changes I don't think the logic is correct
>>>>> even if
>>>>> clock_getres is accurate. In the existing code we have:
>>>>>
>>>>> elapsed_counter -> elapsed time in microseconds
>>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>>>>
>>>>> Now we have:
>>>>>
>>>>> elapsed_counter -> elapsed time in nanoseconds
>>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>>>>> elapsed_time -> counter/frequency -> ???
>>>>>
>>>>> So elapsed_time not, in general, going to give the elapsed time in
>>>>> seconds. And elapsed_time is not dependent on the "frequency" at all
>>>>> because elapsed_counter is not reporting ticks but an actual elapsed
>>>>> "time" in nanoseconds.
>>>>>
>>>>>
>>>>> Also note that we constants for:
>>>>>
>>>>> NANOSECS_PER_SEC
>>>>> NANOSECS_PER_MILLISEC
>>>>>
>>>>> to aid with time conversions.
>>>>>
>>>>> The linux webrev contains unrelated UseLargePages changes!
>>>>
>>>> Sorry for the mess with UseLargePages changes :/
>>>>
>>>> I've fixed the problems with the frequency (using a fixed number as
>>>> before) and I kept the changes to minimum.
>>>>
>>>> I was hesitating about changing the elapsed_counter precision from
>>>> microseconds to nanoseconds but since solaris and windows versions
>>>> already use nanosecond ticks for elapsed_counter I think the change is
>>>> safe.
>>>>
>>>> The update webrev:
>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>>>>
>>>>>
>>>>>
>>>>> David
>>>>> -----
>>>>>
>>>>>
>>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>>>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>>>>
>>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>>>>
>>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>>>>
>>>>>>>>>>>> This discussion also implies that the "elapsed time" we
>>>>>>>>>>>> print in
>>>>>>>>>>>> the
>>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>>
>>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>>>>> (class VMUptimeDCmd) which also relies on
>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>
>>>>>>>>>>> Also the reported GC pauses duration might be wrong since it
>>>>>>>>>>> uses
>>>>>>>>>>> Management::timestamp().
>>>>>>>>>>>
>>>>>>>>>>> On the first sight the change looks rather trivial. But,
>>>>>>>>>>> honestly,
>>>>>>>>>>> I'm not sure which other parts could for whatever reason break
>>>>>>>>>>> once
>>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic
>>>>>>>>>>> equivalent.
>>>>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>>>>
>>>>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>>>>> current
>>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>>>>> few
>>>>>>>>>>> days and then it should probably be able to pass the review
>>>>>>>>>>> before
>>>>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>>>>
>>>>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>>>>> the
>>>>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>>>>> first 8
>>>>>>>>>> update release and do a more thorough job than we have time for
>>>>>>>>>> right
>>>>>>>>>> now.
>>>>>>>>>
>>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>>>>> implementations are a tangled mess. But part of the problem has
>>>>>>>>> been
>>>>>>>>> that people want/expect monotonic time-of-day based timestamps
>>>>>>>>> (yes a
>>>>>>>>> contradiction - though some people make sure TOD does not get
>>>>>>>>> modified
>>>>>>>>> on their production systems). The use of timestamps in logging has
>>>>>>>>> to be
>>>>>>>>> examined carefully - mainly GC logging. I recall a "simple"
>>>>>>>>> attempted
>>>>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>>>>> based
>>>>>>>>> timestamp with the TOD. :(
>>>>>>>>
>>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>>>>> monotonic clock source is used to provide elapsed_counter()
>>>>>>>> value. It
>>>>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>>>>> For Linux/BSD the TOD is used directly.
>>>>>>>>
>>>>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>>>>> follow the same logic would be really that disruptive.
>>>>>>>
>>>>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>>>>> (where possible). Still a bit scary this late in the release, but an
>>>>>>> interesting experiment.
>>>>>>
>>>>>> The change is rather simple and tests ok. All the means to get a
>>>>>> monotonic timestamp are already there and proved to work. The core
>>>>>> tests
>>>>>> in JPRT went fine.
>>>>>>
>>>>>> The updated webrev is at
>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>> David
>>>>>>>>> -----
>>>>>>>>>
>>>>>>>>>> /Staffan
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> /Staffan
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>>>>> should not depend on the system time and should be
>>>>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday
>>>>>>>>>>>>>>>> so is
>>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> timestamping except of the historical one?
>>>>>>>>>>>>>>> os::javaTimeNanos()
>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>> montonic clock when available - why can't be the same used
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is all historical. These elapsed_counters and
>>>>>>>>>>>>>> elapsed_timers
>>>>>>>>>>>>>> make me
>>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>>>>> because of
>>>>>>>>>>>>>> the way these are used in logging etc. Certainly not
>>>>>>>>>>>>>> something
>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> can then
>>>>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>>>>
>>>>>>>>>>>>> My attempt at this is at
>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks
>>>>>>>>>>>>> where
>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>
>>>>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>>>>
>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary
>>>>>>>>>>>>>>>>> JNI
>>>>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the
>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers
>>>>>>>>>>>>>>>>> files in
>>>>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>


From staffan.larsen at oracle.com  Fri Oct 18 04:02:29 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Fri, 18 Oct 2013 13:02:29 +0200
Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative
	values
In-Reply-To: <525FD384.4010904@oracle.com>
References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com>
	<5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com>
	<52553D63.5000508@oracle.com>
	<8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com>
	<52556604.3080900@oracle.com>
	<5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com>
	<525622B4.5020606@oracle.com> <52568940.4000704@oracle.com>
	<23435103-156B-434F-994C-B6F913EE0364@oracle.com>
	<525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com>
	<525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com>
	<525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com>
	<525FD384.4010904@oracle.com>
Message-ID: <B5A388E3-EF9A-4723-82F0-8550CE22290B@oracle.com>

Looks good to me.

Thanks,
/Staffan

On 17 okt 2013, at 14:09, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 17.10.2013 13:07, David Holmes wrote:
>> On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote:
>>> Hi David,
>>> 
>>> On 17.10.2013 04:26, David Holmes wrote:
>>>> Hi Jaroslav,
>>>> 
>>>> Minor nit: os::elapsed_time should really be defined in terms of the
>>>> other functions ie:
>>>> 
>>>> return ((double) os::elapsed_counter()) / os::elapsed_frequency();
>>> 
>>> Ok. I've changed it. It better communicates the way the elapsedTime is
>>> calculated anyway.
>>> 
>>>> 
>>>> I also prefer the cast above as it is very clear that we will be doing a
>>>> floating-point division.
>>>> 
>>>> Aside: AFAICS os::elapsed_time() is never actually used ??
>>> 
>>> Actually, it is os::elapsedTime() and this one is used quite a lot. The
>>> "elapsed_time()" form is used only in bytecodeHistogram.hpp,
>>> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared.
>> 
>> AFAICS they all define their own elapsed_time() functions they don't use
>> os::elapsed_time().
>> 
>>>> 
>>>> I agree that it appears that changing the frequency should be okay.
>>> 
>>> Thanks for the feedback.
>>> 
>>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>> 
>> That should be .04 version :)
> 
> Yep :( copy-paste ...
> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.04
> 
>> 
>> Looks okay.
>> 
>> Thanks,
>> David
>> 
>>> -JB-
>>> 
>>>> 
>>>> Thanks,
>>>> David
>>>> 
>>>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote:
>>>>> On 15.10.2013 08:49, David Holmes wrote:
>>>>>> Hi Jaroslav,
>>>>>> 
>>>>>> os_bsd.cpp / os_linux.cpp:
>>>>>> 
>>>>>> If you don't have a monotonic clock you leave timer_frequency set
>>>>>> to 0!
>>>>>> (So you need to test on a system without a monotonic clock, or else
>>>>>> force it to act as-if not present.)
>>>>>> 
>>>>>> That aside I don't trust clock_getres to give values that actually
>>>>>> allow
>>>>>> the timer frequency to be determined. As per the comments in
>>>>>> os_linux.cpp:
>>>>>> 
>>>>>> // It's fixed in newer kernels, however clock_getres() still returns
>>>>>> // 1/HZ. We check if clock_getres() works, but will ignore its
>>>>>> reported
>>>>>> // resolution for now. Hopefully as people move to new kernels, this
>>>>>> // won't be a problem.
>>>>>> 
>>>>>> we don't know what kernels provide real values here and which provide
>>>>>> dummy ones.
>>>>>> 
>>>>>> On BSD you haven't modified os::elapsed_counter.
>>>>>> 
>>>>>> Looking at the linux changes I don't think the logic is correct
>>>>>> even if
>>>>>> clock_getres is accurate. In the existing code we have:
>>>>>> 
>>>>>> elapsed_counter -> elapsed time in microseconds
>>>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second)
>>>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds
>>>>>> 
>>>>>> Now we have:
>>>>>> 
>>>>>> elapsed_counter -> elapsed time in nanoseconds
>>>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says
>>>>>> elapsed_time -> counter/frequency -> ???
>>>>>> 
>>>>>> So elapsed_time not, in general, going to give the elapsed time in
>>>>>> seconds. And elapsed_time is not dependent on the "frequency" at all
>>>>>> because elapsed_counter is not reporting ticks but an actual elapsed
>>>>>> "time" in nanoseconds.
>>>>>> 
>>>>>> 
>>>>>> Also note that we constants for:
>>>>>> 
>>>>>> NANOSECS_PER_SEC
>>>>>> NANOSECS_PER_MILLISEC
>>>>>> 
>>>>>> to aid with time conversions.
>>>>>> 
>>>>>> The linux webrev contains unrelated UseLargePages changes!
>>>>> 
>>>>> Sorry for the mess with UseLargePages changes :/
>>>>> 
>>>>> I've fixed the problems with the frequency (using a fixed number as
>>>>> before) and I kept the changes to minimum.
>>>>> 
>>>>> I was hesitating about changing the elapsed_counter precision from
>>>>> microseconds to nanoseconds but since solaris and windows versions
>>>>> already use nanosecond ticks for elapsed_counter I think the change is
>>>>> safe.
>>>>> 
>>>>> The update webrev:
>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> David
>>>>>> -----
>>>>>> 
>>>>>> 
>>>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote:
>>>>>>> On 10.10.2013 13:15, Staffan Larsen wrote:
>>>>>>>> 
>>>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik
>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>> 
>>>>>>>>> On 10.10.2013 05:44, David Holmes wrote:
>>>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote:
>>>>>>>>>>> 
>>>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik
>>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote:
>>>>>>>>>>>>> There is now an awful amount of different timestamps in the
>>>>>>>>>>>>> Management class. Can they be consolidated somehow? At least
>>>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This discussion also implies that the "elapsed time" we
>>>>>>>>>>>>> print in
>>>>>>>>>>>>> the
>>>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses
>>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>>> 
>>>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command
>>>>>>>>>>>>> (class VMUptimeDCmd) which also relies on
>>>>>>>>>>>>> os::elapsed_counter().
>>>>>>>>>>>> 
>>>>>>>>>>>> Also the reported GC pauses duration might be wrong since it
>>>>>>>>>>>> uses
>>>>>>>>>>>> Management::timestamp().
>>>>>>>>>>>> 
>>>>>>>>>>>> On the first sight the change looks rather trivial. But,
>>>>>>>>>>>> honestly,
>>>>>>>>>>>> I'm not sure which other parts could for whatever reason break
>>>>>>>>>>>> once
>>>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic
>>>>>>>>>>>> equivalent.
>>>>>>>>>>>> One would think that it shouldn't matter but one never knows ...
>>>>>>>>>>>> 
>>>>>>>>>>>> Staffan, do you think this kind of change is suitable for the
>>>>>>>>>>>> current
>>>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in
>>>>>>>>>>>> few
>>>>>>>>>>>> days and then it should probably be able to pass the review
>>>>>>>>>>>> before
>>>>>>>>>>>> ZBB. But, it's only P3  ...
>>>>>>>>>>> 
>>>>>>>>>>> I think it is a bit late in the release cycle to clean this up in
>>>>>>>>>>> the
>>>>>>>>>>> way it should be cleaned up. I think we should wait until the
>>>>>>>>>>> first 8
>>>>>>>>>>> update release and do a more thorough job than we have time for
>>>>>>>>>>> right
>>>>>>>>>>> now.
>>>>>>>>>> 
>>>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and
>>>>>>>>>> implementations are a tangled mess. But part of the problem has
>>>>>>>>>> been
>>>>>>>>>> that people want/expect monotonic time-of-day based timestamps
>>>>>>>>>> (yes a
>>>>>>>>>> contradiction - though some people make sure TOD does not get
>>>>>>>>>> modified
>>>>>>>>>> on their production systems). The use of timestamps in logging has
>>>>>>>>>> to be
>>>>>>>>>> examined carefully - mainly GC logging. I recall a "simple"
>>>>>>>>>> attempted
>>>>>>>>>> change in the past that resulted in trying to compare a nanoTime
>>>>>>>>>> based
>>>>>>>>>> timestamp with the TOD. :(
>>>>>>>>> 
>>>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the
>>>>>>>>> monotonic clock source is used to provide elapsed_counter()
>>>>>>>>> value. It
>>>>>>>>> falls back to TOD when the monotonic clock source is not available.
>>>>>>>>> For Linux/BSD the TOD is used directly.
>>>>>>>>> 
>>>>>>>>> This makes me wonder if changing the linux/bsd implementation to
>>>>>>>>> follow the same logic would be really that disruptive.
>>>>>>>> 
>>>>>>>> Good point. I would like a world where elapsed_counter is monotonic
>>>>>>>> (where possible). Still a bit scary this late in the release, but an
>>>>>>>> interesting experiment.
>>>>>>> 
>>>>>>> The change is rather simple and tests ok. All the means to get a
>>>>>>> monotonic timestamp are already there and proved to work. The core
>>>>>>> tests
>>>>>>> in JPRT went fine.
>>>>>>> 
>>>>>>> The updated webrev is at
>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02
>>>>>>> 
>>>>>>> -JB-
>>>>>>> 
>>>>>>>> 
>>>>>>>> /Staffan
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -JB-
>>>>>>>>>> 
>>>>>>>>>> David
>>>>>>>>>> -----
>>>>>>>>>> 
>>>>>>>>>>> /Staffan
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> -JB-
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /Staffan
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik
>>>>>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote:
>>>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote:
>>>>>>>>>>>>>>>>> Jaroslav,
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is
>>>>>>>>>>>>>>>>>> based on
>>>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to
>>>>>>>>>>>>>>>>>> changes of the
>>>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.).
>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>> uptime
>>>>>>>>>>>>>>>>>> should not depend on the system time and should be
>>>>>>>>>>>>>>>>>> calculated
>>>>>>>>>>>>>>>>>> using a
>>>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in
>>>>>>>>>>>>>>>>>> ticks.
>>>>>>>>>>>>>>>>>> It is
>>>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are
>>>>>>>>>>>>>>>>>> convertible to
>>>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus
>>>>>>>>>>>>>>>>>> making it
>>>>>>>>>>>>>>>>>> very
>>>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime.
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using
>>>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday
>>>>>>>>>>>>>>>>> so is
>>>>>>>>>>>>>>>>> not a
>>>>>>>>>>>>>>>>> monotonic clock source.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> timestamping except of the historical one?
>>>>>>>>>>>>>>>> os::javaTimeNanos()
>>>>>>>>>>>>>>>> uses
>>>>>>>>>>>>>>>> montonic clock when available - why can't be the same used
>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on
>>>>>>>>>>>>>>>> "gettimeofday"
>>>>>>>>>>>>>>>> is not really a counter?
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> It is all historical. These elapsed_counters and
>>>>>>>>>>>>>>> elapsed_timers
>>>>>>>>>>>>>>> make me
>>>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences
>>>>>>>>>>>>>>> because of
>>>>>>>>>>>>>>> the way these are used in logging etc. Certainly not
>>>>>>>>>>>>>>> something
>>>>>>>>>>>>>>> to be
>>>>>>>>>>>>>>> contemplated at this stage of JDK 8.
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> can then
>>>>>>>>>>>>>>> be used for the uptime.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> My attempt at this is at
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks
>>>>>>>>>>>>>> where
>>>>>>>>>>>>>> possible.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The JDK part stays the same as for webrev.00
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> David
>>>>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in
>>>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> uptime in
>>>>>>>>>>>>>>>>>> milliseconds is added in
>>>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary
>>>>>>>>>>>>>>>>>> JNI
>>>>>>>>>>>>>>>>>> bridging
>>>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the
>>>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>> constant
>>>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers
>>>>>>>>>>>>>>>>>> files in
>>>>>>>>>>>>>>>>>> order to
>>>>>>>>>>>>>>>>>> properly build the native library.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Issue:   https://bugs.openjdk.java.net/browse/JDK-6523160
>>>>>>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
> 


From staffan.larsen at oracle.com  Fri Oct 18 04:05:17 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Fri, 18 Oct 2013 13:05:17 +0200
Subject: jmx-dev [PING] RFR: 8024613
	javax/management/remote/mandatory/connection/RMIConnector_NPETest.java
	failing intermittently
In-Reply-To: <525EA65F.9040509@oracle.com>
References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com>
	<524BFB87.10808@oracle.com> <525EA65F.9040509@oracle.com>
Message-ID: <306F58ED-6630-4AB0-877E-49083631C4C2@oracle.com>

Looks good!

Thanks,
/Staffan

On 16 okt 2013, at 16:44, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 2.10.2013 12:55, Jaroslav Bachorik wrote:
>> On 20.9.2013 14:54, shanliang wrote:
>>> Jaroslav,
>>> 
>>> It is a good idea to use the RMI Testlibrary.
>>> 
>>> Better to call:
>>>        agent.close();
>>> 
>>> at Line 55,  close the RMIRegistry (rmid.shutdown(rmidPort) Line 55)
>>> does not ensure the JMX connector doing full clean, it is always better
>>> to do clean within a test.
>> 
>> Thanks. Implemented.
>> 
>> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01
>> 
>> -JB-
>> 
>>> 
>>> Shanliang
>>> 
>>> 
>>> Jaroslav Bachorik wrote:
>>>> Please, review the following change for JDK-8024613
>>>> 
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/
>>>> <http://cr.openjdk.java.net/%7Ejbachorik/8024613/webrev.00/>
>>>> 
>>>> The patch takes care of intermittent test failures caused by timing
>>>> issues when starting the RMID process. It could happen that the RMID
>>>> process hasn't been properly initialized in the timeframe of 5 seconds
>>>> and the test would fail.
>>>> 
>>>> The patch replaces the home-brewed RMID process management with the
>>>> one available in the RMI Testlibrary which is used by more tests and
>>>> therefore should be more stable.
>>>> 
>>>> Thanks,
>>>> 
>>>> -JB-
>>> 
>> 
> 


From staffan.larsen at oracle.com  Fri Oct 18 04:09:07 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Fri, 18 Oct 2013 13:09:07 +0200
Subject: jmx-dev RFR 7197919:
	java/lang/management/ThreadMXBean/ThreadBlockedCount.java has
	concurency issues
In-Reply-To: <525EA03F.2070106@oracle.com>
References: <525EA03F.2070106@oracle.com>
Message-ID: <5D7A8C19-DB9C-4CE7-B18A-E22C74C0794C@oracle.com>

Looks good!

Nit:  for(int i=0;i<100;i++) should have more spaces:  for (int i = 0; i < 100; i++)

Thanks,
/Staffan

On 16 okt 2013, at 16:18, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> Please, review this simple test change.
> 
> The test tries to get the number of times a certain thread was blocked during the test run and intermittently fails with the difference of 1 - the expected number is 4 but the reported number is 3.
> 
> When updating the thread statistics (the blocked count in this case) no lock is used so there might be stale data when the ThreadMXBean retrieves the stats. The patch tries to workaround this problem by retrying a few times with the added delay. The test will try to obtain the correct result for at most 10 seconds - after that it will fail if the retrieved blocked count does not equal the expected blocked count.
> 
> Issue : https://bugs.openjdk.java.net/browse/JDK-7197919
> Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00
> 
> Thanks,
> 
> -JB-


From shanliang.jiang at oracle.com  Fri Oct 18 07:57:42 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Fri, 18 Oct 2013 16:57:42 +0200
Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report
 some issue in com.sun.jmx.snmp package
In-Reply-To: <525FB3AB.3040105@oracle.com>
References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com>
	<525FB3AB.3040105@oracle.com>
Message-ID: <52614C66.6040507@oracle.com>

Thanks Paul and Daniel for the review.

Shanliang


Daniel Fuchs wrote:
> Hi Shanliang,
>
> Looks good!
>
> -- daniel
>
> On 10/16/13 3:58 PM, shanliang wrote:
>> Hi,
>>
>> Please review the following fix, main issue here is that we should clone
>> an internal variable before returning.
>>
>> webrev:
>> http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/
>>
>> bug
>> https://bugs.openjdk.java.net/browse/JDK-8026028
>>
>> Thanks,
>> Shanliang
>>
>>
>>
>>
>


From mandy.chung at oracle.com  Fri Oct 18 09:42:32 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Fri, 18 Oct 2013 09:42:32 -0700
Subject: jmx-dev RFR 7197919:
 java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency
 issues
In-Reply-To: <525EA03F.2070106@oracle.com>
References: <525EA03F.2070106@oracle.com>
Message-ID: <526164F8.1070501@oracle.com>

On 10/16/2013 7:18 AM, Jaroslav Bachorik wrote:
> Please, review this simple test change.
>
> The test tries to get the number of times a certain thread was blocked 
> during the test run and intermittently fails with the difference of 1 
> - the expected number is 4 but the reported number is 3.
>
> When updating the thread statistics (the blocked count in this case) 
> no lock is used so there might be stale data when the ThreadMXBean 
> retrieves the stats. The patch tries to workaround this problem by 
> retrying a few times with the added delay. The test will try to obtain 
> the correct result for at most 10 seconds - after that it will fail if 
> the retrieved blocked count does not equal the expected blocked count.
>
> Issue : https://bugs.openjdk.java.net/browse/JDK-7197919
> Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00

Looks okay.   I notice that existing code that catches 
InterruptedException only sets testFailed to true but continue.  I think 
it might be good to fix them to return if IE is caught to fail-fast like 
what your fix does.

Mandy

From jaroslav.bachorik at oracle.com  Mon Oct 21 01:47:49 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 21 Oct 2013 10:47:49 +0200
Subject: jmx-dev RFR 7197919:
 java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency
 issues
In-Reply-To: <526164F8.1070501@oracle.com>
References: <525EA03F.2070106@oracle.com> <526164F8.1070501@oracle.com>
Message-ID: <5264EA35.2070605@oracle.com>

On 18.10.2013 18:42, Mandy Chung wrote:
> On 10/16/2013 7:18 AM, Jaroslav Bachorik wrote:
>> Please, review this simple test change.
>>
>> The test tries to get the number of times a certain thread was blocked
>> during the test run and intermittently fails with the difference of 1
>> - the expected number is 4 but the reported number is 3.
>>
>> When updating the thread statistics (the blocked count in this case)
>> no lock is used so there might be stale data when the ThreadMXBean
>> retrieves the stats. The patch tries to workaround this problem by
>> retrying a few times with the added delay. The test will try to obtain
>> the correct result for at most 10 seconds - after that it will fail if
>> the retrieved blocked count does not equal the expected blocked count.
>>
>> Issue : https://bugs.openjdk.java.net/browse/JDK-7197919
>> Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00
>
> Looks okay.   I notice that existing code that catches
> InterruptedException only sets testFailed to true but continue.  I think
> it might be good to fix them to return if IE is caught to fail-fast like
> what your fix does.

Unfortunately, it's not possible to directly return in those cases. The 
synchronization logic relies on the code passing through all the 
"signal"/"waitForSignal" pairs for the test to finish - otherwise the 
test might just hang. I have at least added loop breaks to fail a bit 
faster in case of IE.

-JB-

>
> Mandy


From jaroslav.bachorik at oracle.com  Mon Oct 21 04:03:04 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 21 Oct 2013 13:03:04 +0200
Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails
	intermittently
Message-ID: <526509E8.4030002@oracle.com>

Hi,

please, review the following small test change:

Issue:  https://bugs.openjdk.java.net/browse/JDK-7140929
Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00

The test fails intermittently, mostly when it is run with -Xcomp option. 
The failure is due to fixed timeout used in the test when waiting for 
the notifications arrival. Tests of such slow configurations are run 
with "timeoutfactor" set but the NotSerializableNotifTest does not 
respect the timeoutfactor.

The patch allows the test to reflect the provided "timeoutfactor" and 
therefore successfully pass even when -Xcomp is used.

Thanks,

-JB-

From Alan.Bateman at oracle.com  Mon Oct 21 04:20:17 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Mon, 21 Oct 2013 12:20:17 +0100
Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails
	intermittently
In-Reply-To: <526509E8.4030002@oracle.com>
References: <526509E8.4030002@oracle.com>
Message-ID: <52650DF1.2050008@oracle.com>

On 21/10/2013 12:03, Jaroslav Bachorik wrote:
> Hi,
>
> please, review the following small test change:
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7140929
> Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00
>
> The test fails intermittently, mostly when it is run with -Xcomp 
> option. The failure is due to fixed timeout used in the test when 
> waiting for the notifications arrival. Tests of such slow 
> configurations are run with "timeoutfactor" set but the 
> NotSerializableNotifTest does not respect the timeoutfactor.
>
> The patch allows the test to reflect the provided "timeoutfactor" and 
> therefore successfully pass even when -Xcomp is used.
Good to see test.timeout.factor being used (I think a lot of tests could 
benefit from using it). The change in the webrev looks okay, in the 
sense that you have scaled the existing 10s timeout.

-Alan.

From shanliang.jiang at oracle.com  Mon Oct 21 04:45:53 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Mon, 21 Oct 2013 13:45:53 +0200
Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails
	intermittently
In-Reply-To: <526509E8.4030002@oracle.com>
References: <526509E8.4030002@oracle.com>
Message-ID: <526513F1.4080700@oracle.com>

Jaroslav,

Look fine to me, thanks to fix the timing.
Next time we may need to fix its fixed port:)

Shanliang

Jaroslav Bachorik wrote:
> Hi,
>
> please, review the following small test change:
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7140929
> Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00
>
> The test fails intermittently, mostly when it is run with -Xcomp 
> option. The failure is due to fixed timeout used in the test when 
> waiting for the notifications arrival. Tests of such slow 
> configurations are run with "timeoutfactor" set but the 
> NotSerializableNotifTest does not respect the timeoutfactor.
>
> The patch allows the test to reflect the provided "timeoutfactor" and 
> therefore successfully pass even when -Xcomp is used.
>
> Thanks,
>
> -JB-


From jaroslav.bachorik at oracle.com  Mon Oct 21 04:55:50 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 21 Oct 2013 13:55:50 +0200
Subject: jmx-dev RFR 6309226: TEST:
 java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't
 check Thread.sleep
Message-ID: <52651646.4050705@oracle.com>

Please, review this small patch for a test failing due to the updated 
implementation in JDK6.

Issue:  https://bugs.openjdk.java.net/browse/JDK-6309226
Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/

The test fails due to the change in mustang where 
ThreadMXBean.getThreadInfo().getWaitedTime() and 
ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() 
too. Unfortunately, Thread.sleep() is used throughout the test for 
synchronization purposes and this breaks the test.

In the patch I propose to replace Thread.sleep() with busy wait and 
hinting the scheduler by Thread.yield(). While not very elegant it 
successfully works around inclusion of unknown number of Thread.sleep()s 
(they are called in loop).

Thanks,

-JB-

From paul.sandoz at oracle.com  Thu Oct 17 03:28:14 2013
From: paul.sandoz at oracle.com (Paul Sandoz)
Date: Thu, 17 Oct 2013 12:28:14 +0200
Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report
	some issue in com.sun.jmx.snmp package
In-Reply-To: <525E9B87.9050406@oracle.com>
References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com>
Message-ID: <AECB9028-185F-4005-96B7-7E56CF83D251@oracle.com>


On Oct 16, 2013, at 3:58 PM, shanliang <shanliang.jiang at Oracle.Com> wrote:

> Hi,
> 
> Please review the following fix, main issue here is that we should clone an internal variable before returning.
> 
> webrev:
> http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/
> 
> bug
> https://bugs.openjdk.java.net/browse/JDK-8026028
> 

+1.

Paul.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
Url : http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131017/291bbc5f/signature.asc 

From jaroslav.bachorik at oracle.com  Mon Oct 21 07:46:48 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Mon, 21 Oct 2013 16:46:48 +0200
Subject: jmx-dev RFR 7112404: 2 tests in
 java/lang/management/ManagementFactory fails with G1 because expect
 non-zero pools
Message-ID: <52653E58.9070508@oracle.com>

Please, review this simple test fix.

Issue:  https://bugs.openjdk.java.net/browse/JDK-7112404
Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00

The tests assume the MemoryUsage#commited values to be positive (>0) 
while according to the MemoryUsage only negative values are invalid. 
Therefore the tests should check and fail only when MemoryUsage#commited 
is < 0.

Thanks,

-JB-

From daniel.fuchs at oracle.com  Mon Oct 21 07:56:07 2013
From: daniel.fuchs at oracle.com (Daniel Fuchs)
Date: Mon, 21 Oct 2013 16:56:07 +0200
Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails
	intermittently
In-Reply-To: <526509E8.4030002@oracle.com>
References: <526509E8.4030002@oracle.com>
Message-ID: <52654087.4020905@oracle.com>

On 10/21/13 1:03 PM, Jaroslav Bachorik wrote:
> Hi,
>
> please, review the following small test change:
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7140929
> Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00
>
> The test fails intermittently, mostly when it is run with -Xcomp option.
> The failure is due to fixed timeout used in the test when waiting for
> the notifications arrival. Tests of such slow configurations are run
> with "timeoutfactor" set but the NotSerializableNotifTest does not
> respect the timeoutfactor.
>
> The patch allows the test to reflect the provided "timeoutfactor" and
> therefore successfully pass even when -Xcomp is used.
>
> Thanks,
>
> -JB-

Hi Jaroslav,

Looks good to me. I didn't know timeoutFactor was availaible as a system 
property.

You can probably simplify the code like this:

private static double timeoutFactor;
...

main(...) {
     ...
     timeoutFactor = Double.parseDouble(
                  System.getProperty("test.timeout.factor", "1.0")
              );
}

(no need for the timeoutVal variable)

regards,

-- daniel

From shanliang.jiang at oracle.com  Mon Oct 21 08:14:18 2013
From: shanliang.jiang at oracle.com (shanliang)
Date: Mon, 21 Oct 2013 17:14:18 +0200
Subject: jmx-dev RFR 7112404: 2 tests in
 java/lang/management/ManagementFactory fails with G1 because expect
 non-zero pools
In-Reply-To: <52653E58.9070508@oracle.com>
References: <52653E58.9070508@oracle.com>
Message-ID: <526544CA.1030601@oracle.com>

Looks OK.
    164         // sanity check to have non-zero usage
should be changed to ?
    164         // sanity check to have non-negative usage

Shanliang

Jaroslav Bachorik wrote:
> Please, review this simple test fix.
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7112404
> Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00
>
> The tests assume the MemoryUsage#commited values to be positive (>0) 
> while according to the MemoryUsage only negative values are invalid. 
> Therefore the tests should check and fail only when 
> MemoryUsage#commited is < 0.
>
> Thanks,
>
> -JB-


From david.holmes at oracle.com  Tue Oct 22 00:58:21 2013
From: david.holmes at oracle.com (David Holmes)
Date: Tue, 22 Oct 2013 17:58:21 +1000
Subject: jmx-dev RFR 6309226: TEST:
 java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't
 check Thread.sleep
In-Reply-To: <52651646.4050705@oracle.com>
References: <52651646.4050705@oracle.com>
Message-ID: <5266301D.5040002@oracle.com>

On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote:
> Please, review this small patch for a test failing due to the updated
> implementation in JDK6.
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-6309226
> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/
>
> The test fails due to the change in mustang where
> ThreadMXBean.getThreadInfo().getWaitedTime() and
> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep()
> too. Unfortunately, Thread.sleep() is used throughout the test for
> synchronization purposes and this breaks the test.
>
> In the patch I propose to replace Thread.sleep() with busy wait and
> hinting the scheduler by Thread.yield(). While not very elegant it
> successfully works around inclusion of unknown number of Thread.sleep()s
> (they are called in loop).

Not elegant and not completely reliable either. Probably adequate on a 
multi-core system but single-core and with some schedulers it could just 
be a busy spin.

David

> Thanks,
>
> -JB-

From jaroslav.bachorik at oracle.com  Tue Oct 22 04:03:48 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 22 Oct 2013 13:03:48 +0200
Subject: jmx-dev RFR 6309226: TEST:
 java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't
 check Thread.sleep
In-Reply-To: <5266301D.5040002@oracle.com>
References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com>
Message-ID: <52665B94.8090902@oracle.com>

On 22.10.2013 09:58, David Holmes wrote:
> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote:
>> Please, review this small patch for a test failing due to the updated
>> implementation in JDK6.
>>
>> Issue:  https://bugs.openjdk.java.net/browse/JDK-6309226
>> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/
>>
>> The test fails due to the change in mustang where
>> ThreadMXBean.getThreadInfo().getWaitedTime() and
>> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep()
>> too. Unfortunately, Thread.sleep() is used throughout the test for
>> synchronization purposes and this breaks the test.
>>
>> In the patch I propose to replace Thread.sleep() with busy wait and
>> hinting the scheduler by Thread.yield(). While not very elegant it
>> successfully works around inclusion of unknown number of Thread.sleep()s
>> (they are called in loop).
>
> Not elegant and not completely reliable either. Probably adequate on a
> multi-core system but single-core and with some schedulers it could just
> be a busy spin.

:/ Ok, so I need to account for the Thread.sleep() calls made outside of 
the test code but still reported by the ThreadMXBean. Not that elegant, 
too, but at least should be reliable.

http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01

-JB-

>
> David
>
>> Thanks,
>>
>> -JB-


From jaroslav.bachorik at oracle.com  Tue Oct 22 06:47:41 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 22 Oct 2013 15:47:41 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
Message-ID: <526681FD.90604@oracle.com>

Please, review the following test fix:

Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01

The test tries to make sure that the "pool usage threshold" trigger and 
the reported pool memory usage are not contradicting each other. The 
problem is that it is not possible to get the "pool usage threshold 
exceeded" flag and the pool memory usage atomicly in regard to the GC. 
Specifically, when "CMS Old Gen" pool is examined and the usage is 
retrieved before a GC promotes some objects to the old gen but the usage 
threshold is checked after the GC has promoted some instance into the 
old gen the test will fail.

The patch makes sure that there are some instances promoted in  "CMS Old 
Gen" before checking the "pool usage threshold" to get semi-consistent view.

Thanks,

-JB-

From mandy.chung at oracle.com  Tue Oct 22 12:38:56 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 22 Oct 2013 12:38:56 -0700
Subject: jmx-dev RFR 7112404: 2 tests in
 java/lang/management/ManagementFactory fails with G1 because expect
 non-zero pools
In-Reply-To: <52653E58.9070508@oracle.com>
References: <52653E58.9070508@oracle.com>
Message-ID: <5266D450.5050506@oracle.com>

On 10/21/13 7:46 AM, Jaroslav Bachorik wrote:
> Please, review this simple test fix.
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-7112404
> Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00
>

Looks okay to me.

Mandy

> The tests assume the MemoryUsage#commited values to be positive (>0) 
> while according to the MemoryUsage only negative values are invalid. 
> Therefore the tests should check and fail only when 
> MemoryUsage#commited is < 0.
>
> Thanks,
>
> -JB-


From mandy.chung at oracle.com  Tue Oct 22 13:04:38 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Tue, 22 Oct 2013 13:04:38 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <526681FD.90604@oracle.com>
References: <526681FD.90604@oracle.com>
Message-ID: <5266DA56.6050609@oracle.com>

Hi Jaroslav,

On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
> Please, review the following test fix:
>
> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>

Have you considered to force GC when getUsed() == 0 regardless of which 
memory pool it is?  This will avoid special casing for CMS old gen in 
the test and will handle similar issue in the future for a different 
collector implementation.  To make the test reliable, the test should 
still pass if the memory pool has no object in it (G1 survivor space 
case?).

Mandy

> The test tries to make sure that the "pool usage threshold" trigger 
> and the reported pool memory usage are not contradicting each other. 
> The problem is that it is not possible to get the "pool usage 
> threshold exceeded" flag and the pool memory usage atomicly in regard 
> to the GC. Specifically, when "CMS Old Gen" pool is examined and the 
> usage is retrieved before a GC promotes some objects to the old gen 
> but the usage threshold is checked after the GC has promoted some 
> instance into the old gen the test will fail.
>
> The patch makes sure that there are some instances promoted in "CMS 
> Old Gen" before checking the "pool usage threshold" to get 
> semi-consistent view.
>
> Thanks,
>
> -JB-


From david.holmes at oracle.com  Tue Oct 22 17:40:21 2013
From: david.holmes at oracle.com (David Holmes)
Date: Wed, 23 Oct 2013 10:40:21 +1000
Subject: jmx-dev RFR 6309226: TEST:
 java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't
 check Thread.sleep
In-Reply-To: <52665B94.8090902@oracle.com>
References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com>
	<52665B94.8090902@oracle.com>
Message-ID: <52671AF5.8050703@oracle.com>

On 22/10/2013 9:03 PM, Jaroslav Bachorik wrote:
> On 22.10.2013 09:58, David Holmes wrote:
>> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote:
>>> Please, review this small patch for a test failing due to the updated
>>> implementation in JDK6.
>>>
>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-6309226
>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/
>>>
>>> The test fails due to the change in mustang where
>>> ThreadMXBean.getThreadInfo().getWaitedTime() and
>>> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep()
>>> too. Unfortunately, Thread.sleep() is used throughout the test for
>>> synchronization purposes and this breaks the test.
>>>
>>> In the patch I propose to replace Thread.sleep() with busy wait and
>>> hinting the scheduler by Thread.yield(). While not very elegant it
>>> successfully works around inclusion of unknown number of Thread.sleep()s
>>> (they are called in loop).
>>
>> Not elegant and not completely reliable either. Probably adequate on a
>> multi-core system but single-core and with some schedulers it could just
>> be a busy spin.
>
> :/ Ok, so I need to account for the Thread.sleep() calls made outside of
> the test code but still reported by the ThreadMXBean. Not that elegant,
> too, but at least should be reliable.
>
> http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01

Sorry, I can't follow the logic of that test enough to determine whether 
this compensating logic is correct.

Whether this is more reliable depends on whether the 5% tolerance in 
timeRangeCheck is enough to account for all the potential inaccuracies 
in the time measurements. On a lightly loaded system it may be, but 
otherwise ... a context switch after determining the base time and doing 
the sleep could add an arbitrary load and cpu-dependent delay. It might 
be less reliable than the yield approach :(

I can't help wonder whether there is a more explicit synchronization 
mechanism that will avoid the need for goSleep? But that becomes a much 
bigger task to deal with.

I will leave this for the serviceability team to determine the best 
course of action.

Thanks,
David

> -JB-
>
>>
>> David
>>
>>> Thanks,
>>>
>>> -JB-
>

From jaroslav.bachorik at oracle.com  Wed Oct 23 00:42:08 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 09:42:08 +0200
Subject: jmx-dev RFR 6309226: TEST:
 java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't
 check Thread.sleep
In-Reply-To: <52671AF5.8050703@oracle.com>
References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com>
	<52665B94.8090902@oracle.com> <52671AF5.8050703@oracle.com>
Message-ID: <52677DD0.7000808@oracle.com>

On 23.10.2013 02:40, David Holmes wrote:
> On 22/10/2013 9:03 PM, Jaroslav Bachorik wrote:
>> On 22.10.2013 09:58, David Holmes wrote:
>>> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote:
>>>> Please, review this small patch for a test failing due to the updated
>>>> implementation in JDK6.
>>>>
>>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-6309226
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/
>>>>
>>>> The test fails due to the change in mustang where
>>>> ThreadMXBean.getThreadInfo().getWaitedTime() and
>>>> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep()
>>>> too. Unfortunately, Thread.sleep() is used throughout the test for
>>>> synchronization purposes and this breaks the test.
>>>>
>>>> In the patch I propose to replace Thread.sleep() with busy wait and
>>>> hinting the scheduler by Thread.yield(). While not very elegant it
>>>> successfully works around inclusion of unknown number of
>>>> Thread.sleep()s
>>>> (they are called in loop).
>>>
>>> Not elegant and not completely reliable either. Probably adequate on a
>>> multi-core system but single-core and with some schedulers it could just
>>> be a busy spin.
>>
>> :/ Ok, so I need to account for the Thread.sleep() calls made outside of
>> the test code but still reported by the ThreadMXBean. Not that elegant,
>> too, but at least should be reliable.
>>
>> http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01
>
> Sorry, I can't follow the logic of that test enough to determine whether
> this compensating logic is correct.

It simply adds the number of times and the time spent in sleeping during 
calls to goSleep() from the BlockedThread (the one that actually counts).

It seems to be correct - otherwise the test would fail because the 
numbers wouldn't match.


>
> Whether this is more reliable depends on whether the 5% tolerance in
> timeRangeCheck is enough to account for all the potential inaccuracies
> in the time measurements. On a lightly loaded system it may be, but
> otherwise ... a context switch after determining the base time and doing
> the sleep could add an arbitrary load and cpu-dependent delay. It might
> be less reliable than the yield approach :(

I wonder how would "yield" in busy wait behave on a single core 
architecture. I need the second thread to progress while busy-waiting ...

>
> I can't help wonder whether there is a more explicit synchronization
> mechanism that will avoid the need for goSleep? But that becomes a much
> bigger task to deal with.

Yes. The only task of this fix is to enable the test to be run even 
after Thread.sleep() started to be included in the waited time (sometime 
in JDK6 timeframe). I suppose the test was successfully used before the 
change and if there are any problems with timing additional issues will 
be filed and the test will be redesigned.

For now I would like to keep the change simple and really focus on 
making the test executable on JDK7/8.

>
> I will leave this for the serviceability team to determine the best
> course of action.

Thanks for valuable comments, anyway.

-JB-

>
> Thanks,
> David
>
>> -JB-
>>
>>>
>>> David
>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>


From jaroslav.bachorik at oracle.com  Wed Oct 23 01:02:08 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 10:02:08 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5266DA56.6050609@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
Message-ID: <52678280.1070004@oracle.com>

On 22.10.2013 22:04, Mandy Chung wrote:
> Hi Jaroslav,
>
> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>> Please, review the following test fix:
>>
>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>
>
> Have you considered to force GC when getUsed() == 0 regardless of which
> memory pool it is?  This will avoid special casing for CMS old gen in
> the test and will handle similar issue in the future for a different
> collector implementation.  To make the test reliable, the test should
> still pass if the memory pool has no object in it (G1 survivor space
> case?).

Hi Mandy,

I don't know whether GC will help for other pools - but I can enable it 
for all pools - it should not hurt.

The test should pass even with on object in the monitored pool since the 
pool should not report an exceeded threshold.

-JB-

>
> Mandy
>
>> The test tries to make sure that the "pool usage threshold" trigger
>> and the reported pool memory usage are not contradicting each other.
>> The problem is that it is not possible to get the "pool usage
>> threshold exceeded" flag and the pool memory usage atomicly in regard
>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>> usage is retrieved before a GC promotes some objects to the old gen
>> but the usage threshold is checked after the GC has promoted some
>> instance into the old gen the test will fail.
>>
>> The patch makes sure that there are some instances promoted in "CMS
>> Old Gen" before checking the "pool usage threshold" to get
>> semi-consistent view.
>>
>> Thanks,
>>
>> -JB-
>


From staffan.larsen at oracle.com  Wed Oct 23 01:08:24 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 23 Oct 2013 10:08:24 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
	isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52678280.1070004@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
Message-ID: <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>

I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors.

/Staffan

On 23 okt 2013, at 10:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 22.10.2013 22:04, Mandy Chung wrote:
>> Hi Jaroslav,
>> 
>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>> Please, review the following test fix:
>>> 
>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>> 
>> 
>> Have you considered to force GC when getUsed() == 0 regardless of which
>> memory pool it is?  This will avoid special casing for CMS old gen in
>> the test and will handle similar issue in the future for a different
>> collector implementation.  To make the test reliable, the test should
>> still pass if the memory pool has no object in it (G1 survivor space
>> case?).
> 
> Hi Mandy,
> 
> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt.
> 
> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold.
> 
> -JB-
> 
>> 
>> Mandy
>> 
>>> The test tries to make sure that the "pool usage threshold" trigger
>>> and the reported pool memory usage are not contradicting each other.
>>> The problem is that it is not possible to get the "pool usage
>>> threshold exceeded" flag and the pool memory usage atomicly in regard
>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>>> usage is retrieved before a GC promotes some objects to the old gen
>>> but the usage threshold is checked after the GC has promoted some
>>> instance into the old gen the test will fail.
>>> 
>>> The patch makes sure that there are some instances promoted in "CMS
>>> Old Gen" before checking the "pool usage threshold" to get
>>> semi-consistent view.
>>> 
>>> Thanks,
>>> 
>>> -JB-
>> 
> 


From jaroslav.bachorik at oracle.com  Wed Oct 23 01:12:57 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 10:12:57 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
Message-ID: <52678509.2020002@oracle.com>

On 23.10.2013 10:08, Staffan Larsen wrote:
> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors.

Hm, will it now? I had the impression that it was just hinting the GC 
system to perform GC but it might decide to ignore it. I need to be sure 
that the GC was performed before continuing - otherwise I might get 
inconsistent data again.

-JB-

>
> /Staffan
>
> On 23 okt 2013, at 10:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> On 22.10.2013 22:04, Mandy Chung wrote:
>>> Hi Jaroslav,
>>>
>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>> Please, review the following test fix:
>>>>
>>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>
>>>
>>> Have you considered to force GC when getUsed() == 0 regardless of which
>>> memory pool it is?  This will avoid special casing for CMS old gen in
>>> the test and will handle similar issue in the future for a different
>>> collector implementation.  To make the test reliable, the test should
>>> still pass if the memory pool has no object in it (G1 survivor space
>>> case?).
>>
>> Hi Mandy,
>>
>> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt.
>>
>> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold.
>>
>> -JB-
>>
>>>
>>> Mandy
>>>
>>>> The test tries to make sure that the "pool usage threshold" trigger
>>>> and the reported pool memory usage are not contradicting each other.
>>>> The problem is that it is not possible to get the "pool usage
>>>> threshold exceeded" flag and the pool memory usage atomicly in regard
>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>>>> usage is retrieved before a GC promotes some objects to the old gen
>>>> but the usage threshold is checked after the GC has promoted some
>>>> instance into the old gen the test will fail.
>>>>
>>>> The patch makes sure that there are some instances promoted in "CMS
>>>> Old Gen" before checking the "pool usage threshold" to get
>>>> semi-consistent view.
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>
>>
>


From staffan.larsen at oracle.com  Wed Oct 23 01:18:49 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Wed, 23 Oct 2013 10:18:49 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
	isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52678509.2020002@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
	<52678509.2020002@oracle.com>
Message-ID: <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>


On 23 okt 2013, at 10:12, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 23.10.2013 10:08, Staffan Larsen wrote:
>> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors.
> 
> Hm, will it now? I had the impression that it was just hinting the GC system to perform GC but it might decide to ignore it. I need to be sure that the GC was performed before continuing - otherwise I might get inconsistent data again.

According to the spec it's just a hint, but I think the implementation happens to be a force. But better safe than sorry. :)

/Staffan 

> 
> -JB-
> 
>> 
>> /Staffan
>> 
>> On 23 okt 2013, at 10:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>> 
>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>> Hi Jaroslav,
>>>> 
>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>> Please, review the following test fix:
>>>>> 
>>>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>> 
>>>> 
>>>> Have you considered to force GC when getUsed() == 0 regardless of which
>>>> memory pool it is?  This will avoid special casing for CMS old gen in
>>>> the test and will handle similar issue in the future for a different
>>>> collector implementation.  To make the test reliable, the test should
>>>> still pass if the memory pool has no object in it (G1 survivor space
>>>> case?).
>>> 
>>> Hi Mandy,
>>> 
>>> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt.
>>> 
>>> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold.
>>> 
>>> -JB-
>>> 
>>>> 
>>>> Mandy
>>>> 
>>>>> The test tries to make sure that the "pool usage threshold" trigger
>>>>> and the reported pool memory usage are not contradicting each other.
>>>>> The problem is that it is not possible to get the "pool usage
>>>>> threshold exceeded" flag and the pool memory usage atomicly in regard
>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>>>>> usage is retrieved before a GC promotes some objects to the old gen
>>>>> but the usage threshold is checked after the GC has promoted some
>>>>> instance into the old gen the test will fail.
>>>>> 
>>>>> The patch makes sure that there are some instances promoted in "CMS
>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>> semi-consistent view.
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> -JB-
>>>> 
>>> 
>> 
> 


From bengt.rutisson at oracle.com  Wed Oct 23 05:40:13 2013
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Wed, 23 Oct 2013 14:40:13 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
Message-ID: <5267C3AD.5050306@oracle.com>


Hi Jaroslav,

A couple of questions.

I don't understand why this is a CMS only problem? Why don't the other 
collectors have the same issue? I guess it is less likely that the other 
collectors start (or complete) a GC without a lot of allocation going 
on. But at least G1 should have the same problem.

Also, from the problem description in the CR I would have guessed that 
you want the GC to happen between these two statements:

p.setUsageThreshold(1);
MemoryUsage u = p.getUsage();

Now you have added the GC just after these statements. I thought that 
was what caused the problem. That you read the usage data at one point, 
then a GC happens and you compare the cached usage
data to the new data that p.isUsageThresholdExceeded() will fetch.

Looking at the promoteToOldGen() method I assume that the intent is that 
the code should be using the return value. So my guess is that this code:

   94         if (p.getName().equals("CMS Old Gen")) {
   95             promoteToOldGen(p, u);
   96         }

Should be:

   94         if (p.getName().equals("CMS Old Gen")) {
   95             u = promoteToOldGen(p, u);
   96         }

With that, I think it might work. But I still don't understand why this 
is only a CMS problem.

One more question about the promoteToOldGen() and forceGC() methods. I 
don't really know much about how the different beans work, but are we 
sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same 
pool names? That is, are you sure that forceGC() actually will do anything?

As for just doing a System.gc() to force a GC I think you can rely on 
that System.gc() does a full GC in Hotspot unless someone sets 
-XX:+DisableExplicitGC on the command line. Considering that you are 
relying on Hotspot specifc names for pools I don't think it is a 
limitation to the test to rely on the Hotspot implementatoin of System.gc().

Thanks,
Bengt


On 2013-10-23 10:18, Staffan Larsen wrote:
> On 23 okt 2013, at 10:12, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors.
>> Hm, will it now? I had the impression that it was just hinting the GC system to perform GC but it might decide to ignore it. I need to be sure that the GC was performed before continuing - otherwise I might get inconsistent data again.
> According to the spec it's just a hint, but I think the implementation happens to be a force. But better safe than sorry. :)
>
> /Staffan
>
>> -JB-
>>
>>> /Staffan
>>>
>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>>>
>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>> Hi Jaroslav,
>>>>>
>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>> Please, review the following test fix:
>>>>>>
>>>>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>
>>>>> Have you considered to force GC when getUsed() == 0 regardless of which
>>>>> memory pool it is?  This will avoid special casing for CMS old gen in
>>>>> the test and will handle similar issue in the future for a different
>>>>> collector implementation.  To make the test reliable, the test should
>>>>> still pass if the memory pool has no object in it (G1 survivor space
>>>>> case?).
>>>> Hi Mandy,
>>>>
>>>> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt.
>>>>
>>>> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold.
>>>>
>>>> -JB-
>>>>
>>>>> Mandy
>>>>>
>>>>>> The test tries to make sure that the "pool usage threshold" trigger
>>>>>> and the reported pool memory usage are not contradicting each other.
>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>> threshold exceeded" flag and the pool memory usage atomicly in regard
>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>>>>>> usage is retrieved before a GC promotes some objects to the old gen
>>>>>> but the usage threshold is checked after the GC has promoted some
>>>>>> instance into the old gen the test will fail.
>>>>>>
>>>>>> The patch makes sure that there are some instances promoted in "CMS
>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>> semi-consistent view.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> -JB-


From jaroslav.bachorik at oracle.com  Wed Oct 23 05:55:43 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 14:55:43 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267C3AD.5050306@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com>
Message-ID: <5267C74F.2010302@oracle.com>

Hi Bengt,

On 23.10.2013 14:40, Bengt Rutisson wrote:
>
> Hi Jaroslav,
>
> A couple of questions.
>
> I don't understand why this is a CMS only problem? Why don't the other
> collectors have the same issue? I guess it is less likely that the other
> collectors start (or complete) a GC without a lot of allocation going
> on. But at least G1 should have the same problem.

I don't really know. If there are other pools that can have the "used" 
value 0 before a GC happens then yes, they are susceptible to the same 
problem.

>
> Also, from the problem description in the CR I would have guessed that
> you want the GC to happen between these two statements:
>
> p.setUsageThreshold(1);
> MemoryUsage u = p.getUsage();

This is all but a heuristic here. The problem lies in the fact that it 
is not possible to retrieve the pool usage and the "threshold exceeded" 
flag consistently in one atomic operation. I might get usable data from 
the first call and then I don't need to force GC.

>
> Now you have added the GC just after these statements. I thought that
> was what caused the problem. That you read the usage data at one point,
> then a GC happens and you compare the cached usage
> data to the new data that p.isUsageThresholdExceeded() will fetch.
>
> Looking at the promoteToOldGen() method I assume that the intent is that
> the code should be using the return value. So my guess is that this code:
>
>    94         if (p.getName().equals("CMS Old Gen")) {
>    95             promoteToOldGen(p, u);
>    96         }
>
> Should be:
>
>    94         if (p.getName().equals("CMS Old Gen")) {
>    95             u = promoteToOldGen(p, u);
>    96         }

Indeed. It was meant to re-fetch the usage after GC.

>
> With that, I think it might work. But I still don't understand why this
> is only a CMS problem.
>
> One more question about the promoteToOldGen() and forceGC() methods. I
> don't really know much about how the different beans work, but are we
> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same
> pool names? That is, are you sure that forceGC() actually will do anything?

They use the pool names as reported by the GC infrastracture so they 
should be the same.

>
> As for just doing a System.gc() to force a GC I think you can rely on
> that System.gc() does a full GC in Hotspot unless someone sets
> -XX:+DisableExplicitGC on the command line. Considering that you are
> relying on Hotspot specifc names for pools I don't think it is a
> limitation to the test to rely on the Hotspot implementatoin of
> System.gc().

Good to know. I guess I could simplify the change and just call 
System.gc(), after all.

Thanks,

-JB-

>
> Thanks,
> Bengt
>
>
>
>
> On 2013-10-23 10:18, Staffan Larsen wrote:
>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>> <jaroslav.bachorik at oracle.com> wrote:
>>
>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>> I think you can simplify the logic for forcing a GC to just a simple
>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>> collection to happen for all collectors.
>>> Hm, will it now? I had the impression that it was just hinting the GC
>>> system to perform GC but it might decide to ignore it. I need to be
>>> sure that the GC was performed before continuing - otherwise I might
>>> get inconsistent data again.
>> According to the spec it's just a hint, but I think the implementation
>> happens to be a force. But better safe than sorry. :)
>>
>> /Staffan
>>
>>> -JB-
>>>
>>>> /Staffan
>>>>
>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>
>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>> Hi Jaroslav,
>>>>>>
>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>> Please, review the following test fix:
>>>>>>>
>>>>>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>
>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>> which
>>>>>> memory pool it is?  This will avoid special casing for CMS old gen in
>>>>>> the test and will handle similar issue in the future for a different
>>>>>> collector implementation.  To make the test reliable, the test should
>>>>>> still pass if the memory pool has no object in it (G1 survivor space
>>>>>> case?).
>>>>> Hi Mandy,
>>>>>
>>>>> I don't know whether GC will help for other pools - but I can
>>>>> enable it for all pools - it should not hurt.
>>>>>
>>>>> The test should pass even with on object in the monitored pool
>>>>> since the pool should not report an exceeded threshold.
>>>>>
>>>>> -JB-
>>>>>
>>>>>> Mandy
>>>>>>
>>>>>>> The test tries to make sure that the "pool usage threshold" trigger
>>>>>>> and the reported pool memory usage are not contradicting each other.
>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>> regard
>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the
>>>>>>> usage is retrieved before a GC promotes some objects to the old gen
>>>>>>> but the usage threshold is checked after the GC has promoted some
>>>>>>> instance into the old gen the test will fail.
>>>>>>>
>>>>>>> The patch makes sure that there are some instances promoted in "CMS
>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>> semi-consistent view.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> -JB-
>


From bengt.rutisson at oracle.com  Wed Oct 23 06:15:44 2013
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Wed, 23 Oct 2013 15:15:44 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267C74F.2010302@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
Message-ID: <5267CC00.7080509@oracle.com>


On 2013-10-23 14:55, Jaroslav Bachorik wrote:
> Hi Bengt,
>
> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>
>> Hi Jaroslav,
>>
>> A couple of questions.
>>
>> I don't understand why this is a CMS only problem? Why don't the other
>> collectors have the same issue? I guess it is less likely that the other
>> collectors start (or complete) a GC without a lot of allocation going
>> on. But at least G1 should have the same problem.
>
> I don't really know. If there are other pools that can have the "used" 
> value 0 before a GC happens then yes, they are susceptible to the same 
> problem.

I think all the "old" pools can have 0 used before a GC happens. But 
except for CMS and G1 it is less likely that a GC happens unless you do 
allocations. As long as they keep the 0 used the test will pass. So, my 
guess is that to be on the safe side all "old" pools should make sure to 
do a full GC first.

>
>>
>> Also, from the problem description in the CR I would have guessed that
>> you want the GC to happen between these two statements:
>>
>> p.setUsageThreshold(1);
>> MemoryUsage u = p.getUsage();
>
> This is all but a heuristic here. The problem lies in the fact that it 
> is not possible to retrieve the pool usage and the "threshold 
> exceeded" flag consistently in one atomic operation. I might get 
> usable data from the first call and then I don't need to force GC.

Right. This is why I think you want to avoid a GC after you have fetched 
getUsage() but before you do isUsageThresholdExceeded(). With your 
suggested patch you are explicitly inserting a GC at that point. To me 
this sounds like the opposite of what you want to do.

>
>>
>> Now you have added the GC just after these statements. I thought that
>> was what caused the problem. That you read the usage data at one point,
>> then a GC happens and you compare the cached usage
>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>
>> Looking at the promoteToOldGen() method I assume that the intent is that
>> the code should be using the return value. So my guess is that this 
>> code:
>>
>>    94         if (p.getName().equals("CMS Old Gen")) {
>>    95             promoteToOldGen(p, u);
>>    96         }
>>
>> Should be:
>>
>>    94         if (p.getName().equals("CMS Old Gen")) {
>>    95             u = promoteToOldGen(p, u);
>>    96         }
>
> Indeed. It was meant to re-fetch the usage after GC.

OK. Good. With this code I think it should work. Now you make sure to 
get the GC before you do getUsage().

>
>>
>> With that, I think it might work. But I still don't understand why this
>> is only a CMS problem.
>>
>> One more question about the promoteToOldGen() and forceGC() methods. I
>> don't really know much about how the different beans work, but are we
>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same
>> pool names? That is, are you sure that forceGC() actually will do 
>> anything?
>
> They use the pool names as reported by the GC infrastracture so they 
> should be the same.

Ok.

>
>>
>> As for just doing a System.gc() to force a GC I think you can rely on
>> that System.gc() does a full GC in Hotspot unless someone sets
>> -XX:+DisableExplicitGC on the command line. Considering that you are
>> relying on Hotspot specifc names for pools I don't think it is a
>> limitation to the test to rely on the Hotspot implementatoin of
>> System.gc().
>
> Good to know. I guess I could simplify the change and just call 
> System.gc(), after all.

Yes, I think that' simpler.

Thanks,
Bengt

>
> Thanks,
>
> -JB-
>
>>
>> Thanks,
>> Bengt
>>
>>
>>
>>
>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>> <jaroslav.bachorik at oracle.com> wrote:
>>>
>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>> I think you can simplify the logic for forcing a GC to just a simple
>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>> collection to happen for all collectors.
>>>> Hm, will it now? I had the impression that it was just hinting the GC
>>>> system to perform GC but it might decide to ignore it. I need to be
>>>> sure that the GC was performed before continuing - otherwise I might
>>>> get inconsistent data again.
>>> According to the spec it's just a hint, but I think the implementation
>>> happens to be a force. But better safe than sorry. :)
>>>
>>> /Staffan
>>>
>>>> -JB-
>>>>
>>>>> /Staffan
>>>>>
>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>
>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>> Hi Jaroslav,
>>>>>>>
>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>> Please, review the following test fix:
>>>>>>>>
>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>
>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>>> which
>>>>>>> memory pool it is?  This will avoid special casing for CMS old 
>>>>>>> gen in
>>>>>>> the test and will handle similar issue in the future for a 
>>>>>>> different
>>>>>>> collector implementation.  To make the test reliable, the test 
>>>>>>> should
>>>>>>> still pass if the memory pool has no object in it (G1 survivor 
>>>>>>> space
>>>>>>> case?).
>>>>>> Hi Mandy,
>>>>>>
>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>> enable it for all pools - it should not hurt.
>>>>>>
>>>>>> The test should pass even with on object in the monitored pool
>>>>>> since the pool should not report an exceeded threshold.
>>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>> Mandy
>>>>>>>
>>>>>>>> The test tries to make sure that the "pool usage threshold" 
>>>>>>>> trigger
>>>>>>>> and the reported pool memory usage are not contradicting each 
>>>>>>>> other.
>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>> regard
>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined 
>>>>>>>> and the
>>>>>>>> usage is retrieved before a GC promotes some objects to the old 
>>>>>>>> gen
>>>>>>>> but the usage threshold is checked after the GC has promoted some
>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>
>>>>>>>> The patch makes sure that there are some instances promoted in 
>>>>>>>> "CMS
>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>> semi-consistent view.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>> -JB-
>>
>


From jaroslav.bachorik at oracle.com  Wed Oct 23 07:32:28 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 16:32:28 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267CC00.7080509@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com>
Message-ID: <5267DDFC.4060607@oracle.com>

On 23.10.2013 15:15, Bengt Rutisson wrote:
>
> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>> Hi Bengt,
>>
>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>
>>> Hi Jaroslav,
>>>
>>> A couple of questions.
>>>
>>> I don't understand why this is a CMS only problem? Why don't the other
>>> collectors have the same issue? I guess it is less likely that the other
>>> collectors start (or complete) a GC without a lot of allocation going
>>> on. But at least G1 should have the same problem.
>>
>> I don't really know. If there are other pools that can have the "used"
>> value 0 before a GC happens then yes, they are susceptible to the same
>> problem.
>
> I think all the "old" pools can have 0 used before a GC happens. But
> except for CMS and G1 it is less likely that a GC happens unless you do
> allocations. As long as they keep the 0 used the test will pass. So, my
> guess is that to be on the safe side all "old" pools should make sure to
> do a full GC first.
>
>>
>>>
>>> Also, from the problem description in the CR I would have guessed that
>>> you want the GC to happen between these two statements:
>>>
>>> p.setUsageThreshold(1);
>>> MemoryUsage u = p.getUsage();
>>
>> This is all but a heuristic here. The problem lies in the fact that it
>> is not possible to retrieve the pool usage and the "threshold
>> exceeded" flag consistently in one atomic operation. I might get
>> usable data from the first call and then I don't need to force GC.
>
> Right. This is why I think you want to avoid a GC after you have fetched
> getUsage() but before you do isUsageThresholdExceeded(). With your
> suggested patch you are explicitly inserting a GC at that point. To me
> this sounds like the opposite of what you want to do.

I've updated the patch. The GC is called even before the first attempt 
to get the pool memory usage and System.gc() is used to perform GC (no 
MXBean checks). This should simplify the change a bit.

http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02

-JB-

>
>>
>>>
>>> Now you have added the GC just after these statements. I thought that
>>> was what caused the problem. That you read the usage data at one point,
>>> then a GC happens and you compare the cached usage
>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>
>>> Looking at the promoteToOldGen() method I assume that the intent is that
>>> the code should be using the return value. So my guess is that this
>>> code:
>>>
>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>    95             promoteToOldGen(p, u);
>>>    96         }
>>>
>>> Should be:
>>>
>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>    95             u = promoteToOldGen(p, u);
>>>    96         }
>>
>> Indeed. It was meant to re-fetch the usage after GC.
>
> OK. Good. With this code I think it should work. Now you make sure to
> get the GC before you do getUsage().
>
>>
>>>
>>> With that, I think it might work. But I still don't understand why this
>>> is only a CMS problem.
>>>
>>> One more question about the promoteToOldGen() and forceGC() methods. I
>>> don't really know much about how the different beans work, but are we
>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same
>>> pool names? That is, are you sure that forceGC() actually will do
>>> anything?
>>
>> They use the pool names as reported by the GC infrastracture so they
>> should be the same.
>
> Ok.
>
>>
>>>
>>> As for just doing a System.gc() to force a GC I think you can rely on
>>> that System.gc() does a full GC in Hotspot unless someone sets
>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>> relying on Hotspot specifc names for pools I don't think it is a
>>> limitation to the test to rely on the Hotspot implementatoin of
>>> System.gc().
>>
>> Good to know. I guess I could simplify the change and just call
>> System.gc(), after all.
>
> Yes, I think that' simpler.
>
> Thanks,
> Bengt
>
>>
>> Thanks,
>>
>> -JB-
>>
>>>
>>> Thanks,
>>> Bengt
>>>
>>>
>>>
>>>
>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>
>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>> I think you can simplify the logic for forcing a GC to just a simple
>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>> collection to happen for all collectors.
>>>>> Hm, will it now? I had the impression that it was just hinting the GC
>>>>> system to perform GC but it might decide to ignore it. I need to be
>>>>> sure that the GC was performed before continuing - otherwise I might
>>>>> get inconsistent data again.
>>>> According to the spec it's just a hint, but I think the implementation
>>>> happens to be a force. But better safe than sorry. :)
>>>>
>>>> /Staffan
>>>>
>>>>> -JB-
>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>> Hi Jaroslav,
>>>>>>>>
>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>> Please, review the following test fix:
>>>>>>>>>
>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>
>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>>>> which
>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>> gen in
>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>> different
>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>> should
>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>> space
>>>>>>>> case?).
>>>>>>> Hi Mandy,
>>>>>>>
>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>
>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>> Mandy
>>>>>>>>
>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>> trigger
>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>> other.
>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>> regard
>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>> and the
>>>>>>>>> usage is retrieved before a GC promotes some objects to the old
>>>>>>>>> gen
>>>>>>>>> but the usage threshold is checked after the GC has promoted some
>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>
>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>> "CMS
>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>> semi-consistent view.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> -JB-
>>>
>>
>


From bengt.rutisson at oracle.com  Wed Oct 23 07:43:30 2013
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Wed, 23 Oct 2013 16:43:30 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267DDFC.4060607@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
Message-ID: <5267E092.6090006@oracle.com>


Hi Jaroslav,

On 2013-10-23 16:32, Jaroslav Bachorik wrote:
> On 23.10.2013 15:15, Bengt Rutisson wrote:
>>
>> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>>> Hi Bengt,
>>>
>>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>>
>>>> Hi Jaroslav,
>>>>
>>>> A couple of questions.
>>>>
>>>> I don't understand why this is a CMS only problem? Why don't the other
>>>> collectors have the same issue? I guess it is less likely that the 
>>>> other
>>>> collectors start (or complete) a GC without a lot of allocation going
>>>> on. But at least G1 should have the same problem.
>>>
>>> I don't really know. If there are other pools that can have the "used"
>>> value 0 before a GC happens then yes, they are susceptible to the same
>>> problem.
>>
>> I think all the "old" pools can have 0 used before a GC happens. But
>> except for CMS and G1 it is less likely that a GC happens unless you do
>> allocations. As long as they keep the 0 used the test will pass. So, my
>> guess is that to be on the safe side all "old" pools should make sure to
>> do a full GC first.
>>
>>>
>>>>
>>>> Also, from the problem description in the CR I would have guessed that
>>>> you want the GC to happen between these two statements:
>>>>
>>>> p.setUsageThreshold(1);
>>>> MemoryUsage u = p.getUsage();
>>>
>>> This is all but a heuristic here. The problem lies in the fact that it
>>> is not possible to retrieve the pool usage and the "threshold
>>> exceeded" flag consistently in one atomic operation. I might get
>>> usable data from the first call and then I don't need to force GC.
>>
>> Right. This is why I think you want to avoid a GC after you have fetched
>> getUsage() but before you do isUsageThresholdExceeded(). With your
>> suggested patch you are explicitly inserting a GC at that point. To me
>> this sounds like the opposite of what you want to do.
>
> I've updated the patch. The GC is called even before the first attempt 
> to get the pool memory usage and System.gc() is used to perform GC (no 
> MXBean checks). This should simplify the change a bit.
>
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02

Thanks for doing this update so quickly!

Have you been able to verify that this change still fixes the issue? I 
think it should, but it would be good if we could verify it.

This code worries me a little bit:

  114     private static MemoryUsage getUsage(MemoryPoolMXBean p) {
  115         MemoryUsage u = null;
  116         do  {
  117             System.gc();
  118             u = p.getUsage();
  119         } while (u.getUsed() == 0);
  120         return u;
  121     }

I think one call to System.gc() should be enough. And if it is not, if 
we still get 0 as used, I think that another System.gc() call will just 
render the same result. Thus, I'm a bit worried that this will be an 
endless loop.

Since the test actually handles the case where used is 0, I think it is 
enough to just do a single call to System.gc() and then get the usage data.

Thanks,
Bengt


>
> -JB-
>
>>
>>>
>>>>
>>>> Now you have added the GC just after these statements. I thought that
>>>> was what caused the problem. That you read the usage data at one 
>>>> point,
>>>> then a GC happens and you compare the cached usage
>>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>>
>>>> Looking at the promoteToOldGen() method I assume that the intent is 
>>>> that
>>>> the code should be using the return value. So my guess is that this
>>>> code:
>>>>
>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>    95             promoteToOldGen(p, u);
>>>>    96         }
>>>>
>>>> Should be:
>>>>
>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>    95             u = promoteToOldGen(p, u);
>>>>    96         }
>>>
>>> Indeed. It was meant to re-fetch the usage after GC.
>>
>> OK. Good. With this code I think it should work. Now you make sure to
>> get the GC before you do getUsage().
>>
>>>
>>>>
>>>> With that, I think it might work. But I still don't understand why 
>>>> this
>>>> is only a CMS problem.
>>>>
>>>> One more question about the promoteToOldGen() and forceGC() methods. I
>>>> don't really know much about how the different beans work, but are we
>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the 
>>>> same
>>>> pool names? That is, are you sure that forceGC() actually will do
>>>> anything?
>>>
>>> They use the pool names as reported by the GC infrastracture so they
>>> should be the same.
>>
>> Ok.
>>
>>>
>>>>
>>>> As for just doing a System.gc() to force a GC I think you can rely on
>>>> that System.gc() does a full GC in Hotspot unless someone sets
>>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>>> relying on Hotspot specifc names for pools I don't think it is a
>>>> limitation to the test to rely on the Hotspot implementatoin of
>>>> System.gc().
>>>
>>> Good to know. I guess I could simplify the change and just call
>>> System.gc(), after all.
>>
>> Yes, I think that' simpler.
>>
>> Thanks,
>> Bengt
>>
>>>
>>> Thanks,
>>>
>>> -JB-
>>>
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>
>>>>
>>>>
>>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>
>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>>> I think you can simplify the logic for forcing a GC to just a 
>>>>>>> simple
>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>>> collection to happen for all collectors.
>>>>>> Hm, will it now? I had the impression that it was just hinting 
>>>>>> the GC
>>>>>> system to perform GC but it might decide to ignore it. I need to be
>>>>>> sure that the GC was performed before continuing - otherwise I might
>>>>>> get inconsistent data again.
>>>>> According to the spec it's just a hint, but I think the 
>>>>> implementation
>>>>> happens to be a force. But better safe than sorry. :)
>>>>>
>>>>> /Staffan
>>>>>
>>>>>> -JB-
>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>>> Hi Jaroslav,
>>>>>>>>>
>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>>> Please, review the following test fix:
>>>>>>>>>>
>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>>
>>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>>>>> which
>>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>>> gen in
>>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>>> different
>>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>>> should
>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>>> space
>>>>>>>>> case?).
>>>>>>>> Hi Mandy,
>>>>>>>>
>>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>>
>>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>> Mandy
>>>>>>>>>
>>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>>> trigger
>>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>>> other.
>>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>>> regard
>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>>> and the
>>>>>>>>>> usage is retrieved before a GC promotes some objects to the old
>>>>>>>>>> gen
>>>>>>>>>> but the usage threshold is checked after the GC has promoted 
>>>>>>>>>> some
>>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>>
>>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>>> "CMS
>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>>> semi-consistent view.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>
>>>
>>
>


From jaroslav.bachorik at oracle.com  Wed Oct 23 08:07:13 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 23 Oct 2013 17:07:13 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267E092.6090006@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<5267E092.6090006@oracle.com>
Message-ID: <5267E621.2040601@oracle.com>

On 23.10.2013 16:43, Bengt Rutisson wrote:
>
> Hi Jaroslav,
>
> On 2013-10-23 16:32, Jaroslav Bachorik wrote:
>> On 23.10.2013 15:15, Bengt Rutisson wrote:
>>>
>>> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>>>> Hi Bengt,
>>>>
>>>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>>>
>>>>> Hi Jaroslav,
>>>>>
>>>>> A couple of questions.
>>>>>
>>>>> I don't understand why this is a CMS only problem? Why don't the other
>>>>> collectors have the same issue? I guess it is less likely that the
>>>>> other
>>>>> collectors start (or complete) a GC without a lot of allocation going
>>>>> on. But at least G1 should have the same problem.
>>>>
>>>> I don't really know. If there are other pools that can have the "used"
>>>> value 0 before a GC happens then yes, they are susceptible to the same
>>>> problem.
>>>
>>> I think all the "old" pools can have 0 used before a GC happens. But
>>> except for CMS and G1 it is less likely that a GC happens unless you do
>>> allocations. As long as they keep the 0 used the test will pass. So, my
>>> guess is that to be on the safe side all "old" pools should make sure to
>>> do a full GC first.
>>>
>>>>
>>>>>
>>>>> Also, from the problem description in the CR I would have guessed that
>>>>> you want the GC to happen between these two statements:
>>>>>
>>>>> p.setUsageThreshold(1);
>>>>> MemoryUsage u = p.getUsage();
>>>>
>>>> This is all but a heuristic here. The problem lies in the fact that it
>>>> is not possible to retrieve the pool usage and the "threshold
>>>> exceeded" flag consistently in one atomic operation. I might get
>>>> usable data from the first call and then I don't need to force GC.
>>>
>>> Right. This is why I think you want to avoid a GC after you have fetched
>>> getUsage() but before you do isUsageThresholdExceeded(). With your
>>> suggested patch you are explicitly inserting a GC at that point. To me
>>> this sounds like the opposite of what you want to do.
>>
>> I've updated the patch. The GC is called even before the first attempt
>> to get the pool memory usage and System.gc() is used to perform GC (no
>> MXBean checks). This should simplify the change a bit.
>>
>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>
> Thanks for doing this update so quickly!
>
> Have you been able to verify that this change still fixes the issue? I
> think it should, but it would be good if we could verify it.

Yep, it still fixes the problem. Unfortunatelly, the only way to 
reproduce the problem locally is to run the test under debugger and 
invoke GC explicitly between getting the pool memory usage and threshold 
flag.

>
> This code worries me a little bit:
>
>   114     private static MemoryUsage getUsage(MemoryPoolMXBean p) {
>   115         MemoryUsage u = null;
>   116         do  {
>   117             System.gc();
>   118             u = p.getUsage();
>   119         } while (u.getUsed() == 0);
>   120         return u;
>   121     }
>
> I think one call to System.gc() should be enough. And if it is not, if
> we still get 0 as used, I think that another System.gc() call will just
> render the same result. Thus, I'm a bit worried that this will be an
> endless loop.

Sounds reasonable. My motivation was to try to make sure some objects 
are promoted to old gen but it seems redundant and in case of non-oldgen 
pools might not even work :(

>
> Since the test actually handles the case where used is 0, I think it is
> enough to just do a single call to System.gc() and then get the usage data.

Hm, this makes the patch even simpler ...
http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03

-JB-


>
> Thanks,
> Bengt
>
>
>>
>> -JB-
>>
>>>
>>>>
>>>>>
>>>>> Now you have added the GC just after these statements. I thought that
>>>>> was what caused the problem. That you read the usage data at one
>>>>> point,
>>>>> then a GC happens and you compare the cached usage
>>>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>>>
>>>>> Looking at the promoteToOldGen() method I assume that the intent is
>>>>> that
>>>>> the code should be using the return value. So my guess is that this
>>>>> code:
>>>>>
>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>    95             promoteToOldGen(p, u);
>>>>>    96         }
>>>>>
>>>>> Should be:
>>>>>
>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>    95             u = promoteToOldGen(p, u);
>>>>>    96         }
>>>>
>>>> Indeed. It was meant to re-fetch the usage after GC.
>>>
>>> OK. Good. With this code I think it should work. Now you make sure to
>>> get the GC before you do getUsage().
>>>
>>>>
>>>>>
>>>>> With that, I think it might work. But I still don't understand why
>>>>> this
>>>>> is only a CMS problem.
>>>>>
>>>>> One more question about the promoteToOldGen() and forceGC() methods. I
>>>>> don't really know much about how the different beans work, but are we
>>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the
>>>>> same
>>>>> pool names? That is, are you sure that forceGC() actually will do
>>>>> anything?
>>>>
>>>> They use the pool names as reported by the GC infrastracture so they
>>>> should be the same.
>>>
>>> Ok.
>>>
>>>>
>>>>>
>>>>> As for just doing a System.gc() to force a GC I think you can rely on
>>>>> that System.gc() does a full GC in Hotspot unless someone sets
>>>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>>>> relying on Hotspot specifc names for pools I don't think it is a
>>>>> limitation to the test to rely on the Hotspot implementatoin of
>>>>> System.gc().
>>>>
>>>> Good to know. I guess I could simplify the change and just call
>>>> System.gc(), after all.
>>>
>>> Yes, I think that' simpler.
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> Thanks,
>>>>> Bengt
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>>>> I think you can simplify the logic for forcing a GC to just a
>>>>>>>> simple
>>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>>>> collection to happen for all collectors.
>>>>>>> Hm, will it now? I had the impression that it was just hinting
>>>>>>> the GC
>>>>>>> system to perform GC but it might decide to ignore it. I need to be
>>>>>>> sure that the GC was performed before continuing - otherwise I might
>>>>>>> get inconsistent data again.
>>>>>> According to the spec it's just a hint, but I think the
>>>>>> implementation
>>>>>> happens to be a force. But better safe than sorry. :)
>>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>> /Staffan
>>>>>>>>
>>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>
>>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>
>>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Please, review the following test fix:
>>>>>>>>>>>
>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>>>
>>>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>>>>>> which
>>>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>>>> gen in
>>>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>>>> different
>>>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>>>> should
>>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>>>> space
>>>>>>>>>> case?).
>>>>>>>>> Hi Mandy,
>>>>>>>>>
>>>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>>>
>>>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>> Mandy
>>>>>>>>>>
>>>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>>>> trigger
>>>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>>>> other.
>>>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>>>> regard
>>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>>>> and the
>>>>>>>>>>> usage is retrieved before a GC promotes some objects to the old
>>>>>>>>>>> gen
>>>>>>>>>>> but the usage threshold is checked after the GC has promoted
>>>>>>>>>>> some
>>>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>>>
>>>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>>>> "CMS
>>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>>>> semi-consistent view.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>
>>>>
>>>
>>
>


From bengt.rutisson at oracle.com  Wed Oct 23 08:31:18 2013
From: bengt.rutisson at oracle.com (Bengt Rutisson)
Date: Wed, 23 Oct 2013 17:31:18 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267E621.2040601@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<5267E092.6090006@oracle.com> <5267E621.2040601@oracle.com>
Message-ID: <5267EBC6.8010609@oracle.com>


Hi again Jaroslav,

On 2013-10-23 17:07, Jaroslav Bachorik wrote:
> On 23.10.2013 16:43, Bengt Rutisson wrote:
>>
>> Hi Jaroslav,
>>
>> On 2013-10-23 16:32, Jaroslav Bachorik wrote:
>>> On 23.10.2013 15:15, Bengt Rutisson wrote:
>>>>
>>>> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>>>>> Hi Bengt,
>>>>>
>>>>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>>>>
>>>>>> Hi Jaroslav,
>>>>>>
>>>>>> A couple of questions.
>>>>>>
>>>>>> I don't understand why this is a CMS only problem? Why don't the 
>>>>>> other
>>>>>> collectors have the same issue? I guess it is less likely that the
>>>>>> other
>>>>>> collectors start (or complete) a GC without a lot of allocation 
>>>>>> going
>>>>>> on. But at least G1 should have the same problem.
>>>>>
>>>>> I don't really know. If there are other pools that can have the 
>>>>> "used"
>>>>> value 0 before a GC happens then yes, they are susceptible to the 
>>>>> same
>>>>> problem.
>>>>
>>>> I think all the "old" pools can have 0 used before a GC happens. But
>>>> except for CMS and G1 it is less likely that a GC happens unless 
>>>> you do
>>>> allocations. As long as they keep the 0 used the test will pass. 
>>>> So, my
>>>> guess is that to be on the safe side all "old" pools should make 
>>>> sure to
>>>> do a full GC first.
>>>>
>>>>>
>>>>>>
>>>>>> Also, from the problem description in the CR I would have guessed 
>>>>>> that
>>>>>> you want the GC to happen between these two statements:
>>>>>>
>>>>>> p.setUsageThreshold(1);
>>>>>> MemoryUsage u = p.getUsage();
>>>>>
>>>>> This is all but a heuristic here. The problem lies in the fact 
>>>>> that it
>>>>> is not possible to retrieve the pool usage and the "threshold
>>>>> exceeded" flag consistently in one atomic operation. I might get
>>>>> usable data from the first call and then I don't need to force GC.
>>>>
>>>> Right. This is why I think you want to avoid a GC after you have 
>>>> fetched
>>>> getUsage() but before you do isUsageThresholdExceeded(). With your
>>>> suggested patch you are explicitly inserting a GC at that point. To me
>>>> this sounds like the opposite of what you want to do.
>>>
>>> I've updated the patch. The GC is called even before the first attempt
>>> to get the pool memory usage and System.gc() is used to perform GC (no
>>> MXBean checks). This should simplify the change a bit.
>>>
>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>>
>> Thanks for doing this update so quickly!
>>
>> Have you been able to verify that this change still fixes the issue? I
>> think it should, but it would be good if we could verify it.
>
> Yep, it still fixes the problem. Unfortunatelly, the only way to 
> reproduce the problem locally is to run the test under debugger and 
> invoke GC explicitly between getting the pool memory usage and 
> threshold flag.
>
>>
>> This code worries me a little bit:
>>
>>   114     private static MemoryUsage getUsage(MemoryPoolMXBean p) {
>>   115         MemoryUsage u = null;
>>   116         do  {
>>   117             System.gc();
>>   118             u = p.getUsage();
>>   119         } while (u.getUsed() == 0);
>>   120         return u;
>>   121     }
>>
>> I think one call to System.gc() should be enough. And if it is not, if
>> we still get 0 as used, I think that another System.gc() call will just
>> render the same result. Thus, I'm a bit worried that this will be an
>> endless loop.
>
> Sounds reasonable. My motivation was to try to make sure some objects 
> are promoted to old gen but it seems redundant and in case of 
> non-oldgen pools might not even work :(
>
>>
>> Since the test actually handles the case where used is 0, I think it is
>> enough to just do a single call to System.gc() and then get the usage 
>> data.
>
> Hm, this makes the patch even simpler ...
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03

Yes, I think this looks simple and good. :-)

Thanks,
Bengt


>
> -JB-
>
>
>>
>> Thanks,
>> Bengt
>>
>>
>>>
>>> -JB-
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Now you have added the GC just after these statements. I thought 
>>>>>> that
>>>>>> was what caused the problem. That you read the usage data at one
>>>>>> point,
>>>>>> then a GC happens and you compare the cached usage
>>>>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>>>>
>>>>>> Looking at the promoteToOldGen() method I assume that the intent is
>>>>>> that
>>>>>> the code should be using the return value. So my guess is that this
>>>>>> code:
>>>>>>
>>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>>    95             promoteToOldGen(p, u);
>>>>>>    96         }
>>>>>>
>>>>>> Should be:
>>>>>>
>>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>>    95             u = promoteToOldGen(p, u);
>>>>>>    96         }
>>>>>
>>>>> Indeed. It was meant to re-fetch the usage after GC.
>>>>
>>>> OK. Good. With this code I think it should work. Now you make sure to
>>>> get the GC before you do getUsage().
>>>>
>>>>>
>>>>>>
>>>>>> With that, I think it might work. But I still don't understand why
>>>>>> this
>>>>>> is only a CMS problem.
>>>>>>
>>>>>> One more question about the promoteToOldGen() and forceGC() 
>>>>>> methods. I
>>>>>> don't really know much about how the different beans work, but 
>>>>>> are we
>>>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the
>>>>>> same
>>>>>> pool names? That is, are you sure that forceGC() actually will do
>>>>>> anything?
>>>>>
>>>>> They use the pool names as reported by the GC infrastracture so they
>>>>> should be the same.
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>>>
>>>>>> As for just doing a System.gc() to force a GC I think you can 
>>>>>> rely on
>>>>>> that System.gc() does a full GC in Hotspot unless someone sets
>>>>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>>>>> relying on Hotspot specifc names for pools I don't think it is a
>>>>>> limitation to the test to rely on the Hotspot implementatoin of
>>>>>> System.gc().
>>>>>
>>>>> Good to know. I guess I could simplify the change and just call
>>>>> System.gc(), after all.
>>>>
>>>> Yes, I think that' simpler.
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Bengt
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>>>>> I think you can simplify the logic for forcing a GC to just a
>>>>>>>>> simple
>>>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>>>>> collection to happen for all collectors.
>>>>>>>> Hm, will it now? I had the impression that it was just hinting
>>>>>>>> the GC
>>>>>>>> system to perform GC but it might decide to ignore it. I need 
>>>>>>>> to be
>>>>>>>> sure that the GC was performed before continuing - otherwise I 
>>>>>>>> might
>>>>>>>> get inconsistent data again.
>>>>>>> According to the spec it's just a hint, but I think the
>>>>>>> implementation
>>>>>>> happens to be a force. But better safe than sorry. :)
>>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>
>>>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> Please, review the following test fix:
>>>>>>>>>>>>
>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>>>>
>>>>>>>>>>> Have you considered to force GC when getUsed() == 0 
>>>>>>>>>>> regardless of
>>>>>>>>>>> which
>>>>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>>>>> gen in
>>>>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>>>>> different
>>>>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>>>>> should
>>>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>>>>> space
>>>>>>>>>>> case?).
>>>>>>>>>> Hi Mandy,
>>>>>>>>>>
>>>>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>>>>
>>>>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>> Mandy
>>>>>>>>>>>
>>>>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>>>>> trigger
>>>>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>>>>> other.
>>>>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>>>>> regard
>>>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>>>>> and the
>>>>>>>>>>>> usage is retrieved before a GC promotes some objects to the 
>>>>>>>>>>>> old
>>>>>>>>>>>> gen
>>>>>>>>>>>> but the usage threshold is checked after the GC has promoted
>>>>>>>>>>>> some
>>>>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>>>>> "CMS
>>>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>>>>> semi-consistent view.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>
>>>>>
>>>>
>>>
>>
>


From mandy.chung at oracle.com  Wed Oct 23 16:02:09 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 23 Oct 2013 16:02:09 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5267DDFC.4060607@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
Message-ID: <52685571.1090407@oracle.com>


On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote:
> I've updated the patch. The GC is called even before the first attempt 
> to get the pool memory usage and System.gc() is used to perform GC (no 
> MXBean checks). This should simplify the change a bit.
>
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02

This change is okay.  It will force GC once per each memory pool that 
supports usage threshold (I think 3 memory pools) which is not a huge 
issue.  Perhaps a more reliable option is to make it an othervm test and 
allocating large object and calling GC once before the verification.

Mandy

From jaroslav.bachorik at oracle.com  Thu Oct 24 07:01:43 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 24 Oct 2013 16:01:43 +0200
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52685571.1090407@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com>
Message-ID: <52692847.1030806@oracle.com>

Hi Mandy,

On 24.10.2013 01:02, Mandy Chung wrote:
>
> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote:
>> I've updated the patch. The GC is called even before the first attempt
>> to get the pool memory usage and System.gc() is used to perform GC (no
>> MXBean checks). This should simplify the change a bit.
>>
>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>
> This change is okay.  It will force GC once per each memory pool that
> supports usage threshold (I think 3 memory pools) which is not a huge
> issue.  Perhaps a more reliable option is to make it an othervm test and
> allocating large object and calling GC once before the verification.

Running it as othervm might improve repeatbility but I don't quite 
follow the trick with large object. That would be effective for the 
oldgen pools only, I suppose? There were concerns raised during the 
review that other pools might be susceptible to the same timing related 
problems (theoretically). So, if you don't feel strongy about it I would 
leave the rest of the test as it is - that is calling System.gc() before 
checking the pool thresholds.

-JB-

>
> Mandy


From jaroslav.bachorik at oracle.com  Thu Oct 24 07:10:12 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 24 Oct 2013 16:10:12 +0200
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <52562FF5.5060304@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com>
	<52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com>
	<52562FF5.5060304@oracle.com>
Message-ID: <52692A44.9050004@oracle.com>

Hi David,

On 10.10.2013 06:41, David Holmes wrote:
> On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote:
>> On 9.10.2013 12:23, David Holmes wrote:
>>> Jaroslav,
>>>
>>> Thanks for the details description of changes - much appreciated.
>>>
>>> There is a lot to digest in there. :)
>>
>> Yep, it started as a simple fix :/
>>
>>>
>>> It isn't obvious to me why these tests require a full JDK?
>>
>> IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its
>> requirements are the same as the ones of CustomLauncherTest.sh (now
>> *.java) so it seemed logical to list it there too.
>
> Ah! Now I see it - it uses tools.jar which implies a full JDK.
>
>>>
>>> I don't quite follow the libjvm lookup logic - I would expect that you
>>> would always want to test the libjvm that is currently running - though
>>> it is hard to determine that.
>>
>> I'm afraid I can't be of much assistance here - I just took what was in
>> the *.sh version and converted it to *.java.
>
> Okay. I expect this will need revisiting at some point.

So, does this mean "ok, go"?

Thanks,

-JB-

>
> Thanks,
> David
> -----
>
>
>> -JB-
>>
>>>
>>> Thanks,
>>> David
>>>
>>> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote:
>>>> On 8.10.2013 05:42, David Holmes wrote:
>>>>> Jaroslav,
>>>>>
>>>>> Can you summarise the changes please? With the conversion to Java and
>>>>> the infrastructure additions I can't tell what is actually fixing the
>>>>> original timeout issue :)
>>>>
>>>> The timeout was most caused by using the same file for communication
>>>> between java processes in more test cases. When those test cases were
>>>> run in parallel the file got rewritten silently and some of the tests
>>>> could end up trying to connect to incorrect port in the target
>>>> application. I was able to reproduce the timeout by interleaving the
>>>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and
>>>> adding an artificial delay to CusteomLauncherTest.sh to allow
>>>> LocalManagementTest.sh to change the port in the file.
>>>>
>>>> While it could be fixed by using a different file for each test case I
>>>> took the liberty of converting the shell tests to java tests. This
>>>> allows me to remove the communication file and, in the end, make the
>>>> tests more robust.
>>>>
>>>> CustomLauncherTest.java and LocalManagementTest.java are the tests
>>>> converted from shell to java. I decided to convert
>>>> LocalManagementTest.sh as well because it has the same problems as the
>>>> CustomLauncherTest.sh.
>>>>
>>>> The changes in the testlibrary are about introducing new methods
>>>> allowing the tests easily start a process and wait for a certain text
>>>> appearing in its stdout/stderr. Using these methods the caller can wait
>>>> till the callee is fully initialized and eg. ready to accept
>>>> connections.
>>>>
>>>> The changes in launchers make the launchers actually executable + I am
>>>> adding a linux-amd64 launcher (I needed that one to work on the changes
>>>> locally and thought it might be nice to have one more platform covered
>>>> by the test).
>>>>
>>>> I've update the webrev to include changes to LocalManagementTest and
>>>> TEST.groups (both of those tests require JDK) -
>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>>>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>>>>> The updated webrev:
>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>>>>
>>>>>>> I've moved some of the functionality to the testlibrary.
>>>>>>>
>>>>>>> -JB -
>>>>>>>
>>>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>>>>> Jaroslav,
>>>>>>>>>
>>>>>>>>> CustomLauncherTest.java:
>>>>>>>>>
>>>>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM
>>>>>>>>> remains
>>>>>>>>> unset.
>>>>>>>> Good idea. Thanks.
>>>>>>>>
>>>>>>>>> 129: I would prefer don't have pattern like this one ever in shell
>>>>>>>>> script. Could you prepare a list of VM's to check and just loop
>>>>>>>>> over
>>>>>>>>> it?
>>>>>>>>> It makes test better readable. Also I think nowdays we can always
>>>>>>>>> use
>>>>>>>>> server VM.
>>>>>>>> I tried to mirror the original shell test as closely as
>>>>>>>> possible. It
>>>>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>>>>> more
>>>>>>>> readable.
>>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>> -Dmitry
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>>>
>>>>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As written this test can only run on a full JDK - so please
>>>>>>>>>>>>> add
>>>>>>>>>>>>> it to
>>>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really
>>>>>>>>>>>>> needs to
>>>>>>>>>>>>> come
>>>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>>>>> need
>>>>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>>>>> inherent in
>>>>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had
>>>>>>>>>>>>> the
>>>>>>>>>>>>> hotspot
>>>>>>>>>>>>> test library's process tools available. :(
>>>>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>>>>
>>>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>> Hm, thanks for the info. I should have used this library
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> Please, stand by for the updated webrev.
>>>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the
>>>>>>>>>> target
>>>>>>>>>> application can recognize its own PID and print it to its stdout.
>>>>>>>>>> The
>>>>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>>>>> need
>>>>>>>>>> for JCMD any more.
>>>>>>>>>>
>>>>>>>>>> I could not find a way to remove the dependency on "test.jdk"
>>>>>>>>>> system
>>>>>>>>>> property. According to the jtreg web documentation
>>>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>>>>> "test.java"
>>>>>>>>>> system property should be available but in fact is not. But it
>>>>>>>>>> seems
>>>>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>>>>
>>>>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>>>>> binary :)
>>>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>>>>
>>>>>>>>>> While working with the test library I realized I was missing a
>>>>>>>>>> crucial
>>>>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>>>>> message to
>>>>>>>>>> appear in the stdout/stderr of the launched process. Very often I
>>>>>>>>>> need
>>>>>>>>>> to wait for the target process to get to certain point before the
>>>>>>>>>> test
>>>>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>>>>> message in
>>>>>>>>>> stdout/stderr. Currently all the proc tools are designed to
>>>>>>>>>> work in
>>>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and
>>>>>>>>>> analyzed after the target process died - and are not suitable for
>>>>>>>>>> this
>>>>>>>>>> kind of usage.
>>>>>>>>>>
>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>
>>>>>>>>>>>>> David
>>>>>>>>>>>>> -----
>>>>>>>>>>>>>
>>>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>>>>> original
>>>>>>>>>>>>>> functionality and outputs should be 100% preserved. The
>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>>>>> similarity
>>>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> new
>>>>>>>>>>>>>> source in whole.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>>>>> permissions to
>>>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>>>>> Webrev :
>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>


From mandy.chung at oracle.com  Thu Oct 24 12:33:08 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Thu, 24 Oct 2013 12:33:08 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52692847.1030806@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
Message-ID: <526975F4.8060707@oracle.com>


On 10/24/2013 7:01 AM, Jaroslav Bachorik wrote:
> Hi Mandy,
>
> On 24.10.2013 01:02, Mandy Chung wrote:
>>
>> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote:
>>> I've updated the patch. The GC is called even before the first attempt
>>> to get the pool memory usage and System.gc() is used to perform GC (no
>>> MXBean checks). This should simplify the change a bit.
>>>
>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>>
>> This change is okay.  It will force GC once per each memory pool that
>> supports usage threshold (I think 3 memory pools) which is not a huge
>> issue.  Perhaps a more reliable option is to make it an othervm test and
>> allocating large object and calling GC once before the verification.
>
> Running it as othervm might improve repeatbility but I don't quite 
> follow the trick with large object. That would be effective for the 
> oldgen pools only, I suppose? There were concerns raised during the 
> review that other pools might be susceptible to the same timing 
> related problems (theoretically).

This test was written before the samevm/agentvm support.  In general we 
want the tests to be reliable.   You want the System.gc() call to reduce 
the probability of the race such that the initially empty pool is being 
filled with objects between getUsage() and isUsageThresholdExceeded() 
methods are called but this has the assumption that there is some large 
object allocated and get promoted to the old gen (not done in this test 
though).  The other possibility is that the old gen is cleared although 
it might be rare in practice?  Holding on a large object will ensure 
that the old gen is always filled with something to make it more reliable.

> So, if you don't feel strongy about it I would leave the rest of the 
> test as it is - that is calling System.gc() before checking the pool 
> thresholds.

I just worry that this test will fail some day intermittently again.  
Since in practice the runtime has space allocated, I think running it in 
othervm would be adequate.

Mandy

From david.holmes at oracle.com  Thu Oct 24 15:54:49 2013
From: david.holmes at oracle.com (David Holmes)
Date: Fri, 25 Oct 2013 08:54:49 +1000
Subject: jmx-dev [ping][ping] Re: RFR: 8004926
 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out
In-Reply-To: <52692A44.9050004@oracle.com>
References: <52308ECC.1050304@oracle.com>
	<523138CB.9040401@oracle.com>	<52317782.1060300@oracle.com>
	<523179C8.50606@oracle.com>	<5231CCE4.7060902@oracle.com>
	<5231DA1B.4070706@oracle.com>	<5231DE69.7090309@oracle.com>
	<523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com>
	<52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com>
	<52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com>
	<52562FF5.5060304@oracle.com> <52692A44.9050004@oracle.com>
Message-ID: <5269A539.8020401@oracle.com>

Good to go.

Thanks,
David

On 25/10/2013 12:10 AM, Jaroslav Bachorik wrote:
> Hi David,
>
> On 10.10.2013 06:41, David Holmes wrote:
>> On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote:
>>> On 9.10.2013 12:23, David Holmes wrote:
>>>> Jaroslav,
>>>>
>>>> Thanks for the details description of changes - much appreciated.
>>>>
>>>> There is a lot to digest in there. :)
>>>
>>> Yep, it started as a simple fix :/
>>>
>>>>
>>>> It isn't obvious to me why these tests require a full JDK?
>>>
>>> IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its
>>> requirements are the same as the ones of CustomLauncherTest.sh (now
>>> *.java) so it seemed logical to list it there too.
>>
>> Ah! Now I see it - it uses tools.jar which implies a full JDK.
>>
>>>>
>>>> I don't quite follow the libjvm lookup logic - I would expect that you
>>>> would always want to test the libjvm that is currently running - though
>>>> it is hard to determine that.
>>>
>>> I'm afraid I can't be of much assistance here - I just took what was in
>>> the *.sh version and converted it to *.java.
>>
>> Okay. I expect this will need revisiting at some point.
>
> So, does this mean "ok, go"?
>
> Thanks,
>
> -JB-
>
>>
>> Thanks,
>> David
>> -----
>>
>>
>>> -JB-
>>>
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote:
>>>>> On 8.10.2013 05:42, David Holmes wrote:
>>>>>> Jaroslav,
>>>>>>
>>>>>> Can you summarise the changes please? With the conversion to Java and
>>>>>> the infrastructure additions I can't tell what is actually fixing the
>>>>>> original timeout issue :)
>>>>>
>>>>> The timeout was most caused by using the same file for communication
>>>>> between java processes in more test cases. When those test cases were
>>>>> run in parallel the file got rewritten silently and some of the tests
>>>>> could end up trying to connect to incorrect port in the target
>>>>> application. I was able to reproduce the timeout by interleaving the
>>>>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and
>>>>> adding an artificial delay to CusteomLauncherTest.sh to allow
>>>>> LocalManagementTest.sh to change the port in the file.
>>>>>
>>>>> While it could be fixed by using a different file for each test case I
>>>>> took the liberty of converting the shell tests to java tests. This
>>>>> allows me to remove the communication file and, in the end, make the
>>>>> tests more robust.
>>>>>
>>>>> CustomLauncherTest.java and LocalManagementTest.java are the tests
>>>>> converted from shell to java. I decided to convert
>>>>> LocalManagementTest.sh as well because it has the same problems as the
>>>>> CustomLauncherTest.sh.
>>>>>
>>>>> The changes in the testlibrary are about introducing new methods
>>>>> allowing the tests easily start a process and wait for a certain text
>>>>> appearing in its stdout/stderr. Using these methods the caller can
>>>>> wait
>>>>> till the callee is fully initialized and eg. ready to accept
>>>>> connections.
>>>>>
>>>>> The changes in launchers make the launchers actually executable + I am
>>>>> adding a linux-amd64 launcher (I needed that one to work on the
>>>>> changes
>>>>> locally and thought it might be nice to have one more platform covered
>>>>> by the test).
>>>>>
>>>>> I've update the webrev to include changes to LocalManagementTest and
>>>>> TEST.groups (both of those tests require JDK) -
>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote:
>>>>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote:
>>>>>>>> The updated webrev:
>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03
>>>>>>>>
>>>>>>>> I've moved some of the functionality to the testlibrary.
>>>>>>>>
>>>>>>>> -JB -
>>>>>>>>
>>>>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote:
>>>>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote:
>>>>>>>>>> Jaroslav,
>>>>>>>>>>
>>>>>>>>>> CustomLauncherTest.java:
>>>>>>>>>>
>>>>>>>>>> 102: this check could be moved to switch at ll. 108
>>>>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM
>>>>>>>>>> remains
>>>>>>>>>> unset.
>>>>>>>>> Good idea. Thanks.
>>>>>>>>>
>>>>>>>>>> 129: I would prefer don't have pattern like this one ever in
>>>>>>>>>> shell
>>>>>>>>>> script. Could you prepare a list of VM's to check and just loop
>>>>>>>>>> over
>>>>>>>>>> it?
>>>>>>>>>> It makes test better readable. Also I think nowdays we can always
>>>>>>>>>> use
>>>>>>>>>> server VM.
>>>>>>>>> I tried to mirror the original shell test as closely as
>>>>>>>>> possible. It
>>>>>>>>> would be nice if we could rely on the "server" vm only. Definitely
>>>>>>>>> more
>>>>>>>>> readable.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>> -Dmitry
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote:
>>>>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote:
>>>>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote:
>>>>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> You need a copyright notice in the new file.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As written this test can only run on a full JDK - so please
>>>>>>>>>>>>>> add
>>>>>>>>>>>>>> it to
>>>>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really
>>>>>>>>>>>>>> needs to
>>>>>>>>>>>>>> come
>>>>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?)
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would
>>>>>>>>>>>>>> need
>>>>>>>>>>>>>> further modification but was wondering if there is something
>>>>>>>>>>>>>> inherent in
>>>>>>>>>>>>>> the test that makes it inapplicable to OSX.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>> hotspot
>>>>>>>>>>>>>> test library's process tools available. :(
>>>>>>>>>>>>> We have some, is there an obvious gap?
>>>>>>>>>>>>>
>>>>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>> Hm, thanks for the info. I should have used this library
>>>>>>>>>>>> instead.
>>>>>>>>>>>>
>>>>>>>>>>>> Please, stand by for the updated webrev.
>>>>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the
>>>>>>>>>>> target
>>>>>>>>>>> application can recognize its own PID and print it to its
>>>>>>>>>>> stdout.
>>>>>>>>>>> The
>>>>>>>>>>> main application then just reads the stdout to parse the PID. No
>>>>>>>>>>> need
>>>>>>>>>>> for JCMD any more.
>>>>>>>>>>>
>>>>>>>>>>> I could not find a way to remove the dependency on "test.jdk"
>>>>>>>>>>> system
>>>>>>>>>>> property. According to the jtreg web documentation
>>>>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a
>>>>>>>>>>> "test.java"
>>>>>>>>>>> system property should be available but in fact is not. But it
>>>>>>>>>>> seems
>>>>>>>>>>> that the testlibrary uses "test.jdk" system property too.
>>>>>>>>>>>
>>>>>>>>>>> The test does not run on OSX because nobody built the launcher
>>>>>>>>>>> binary :)
>>>>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a
>>>>>>>>>>> linux-amd64 launcher while working on the test.
>>>>>>>>>>>
>>>>>>>>>>> While working with the test library I realized I was missing a
>>>>>>>>>>> crucial
>>>>>>>>>>> feature (at least for my purposes) - waiting for a certain
>>>>>>>>>>> message to
>>>>>>>>>>> appear in the stdout/stderr of the launched process. Very
>>>>>>>>>>> often I
>>>>>>>>>>> need
>>>>>>>>>>> to wait for the target process to get to certain point before
>>>>>>>>>>> the
>>>>>>>>>>> test
>>>>>>>>>>> can be allowed to continue - and the point is indicated by a
>>>>>>>>>>> message in
>>>>>>>>>>> stdout/stderr. Currently all the proc tools are designed to
>>>>>>>>>>> work in
>>>>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings
>>>>>>>>>>> and
>>>>>>>>>>> analyzed after the target process died - and are not suitable
>>>>>>>>>>> for
>>>>>>>>>>> this
>>>>>>>>>>> kind of usage.
>>>>>>>>>>>
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01
>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Chris.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> David
>>>>>>>>>>>>>> -----
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>>>>> Please, review the patch for an intermittently failing test.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The test is a shell test, using files for the interprocess
>>>>>>>>>>>>>>> synchronization. This leads to intermittent failures.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the
>>>>>>>>>>>>>>> original
>>>>>>>>>>>>>>> functionality and outputs should be 100% preserved. The
>>>>>>>>>>>>>>> patch is
>>>>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no
>>>>>>>>>>>>>>> similarity
>>>>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>> source in whole.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The changes in "launcher" files are all about adding
>>>>>>>>>>>>>>> permissions to
>>>>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Issue  : JDK-8004926
>>>>>>>>>>>>>>> Webrev :
>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -JB-
>>>>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>
>

From jaroslav.bachorik at oracle.com  Tue Oct 29 10:28:37 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Tue, 29 Oct 2013 18:28:37 +0100
Subject: jmx-dev RFR 8027358:
 sun/management/jmxremote/bootstrap/LocalManagementTest.java failing since
 JDK-8004926
Message-ID: <526FF045.9060803@oracle.com>

Please, review this test fix.

In agentvm mode the test can not rely on the co-location of the test 
class and the auxiliary classes the test class wants to start. It is 
necessary to explicitly provide the test class path when starting an 
external java process.

Issue : https://bugs.openjdk.java.net/browse/JDK-8027358
Webrev: http://cr.openjdk.java.net/~jbachorik/8027358/webrev.00/

Thanks,

-JB-

From Alan.Bateman at oracle.com  Tue Oct 29 13:35:07 2013
From: Alan.Bateman at oracle.com (Alan Bateman)
Date: Tue, 29 Oct 2013 20:35:07 +0000
Subject: jmx-dev RFR 8027358:
 sun/management/jmxremote/bootstrap/LocalManagementTest.java failing since
 JDK-8004926
In-Reply-To: <526FF045.9060803@oracle.com>
References: <526FF045.9060803@oracle.com>
Message-ID: <52701BFB.1030107@oracle.com>

On 29/10/2013 17:28, Jaroslav Bachorik wrote:
> Please, review this test fix.
>
> In agentvm mode the test can not rely on the co-location of the test 
> class and the auxiliary classes the test class wants to start. It is 
> necessary to explicitly provide the test class path when starting an 
> external java process.
>
> Issue : https://bugs.openjdk.java.net/browse/JDK-8027358
> Webrev: http://cr.openjdk.java.net/~jbachorik/8027358/webrev.00/
This looks okay to me.

-Alan.

From jaroslav.bachorik at oracle.com  Wed Oct 30 04:23:55 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 30 Oct 2013 12:23:55 +0100
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <526975F4.8060707@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com>
Message-ID: <5270EC4B.9080205@oracle.com>

On 24.10.2013 21:33, Mandy Chung wrote:
>
> On 10/24/2013 7:01 AM, Jaroslav Bachorik wrote:
>> Hi Mandy,
>>
>> On 24.10.2013 01:02, Mandy Chung wrote:
>>>
>>> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote:
>>>> I've updated the patch. The GC is called even before the first attempt
>>>> to get the pool memory usage and System.gc() is used to perform GC (no
>>>> MXBean checks). This should simplify the change a bit.
>>>>
>>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>>>
>>> This change is okay.  It will force GC once per each memory pool that
>>> supports usage threshold (I think 3 memory pools) which is not a huge
>>> issue.  Perhaps a more reliable option is to make it an othervm test and
>>> allocating large object and calling GC once before the verification.
>>
>> Running it as othervm might improve repeatbility but I don't quite
>> follow the trick with large object. That would be effective for the
>> oldgen pools only, I suppose? There were concerns raised during the
>> review that other pools might be susceptible to the same timing
>> related problems (theoretically).
>
> This test was written before the samevm/agentvm support.  In general we
> want the tests to be reliable.   You want the System.gc() call to reduce
> the probability of the race such that the initially empty pool is being
> filled with objects between getUsage() and isUsageThresholdExceeded()
> methods are called but this has the assumption that there is some large
> object allocated and get promoted to the old gen (not done in this test
> though).  The other possibility is that the old gen is cleared although
> it might be rare in practice?  Holding on a large object will ensure
> that the old gen is always filled with something to make it more reliable.
>
>> So, if you don't feel strongy about it I would leave the rest of the
>> test as it is - that is calling System.gc() before checking the pool
>> thresholds.
>
> I just worry that this test will fail some day intermittently again.
> Since in practice the runtime has space allocated, I think running it in
> othervm would be adequate.

Ok. I've added a big object and an initial call to System.gc(). But I'm 
leaving the calls to System.gc() right before checking the pools as well 
- just to be sure.

http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04

-JB-

>
> Mandy


From mandy.chung at oracle.com  Wed Oct 30 09:30:17 2013
From: mandy.chung at oracle.com (Mandy Chung)
Date: Wed, 30 Oct 2013 09:30:17 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <5270EC4B.9080205@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com>
Message-ID: <52713419.5040809@oracle.com>


On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote:
> Ok. I've added a big object and an initial call to System.gc(). But 
> I'm leaving the calls to System.gc() right before checking the pools 
> as well - just to be sure.
>
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04
>

The update looks okay and I think System.gc() at line 90 is no longer 
needed as the failure was due to the empty old gen.

thanks for the update.
Mandy

From jaroslav.bachorik at oracle.com  Wed Oct 30 09:58:13 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Wed, 30 Oct 2013 17:58:13 +0100
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52713419.5040809@oracle.com>
References: <526681FD.90604@oracle.com>
	<5266DA56.6050609@oracle.com>	<52678280.1070004@oracle.com>	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com>
	<52713419.5040809@oracle.com>
Message-ID: <52713AA5.3050102@oracle.com>

On 30.10.2013 17:30, Mandy Chung wrote:
>
> On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote:
>> Ok. I've added a big object and an initial call to System.gc(). But
>> I'm leaving the calls to System.gc() right before checking the pools
>> as well - just to be sure.
>>
>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04
>>
>
> The update looks okay and I think System.gc() at line 90 is no longer
> needed as the failure was due to the empty old gen.
>
> thanks for the update.

Thanks for the review. I've left the System.gc() at line 90 intact - 
when discussing this with Bengt during the review he was concerned that 
other pools might be susceptible to this kind of problem and having a 
full GC right before the check could lessen the probability of running 
into the data races described in this issue.

-JB-

> Mandy


From staffan.larsen at oracle.com  Wed Oct 30 23:32:28 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 31 Oct 2013 07:32:28 +0100
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
	isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <CA+kOe08NtG23wd2KDHf-bDYCNJKkJcenOqEb4xzQXYoffbVgsg@mail.gmail.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com>
	<52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com>
	<CA+kOe08NtG23wd2KDHf-bDYCNJKkJcenOqEb4xzQXYoffbVgsg@mail.gmail.com>
Message-ID: <9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com>

Quoting Bengt from earlier in this conversation:

"As for just doing a System.gc() to force a GC I think you can rely on that System.gc() does a full GC in Hotspot unless someone sets -XX:+DisableExplicitGC on the command line. Considering that you are relying on Hotspot specifc names for pools I don't think it is a limitation to the test to rely on the Hotspot implementatoin of System.gc()."

The spec for System.gc() doesn't promising anything, but all the collectors in Hotspot are implemented to do a full GC when System.gc() is called.

Thanks,
/Staffan

On 30 okt 2013, at 21:02, Martin Buchholz <martinrb at google.com> wrote:

> Technically, System.gc() doesn't promise anything.  I believe it may merely initiate a gc if the gc implementation is concurrent.  Check out awaitFullGc in my beloved GcFinalization
> 
> https://code.google.com/p/guava-libraries/source/browse/guava-testlib/src/com/google/common/testing/GcFinalization.java?spec=svn196edb139d49d373abbce013008da0206b83f0ca&r=ae6bc9be431d7601b1f4713679efea126673378e
> 
> 
> On Wed, Oct 30, 2013 at 9:58 AM, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
> On 30.10.2013 17:30, Mandy Chung wrote:
> 
> On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote:
> Ok. I've added a big object and an initial call to System.gc(). But
> I'm leaving the calls to System.gc() right before checking the pools
> as well - just to be sure.
> 
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04
> 
> 
> The update looks okay and I think System.gc() at line 90 is no longer
> needed as the failure was due to the empty old gen.
> 
> thanks for the update.
> 
> Thanks for the review. I've left the System.gc() at line 90 intact - when discussing this with Bengt during the review he was concerned that other pools might be susceptible to this kind of problem and having a full GC right before the check could lessen the probability of running into the data races described in this issue.
> 
> -JB-
> 
> Mandy
> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131031/13793423/attachment.html 

From jaroslav.bachorik at oracle.com  Thu Oct 31 03:27:04 2013
From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik)
Date: Thu, 31 Oct 2013 11:27:04 +0100
Subject: jmx-dev RFR 7144200:
 java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR
 enabled
In-Reply-To: <F9D0D308-A172-4747-A0BD-EC8670C8BCCF@oracle.com>
References: <5252BE3B.5020607@oracle.com>
	<F9D0D308-A172-4747-A0BD-EC8670C8BCCF@oracle.com>
Message-ID: <52723078.2010507@oracle.com>

On 7.10.2013 16:35, Staffan Larsen wrote:
> This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.)
>
> A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable.

I've implemented the check for non-decrementing class count.

I talked to SQE about not running this test with JFR but it seems that 
it is not currently possible to exclude single tests from parametrized runs.

Also, the test is marked as /othervm

http://cr.openjdk.java.net/~jbachorik/7144200/webrev.02

Cheers,

-JB-

>
> In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test.
>
> S/taffan
>
> On 7 okt 2013, at 15:59, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>
>> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR.
>>
>> The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently.
>>
>> Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
>> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/
>>
>> Cheers,
>>
>> -JB-
>


From staffan.larsen at oracle.com  Thu Oct 31 03:43:29 2013
From: staffan.larsen at oracle.com (Staffan Larsen)
Date: Thu, 31 Oct 2013 11:43:29 +0100
Subject: jmx-dev RFR 7144200:
	java/lang/management/ClassLoadingMXBean/LoadCounts.java
	failed with JFR enabled
In-Reply-To: <52723078.2010507@oracle.com>
References: <5252BE3B.5020607@oracle.com>
	<F9D0D308-A172-4747-A0BD-EC8670C8BCCF@oracle.com>
	<52723078.2010507@oracle.com>
Message-ID: <7B3294CA-2BA9-458E-82D8-6491306B8392@oracle.com>

Looks good!

Thanks,
/Staffan

On 31 okt 2013, at 11:27, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:

> On 7.10.2013 16:35, Staffan Larsen wrote:
>> This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.)
>> 
>> A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable.
> 
> I've implemented the check for non-decrementing class count.
> 
> I talked to SQE about not running this test with JFR but it seems that it is not currently possible to exclude single tests from parametrized runs.
> 
> Also, the test is marked as /othervm
> 
> http://cr.openjdk.java.net/~jbachorik/7144200/webrev.02
> 
> Cheers,
> 
> -JB-
> 
>> 
>> In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test.
>> 
>> S/taffan
>> 
>> On 7 okt 2013, at 15:59, Jaroslav Bachorik <jaroslav.bachorik at oracle.com> wrote:
>> 
>>> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR.
>>> 
>>> The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently.
>>> 
>>> Issue:  https://bugs.openjdk.java.net/browse/JDK-7144200
>>> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/
>>> 
>>> Cheers,
>>> 
>>> -JB-
>> 
> 


From martinrb at google.com  Wed Oct 30 13:02:23 2013
From: martinrb at google.com (Martin Buchholz)
Date: Wed, 30 Oct 2013 13:02:23 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <52713AA5.3050102@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com>
	<52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com>
Message-ID: <CA+kOe08NtG23wd2KDHf-bDYCNJKkJcenOqEb4xzQXYoffbVgsg@mail.gmail.com>

Technically, System.gc() doesn't promise anything.  I believe it may merely
initiate a gc if the gc implementation is concurrent.  Check out
awaitFullGc in my beloved GcFinalization

https://code.google.com/p/guava-libraries/source/browse/guava-testlib/src/com/google/common/testing/GcFinalization.java?spec=svn196edb139d49d373abbce013008da0206b83f0ca&r=ae6bc9be431d7601b1f4713679efea126673378e


On Wed, Oct 30, 2013 at 9:58 AM, Jaroslav Bachorik <
jaroslav.bachorik at oracle.com> wrote:

> On 30.10.2013 17:30, Mandy Chung wrote:
>
>>
>> On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote:
>>
>>> Ok. I've added a big object and an initial call to System.gc(). But
>>> I'm leaving the calls to System.gc() right before checking the pools
>>> as well - just to be sure.
>>>
>>> http://cr.openjdk.java.net/~**jbachorik/8020467/webrev.04<http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04>
>>>
>>>
>> The update looks okay and I think System.gc() at line 90 is no longer
>> needed as the failure was due to the empty old gen.
>>
>> thanks for the update.
>>
>
> Thanks for the review. I've left the System.gc() at line 90 intact - when
> discussing this with Bengt during the review he was concerned that other
> pools might be susceptible to this kind of problem and having a full GC
> right before the check could lessen the probability of running into the
> data races described in this issue.
>
> -JB-
>
>  Mandy
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131030/adb1cf70/attachment-0001.html 

From martinrb at google.com  Thu Oct 31 00:08:31 2013
From: martinrb at google.com (Martin Buchholz)
Date: Thu, 31 Oct 2013 00:08:31 -0700
Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and
 isUsageThresholdExceeded() with CMS Old Gen pool
In-Reply-To: <9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com>
References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com>
	<52678280.1070004@oracle.com>
	<4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com>
	<52678509.2020002@oracle.com>
	<68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com>
	<5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com>
	<5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com>
	<52685571.1090407@oracle.com> <52692847.1030806@oracle.com>
	<526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com>
	<52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com>
	<CA+kOe08NtG23wd2KDHf-bDYCNJKkJcenOqEb4xzQXYoffbVgsg@mail.gmail.com>
	<9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com>
Message-ID: <CA+kOe080GONOQCCeUdA=PeB4qmvO2TMk_m6fmdf0vs8VU0L0eA@mail.gmail.com>

On Wed, Oct 30, 2013 at 11:32 PM, Staffan Larsen
<staffan.larsen at oracle.com>wrote:

> Quoting Bengt from earlier in this conversation:
>
> "As for just doing a System.gc() to force a GC I think you can rely on
> that System.gc() does a full GC in Hotspot unless someone sets
> -XX:+DisableExplicitGC on the command line. Considering that you are
> relying on Hotspot specifc names for pools I don't think it is a limitation
> to the test to rely on the Hotspot implementatoin of System.gc()."
>
>
A full synchronous gc is a stronger condition than a full gc.


> The spec for System.gc() doesn't promising anything, but all the
> collectors in Hotspot are implemented to do a full GC when System.gc() is
> called.
>
>
I'm not a GC expert and I have no proof, but that is not my understanding.
 I believe that a concurrent gc (CMS) remains concurrent even if initiated
by System.gc().


Hmmm.... checking hotspot flags I see:

java -XX:+PrintFlagsFinal
     bool ExplicitGCInvokesConcurrent               = false
{product}
     bool ExplicitGCInvokesConcurrentAndUnloadsClasses  = false
{product}
which suggests you are right for default gc operation.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131031/bc3723e0/attachment-0001.html