From jaroslav.bachorik at oracle.com Tue Oct 1 02:01:00 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 01 Oct 2013 11:01:00 +0200 Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL keystore setup when loaded dynamically Message-ID: <524A8F4C.2000908@oracle.com> Hi, Currently it is not possible to configure SSL parameters when loading the management agent. The fix is to forward any javax.net.ssl.* properties to the target JVM. The javax.net.ssl.* properties provided in the agent configuration should never replace any javax.net.ssl.* properties defined by the target JVM. The issue: https://bugs.openjdk.java.net/browse/JDK-6399961 Webrev: http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00 Thanks, -JB- From dmitry.samersoff at oracle.com Tue Oct 1 04:51:46 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Tue, 01 Oct 2013 15:51:46 +0400 Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL keystore setup when loaded dynamically In-Reply-To: <524A8F4C.2000908@oracle.com> References: <524A8F4C.2000908@oracle.com> Message-ID: <524AB752.2030900@oracle.com> Jaroslav, Agent.java: 99: As far as you intorduce SSL_PREFIX constant, please add one for "com.sun.management." 259: It's better to keep all property manipulations in agentmain and startRemoteManagementAgent - different methods to start Agent might have separate property set and scoping rules. Do you need to modify jcmd as well? -Dmitry On 2013-10-01 13:01, Jaroslav Bachorik wrote: > Hi, > > Currently it is not possible to configure SSL parameters when loading > the management agent. The fix is to forward any javax.net.ssl.* > properties to the target JVM. The javax.net.ssl.* properties provided in > the agent configuration should never replace any javax.net.ssl.* > properties defined by the target JVM. > > The issue: https://bugs.openjdk.java.net/browse/JDK-6399961 > Webrev: http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00 > > Thanks, > > -JB- -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From jaroslav.bachorik at oracle.com Tue Oct 1 06:03:35 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 01 Oct 2013 15:03:35 +0200 Subject: jmx-dev RFR 6399961: The management agent doesn't support SSL keystore setup when loaded dynamically In-Reply-To: <524AB752.2030900@oracle.com> References: <524A8F4C.2000908@oracle.com> <524AB752.2030900@oracle.com> Message-ID: <524AC827.9000301@oracle.com> Ok, thanks everyone for taking time reviewing this. Dmitry's comment about jcmd got me looking at the properties available when using the jcmd to start the management agent. It turns out that "com.sun.management.jmxremote.ssl.config.file" allows exactly the thing this issue talks about. So, please, disregard this change completely. Cheers, -JB- On 1.10.2013 13:51, Dmitry Samersoff wrote: > Jaroslav, > > Agent.java: > > 99: > > As far as you intorduce SSL_PREFIX constant, please add one for > "com.sun.management." > > 259: > > It's better to keep all property manipulations in agentmain and > startRemoteManagementAgent - different methods to start Agent might have > separate property set and scoping rules. > > Do you need to modify jcmd as well? > > -Dmitry > > > On 2013-10-01 13:01, Jaroslav Bachorik wrote: >> Hi, >> >> Currently it is not possible to configure SSL parameters when loading >> the management agent. The fix is to forward any javax.net.ssl.* >> properties to the target JVM. The javax.net.ssl.* properties provided in >> the agent configuration should never replace any javax.net.ssl.* >> properties defined by the target JVM. >> >> The issue: https://bugs.openjdk.java.net/browse/JDK-6399961 >> Webrev: http://cr.openjdk.java.net/~jbachorik/6399961/webrev.00 >> >> Thanks, >> >> -JB- > > From jaroslav.bachorik at oracle.com Wed Oct 2 01:47:26 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 02 Oct 2013 10:47:26 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values Message-ID: <524BDD9E.1050100@oracle.com> Hello, currently the JVM uptime reported by the RuntimeMXBean is based on System.currentTimeMillis() which makes it susceptible to changes of the OS time (eg. changing timezone, NTP synchronization etc.). The uptime should not depend on the system time and should be calculated using a monotonic clock source. There is already the way to get the actual JVM uptime in ticks. It is accessible as Management::timestamp() and the ticks are convertible to milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very easy to switch to the monotonic clock based uptime. The patch consists of the hotspot and jdk parts. For the hotspot a new constant needs to be introduced in src/share/vm/services/jmm.h and the actual logic to obtain the uptime in milliseconds is added in src/share/vm/services/management.cpp. For the jdk the changes comprise of adding the necessary JNI bridging methods in order to get the new uptime, introducing the same constant that is used in hotspot and changes to mapfile-vers files in order to properly build the native library. Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 Thanks, -JB- From staffan.larsen at oracle.com Wed Oct 2 02:23:34 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 2 Oct 2013 11:23:34 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <524BDD9E.1050100@oracle.com> References: <524BDD9E.1050100@oracle.com> Message-ID: <4088022F-6550-4C95-86D5-640F2E737839@oracle.com> Looks good! Thanks, /Staffan On 2 okt 2013, at 10:47, Jaroslav Bachorik wrote: > Hello, > > currently the JVM uptime reported by the RuntimeMXBean is based on System.currentTimeMillis() which makes it susceptible to changes of the OS time (eg. changing timezone, NTP synchronization etc.). The uptime should not depend on the system time and should be calculated using a monotonic clock source. > > There is already the way to get the actual JVM uptime in ticks. It is accessible as Management::timestamp() and the ticks are convertible to milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very easy to switch to the monotonic clock based uptime. > > The patch consists of the hotspot and jdk parts. > > For the hotspot a new constant needs to be introduced in src/share/vm/services/jmm.h and the actual logic to obtain the uptime in milliseconds is added in src/share/vm/services/management.cpp. > > For the jdk the changes comprise of adding the necessary JNI bridging methods in order to get the new uptime, introducing the same constant that is used in hotspot and changes to mapfile-vers files in order to properly build the native library. > > Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 > Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 > > Thanks, > > -JB- From jaroslav.bachorik at oracle.com Wed Oct 2 03:55:03 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 02 Oct 2013 12:55:03 +0200 Subject: jmx-dev RFR: 8024613 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java failing intermittently In-Reply-To: <523C459A.3080303@oracle.com> References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com> Message-ID: <524BFB87.10808@oracle.com> On 20.9.2013 14:54, shanliang wrote: > Jaroslav, > > It is a good idea to use the RMI Testlibrary. > > Better to call: > agent.close(); > > at Line 55, close the RMIRegistry (rmid.shutdown(rmidPort) Line 55) > does not ensure the JMX connector doing full clean, it is always better > to do clean within a test. Thanks. Implemented. http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01 -JB- > > Shanliang > > > Jaroslav Bachorik wrote: >> Please, review the following change for JDK-8024613 >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613 >> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/ >> >> >> The patch takes care of intermittent test failures caused by timing >> issues when starting the RMID process. It could happen that the RMID >> process hasn't been properly initialized in the timeframe of 5 seconds >> and the test would fail. >> >> The patch replaces the home-brewed RMID process management with the >> one available in the RMI Testlibrary which is used by more tests and >> therefore should be more stable. >> >> Thanks, >> >> -JB- > From jaroslav.bachorik at oracle.com Wed Oct 2 03:57:06 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 02 Oct 2013 12:57:06 +0200 Subject: jmx-dev [ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <523B0B30.4020003@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> Message-ID: <524BFC02.4050800@oracle.com> On 19.9.2013 16:33, Jaroslav Bachorik wrote: > The updated webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 > > I've moved some of the functionality to the testlibrary. > > -JB - > > On 12.9.2013 17:31, Jaroslav Bachorik wrote: >> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>> Jaroslav, >>> >>> CustomLauncherTest.java: >>> >>> 102: this check could be moved to switch at ll. 108 >>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>> unset. >> Good idea. Thanks. >> >>> 129: I would prefer don't have pattern like this one ever in shell >>> script. Could you prepare a list of VM's to check and just loop over it? >>> It makes test better readable. Also I think nowdays we can always use >>> server VM. >> I tried to mirror the original shell test as closely as possible. It >> would be nice if we could rely on the "server" vm only. Definitely more >> readable. >> >> -JB- >> >>> -Dmitry >>> >>> >>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>> Hi Jaroslav, >>>>>>> >>>>>>> You need a copyright notice in the new file. >>>>>>> >>>>>>> As written this test can only run on a full JDK - so please add >>>>>>> it to >>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to come >>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>> >>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>> further modification but was wondering if there is something >>>>>>> inherent in >>>>>>> the test that makes it inapplicable to OSX. >>>>>>> >>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>> hotspot >>>>>>> test library's process tools available. :( >>>>>> We have some, is there an obvious gap? >>>>>> >>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>> >>>>> Hm, thanks for the info. I should have used this library instead. >>>>> >>>>> Please, stand by for the updated webrev. >>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>> application can recognize its own PID and print it to its stdout. The >>>> main application then just reads the stdout to parse the PID. No need >>>> for JCMD any more. >>>> >>>> I could not find a way to remove the dependency on "test.jdk" system >>>> property. According to the jtreg web documentation >>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>> "test.java" >>>> system property should be available but in fact is not. But it seems >>>> that the testlibrary uses "test.jdk" system property too. >>>> >>>> The test does not run on OSX because nobody built the launcher >>>> binary :) >>>> I think it is a kind of DIY so I took the liberty of adding a >>>> linux-amd64 launcher while working on the test. >>>> >>>> While working with the test library I realized I was missing a crucial >>>> feature (at least for my purposes) - waiting for a certain message to >>>> appear in the stdout/stderr of the launched process. Very often I need >>>> to wait for the target process to get to certain point before the test >>>> can be allowed to continue - and the point is indicated by a message in >>>> stdout/stderr. Currently all the proc tools are designed to work in >>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>> analyzed after the target process died - and are not suitable for this >>>> kind of usage. >>>> >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>> >>>>> -JB- >>>>> >>>>>> >>>>>> -Chris. >>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>> >>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>> >>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>> similarity >>>>>>>> between the *.sh and *.java file so one needs to go through the new >>>>>>>> source in whole. >>>>>>>> >>>>>>>> The changes in "launcher" files are all about adding permissions to >>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Issue : JDK-8004926 >>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>> > From jaroslav.bachorik at oracle.com Wed Oct 2 03:59:33 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 02 Oct 2013 12:59:33 +0200 Subject: jmx-dev [ping] Re: RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <52308A5B.8020206@oracle.com> References: <52308389.6060001@oracle.com> <523086E2.4050307@oracle.com> <52308A5B.8020206@oracle.com> Message-ID: <524BFC95.60605@oracle.com> On 11.9.2013 17:20, Jaroslav Bachorik wrote: > On 09/11/2013 05:06 PM, shanliang wrote: >> The fix looks OK for me. >> >> I am wondering that in case of loopback address, is it better to always >> using "127.0.0.1" to generate a connectionId? this will make sure to >> have a unique id. > > I am afraid we are getting the 127.0.1.1 variant from RMI > (java.rmi.server.RemoteServer#getClientHost()). I don't know what else > might break if we start fiddling around with it. For now I would better > keep it the simplest possible. > > -JB- > >> >> Shanliang >> >> Jaroslav Bachorik wrote: >>> Please, review this simple patch for an intermittently failing test. >>> >>> The test fails in cases when the connection loopback is resolved to be >>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The >>> test does not anticipate this possibility and requires the loopback >>> address to be exactly 127.0.0.1 >>> >>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>> consider them non equal even though they are both the same loopback. The >>> patch adds a bit of flexibility to the test allowing for any two valid >>> loopback addresses (127.0.0.0/8) to be equal. >>> >>> Issue : JDK-8022220 >>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>> >>> Thanks, >>> >>> -JB- >>> >> > From dmitry.samersoff at oracle.com Wed Oct 2 07:11:02 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Wed, 02 Oct 2013 18:11:02 +0400 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <52308389.6060001@oracle.com> References: <52308389.6060001@oracle.com> Message-ID: <524C2976.8020109@oracle.com> Jaroslav, As a far as loopback address could be resolved to any of 127.0.0.0/8 client and server have to use the same loopback address. Generally speaking it's not required for 127.0.1.1 to be able to talk to 127.0.0.1 and we are in risk to get a weird fail instead of clear error message. -Dmitry On 2013-09-11 18:51, Jaroslav Bachorik wrote: > Please, review this simple patch for an intermittently failing test. > > The test fails in cases when the connection loopback is resolved to be > 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The > test does not anticipate this possibility and requires the loopback > address to be exactly 127.0.0.1 > > The test will end comparing 127.0.0.1 against 127.0.1.1 and will > consider them non equal even though they are both the same loopback. The > patch adds a bit of flexibility to the test allowing for any two valid > loopback addresses (127.0.0.0/8) to be equal. > > Issue : JDK-8022220 > Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 > > Thanks, > > -JB- > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From jaroslav.bachorik at oracle.com Thu Oct 3 08:02:37 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 03 Oct 2013 17:02:37 +0200 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524C2976.8020109@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> Message-ID: <524D870D.2080804@oracle.com> On 2.10.2013 16:11, Dmitry Samersoff wrote: > Jaroslav, > > As a far as loopback address could be resolved to any of 127.0.0.0/8 > client and server have to use the same loopback address. AFAIK, all the IPs 127.*.*.* equally designate the loopback interface. This might start breaking when you have more than one loopback interface in the system. But all of this might be irrelevant here - the IPs are retrieved *after* the JMX connection has been established making it clear that they are reachable. > > Generally speaking it's not required for 127.0.1.1 to be able to talk to > 127.0.0.1 and we are in risk to get a weird fail instead of clear error > message. As I said before as long as there is only one loopback interface it is safe to assume that all the loopback IPs are virtually identical. When we start considering multiple loopback interfaces we would need to take into account the also the assigned network interfaces. But it might hardly matter - it seems that the main culprit for this test to fail on this particular configuration was the fact that 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and makes one question the sanity of the test setup... -JB- > > -Dmitry > > > On 2013-09-11 18:51, Jaroslav Bachorik wrote: >> Please, review this simple patch for an intermittently failing test. >> >> The test fails in cases when the connection loopback is resolved to be >> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The >> test does not anticipate this possibility and requires the loopback >> address to be exactly 127.0.0.1 >> >> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >> consider them non equal even though they are both the same loopback. The >> patch adds a bit of flexibility to the test allowing for any two valid >> loopback addresses (127.0.0.0/8) to be equal. >> >> Issue : JDK-8022220 >> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >> >> Thanks, >> >> -JB- >> > > From chris.hegarty at oracle.com Thu Oct 3 08:29:48 2013 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Thu, 03 Oct 2013 16:29:48 +0100 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524D870D.2080804@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> <524D870D.2080804@oracle.com> Message-ID: <524D8D6C.9050907@oracle.com> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote: > ....... > But it might hardly matter - it seems that the main culprit for this > test to fail on this particular configuration was the fact that > 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and I have not looked at the specifics, but if you have an InetAddress instance you can invoke the isLoopbackAddress() [1][2] method to correctly determine if the instance is a valid loopback address. -Chris. [1] http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java [2] http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java > makes one question the sanity of the test setup... > > -JB- > >> >> -Dmitry >> >> >> On 2013-09-11 18:51, Jaroslav Bachorik wrote: >>> Please, review this simple patch for an intermittently failing test. >>> >>> The test fails in cases when the connection loopback is resolved to be >>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The >>> test does not anticipate this possibility and requires the loopback >>> address to be exactly 127.0.0.1 >>> >>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>> consider them non equal even though they are both the same loopback. The >>> patch adds a bit of flexibility to the test allowing for any two valid >>> loopback addresses (127.0.0.0/8) to be equal. >>> >>> Issue : JDK-8022220 >>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>> >>> Thanks, >>> >>> -JB- >>> >> >> > From jaroslav.bachorik at oracle.com Thu Oct 3 08:37:02 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 03 Oct 2013 17:37:02 +0200 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524D8D6C.9050907@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> <524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com> Message-ID: <524D8F1E.9050904@oracle.com> On 3.10.2013 17:29, Chris Hegarty wrote: > > > On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote: >> ....... >> But it might hardly matter - it seems that the main culprit for this >> test to fail on this particular configuration was the fact that >> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and > > I have not looked at the specifics, but if you have an InetAddress > instance you can invoke the isLoopbackAddress() [1][2] method to > correctly determine if the instance is a valid loopback address. Yes, and exactly this method seems to have failed to determine 127.0.0.1 being a loopback - according to the test output. I really can't see how because it basically compares the left-most byte of the IP to 127 ... -JB- > > -Chris. > > [1] > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java > > [2] > http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java > > >> makes one question the sanity of the test setup... >> >> -JB- >> >>> >>> -Dmitry >>> >>> >>> On 2013-09-11 18:51, Jaroslav Bachorik wrote: >>>> Please, review this simple patch for an intermittently failing test. >>>> >>>> The test fails in cases when the connection loopback is resolved to be >>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. >>>> The >>>> test does not anticipate this possibility and requires the loopback >>>> address to be exactly 127.0.0.1 >>>> >>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>>> consider them non equal even though they are both the same loopback. >>>> The >>>> patch adds a bit of flexibility to the test allowing for any two valid >>>> loopback addresses (127.0.0.0/8) to be equal. >>>> >>>> Issue : JDK-8022220 >>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>>> >>>> Thanks, >>>> >>>> -JB- >>>> >>> >>> >> From chris.hegarty at oracle.com Thu Oct 3 08:43:20 2013 From: chris.hegarty at oracle.com (Chris Hegarty) Date: Thu, 03 Oct 2013 16:43:20 +0100 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524D8F1E.9050904@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> <524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com> <524D8F1E.9050904@oracle.com> Message-ID: <524D9098.6030701@oracle.com> On 10/03/2013 04:37 PM, Jaroslav Bachorik wrote: > On 3.10.2013 17:29, Chris Hegarty wrote: >> >> >> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote: >>> ....... >>> But it might hardly matter - it seems that the main culprit for this >>> test to fail on this particular configuration was the fact that >>> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and >> >> I have not looked at the specifics, but if you have an InetAddress >> instance you can invoke the isLoopbackAddress() [1][2] method to >> correctly determine if the instance is a valid loopback address. > > Yes, and exactly this method seems to have failed to determine 127.0.0.1 > being a loopback - according to the test output. > > I really can't see how because it basically compares the left-most byte > of the IP to 127 ... Hmm... if this method fails to make the correct determination then we have problems ;-) We use isLoopbackAddress in may other networking, and similar, tests in the jdk. Sorry, I don't know what to say, there must be some other kind of issue on your machine, or address is not truly 127.0.0.1. -Chris. > > -JB- > >> >> -Chris. >> >> [1] >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java >> >> >> [2] >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java >> >> >> >>> makes one question the sanity of the test setup... >>> >>> -JB- >>> >>>> >>>> -Dmitry >>>> >>>> >>>> On 2013-09-11 18:51, Jaroslav Bachorik wrote: >>>>> Please, review this simple patch for an intermittently failing test. >>>>> >>>>> The test fails in cases when the connection loopback is resolved to be >>>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. >>>>> The >>>>> test does not anticipate this possibility and requires the loopback >>>>> address to be exactly 127.0.0.1 >>>>> >>>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>>>> consider them non equal even though they are both the same loopback. >>>>> The >>>>> patch adds a bit of flexibility to the test allowing for any two valid >>>>> loopback addresses (127.0.0.0/8) to be equal. >>>>> >>>>> Issue : JDK-8022220 >>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>>>> >>>>> Thanks, >>>>> >>>>> -JB- >>>>> >>>> >>>> >>> > From dmitry.samersoff at oracle.com Thu Oct 3 12:09:55 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Thu, 03 Oct 2013 23:09:55 +0400 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524D870D.2080804@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> <524D870D.2080804@oracle.com> Message-ID: <524DC103.7080401@oracle.com> Jaroslav, Behevior of mulitple loopback is not specified clearly[1] and is up to OS developers or more precise - kernel setup. Common practice is to assign 127.*.*.* to interfaces like tun, to be able to use some socket-related calls ever if it is not connected to peer. Other common situation - multiple loopback interfaces on host computer to support virtual instances. So on my opinion, it's better to be pessimistic and don't assume that different loopback addresses are able to talk with each other. [1] http://tools.ietf.org/html/rfc3330 127.0.0.0/8 - This block is assigned for use as the Internet host loopback address. A datagram sent by a higher level protocol to an address anywhere within this block should loop back inside the host. This is ordinarily implemented using only 127.0.0.1/32 for loopback, but no addresses within this block should ever appear on any network anywhere. -Dmitry On 2013-10-03 19:02, Jaroslav Bachorik wrote: > On 2.10.2013 16:11, Dmitry Samersoff wrote: >> Jaroslav, >> >> As a far as loopback address could be resolved to any of 127.0.0.0/8 >> client and server have to use the same loopback address. > > AFAIK, all the IPs 127.*.*.* equally designate the loopback interface. > This might start breaking when you have more than one loopback interface > in the system. > But all of this might be irrelevant here - the IPs are retrieved *after* > the JMX connection has been established making it clear that they are > reachable. > >> >> Generally speaking it's not required for 127.0.1.1 to be able to talk to >> 127.0.0.1 and we are in risk to get a weird fail instead of clear error >> message. > > As I said before as long as there is only one loopback interface it is > safe to assume that all the loopback IPs are virtually identical. When > we start considering multiple loopback interfaces we would need to take > into account the also the assigned network interfaces. > > But it might hardly matter - it seems that the main culprit for this > test to fail on this particular configuration was the fact that > 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and > makes one question the sanity of the test setup... > > -JB- > >> >> -Dmitry >> >> >> On 2013-09-11 18:51, Jaroslav Bachorik wrote: >>> Please, review this simple patch for an intermittently failing test. >>> >>> The test fails in cases when the connection loopback is resolved to be >>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. The >>> test does not anticipate this possibility and requires the loopback >>> address to be exactly 127.0.0.1 >>> >>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>> consider them non equal even though they are both the same loopback. The >>> patch adds a bit of flexibility to the test allowing for any two valid >>> loopback addresses (127.0.0.0/8) to be equal. >>> >>> Issue : JDK-8022220 >>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>> >>> Thanks, >>> >>> -JB- >>> >> >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From jaroslav.bachorik at oracle.com Fri Oct 4 02:15:47 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Fri, 04 Oct 2013 11:15:47 +0200 Subject: jmx-dev RFR: 8022220 Intermittent test failures in javax/management/remote/mandatory/connection/RMIConnectionIdTest.java In-Reply-To: <524D9098.6030701@oracle.com> References: <52308389.6060001@oracle.com> <524C2976.8020109@oracle.com> <524D870D.2080804@oracle.com> <524D8D6C.9050907@oracle.com> <524D8F1E.9050904@oracle.com> <524D9098.6030701@oracle.com> Message-ID: <524E8743.3070901@oracle.com> On 3.10.2013 17:43, Chris Hegarty wrote: > On 10/03/2013 04:37 PM, Jaroslav Bachorik wrote: >> On 3.10.2013 17:29, Chris Hegarty wrote: >>> >>> >>> On 10/03/2013 04:02 PM, Jaroslav Bachorik wrote: >>>> ....... >>>> But it might hardly matter - it seems that the main culprit for this >>>> test to fail on this particular configuration was the fact that >>>> 127.0.0.1 was *NOT* detected as a loopback IP. This is pretty weird and >>> >>> I have not looked at the specifics, but if you have an InetAddress >>> instance you can invoke the isLoopbackAddress() [1][2] method to >>> correctly determine if the instance is a valid loopback address. >> >> Yes, and exactly this method seems to have failed to determine 127.0.0.1 >> being a loopback - according to the test output. >> >> I really can't see how because it basically compares the left-most byte >> of the IP to 127 ... > > Hmm... if this method fails to make the correct determination then we > have problems ;-) We use isLoopbackAddress in may other networking, and > similar, tests in the jdk. > > Sorry, I don't know what to say, there must be some other kind of issue > on your machine, or address is not truly 127.0.0.1. Well, it turns out that this issue was reported roughly 7 months after it actually appeared in the test stabilization run. When digging around for more info in the logs it became obvious that this problem has been covered by a separate issue and fixed for b84. Additionaly, there was some fiddling with /etc/hosts during the test run. So, as usual, no black magic here ... just a lot of communication noise :/ Thanks everybody for taking your time and reviewing this unnecessary change. -JB- > > -Chris. > > >> >> -JB- >> >>> >>> -Chris. >>> >>> [1] >>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet4Address.java >>> >>> >>> >>> [2] >>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/54e099776f08/src/share/classes/java/net/Inet6Address.java >>> >>> >>> >>> >>>> makes one question the sanity of the test setup... >>>> >>>> -JB- >>>> >>>>> >>>>> -Dmitry >>>>> >>>>> >>>>> On 2013-09-11 18:51, Jaroslav Bachorik wrote: >>>>>> Please, review this simple patch for an intermittently failing test. >>>>>> >>>>>> The test fails in cases when the connection loopback is resolved >>>>>> to be >>>>>> 127.0.1.1 - it may happen under certain circumstances in eg. Ubuntu. >>>>>> The >>>>>> test does not anticipate this possibility and requires the loopback >>>>>> address to be exactly 127.0.0.1 >>>>>> >>>>>> The test will end comparing 127.0.0.1 against 127.0.1.1 and will >>>>>> consider them non equal even though they are both the same loopback. >>>>>> The >>>>>> patch adds a bit of flexibility to the test allowing for any two >>>>>> valid >>>>>> loopback addresses (127.0.0.0/8) to be equal. >>>>>> >>>>>> Issue : JDK-8022220 >>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8022220/webrev.00 >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -JB- >>>>>> >>>>> >>>>> >>>> >> From jaroslav.bachorik at oracle.com Mon Oct 7 06:59:23 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 15:59:23 +0200 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled Message-ID: <5252BE3B.5020607@oracle.com> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR. The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently. Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ Cheers, -JB- From jaroslav.bachorik at oracle.com Mon Oct 7 07:14:14 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 16:14:14 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <523B0B30.4020003@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> Message-ID: <5252C1B6.2060904@oracle.com> On 19.9.2013 16:33, Jaroslav Bachorik wrote: > The updated webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 > > I've moved some of the functionality to the testlibrary. > > -JB - > > On 12.9.2013 17:31, Jaroslav Bachorik wrote: >> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>> Jaroslav, >>> >>> CustomLauncherTest.java: >>> >>> 102: this check could be moved to switch at ll. 108 >>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>> unset. >> Good idea. Thanks. >> >>> 129: I would prefer don't have pattern like this one ever in shell >>> script. Could you prepare a list of VM's to check and just loop over it? >>> It makes test better readable. Also I think nowdays we can always use >>> server VM. >> I tried to mirror the original shell test as closely as possible. It >> would be nice if we could rely on the "server" vm only. Definitely more >> readable. >> >> -JB- >> >>> -Dmitry >>> >>> >>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>> Hi Jaroslav, >>>>>>> >>>>>>> You need a copyright notice in the new file. >>>>>>> >>>>>>> As written this test can only run on a full JDK - so please add >>>>>>> it to >>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to come >>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>> >>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>> further modification but was wondering if there is something >>>>>>> inherent in >>>>>>> the test that makes it inapplicable to OSX. >>>>>>> >>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>> hotspot >>>>>>> test library's process tools available. :( >>>>>> We have some, is there an obvious gap? >>>>>> >>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>> >>>>> Hm, thanks for the info. I should have used this library instead. >>>>> >>>>> Please, stand by for the updated webrev. >>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>> application can recognize its own PID and print it to its stdout. The >>>> main application then just reads the stdout to parse the PID. No need >>>> for JCMD any more. >>>> >>>> I could not find a way to remove the dependency on "test.jdk" system >>>> property. According to the jtreg web documentation >>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>> "test.java" >>>> system property should be available but in fact is not. But it seems >>>> that the testlibrary uses "test.jdk" system property too. >>>> >>>> The test does not run on OSX because nobody built the launcher >>>> binary :) >>>> I think it is a kind of DIY so I took the liberty of adding a >>>> linux-amd64 launcher while working on the test. >>>> >>>> While working with the test library I realized I was missing a crucial >>>> feature (at least for my purposes) - waiting for a certain message to >>>> appear in the stdout/stderr of the launched process. Very often I need >>>> to wait for the target process to get to certain point before the test >>>> can be allowed to continue - and the point is indicated by a message in >>>> stdout/stderr. Currently all the proc tools are designed to work in >>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>> analyzed after the target process died - and are not suitable for this >>>> kind of usage. >>>> >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>> >>>>> -JB- >>>>> >>>>>> >>>>>> -Chris. >>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>> >>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>> >>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>> similarity >>>>>>>> between the *.sh and *.java file so one needs to go through the new >>>>>>>> source in whole. >>>>>>>> >>>>>>>> The changes in "launcher" files are all about adding permissions to >>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Issue : JDK-8004926 >>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>> > From daniel.fuchs at oracle.com Mon Oct 7 07:22:10 2013 From: daniel.fuchs at oracle.com (Daniel Fuchs) Date: Mon, 07 Oct 2013 16:22:10 +0200 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled In-Reply-To: <5252BE3B.5020607@oracle.com> References: <5252BE3B.5020607@oracle.com> Message-ID: <5252C392.5010909@oracle.com> Hi Jaroslav, I am not an expert in classloading but I don't see any obvious issue with what you propose. I wonder whether making the test always run in /othervm mode might make it more stable. best regards, -- daniel On 10/7/13 3:59 PM, Jaroslav Bachorik wrote: > The test captures the number of loaded classes right at the start and > then checks the diffs when it's finished. However, it seems that there > might by some async class loading still going on, initiated by JFR. > > The patch simply adds a loop to wait for the number of loaded classes to > settle before continuing. This should prevent the test failing with JFR > intermittently. > > Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 > Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ > > Cheers, > > -JB- From dmitry.samersoff at oracle.com Mon Oct 7 07:31:27 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Mon, 07 Oct 2013 18:31:27 +0400 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252C1B6.2060904@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> Message-ID: <5252C5BF.4060406@oracle.com> Jarsolav, Looks good for me, comments below is just a nits - so fill free to ignore it. 1. As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for findLibjvm parameter, It's better to create an overload function findLibjvm(). 2. it's better to check for File.isFile() - readable (e.g. device) is not always what you whant here. 3. It's good to try ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so in order for the possible platforms with the only vm -Dmitry On 2013-10-07 18:14, Jaroslav Bachorik wrote: > On 19.9.2013 16:33, Jaroslav Bachorik wrote: >> The updated webrev: >> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >> >> I've moved some of the functionality to the testlibrary. >> >> -JB - >> >> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>> Jaroslav, >>>> >>>> CustomLauncherTest.java: >>>> >>>> 102: this check could be moved to switch at ll. 108 >>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>> unset. >>> Good idea. Thanks. >>> >>>> 129: I would prefer don't have pattern like this one ever in shell >>>> script. Could you prepare a list of VM's to check and just loop over >>>> it? >>>> It makes test better readable. Also I think nowdays we can always use >>>> server VM. >>> I tried to mirror the original shell test as closely as possible. It >>> would be nice if we could rely on the "server" vm only. Definitely more >>> readable. >>> >>> -JB- >>> >>>> -Dmitry >>>> >>>> >>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>> Hi Jaroslav, >>>>>>>> >>>>>>>> You need a copyright notice in the new file. >>>>>>>> >>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>> it to >>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>> come >>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>> >>>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>>> further modification but was wondering if there is something >>>>>>>> inherent in >>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>> >>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>> hotspot >>>>>>>> test library's process tools available. :( >>>>>>> We have some, is there an obvious gap? >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>> >>>>>>> >>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>> >>>>>> Please, stand by for the updated webrev. >>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>> application can recognize its own PID and print it to its stdout. The >>>>> main application then just reads the stdout to parse the PID. No need >>>>> for JCMD any more. >>>>> >>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>> property. According to the jtreg web documentation >>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>> "test.java" >>>>> system property should be available but in fact is not. But it seems >>>>> that the testlibrary uses "test.jdk" system property too. >>>>> >>>>> The test does not run on OSX because nobody built the launcher >>>>> binary :) >>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>> linux-amd64 launcher while working on the test. >>>>> >>>>> While working with the test library I realized I was missing a crucial >>>>> feature (at least for my purposes) - waiting for a certain message to >>>>> appear in the stdout/stderr of the launched process. Very often I need >>>>> to wait for the target process to get to certain point before the test >>>>> can be allowed to continue - and the point is indicated by a >>>>> message in >>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>> analyzed after the target process died - and are not suitable for this >>>>> kind of usage. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> -Chris. >>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>> >>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>> >>>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>> similarity >>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>> new >>>>>>>>> source in whole. >>>>>>>>> >>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>> permissions to >>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Issue : JDK-8004926 >>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>> >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From jaroslav.bachorik at oracle.com Mon Oct 7 07:34:52 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 16:34:52 +0200 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled In-Reply-To: <5252C392.5010909@oracle.com> References: <5252BE3B.5020607@oracle.com> <5252C392.5010909@oracle.com> Message-ID: <5252C68C.4050608@oracle.com> On 7.10.2013 16:22, Daniel Fuchs wrote: > Hi Jaroslav, > > I am not an expert in classloading but I don't see any obvious > issue with what you propose. I hope there is none :) If the number of loaded classes is not changing the test should continue immediately. The only problem could be loading the classes veeery slowly - not increasing the number of the loaded classes in 300ms interval. We could get a false positive and end up with the same failure as now :( > > I wonder whether making the test always run in /othervm mode > might make it more stable. I don't know. I was not able to reproduce the problem but from the description it sounds like it is spotted only with JFR enabled. So, I suppose, running it in othervm would not help at all. -JB- > > best regards, > > -- daniel > > On 10/7/13 3:59 PM, Jaroslav Bachorik wrote: >> The test captures the number of loaded classes right at the start and >> then checks the diffs when it's finished. However, it seems that there >> might by some async class loading still going on, initiated by JFR. >> >> The patch simply adds a loop to wait for the number of loaded classes to >> settle before continuing. This should prevent the test failing with JFR >> intermittently. >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 >> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ >> >> Cheers, >> >> -JB- > From staffan.larsen at oracle.com Mon Oct 7 07:35:47 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Mon, 7 Oct 2013 16:35:47 +0200 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled In-Reply-To: <5252BE3B.5020607@oracle.com> References: <5252BE3B.5020607@oracle.com> Message-ID: This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.) A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable. In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test. S/taffan On 7 okt 2013, at 15:59, Jaroslav Bachorik wrote: > The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR. > > The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently. > > Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 > Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ > > Cheers, > > -JB- From jaroslav.bachorik at oracle.com Mon Oct 7 09:39:06 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 18:39:06 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252C5BF.4060406@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com> Message-ID: <5252E3AA.5010702@oracle.com> On 7.10.2013 16:31, Dmitry Samersoff wrote: > Jarsolav, > > Looks good for me, comments below is just a nits - so fill free to > ignore it. > > 1. > As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for > findLibjvm parameter, It's better to create an overload function > findLibjvm(). Ok. It will make the code a further bit readable. > > 2. > it's better to check for File.isFile() - readable (e.g. device) is not > always what you whant here. Can you elaborate why checking for the current user being able to read the actual library file might be wrong? > > 3. It's good to try > ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so > in order for the possible platforms with the only vm Ok. -JB- > > -Dmitry > > > On 2013-10-07 18:14, Jaroslav Bachorik wrote: >> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>> The updated webrev: >>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>> >>> I've moved some of the functionality to the testlibrary. >>> >>> -JB - >>> >>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>> Jaroslav, >>>>> >>>>> CustomLauncherTest.java: >>>>> >>>>> 102: this check could be moved to switch at ll. 108 >>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>> unset. >>>> Good idea. Thanks. >>>> >>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>> script. Could you prepare a list of VM's to check and just loop over >>>>> it? >>>>> It makes test better readable. Also I think nowdays we can always use >>>>> server VM. >>>> I tried to mirror the original shell test as closely as possible. It >>>> would be nice if we could rely on the "server" vm only. Definitely more >>>> readable. >>>> >>>> -JB- >>>> >>>>> -Dmitry >>>>> >>>>> >>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>> Hi Jaroslav, >>>>>>>>> >>>>>>>>> You need a copyright notice in the new file. >>>>>>>>> >>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>> it to >>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>> come >>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>> >>>>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>>>> further modification but was wondering if there is something >>>>>>>>> inherent in >>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>> >>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>> hotspot >>>>>>>>> test library's process tools available. :( >>>>>>>> We have some, is there an obvious gap? >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>> >>>>>>>> >>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>> >>>>>>> Please, stand by for the updated webrev. >>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>> application can recognize its own PID and print it to its stdout. The >>>>>> main application then just reads the stdout to parse the PID. No need >>>>>> for JCMD any more. >>>>>> >>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>> property. According to the jtreg web documentation >>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>> "test.java" >>>>>> system property should be available but in fact is not. But it seems >>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>> >>>>>> The test does not run on OSX because nobody built the launcher >>>>>> binary :) >>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>> linux-amd64 launcher while working on the test. >>>>>> >>>>>> While working with the test library I realized I was missing a crucial >>>>>> feature (at least for my purposes) - waiting for a certain message to >>>>>> appear in the stdout/stderr of the launched process. Very often I need >>>>>> to wait for the target process to get to certain point before the test >>>>>> can be allowed to continue - and the point is indicated by a >>>>>> message in >>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>> analyzed after the target process died - and are not suitable for this >>>>>> kind of usage. >>>>>> >>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> -Chris. >>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>> >>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>> >>>>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>> similarity >>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>> new >>>>>>>>>> source in whole. >>>>>>>>>> >>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>> permissions to >>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>> >>> >> > > From dmitry.samersoff at oracle.com Mon Oct 7 09:47:25 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Mon, 07 Oct 2013 20:47:25 +0400 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252E3AA.5010702@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com> <5252E3AA.5010702@oracle.com> Message-ID: <5252E59D.30200@oracle.com> Jaroslav, > Can you elaborate why checking for the current user being able to read > the actual library file might be wrong? It's not applicable to this particular testcase (so I'd marked it as a nit) but a generic security rule is to always check that we deal with a regular file. Try to link any block device to libjvm.so and see what happens. -Dmitry On 2013-10-07 20:39, Jaroslav Bachorik wrote: > On 7.10.2013 16:31, Dmitry Samersoff wrote: >> Jarsolav, >> >> Looks good for me, comments below is just a nits - so fill free to >> ignore it. >> >> 1. >> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for >> findLibjvm parameter, It's better to create an overload function >> findLibjvm(). > > Ok. It will make the code a further bit readable. > >> >> 2. >> it's better to check for File.isFile() - readable (e.g. device) is not >> always what you whant here. > > Can you elaborate why checking for the current user being able to read > the actual library file might be wrong? > >> >> 3. It's good to try >> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so >> in order for the possible platforms with the only vm > > Ok. > > -JB- > >> >> -Dmitry >> >> >> On 2013-10-07 18:14, Jaroslav Bachorik wrote: >>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>> The updated webrev: >>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>> >>>> I've moved some of the functionality to the testlibrary. >>>> >>>> -JB - >>>> >>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>> Jaroslav, >>>>>> >>>>>> CustomLauncherTest.java: >>>>>> >>>>>> 102: this check could be moved to switch at ll. 108 >>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>>> unset. >>>>> Good idea. Thanks. >>>>> >>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>> script. Could you prepare a list of VM's to check and just loop over >>>>>> it? >>>>>> It makes test better readable. Also I think nowdays we can always use >>>>>> server VM. >>>>> I tried to mirror the original shell test as closely as possible. It >>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>> more >>>>> readable. >>>>> >>>>> -JB- >>>>> >>>>>> -Dmitry >>>>>> >>>>>> >>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>> Hi Jaroslav, >>>>>>>>>> >>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>> >>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>> it to >>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>> come >>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>> >>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>> need >>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>> inherent in >>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>> >>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>>> hotspot >>>>>>>>>> test library's process tools available. :( >>>>>>>>> We have some, is there an obvious gap? >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>> >>>>>>>> Please, stand by for the updated webrev. >>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>> The >>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>> need >>>>>>> for JCMD any more. >>>>>>> >>>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>>> property. According to the jtreg web documentation >>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>> "test.java" >>>>>>> system property should be available but in fact is not. But it seems >>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>> >>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>> binary :) >>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>> linux-amd64 launcher while working on the test. >>>>>>> >>>>>>> While working with the test library I realized I was missing a >>>>>>> crucial >>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>> message to >>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>> need >>>>>>> to wait for the target process to get to certain point before the >>>>>>> test >>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>> message in >>>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>> analyzed after the target process died - and are not suitable for >>>>>>> this >>>>>>> kind of usage. >>>>>>> >>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> -Chris. >>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>> >>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>> >>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>> original >>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>> similarity >>>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>>> new >>>>>>>>>>> source in whole. >>>>>>>>>>> >>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>> permissions to >>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>> >>>> >>> >> >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From jaroslav.bachorik at oracle.com Mon Oct 7 09:55:53 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 18:55:53 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252E59D.30200@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com> <5252E3AA.5010702@oracle.com> <5252E59D.30200@oracle.com> Message-ID: <5252E799.4090402@oracle.com> On 7.10.2013 18:47, Dmitry Samersoff wrote: > Jaroslav, > >> Can you elaborate why checking for the current user being able to read >> the actual library file might be wrong? > > It's not applicable to this particular testcase (so I'd marked it as a > nit) but a generic security rule is to always check that we deal with a > regular file. > > Try to link any block device to libjvm.so and see what happens. Ok. I see - in that case it would probably be good to check either for regular file and it being readable. -JB- > > -Dmitry > > > > On 2013-10-07 20:39, Jaroslav Bachorik wrote: >> On 7.10.2013 16:31, Dmitry Samersoff wrote: >>> Jarsolav, >>> >>> Looks good for me, comments below is just a nits - so fill free to >>> ignore it. >>> >>> 1. >>> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for >>> findLibjvm parameter, It's better to create an overload function >>> findLibjvm(). >> >> Ok. It will make the code a further bit readable. >> >>> >>> 2. >>> it's better to check for File.isFile() - readable (e.g. device) is not >>> always what you whant here. >> >> Can you elaborate why checking for the current user being able to read >> the actual library file might be wrong? >> >>> >>> 3. It's good to try >>> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so >>> in order for the possible platforms with the only vm >> >> Ok. >> >> -JB- >> >>> >>> -Dmitry >>> >>> >>> On 2013-10-07 18:14, Jaroslav Bachorik wrote: >>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>>> The updated webrev: >>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>>> >>>>> I've moved some of the functionality to the testlibrary. >>>>> >>>>> -JB - >>>>> >>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>>> Jaroslav, >>>>>>> >>>>>>> CustomLauncherTest.java: >>>>>>> >>>>>>> 102: this check could be moved to switch at ll. 108 >>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>>>> unset. >>>>>> Good idea. Thanks. >>>>>> >>>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>>> script. Could you prepare a list of VM's to check and just loop over >>>>>>> it? >>>>>>> It makes test better readable. Also I think nowdays we can always use >>>>>>> server VM. >>>>>> I tried to mirror the original shell test as closely as possible. It >>>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>>> more >>>>>> readable. >>>>>> >>>>>> -JB- >>>>>> >>>>>>> -Dmitry >>>>>>> >>>>>>> >>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>> >>>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>>> >>>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>>> it to >>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>>> come >>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>>> >>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>>> need >>>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>>> inherent in >>>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>>> >>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>>>> hotspot >>>>>>>>>>> test library's process tools available. :( >>>>>>>>>> We have some, is there an obvious gap? >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>>> >>>>>>>>> Please, stand by for the updated webrev. >>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>>> The >>>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>>> need >>>>>>>> for JCMD any more. >>>>>>>> >>>>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>>>> property. According to the jtreg web documentation >>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>>> "test.java" >>>>>>>> system property should be available but in fact is not. But it seems >>>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>>> >>>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>>> binary :) >>>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>>> linux-amd64 launcher while working on the test. >>>>>>>> >>>>>>>> While working with the test library I realized I was missing a >>>>>>>> crucial >>>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>>> message to >>>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>>> need >>>>>>>> to wait for the target process to get to certain point before the >>>>>>>> test >>>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>>> message in >>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>>> analyzed after the target process died - and are not suitable for >>>>>>>> this >>>>>>>> kind of usage. >>>>>>>> >>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Chris. >>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>>> >>>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>>> >>>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>>> original >>>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>>> similarity >>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>>>> new >>>>>>>>>>>> source in whole. >>>>>>>>>>>> >>>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>>> permissions to >>>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>> >>>>> >>>> >>> >>> >> > > From jaroslav.bachorik at oracle.com Mon Oct 7 10:10:33 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 07 Oct 2013 19:10:33 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252C5BF.4060406@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com> Message-ID: <5252EB09.2090103@oracle.com> On 7.10.2013 16:31, Dmitry Samersoff wrote: > Jarsolav, > > Looks good for me, comments below is just a nits - so fill free to > ignore it. > > 1. > As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for > findLibjvm parameter, It's better to create an overload function > findLibjvm(). > > 2. > it's better to check for File.isFile() - readable (e.g. device) is not > always what you whant here. > > 3. It's good to try > ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so > in order for the possible platforms with the only vm Nits not ignored - http://cr.openjdk.java.net/~jbachorik/8004926/webrev.04/ :) -JB- > > -Dmitry > > > On 2013-10-07 18:14, Jaroslav Bachorik wrote: >> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>> The updated webrev: >>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>> >>> I've moved some of the functionality to the testlibrary. >>> >>> -JB - >>> >>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>> Jaroslav, >>>>> >>>>> CustomLauncherTest.java: >>>>> >>>>> 102: this check could be moved to switch at ll. 108 >>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>> unset. >>>> Good idea. Thanks. >>>> >>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>> script. Could you prepare a list of VM's to check and just loop over >>>>> it? >>>>> It makes test better readable. Also I think nowdays we can always use >>>>> server VM. >>>> I tried to mirror the original shell test as closely as possible. It >>>> would be nice if we could rely on the "server" vm only. Definitely more >>>> readable. >>>> >>>> -JB- >>>> >>>>> -Dmitry >>>>> >>>>> >>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>> Hi Jaroslav, >>>>>>>>> >>>>>>>>> You need a copyright notice in the new file. >>>>>>>>> >>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>> it to >>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>> come >>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>> >>>>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>>>> further modification but was wondering if there is something >>>>>>>>> inherent in >>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>> >>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>> hotspot >>>>>>>>> test library's process tools available. :( >>>>>>>> We have some, is there an obvious gap? >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>> >>>>>>>> >>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>> >>>>>>> Please, stand by for the updated webrev. >>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>> application can recognize its own PID and print it to its stdout. The >>>>>> main application then just reads the stdout to parse the PID. No need >>>>>> for JCMD any more. >>>>>> >>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>> property. According to the jtreg web documentation >>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>> "test.java" >>>>>> system property should be available but in fact is not. But it seems >>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>> >>>>>> The test does not run on OSX because nobody built the launcher >>>>>> binary :) >>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>> linux-amd64 launcher while working on the test. >>>>>> >>>>>> While working with the test library I realized I was missing a crucial >>>>>> feature (at least for my purposes) - waiting for a certain message to >>>>>> appear in the stdout/stderr of the launched process. Very often I need >>>>>> to wait for the target process to get to certain point before the test >>>>>> can be allowed to continue - and the point is indicated by a >>>>>> message in >>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>> analyzed after the target process died - and are not suitable for this >>>>>> kind of usage. >>>>>> >>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> -Chris. >>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>> >>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>> >>>>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>> similarity >>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>> new >>>>>>>>>> source in whole. >>>>>>>>>> >>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>> permissions to >>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>> >>> >> > > From dmitry.samersoff at oracle.com Mon Oct 7 10:12:23 2013 From: dmitry.samersoff at oracle.com (Dmitry Samersoff) Date: Mon, 07 Oct 2013 21:12:23 +0400 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252EB09.2090103@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <5252C5BF.4060406@oracle.com> <5252EB09.2090103@oracle.com> Message-ID: <5252EB77.8090702@oracle.com> Jaroslav, Thumbs up! Thank you for addressing my comments. -Dmitry On 2013-10-07 21:10, Jaroslav Bachorik wrote: > On 7.10.2013 16:31, Dmitry Samersoff wrote: >> Jarsolav, >> >> Looks good for me, comments below is just a nits - so fill free to >> ignore it. >> >> 1. >> As FS.getPath(TEST_JDK, "jre", "lib", LIBARCH) is the only value for >> findLibjvm parameter, It's better to create an overload function >> findLibjvm(). >> >> 2. >> it's better to check for File.isFile() - readable (e.g. device) is not >> always what you whant here. >> >> 3. It's good to try >> ARCH/libjvm.so, ARCH/server/libjvm.so, ARCH/client/libjvm.so >> in order for the possible platforms with the only vm > > Nits not ignored - > http://cr.openjdk.java.net/~jbachorik/8004926/webrev.04/ :) > > -JB- > >> >> -Dmitry >> >> >> On 2013-10-07 18:14, Jaroslav Bachorik wrote: >>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>> The updated webrev: >>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>> >>>> I've moved some of the functionality to the testlibrary. >>>> >>>> -JB - >>>> >>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>> Jaroslav, >>>>>> >>>>>> CustomLauncherTest.java: >>>>>> >>>>>> 102: this check could be moved to switch at ll. 108 >>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>>> unset. >>>>> Good idea. Thanks. >>>>> >>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>> script. Could you prepare a list of VM's to check and just loop over >>>>>> it? >>>>>> It makes test better readable. Also I think nowdays we can always use >>>>>> server VM. >>>>> I tried to mirror the original shell test as closely as possible. It >>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>> more >>>>> readable. >>>>> >>>>> -JB- >>>>> >>>>>> -Dmitry >>>>>> >>>>>> >>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>> Hi Jaroslav, >>>>>>>>>> >>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>> >>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>> it to >>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>> come >>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>> >>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>> need >>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>> inherent in >>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>> >>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>>> hotspot >>>>>>>>>> test library's process tools available. :( >>>>>>>>> We have some, is there an obvious gap? >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>> >>>>>>>> Please, stand by for the updated webrev. >>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>> The >>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>> need >>>>>>> for JCMD any more. >>>>>>> >>>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>>> property. According to the jtreg web documentation >>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>> "test.java" >>>>>>> system property should be available but in fact is not. But it seems >>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>> >>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>> binary :) >>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>> linux-amd64 launcher while working on the test. >>>>>>> >>>>>>> While working with the test library I realized I was missing a >>>>>>> crucial >>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>> message to >>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>> need >>>>>>> to wait for the target process to get to certain point before the >>>>>>> test >>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>> message in >>>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>> analyzed after the target process died - and are not suitable for >>>>>>> this >>>>>>> kind of usage. >>>>>>> >>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> -Chris. >>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>> >>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>> >>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>> original >>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>> similarity >>>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>>> new >>>>>>>>>>> source in whole. >>>>>>>>>>> >>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>> permissions to >>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>> >>>> >>> >> >> > -- Dmitry Samersoff Oracle Java development team, Saint Petersburg, Russia * I would love to change the world, but they won't give me the sources. From david.holmes at oracle.com Mon Oct 7 20:42:46 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 08 Oct 2013 13:42:46 +1000 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5252C1B6.2060904@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> Message-ID: <52537F36.8020001@oracle.com> Jaroslav, Can you summarise the changes please? With the conversion to Java and the infrastructure additions I can't tell what is actually fixing the original timeout issue :) Thanks, David On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: > On 19.9.2013 16:33, Jaroslav Bachorik wrote: >> The updated webrev: >> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >> >> I've moved some of the functionality to the testlibrary. >> >> -JB - >> >> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>> Jaroslav, >>>> >>>> CustomLauncherTest.java: >>>> >>>> 102: this check could be moved to switch at ll. 108 >>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>> unset. >>> Good idea. Thanks. >>> >>>> 129: I would prefer don't have pattern like this one ever in shell >>>> script. Could you prepare a list of VM's to check and just loop over >>>> it? >>>> It makes test better readable. Also I think nowdays we can always use >>>> server VM. >>> I tried to mirror the original shell test as closely as possible. It >>> would be nice if we could rely on the "server" vm only. Definitely more >>> readable. >>> >>> -JB- >>> >>>> -Dmitry >>>> >>>> >>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>> Hi Jaroslav, >>>>>>>> >>>>>>>> You need a copyright notice in the new file. >>>>>>>> >>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>> it to >>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>> come >>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>> >>>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>>> further modification but was wondering if there is something >>>>>>>> inherent in >>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>> >>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>> hotspot >>>>>>>> test library's process tools available. :( >>>>>>> We have some, is there an obvious gap? >>>>>>> >>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>> >>>>>>> >>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>> >>>>>> Please, stand by for the updated webrev. >>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>> application can recognize its own PID and print it to its stdout. The >>>>> main application then just reads the stdout to parse the PID. No need >>>>> for JCMD any more. >>>>> >>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>> property. According to the jtreg web documentation >>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>> "test.java" >>>>> system property should be available but in fact is not. But it seems >>>>> that the testlibrary uses "test.jdk" system property too. >>>>> >>>>> The test does not run on OSX because nobody built the launcher >>>>> binary :) >>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>> linux-amd64 launcher while working on the test. >>>>> >>>>> While working with the test library I realized I was missing a crucial >>>>> feature (at least for my purposes) - waiting for a certain message to >>>>> appear in the stdout/stderr of the launched process. Very often I need >>>>> to wait for the target process to get to certain point before the test >>>>> can be allowed to continue - and the point is indicated by a >>>>> message in >>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>> analyzed after the target process died - and are not suitable for this >>>>> kind of usage. >>>>> >>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> -Chris. >>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>> >>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>> >>>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>> similarity >>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>> new >>>>>>>>> source in whole. >>>>>>>>> >>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>> permissions to >>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Issue : JDK-8004926 >>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>> >> > From david.holmes at oracle.com Tue Oct 8 00:34:46 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 08 Oct 2013 17:34:46 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <524BDD9E.1050100@oracle.com> References: <524BDD9E.1050100@oracle.com> Message-ID: <5253B596.1000206@oracle.com> Jaroslav, On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: > Hello, > > currently the JVM uptime reported by the RuntimeMXBean is based on > System.currentTimeMillis() which makes it susceptible to changes of the > OS time (eg. changing timezone, NTP synchronization etc.). The uptime > should not depend on the system time and should be calculated using a > monotonic clock source. > > There is already the way to get the actual JVM uptime in ticks. It is > accessible as Management::timestamp() and the ticks are convertible to > milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very > easy to switch to the monotonic clock based uptime. Maybe I'm missing something but TiumeStamp updates using os::elapsed_counter() which on Linux uses gettimeofday so is not a monotonic clock source. David ----- > The patch consists of the hotspot and jdk parts. > > For the hotspot a new constant needs to be introduced in > src/share/vm/services/jmm.h and the actual logic to obtain the uptime in > milliseconds is added in src/share/vm/services/management.cpp. > > For the jdk the changes comprise of adding the necessary JNI bridging > methods in order to get the new uptime, introducing the same constant > that is used in hotspot and changes to mapfile-vers files in order to > properly build the native library. > > Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 > Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 > > Thanks, > > -JB- From jaroslav.bachorik at oracle.com Tue Oct 8 04:33:41 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 08 Oct 2013 13:33:41 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <52537F36.8020001@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> Message-ID: <5253ED95.20706@oracle.com> On 8.10.2013 05:42, David Holmes wrote: > Jaroslav, > > Can you summarise the changes please? With the conversion to Java and > the infrastructure additions I can't tell what is actually fixing the > original timeout issue :) The timeout was most caused by using the same file for communication between java processes in more test cases. When those test cases were run in parallel the file got rewritten silently and some of the tests could end up trying to connect to incorrect port in the target application. I was able to reproduce the timeout by interleaving the test runs for CustomLauncherTest.sh and LocalManagementTest.sh and adding an artificial delay to CusteomLauncherTest.sh to allow LocalManagementTest.sh to change the port in the file. While it could be fixed by using a different file for each test case I took the liberty of converting the shell tests to java tests. This allows me to remove the communication file and, in the end, make the tests more robust. CustomLauncherTest.java and LocalManagementTest.java are the tests converted from shell to java. I decided to convert LocalManagementTest.sh as well because it has the same problems as the CustomLauncherTest.sh. The changes in the testlibrary are about introducing new methods allowing the tests easily start a process and wait for a certain text appearing in its stdout/stderr. Using these methods the caller can wait till the callee is fully initialized and eg. ready to accept connections. The changes in launchers make the launchers actually executable + I am adding a linux-amd64 launcher (I needed that one to work on the changes locally and thought it might be nice to have one more platform covered by the test). I've update the webrev to include changes to LocalManagementTest and TEST.groups (both of those tests require JDK) - http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 -JB- > > Thanks, > David > > On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>> The updated webrev: >>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>> >>> I've moved some of the functionality to the testlibrary. >>> >>> -JB - >>> >>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>> Jaroslav, >>>>> >>>>> CustomLauncherTest.java: >>>>> >>>>> 102: this check could be moved to switch at ll. 108 >>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>> unset. >>>> Good idea. Thanks. >>>> >>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>> script. Could you prepare a list of VM's to check and just loop over >>>>> it? >>>>> It makes test better readable. Also I think nowdays we can always use >>>>> server VM. >>>> I tried to mirror the original shell test as closely as possible. It >>>> would be nice if we could rely on the "server" vm only. Definitely more >>>> readable. >>>> >>>> -JB- >>>> >>>>> -Dmitry >>>>> >>>>> >>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>> Hi Jaroslav, >>>>>>>>> >>>>>>>>> You need a copyright notice in the new file. >>>>>>>>> >>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>> it to >>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>> come >>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>> >>>>>>>>> Is there a reason this test can't run on OSX? I know it would need >>>>>>>>> further modification but was wondering if there is something >>>>>>>>> inherent in >>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>> >>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>> hotspot >>>>>>>>> test library's process tools available. :( >>>>>>>> We have some, is there an obvious gap? >>>>>>>> >>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>> >>>>>>> Please, stand by for the updated webrev. >>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>> application can recognize its own PID and print it to its stdout. The >>>>>> main application then just reads the stdout to parse the PID. No need >>>>>> for JCMD any more. >>>>>> >>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>> property. According to the jtreg web documentation >>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>> "test.java" >>>>>> system property should be available but in fact is not. But it seems >>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>> >>>>>> The test does not run on OSX because nobody built the launcher >>>>>> binary :) >>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>> linux-amd64 launcher while working on the test. >>>>>> >>>>>> While working with the test library I realized I was missing a >>>>>> crucial >>>>>> feature (at least for my purposes) - waiting for a certain message to >>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>> need >>>>>> to wait for the target process to get to certain point before the >>>>>> test >>>>>> can be allowed to continue - and the point is indicated by a >>>>>> message in >>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>> analyzed after the target process died - and are not suitable for >>>>>> this >>>>>> kind of usage. >>>>>> >>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> -Chris. >>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>> >>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>> >>>>>>>>>> In order to fix this the test is rewritten in Java - the original >>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>> similarity >>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>> new >>>>>>>>>> source in whole. >>>>>>>>>> >>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>> permissions to >>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>> >>> >> From jaroslav.bachorik at oracle.com Tue Oct 8 05:36:36 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 08 Oct 2013 14:36:36 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <5253B596.1000206@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> Message-ID: <5253FC54.4010407@oracle.com> On 8.10.2013 09:34, David Holmes wrote: > Jaroslav, > > On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >> Hello, >> >> currently the JVM uptime reported by the RuntimeMXBean is based on >> System.currentTimeMillis() which makes it susceptible to changes of the >> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >> should not depend on the system time and should be calculated using a >> monotonic clock source. >> >> There is already the way to get the actual JVM uptime in ticks. It is >> accessible as Management::timestamp() and the ticks are convertible to >> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very >> easy to switch to the monotonic clock based uptime. > > Maybe I'm missing something but TiumeStamp updates using > os::elapsed_counter() which on Linux uses gettimeofday so is not a > monotonic clock source. Hm, yes. I wasn't aware of this linux/bsd specific. Is there any reason why a non monotonic clock source is used for timestamping except of the historical one? os::javaTimeNanos() uses montonic clock when available - why can't be the same used for os::elapsed_counter() especially when a counter based on "gettimeofday" is not really a counter? -JB- > > David > ----- > > > >> The patch consists of the hotspot and jdk parts. >> >> For the hotspot a new constant needs to be introduced in >> src/share/vm/services/jmm.h and the actual logic to obtain the uptime in >> milliseconds is added in src/share/vm/services/management.cpp. >> >> For the jdk the changes comprise of adding the necessary JNI bridging >> methods in order to get the new uptime, introducing the same constant >> that is used in hotspot and changes to mapfile-vers files in order to >> properly build the native library. >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >> >> Thanks, >> >> -JB- From david.holmes at oracle.com Tue Oct 8 14:46:12 2013 From: david.holmes at oracle.com (David Holmes) Date: Wed, 09 Oct 2013 07:46:12 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <5253FC54.4010407@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> Message-ID: <52547D24.9060806@oracle.com> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: > On 8.10.2013 09:34, David Holmes wrote: >> Jaroslav, >> >> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>> Hello, >>> >>> currently the JVM uptime reported by the RuntimeMXBean is based on >>> System.currentTimeMillis() which makes it susceptible to changes of the >>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>> should not depend on the system time and should be calculated using a >>> monotonic clock source. >>> >>> There is already the way to get the actual JVM uptime in ticks. It is >>> accessible as Management::timestamp() and the ticks are convertible to >>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it very >>> easy to switch to the monotonic clock based uptime. >> >> Maybe I'm missing something but TiumeStamp updates using >> os::elapsed_counter() which on Linux uses gettimeofday so is not a >> monotonic clock source. > > Hm, yes. I wasn't aware of this linux/bsd specific. > > Is there any reason why a non monotonic clock source is used for > timestamping except of the historical one? os::javaTimeNanos() uses > montonic clock when available - why can't be the same used for > os::elapsed_counter() especially when a counter based on "gettimeofday" > is not really a counter? It is all historical. These elapsed_counters and elapsed_timers make me cringe. But changing it has a lot of potential consequences because of the way these are used in logging etc. Certainly not something to be contemplated at this stage of JDK 8. Perhaps a simpler fix here is to expose a startUpTimeNanos that can then be used for the uptime. David > -JB- > >> >> David >> ----- >> >> >> >>> The patch consists of the hotspot and jdk parts. >>> >>> For the hotspot a new constant needs to be introduced in >>> src/share/vm/services/jmm.h and the actual logic to obtain the uptime in >>> milliseconds is added in src/share/vm/services/management.cpp. >>> >>> For the jdk the changes comprise of adding the necessary JNI bridging >>> methods in order to get the new uptime, introducing the same constant >>> that is used in hotspot and changes to mapfile-vers files in order to >>> properly build the native library. >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>> >>> Thanks, >>> >>> -JB- > From david.holmes at oracle.com Wed Oct 9 03:23:54 2013 From: david.holmes at oracle.com (David Holmes) Date: Wed, 09 Oct 2013 20:23:54 +1000 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <5253ED95.20706@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com> Message-ID: <52552EBA.4060308@oracle.com> Jaroslav, Thanks for the details description of changes - much appreciated. There is a lot to digest in there. :) It isn't obvious to me why these tests require a full JDK? I don't quite follow the libjvm lookup logic - I would expect that you would always want to test the libjvm that is currently running - though it is hard to determine that. Thanks, David On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote: > On 8.10.2013 05:42, David Holmes wrote: >> Jaroslav, >> >> Can you summarise the changes please? With the conversion to Java and >> the infrastructure additions I can't tell what is actually fixing the >> original timeout issue :) > > The timeout was most caused by using the same file for communication > between java processes in more test cases. When those test cases were > run in parallel the file got rewritten silently and some of the tests > could end up trying to connect to incorrect port in the target > application. I was able to reproduce the timeout by interleaving the > test runs for CustomLauncherTest.sh and LocalManagementTest.sh and > adding an artificial delay to CusteomLauncherTest.sh to allow > LocalManagementTest.sh to change the port in the file. > > While it could be fixed by using a different file for each test case I > took the liberty of converting the shell tests to java tests. This > allows me to remove the communication file and, in the end, make the > tests more robust. > > CustomLauncherTest.java and LocalManagementTest.java are the tests > converted from shell to java. I decided to convert > LocalManagementTest.sh as well because it has the same problems as the > CustomLauncherTest.sh. > > The changes in the testlibrary are about introducing new methods > allowing the tests easily start a process and wait for a certain text > appearing in its stdout/stderr. Using these methods the caller can wait > till the callee is fully initialized and eg. ready to accept connections. > > The changes in launchers make the launchers actually executable + I am > adding a linux-amd64 launcher (I needed that one to work on the changes > locally and thought it might be nice to have one more platform covered > by the test). > > I've update the webrev to include changes to LocalManagementTest and > TEST.groups (both of those tests require JDK) - > http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 > > -JB- > >> >> Thanks, >> David >> >> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>> The updated webrev: >>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>> >>>> I've moved some of the functionality to the testlibrary. >>>> >>>> -JB - >>>> >>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>> Jaroslav, >>>>>> >>>>>> CustomLauncherTest.java: >>>>>> >>>>>> 102: this check could be moved to switch at ll. 108 >>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>>> unset. >>>>> Good idea. Thanks. >>>>> >>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>> script. Could you prepare a list of VM's to check and just loop over >>>>>> it? >>>>>> It makes test better readable. Also I think nowdays we can always use >>>>>> server VM. >>>>> I tried to mirror the original shell test as closely as possible. It >>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>> more >>>>> readable. >>>>> >>>>> -JB- >>>>> >>>>>> -Dmitry >>>>>> >>>>>> >>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>> Hi Jaroslav, >>>>>>>>>> >>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>> >>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>> it to >>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>> come >>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>> >>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>> need >>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>> inherent in >>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>> >>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>>> hotspot >>>>>>>>>> test library's process tools available. :( >>>>>>>>> We have some, is there an obvious gap? >>>>>>>>> >>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>> >>>>>>>> Please, stand by for the updated webrev. >>>>>>> I was able to get rid off the JCMD. Using the testlibrary the target >>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>> The >>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>> need >>>>>>> for JCMD any more. >>>>>>> >>>>>>> I could not find a way to remove the dependency on "test.jdk" system >>>>>>> property. According to the jtreg web documentation >>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>> "test.java" >>>>>>> system property should be available but in fact is not. But it seems >>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>> >>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>> binary :) >>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>> linux-amd64 launcher while working on the test. >>>>>>> >>>>>>> While working with the test library I realized I was missing a >>>>>>> crucial >>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>> message to >>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>> need >>>>>>> to wait for the target process to get to certain point before the >>>>>>> test >>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>> message in >>>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>> analyzed after the target process died - and are not suitable for >>>>>>> this >>>>>>> kind of usage. >>>>>>> >>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> -Chris. >>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>> >>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>> >>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>> original >>>>>>>>>>> functionality and outputs should be 100% preserved. The patch is >>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>> similarity >>>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>>> new >>>>>>>>>>> source in whole. >>>>>>>>>>> >>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>> permissions to >>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>> Webrev : http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>> >>>> >>> > From jaroslav.bachorik at oracle.com Wed Oct 9 04:26:27 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Oct 2013 13:26:27 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <52547D24.9060806@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> Message-ID: <52553D63.5000508@oracle.com> On 8.10.2013 23:46, David Holmes wrote: > On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >> On 8.10.2013 09:34, David Holmes wrote: >>> Jaroslav, >>> >>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>> Hello, >>>> >>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>> System.currentTimeMillis() which makes it susceptible to changes of the >>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>> should not depend on the system time and should be calculated using a >>>> monotonic clock source. >>>> >>>> There is already the way to get the actual JVM uptime in ticks. It is >>>> accessible as Management::timestamp() and the ticks are convertible to >>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>> very >>>> easy to switch to the monotonic clock based uptime. >>> >>> Maybe I'm missing something but TiumeStamp updates using >>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>> monotonic clock source. >> >> Hm, yes. I wasn't aware of this linux/bsd specific. >> >> Is there any reason why a non monotonic clock source is used for >> timestamping except of the historical one? os::javaTimeNanos() uses >> montonic clock when available - why can't be the same used for >> os::elapsed_counter() especially when a counter based on "gettimeofday" >> is not really a counter? > > It is all historical. These elapsed_counters and elapsed_timers make me > cringe. But changing it has a lot of potential consequences because of > the way these are used in logging etc. Certainly not something to be > contemplated at this stage of JDK 8. > > Perhaps a simpler fix here is to expose a startUpTimeNanos that can then > be used for the uptime. My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot I am using os::javaTimeNanos() to get the monotonic ticks where possible. The JDK part stays the same as for webrev.00 -JB- > > David > >> -JB- >> >>> >>> David >>> ----- >>> >>> >>> >>>> The patch consists of the hotspot and jdk parts. >>>> >>>> For the hotspot a new constant needs to be introduced in >>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>> uptime in >>>> milliseconds is added in src/share/vm/services/management.cpp. >>>> >>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>> methods in order to get the new uptime, introducing the same constant >>>> that is used in hotspot and changes to mapfile-vers files in order to >>>> properly build the native library. >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>> >>>> Thanks, >>>> >>>> -JB- >> From jaroslav.bachorik at oracle.com Wed Oct 9 04:31:57 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Oct 2013 13:31:57 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <52552EBA.4060308@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com> <52552EBA.4060308@oracle.com> Message-ID: <52553EAD.4040506@oracle.com> On 9.10.2013 12:23, David Holmes wrote: > Jaroslav, > > Thanks for the details description of changes - much appreciated. > > There is a lot to digest in there. :) Yep, it started as a simple fix :/ > > It isn't obvious to me why these tests require a full JDK? IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its requirements are the same as the ones of CustomLauncherTest.sh (now *.java) so it seemed logical to list it there too. > > I don't quite follow the libjvm lookup logic - I would expect that you > would always want to test the libjvm that is currently running - though > it is hard to determine that. I'm afraid I can't be of much assistance here - I just took what was in the *.sh version and converted it to *.java. -JB- > > Thanks, > David > > On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote: >> On 8.10.2013 05:42, David Holmes wrote: >>> Jaroslav, >>> >>> Can you summarise the changes please? With the conversion to Java and >>> the infrastructure additions I can't tell what is actually fixing the >>> original timeout issue :) >> >> The timeout was most caused by using the same file for communication >> between java processes in more test cases. When those test cases were >> run in parallel the file got rewritten silently and some of the tests >> could end up trying to connect to incorrect port in the target >> application. I was able to reproduce the timeout by interleaving the >> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and >> adding an artificial delay to CusteomLauncherTest.sh to allow >> LocalManagementTest.sh to change the port in the file. >> >> While it could be fixed by using a different file for each test case I >> took the liberty of converting the shell tests to java tests. This >> allows me to remove the communication file and, in the end, make the >> tests more robust. >> >> CustomLauncherTest.java and LocalManagementTest.java are the tests >> converted from shell to java. I decided to convert >> LocalManagementTest.sh as well because it has the same problems as the >> CustomLauncherTest.sh. >> >> The changes in the testlibrary are about introducing new methods >> allowing the tests easily start a process and wait for a certain text >> appearing in its stdout/stderr. Using these methods the caller can wait >> till the callee is fully initialized and eg. ready to accept connections. >> >> The changes in launchers make the launchers actually executable + I am >> adding a linux-amd64 launcher (I needed that one to work on the changes >> locally and thought it might be nice to have one more platform covered >> by the test). >> >> I've update the webrev to include changes to LocalManagementTest and >> TEST.groups (both of those tests require JDK) - >> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 >> >> -JB- >> >>> >>> Thanks, >>> David >>> >>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>>> The updated webrev: >>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>>> >>>>> I've moved some of the functionality to the testlibrary. >>>>> >>>>> -JB - >>>>> >>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>>> Jaroslav, >>>>>>> >>>>>>> CustomLauncherTest.java: >>>>>>> >>>>>>> 102: this check could be moved to switch at ll. 108 >>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM remains >>>>>>> unset. >>>>>> Good idea. Thanks. >>>>>> >>>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>>> script. Could you prepare a list of VM's to check and just loop over >>>>>>> it? >>>>>>> It makes test better readable. Also I think nowdays we can always >>>>>>> use >>>>>>> server VM. >>>>>> I tried to mirror the original shell test as closely as possible. It >>>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>>> more >>>>>> readable. >>>>>> >>>>>> -JB- >>>>>> >>>>>>> -Dmitry >>>>>>> >>>>>>> >>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>> >>>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>>> >>>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>>> it to >>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>>> come >>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>>> >>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>>> need >>>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>>> inherent in >>>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>>> >>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had the >>>>>>>>>>> hotspot >>>>>>>>>>> test library's process tools available. :( >>>>>>>>>> We have some, is there an obvious gap? >>>>>>>>>> >>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>>> >>>>>>>>> Please, stand by for the updated webrev. >>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the >>>>>>>> target >>>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>>> The >>>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>>> need >>>>>>>> for JCMD any more. >>>>>>>> >>>>>>>> I could not find a way to remove the dependency on "test.jdk" >>>>>>>> system >>>>>>>> property. According to the jtreg web documentation >>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>>> "test.java" >>>>>>>> system property should be available but in fact is not. But it >>>>>>>> seems >>>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>>> >>>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>>> binary :) >>>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>>> linux-amd64 launcher while working on the test. >>>>>>>> >>>>>>>> While working with the test library I realized I was missing a >>>>>>>> crucial >>>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>>> message to >>>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>>> need >>>>>>>> to wait for the target process to get to certain point before the >>>>>>>> test >>>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>>> message in >>>>>>>> stdout/stderr. Currently all the proc tools are designed to work in >>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>>> analyzed after the target process died - and are not suitable for >>>>>>>> this >>>>>>>> kind of usage. >>>>>>>> >>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -Chris. >>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>>> >>>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>>> >>>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>>> original >>>>>>>>>>>> functionality and outputs should be 100% preserved. The >>>>>>>>>>>> patch is >>>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>>> similarity >>>>>>>>>>>> between the *.sh and *.java file so one needs to go through the >>>>>>>>>>>> new >>>>>>>>>>>> source in whole. >>>>>>>>>>>> >>>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>>> permissions to >>>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>>> Webrev : >>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>> >>>>> >>>> >> From staffan.larsen at oracle.com Wed Oct 9 07:10:45 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 9 Oct 2013 16:10:45 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <52553D63.5000508@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> Message-ID: <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns. This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter(). And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter(). /Staffan On 9 okt 2013, at 13:26, Jaroslav Bachorik wrote: > On 8.10.2013 23:46, David Holmes wrote: >> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>> On 8.10.2013 09:34, David Holmes wrote: >>>> Jaroslav, >>>> >>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>> Hello, >>>>> >>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>> System.currentTimeMillis() which makes it susceptible to changes of the >>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>>> should not depend on the system time and should be calculated using a >>>>> monotonic clock source. >>>>> >>>>> There is already the way to get the actual JVM uptime in ticks. It is >>>>> accessible as Management::timestamp() and the ticks are convertible to >>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>>> very >>>>> easy to switch to the monotonic clock based uptime. >>>> >>>> Maybe I'm missing something but TiumeStamp updates using >>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>> monotonic clock source. >>> >>> Hm, yes. I wasn't aware of this linux/bsd specific. >>> >>> Is there any reason why a non monotonic clock source is used for >>> timestamping except of the historical one? os::javaTimeNanos() uses >>> montonic clock when available - why can't be the same used for >>> os::elapsed_counter() especially when a counter based on "gettimeofday" >>> is not really a counter? >> >> It is all historical. These elapsed_counters and elapsed_timers make me >> cringe. But changing it has a lot of potential consequences because of >> the way these are used in logging etc. Certainly not something to be >> contemplated at this stage of JDK 8. >> >> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then >> be used for the uptime. > > My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot > I am using os::javaTimeNanos() to get the monotonic ticks where possible. > > The JDK part stays the same as for webrev.00 > > -JB- > >> >> David >> >>> -JB- >>> >>>> >>>> David >>>> ----- >>>> >>>> >>>> >>>>> The patch consists of the hotspot and jdk parts. >>>>> >>>>> For the hotspot a new constant needs to be introduced in >>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>> uptime in >>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>> >>>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>>> methods in order to get the new uptime, introducing the same constant >>>>> that is used in hotspot and changes to mapfile-vers files in order to >>>>> properly build the native library. >>>>> >>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>> >>>>> Thanks, >>>>> >>>>> -JB- >>> > From jaroslav.bachorik at oracle.com Wed Oct 9 07:19:48 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 09 Oct 2013 16:19:48 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> Message-ID: <52556604.3080900@oracle.com> On 9.10.2013 16:10, Staffan Larsen wrote: > There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns. > > This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter(). > > And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter(). Also the reported GC pauses duration might be wrong since it uses Management::timestamp(). On the first sight the change looks rather trivial. But, honestly, I'm not sure which other parts could for whatever reason break once the time-of-day timestamp is replaced with a monotonic equivalent. One would think that it shouldn't matter but one never knows ... Staffan, do you think this kind of change is suitable for the current phase of JDK release cycle? I think I could improve the patch in few days and then it should probably be able to pass the review before ZBB. But, it's only P3 ... -JB- > > /Staffan > > > On 9 okt 2013, at 13:26, Jaroslav Bachorik wrote: > >> On 8.10.2013 23:46, David Holmes wrote: >>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>> On 8.10.2013 09:34, David Holmes wrote: >>>>> Jaroslav, >>>>> >>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>> Hello, >>>>>> >>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>> System.currentTimeMillis() which makes it susceptible to changes of the >>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>>>> should not depend on the system time and should be calculated using a >>>>>> monotonic clock source. >>>>>> >>>>>> There is already the way to get the actual JVM uptime in ticks. It is >>>>>> accessible as Management::timestamp() and the ticks are convertible to >>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>>>> very >>>>>> easy to switch to the monotonic clock based uptime. >>>>> >>>>> Maybe I'm missing something but TiumeStamp updates using >>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>> monotonic clock source. >>>> >>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>> >>>> Is there any reason why a non monotonic clock source is used for >>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>> montonic clock when available - why can't be the same used for >>>> os::elapsed_counter() especially when a counter based on "gettimeofday" >>>> is not really a counter? >>> >>> It is all historical. These elapsed_counters and elapsed_timers make me >>> cringe. But changing it has a lot of potential consequences because of >>> the way these are used in logging etc. Certainly not something to be >>> contemplated at this stage of JDK 8. >>> >>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then >>> be used for the uptime. >> >> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >> I am using os::javaTimeNanos() to get the monotonic ticks where possible. >> >> The JDK part stays the same as for webrev.00 >> >> -JB- >> >>> >>> David >>> >>>> -JB- >>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>> >>>>> >>>>>> The patch consists of the hotspot and jdk parts. >>>>>> >>>>>> For the hotspot a new constant needs to be introduced in >>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>> uptime in >>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>> >>>>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>>>> methods in order to get the new uptime, introducing the same constant >>>>>> that is used in hotspot and changes to mapfile-vers files in order to >>>>>> properly build the native library. >>>>>> >>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -JB- >>>> >> > From staffan.larsen at oracle.com Wed Oct 9 11:12:47 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 9 Oct 2013 20:12:47 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <52556604.3080900@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> Message-ID: <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> On 9 okt 2013, at 16:19, Jaroslav Bachorik wrote: > On 9.10.2013 16:10, Staffan Larsen wrote: >> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns. >> >> This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter(). >> >> And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter(). > > Also the reported GC pauses duration might be wrong since it uses Management::timestamp(). > > On the first sight the change looks rather trivial. But, honestly, I'm not sure which other parts could for whatever reason break once the time-of-day timestamp is replaced with a monotonic equivalent. One would think that it shouldn't matter but one never knows ... > > Staffan, do you think this kind of change is suitable for the current phase of JDK release cycle? I think I could improve the patch in few days and then it should probably be able to pass the review before ZBB. But, it's only P3 ... I think it is a bit late in the release cycle to clean this up in the way it should be cleaned up. I think we should wait until the first 8 update release and do a more thorough job than we have time for right now. /Staffan > > -JB- > >> >> /Staffan >> >> >> On 9 okt 2013, at 13:26, Jaroslav Bachorik wrote: >> >>> On 8.10.2013 23:46, David Holmes wrote: >>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>> Jaroslav, >>>>>> >>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>> Hello, >>>>>>> >>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>>> System.currentTimeMillis() which makes it susceptible to changes of the >>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>>>>> should not depend on the system time and should be calculated using a >>>>>>> monotonic clock source. >>>>>>> >>>>>>> There is already the way to get the actual JVM uptime in ticks. It is >>>>>>> accessible as Management::timestamp() and the ticks are convertible to >>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>>>>> very >>>>>>> easy to switch to the monotonic clock based uptime. >>>>>> >>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>>> monotonic clock source. >>>>> >>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>> >>>>> Is there any reason why a non monotonic clock source is used for >>>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>>> montonic clock when available - why can't be the same used for >>>>> os::elapsed_counter() especially when a counter based on "gettimeofday" >>>>> is not really a counter? >>>> >>>> It is all historical. These elapsed_counters and elapsed_timers make me >>>> cringe. But changing it has a lot of potential consequences because of >>>> the way these are used in logging etc. Certainly not something to be >>>> contemplated at this stage of JDK 8. >>>> >>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then >>>> be used for the uptime. >>> >>> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>> I am using os::javaTimeNanos() to get the monotonic ticks where possible. >>> >>> The JDK part stays the same as for webrev.00 >>> >>> -JB- >>> >>>> >>>> David >>>> >>>>> -JB- >>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>> >>>>>> >>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>> >>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>> uptime in >>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>> >>>>>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>>>>> methods in order to get the new uptime, introducing the same constant >>>>>>> that is used in hotspot and changes to mapfile-vers files in order to >>>>>>> properly build the native library. >>>>>>> >>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -JB- >>>>> >>> >> > From david.holmes at oracle.com Wed Oct 9 20:44:52 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 10 Oct 2013 13:44:52 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> Message-ID: <525622B4.5020606@oracle.com> On 10/10/2013 4:12 AM, Staffan Larsen wrote: > > On 9 okt 2013, at 16:19, Jaroslav Bachorik wrote: > >> On 9.10.2013 16:10, Staffan Larsen wrote: >>> There is now an awful amount of different timestamps in the Management class. Can they be consolidated somehow? At least _begin_vm_creation_time and the new _begin_vm_creation_ns. >>> >>> This discussion also implies that the "elapsed time" we print in the hs_err file is also wrong. It uses os::elapsedTime() which uses os::elapsed_counter(). >>> >>> And I guess the same thing for the VM.uptime Diagnostic Command (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >> >> Also the reported GC pauses duration might be wrong since it uses Management::timestamp(). >> >> On the first sight the change looks rather trivial. But, honestly, I'm not sure which other parts could for whatever reason break once the time-of-day timestamp is replaced with a monotonic equivalent. One would think that it shouldn't matter but one never knows ... >> >> Staffan, do you think this kind of change is suitable for the current phase of JDK release cycle? I think I could improve the patch in few days and then it should probably be able to pass the review before ZBB. But, it's only P3 ... > > I think it is a bit late in the release cycle to clean this up in the way it should be cleaned up. I think we should wait until the first 8 update release and do a more thorough job than we have time for right now. I second that. The elapsed_counter/elpased_timer APIs and implementations are a tangled mess. But part of the problem has been that people want/expect monotonic time-of-day based timestamps (yes a contradiction - though some people make sure TOD does not get modified on their production systems). The use of timestamps in logging has to be examined carefully - mainly GC logging. I recall a "simple" attempted change in the past that resulted in trying to compare a nanoTime based timestamp with the TOD. :( David ----- > /Staffan > > >> >> -JB- >> >>> >>> /Staffan >>> >>> >>> On 9 okt 2013, at 13:26, Jaroslav Bachorik wrote: >>> >>>> On 8.10.2013 23:46, David Holmes wrote: >>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>> Jaroslav, >>>>>>> >>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>> Hello, >>>>>>>> >>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>>>> System.currentTimeMillis() which makes it susceptible to changes of the >>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>>>>>> should not depend on the system time and should be calculated using a >>>>>>>> monotonic clock source. >>>>>>>> >>>>>>>> There is already the way to get the actual JVM uptime in ticks. It is >>>>>>>> accessible as Management::timestamp() and the ticks are convertible to >>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>>>>>> very >>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>> >>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>>>> monotonic clock source. >>>>>> >>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>> >>>>>> Is there any reason why a non monotonic clock source is used for >>>>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>>>> montonic clock when available - why can't be the same used for >>>>>> os::elapsed_counter() especially when a counter based on "gettimeofday" >>>>>> is not really a counter? >>>>> >>>>> It is all historical. These elapsed_counters and elapsed_timers make me >>>>> cringe. But changing it has a lot of potential consequences because of >>>>> the way these are used in logging etc. Certainly not something to be >>>>> contemplated at this stage of JDK 8. >>>>> >>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then >>>>> be used for the uptime. >>>> >>>> My attempt at this is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>> I am using os::javaTimeNanos() to get the monotonic ticks where possible. >>>> >>>> The JDK part stays the same as for webrev.00 >>>> >>>> -JB- >>>> >>>>> >>>>> David >>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>> >>>>>>> >>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>> >>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>> uptime in >>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>> >>>>>>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>>>>>> methods in order to get the new uptime, introducing the same constant >>>>>>>> that is used in hotspot and changes to mapfile-vers files in order to >>>>>>>> properly build the native library. >>>>>>>> >>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -JB- >>>>>> >>>> >>> >> > From david.holmes at oracle.com Wed Oct 9 21:33:19 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 10 Oct 2013 14:33:19 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <52553D63.5000508@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> Message-ID: <52562E0F.7020108@oracle.com> On 9/10/2013 9:26 PM, Jaroslav Bachorik wrote: > On 8.10.2013 23:46, David Holmes wrote: >> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>> On 8.10.2013 09:34, David Holmes wrote: >>>> Jaroslav, >>>> >>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>> Hello, >>>>> >>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>> System.currentTimeMillis() which makes it susceptible to changes of >>>>> the >>>>> OS time (eg. changing timezone, NTP synchronization etc.). The uptime >>>>> should not depend on the system time and should be calculated using a >>>>> monotonic clock source. >>>>> >>>>> There is already the way to get the actual JVM uptime in ticks. It is >>>>> accessible as Management::timestamp() and the ticks are convertible to >>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus making it >>>>> very >>>>> easy to switch to the monotonic clock based uptime. >>>> >>>> Maybe I'm missing something but TiumeStamp updates using >>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>> monotonic clock source. >>> >>> Hm, yes. I wasn't aware of this linux/bsd specific. >>> >>> Is there any reason why a non monotonic clock source is used for >>> timestamping except of the historical one? os::javaTimeNanos() uses >>> montonic clock when available - why can't be the same used for >>> os::elapsed_counter() especially when a counter based on "gettimeofday" >>> is not really a counter? >> >> It is all historical. These elapsed_counters and elapsed_timers make me >> cringe. But changing it has a lot of potential consequences because of >> the way these are used in logging etc. Certainly not something to be >> contemplated at this stage of JDK 8. >> >> Perhaps a simpler fix here is to expose a startUpTimeNanos that can then >> be used for the uptime. > > My attempt at this is at > http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot > I am using os::javaTimeNanos() to get the monotonic ticks where possible. Only nit with this is that you initialize _begin_vm_creation_ns very early compared to the other timestamps. Plus I'm not even certain when this global initializer will execute relative to the VM initialization sequence! Best to move it into Management::init() to where _begin_vm_creation_time is initialized. David ----- > The JDK part stays the same as for webrev.00 > > -JB- > >> >> David >> >>> -JB- >>> >>>> >>>> David >>>> ----- >>>> >>>> >>>> >>>>> The patch consists of the hotspot and jdk parts. >>>>> >>>>> For the hotspot a new constant needs to be introduced in >>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>> uptime in >>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>> >>>>> For the jdk the changes comprise of adding the necessary JNI bridging >>>>> methods in order to get the new uptime, introducing the same constant >>>>> that is used in hotspot and changes to mapfile-vers files in order to >>>>> properly build the native library. >>>>> >>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>> >>>>> Thanks, >>>>> >>>>> -JB- >>> > From david.holmes at oracle.com Wed Oct 9 21:41:25 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 10 Oct 2013 14:41:25 +1000 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <52553EAD.4040506@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com> <52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com> Message-ID: <52562FF5.5060304@oracle.com> On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote: > On 9.10.2013 12:23, David Holmes wrote: >> Jaroslav, >> >> Thanks for the details description of changes - much appreciated. >> >> There is a lot to digest in there. :) > > Yep, it started as a simple fix :/ > >> >> It isn't obvious to me why these tests require a full JDK? > > IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its > requirements are the same as the ones of CustomLauncherTest.sh (now > *.java) so it seemed logical to list it there too. Ah! Now I see it - it uses tools.jar which implies a full JDK. >> >> I don't quite follow the libjvm lookup logic - I would expect that you >> would always want to test the libjvm that is currently running - though >> it is hard to determine that. > > I'm afraid I can't be of much assistance here - I just took what was in > the *.sh version and converted it to *.java. Okay. I expect this will need revisiting at some point. Thanks, David ----- > -JB- > >> >> Thanks, >> David >> >> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote: >>> On 8.10.2013 05:42, David Holmes wrote: >>>> Jaroslav, >>>> >>>> Can you summarise the changes please? With the conversion to Java and >>>> the infrastructure additions I can't tell what is actually fixing the >>>> original timeout issue :) >>> >>> The timeout was most caused by using the same file for communication >>> between java processes in more test cases. When those test cases were >>> run in parallel the file got rewritten silently and some of the tests >>> could end up trying to connect to incorrect port in the target >>> application. I was able to reproduce the timeout by interleaving the >>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and >>> adding an artificial delay to CusteomLauncherTest.sh to allow >>> LocalManagementTest.sh to change the port in the file. >>> >>> While it could be fixed by using a different file for each test case I >>> took the liberty of converting the shell tests to java tests. This >>> allows me to remove the communication file and, in the end, make the >>> tests more robust. >>> >>> CustomLauncherTest.java and LocalManagementTest.java are the tests >>> converted from shell to java. I decided to convert >>> LocalManagementTest.sh as well because it has the same problems as the >>> CustomLauncherTest.sh. >>> >>> The changes in the testlibrary are about introducing new methods >>> allowing the tests easily start a process and wait for a certain text >>> appearing in its stdout/stderr. Using these methods the caller can wait >>> till the callee is fully initialized and eg. ready to accept >>> connections. >>> >>> The changes in launchers make the launchers actually executable + I am >>> adding a linux-amd64 launcher (I needed that one to work on the changes >>> locally and thought it might be nice to have one more platform covered >>> by the test). >>> >>> I've update the webrev to include changes to LocalManagementTest and >>> TEST.groups (both of those tests require JDK) - >>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 >>> >>> -JB- >>> >>>> >>>> Thanks, >>>> David >>>> >>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>>>> The updated webrev: >>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>>>> >>>>>> I've moved some of the functionality to the testlibrary. >>>>>> >>>>>> -JB - >>>>>> >>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>>>> Jaroslav, >>>>>>>> >>>>>>>> CustomLauncherTest.java: >>>>>>>> >>>>>>>> 102: this check could be moved to switch at ll. 108 >>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM >>>>>>>> remains >>>>>>>> unset. >>>>>>> Good idea. Thanks. >>>>>>> >>>>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>>>> script. Could you prepare a list of VM's to check and just loop >>>>>>>> over >>>>>>>> it? >>>>>>>> It makes test better readable. Also I think nowdays we can always >>>>>>>> use >>>>>>>> server VM. >>>>>>> I tried to mirror the original shell test as closely as possible. It >>>>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>>>> more >>>>>>> readable. >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> -Dmitry >>>>>>>> >>>>>>>> >>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>>> >>>>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>>>> >>>>>>>>>>>> As written this test can only run on a full JDK - so please add >>>>>>>>>>>> it to >>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really needs to >>>>>>>>>>>> come >>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>>>> >>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>>>> need >>>>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>>>> inherent in >>>>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>>>> >>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had >>>>>>>>>>>> the >>>>>>>>>>>> hotspot >>>>>>>>>>>> test library's process tools available. :( >>>>>>>>>>> We have some, is there an obvious gap? >>>>>>>>>>> >>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> Hm, thanks for the info. I should have used this library instead. >>>>>>>>>> >>>>>>>>>> Please, stand by for the updated webrev. >>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the >>>>>>>>> target >>>>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>>>> The >>>>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>>>> need >>>>>>>>> for JCMD any more. >>>>>>>>> >>>>>>>>> I could not find a way to remove the dependency on "test.jdk" >>>>>>>>> system >>>>>>>>> property. According to the jtreg web documentation >>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>>>> "test.java" >>>>>>>>> system property should be available but in fact is not. But it >>>>>>>>> seems >>>>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>>>> >>>>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>>>> binary :) >>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>>>> linux-amd64 launcher while working on the test. >>>>>>>>> >>>>>>>>> While working with the test library I realized I was missing a >>>>>>>>> crucial >>>>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>>>> message to >>>>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>>>> need >>>>>>>>> to wait for the target process to get to certain point before the >>>>>>>>> test >>>>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>>>> message in >>>>>>>>> stdout/stderr. Currently all the proc tools are designed to >>>>>>>>> work in >>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>>>> analyzed after the target process died - and are not suitable for >>>>>>>>> this >>>>>>>>> kind of usage. >>>>>>>>> >>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Chris. >>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>>>> >>>>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>>>> >>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>>>> original >>>>>>>>>>>>> functionality and outputs should be 100% preserved. The >>>>>>>>>>>>> patch is >>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>>>> similarity >>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through >>>>>>>>>>>>> the >>>>>>>>>>>>> new >>>>>>>>>>>>> source in whole. >>>>>>>>>>>>> >>>>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>>>> permissions to >>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>>>> Webrev : >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>> >>>>>> >>>>> >>> > From jaroslav.bachorik at oracle.com Thu Oct 10 04:02:24 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 10 Oct 2013 13:02:24 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525622B4.5020606@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> Message-ID: <52568940.4000704@oracle.com> On 10.10.2013 05:44, David Holmes wrote: > On 10/10/2013 4:12 AM, Staffan Larsen wrote: >> >> On 9 okt 2013, at 16:19, Jaroslav Bachorik >> wrote: >> >>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>> There is now an awful amount of different timestamps in the >>>> Management class. Can they be consolidated somehow? At least >>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>> >>>> This discussion also implies that the "elapsed time" we print in the >>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>> os::elapsed_counter(). >>>> >>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>> >>> Also the reported GC pauses duration might be wrong since it uses >>> Management::timestamp(). >>> >>> On the first sight the change looks rather trivial. But, honestly, >>> I'm not sure which other parts could for whatever reason break once >>> the time-of-day timestamp is replaced with a monotonic equivalent. >>> One would think that it shouldn't matter but one never knows ... >>> >>> Staffan, do you think this kind of change is suitable for the current >>> phase of JDK release cycle? I think I could improve the patch in few >>> days and then it should probably be able to pass the review before >>> ZBB. But, it's only P3 ... >> >> I think it is a bit late in the release cycle to clean this up in the >> way it should be cleaned up. I think we should wait until the first 8 >> update release and do a more thorough job than we have time for right >> now. > > I second that. The elapsed_counter/elpased_timer APIs and > implementations are a tangled mess. But part of the problem has been > that people want/expect monotonic time-of-day based timestamps (yes a > contradiction - though some people make sure TOD does not get modified > on their production systems). The use of timestamps in logging has to be > examined carefully - mainly GC logging. I recall a "simple" attempted > change in the past that resulted in trying to compare a nanoTime based > timestamp with the TOD. :( Actually, if I'm reading the sources right for Solaris and Win the monotonic clock source is used to provide elapsed_counter() value. It falls back to TOD when the monotonic clock source is not available. For Linux/BSD the TOD is used directly. This makes me wonder if changing the linux/bsd implementation to follow the same logic would be really that disruptive. -JB- > > David > ----- > >> /Staffan >> >> >>> >>> -JB- >>> >>>> >>>> /Staffan >>>> >>>> >>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>> wrote: >>>> >>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>> Jaroslav, >>>>>>>> >>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>> changes of the >>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The >>>>>>>>> uptime >>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>> using a >>>>>>>>> monotonic clock source. >>>>>>>>> >>>>>>>>> There is already the way to get the actual JVM uptime in ticks. >>>>>>>>> It is >>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>> convertible to >>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>> making it >>>>>>>>> very >>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>> >>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>>>>> monotonic clock source. >>>>>>> >>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>> >>>>>>> Is there any reason why a non monotonic clock source is used for >>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>>>>> montonic clock when available - why can't be the same used for >>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>> "gettimeofday" >>>>>>> is not really a counter? >>>>>> >>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>> make me >>>>>> cringe. But changing it has a lot of potential consequences >>>>>> because of >>>>>> the way these are used in logging etc. Certainly not something to be >>>>>> contemplated at this stage of JDK 8. >>>>>> >>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>> can then >>>>>> be used for the uptime. >>>>> >>>>> My attempt at this is at >>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>> possible. >>>>> >>>>> The JDK part stays the same as for webrev.00 >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> David >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>> >>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>>> uptime in >>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>>> >>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>> bridging >>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>> constant >>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>> order to >>>>>>>>> properly build the native library. >>>>>>>>> >>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -JB- >>>>>>> >>>>> >>>> >>> >> From staffan.larsen at oracle.com Thu Oct 10 04:15:49 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 10 Oct 2013 13:15:49 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <52568940.4000704@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> Message-ID: <23435103-156B-434F-994C-B6F913EE0364@oracle.com> On 10 okt 2013, at 13:02, Jaroslav Bachorik wrote: > On 10.10.2013 05:44, David Holmes wrote: >> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>> >>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>> wrote: >>> >>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>> There is now an awful amount of different timestamps in the >>>>> Management class. Can they be consolidated somehow? At least >>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>> >>>>> This discussion also implies that the "elapsed time" we print in the >>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>> os::elapsed_counter(). >>>>> >>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>> >>>> Also the reported GC pauses duration might be wrong since it uses >>>> Management::timestamp(). >>>> >>>> On the first sight the change looks rather trivial. But, honestly, >>>> I'm not sure which other parts could for whatever reason break once >>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>> One would think that it shouldn't matter but one never knows ... >>>> >>>> Staffan, do you think this kind of change is suitable for the current >>>> phase of JDK release cycle? I think I could improve the patch in few >>>> days and then it should probably be able to pass the review before >>>> ZBB. But, it's only P3 ... >>> >>> I think it is a bit late in the release cycle to clean this up in the >>> way it should be cleaned up. I think we should wait until the first 8 >>> update release and do a more thorough job than we have time for right >>> now. >> >> I second that. The elapsed_counter/elpased_timer APIs and >> implementations are a tangled mess. But part of the problem has been >> that people want/expect monotonic time-of-day based timestamps (yes a >> contradiction - though some people make sure TOD does not get modified >> on their production systems). The use of timestamps in logging has to be >> examined carefully - mainly GC logging. I recall a "simple" attempted >> change in the past that resulted in trying to compare a nanoTime based >> timestamp with the TOD. :( > > Actually, if I'm reading the sources right for Solaris and Win the monotonic clock source is used to provide elapsed_counter() value. It falls back to TOD when the monotonic clock source is not available. > For Linux/BSD the TOD is used directly. > > This makes me wonder if changing the linux/bsd implementation to follow the same logic would be really that disruptive. Good point. I would like a world where elapsed_counter is monotonic (where possible). Still a bit scary this late in the release, but an interesting experiment. /Staffan > > -JB- >> >> David >> ----- >> >>> /Staffan >>> >>> >>>> >>>> -JB- >>>> >>>>> >>>>> /Staffan >>>>> >>>>> >>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>> wrote: >>>>> >>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>> Jaroslav, >>>>>>>>> >>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>> Hello, >>>>>>>>>> >>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>> changes of the >>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The >>>>>>>>>> uptime >>>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>>> using a >>>>>>>>>> monotonic clock source. >>>>>>>>>> >>>>>>>>>> There is already the way to get the actual JVM uptime in ticks. >>>>>>>>>> It is >>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>> convertible to >>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>> making it >>>>>>>>>> very >>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>> >>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>>>>>> monotonic clock source. >>>>>>>> >>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>> >>>>>>>> Is there any reason why a non monotonic clock source is used for >>>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>> "gettimeofday" >>>>>>>> is not really a counter? >>>>>>> >>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>> make me >>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>> because of >>>>>>> the way these are used in logging etc. Certainly not something to be >>>>>>> contemplated at this stage of JDK 8. >>>>>>> >>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>> can then >>>>>>> be used for the uptime. >>>>>> >>>>>> My attempt at this is at >>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>> possible. >>>>>> >>>>>> The JDK part stays the same as for webrev.00 >>>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> David >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>> >>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>>>> uptime in >>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>>>> >>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>> bridging >>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>> constant >>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>> order to >>>>>>>>>> properly build the native library. >>>>>>>>>> >>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>> >>>>>> >>>>> >>>> >>> > From jaroslav.bachorik at oracle.com Mon Oct 14 07:13:33 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 14 Oct 2013 16:13:33 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <23435103-156B-434F-994C-B6F913EE0364@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> Message-ID: <525BFC0D.8090101@oracle.com> On 10.10.2013 13:15, Staffan Larsen wrote: > > On 10 okt 2013, at 13:02, Jaroslav Bachorik wrote: > >> On 10.10.2013 05:44, David Holmes wrote: >>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>> >>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>> wrote: >>>> >>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>> There is now an awful amount of different timestamps in the >>>>>> Management class. Can they be consolidated somehow? At least >>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>> >>>>>> This discussion also implies that the "elapsed time" we print in the >>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>> os::elapsed_counter(). >>>>>> >>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>> >>>>> Also the reported GC pauses duration might be wrong since it uses >>>>> Management::timestamp(). >>>>> >>>>> On the first sight the change looks rather trivial. But, honestly, >>>>> I'm not sure which other parts could for whatever reason break once >>>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>>> One would think that it shouldn't matter but one never knows ... >>>>> >>>>> Staffan, do you think this kind of change is suitable for the current >>>>> phase of JDK release cycle? I think I could improve the patch in few >>>>> days and then it should probably be able to pass the review before >>>>> ZBB. But, it's only P3 ... >>>> >>>> I think it is a bit late in the release cycle to clean this up in the >>>> way it should be cleaned up. I think we should wait until the first 8 >>>> update release and do a more thorough job than we have time for right >>>> now. >>> >>> I second that. The elapsed_counter/elpased_timer APIs and >>> implementations are a tangled mess. But part of the problem has been >>> that people want/expect monotonic time-of-day based timestamps (yes a >>> contradiction - though some people make sure TOD does not get modified >>> on their production systems). The use of timestamps in logging has to be >>> examined carefully - mainly GC logging. I recall a "simple" attempted >>> change in the past that resulted in trying to compare a nanoTime based >>> timestamp with the TOD. :( >> >> Actually, if I'm reading the sources right for Solaris and Win the monotonic clock source is used to provide elapsed_counter() value. It falls back to TOD when the monotonic clock source is not available. >> For Linux/BSD the TOD is used directly. >> >> This makes me wonder if changing the linux/bsd implementation to follow the same logic would be really that disruptive. > > Good point. I would like a world where elapsed_counter is monotonic (where possible). Still a bit scary this late in the release, but an interesting experiment. The change is rather simple and tests ok. All the means to get a monotonic timestamp are already there and proved to work. The core tests in JPRT went fine. The updated webrev is at http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 -JB- > > /Staffan > > > > >> >> -JB- >>> >>> David >>> ----- >>> >>>> /Staffan >>>> >>>> >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> /Staffan >>>>>> >>>>>> >>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>> wrote: >>>>>> >>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>> Jaroslav, >>>>>>>>>> >>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>> Hello, >>>>>>>>>>> >>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is based on >>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>> changes of the >>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The >>>>>>>>>>> uptime >>>>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>>>> using a >>>>>>>>>>> monotonic clock source. >>>>>>>>>>> >>>>>>>>>>> There is already the way to get the actual JVM uptime in ticks. >>>>>>>>>>> It is >>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>> convertible to >>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>> making it >>>>>>>>>>> very >>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>> >>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is not a >>>>>>>>>> monotonic clock source. >>>>>>>>> >>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>> >>>>>>>>> Is there any reason why a non monotonic clock source is used for >>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() uses >>>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>> "gettimeofday" >>>>>>>>> is not really a counter? >>>>>>>> >>>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>>> make me >>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>> because of >>>>>>>> the way these are used in logging etc. Certainly not something to be >>>>>>>> contemplated at this stage of JDK 8. >>>>>>>> >>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>>> can then >>>>>>>> be used for the uptime. >>>>>>> >>>>>>> My attempt at this is at >>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>> possible. >>>>>>> >>>>>>> The JDK part stays the same as for webrev.00 >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> David >>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>> >>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>>>>> uptime in >>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>>>>> >>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>>> bridging >>>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>>> constant >>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>>> order to >>>>>>>>>>> properly build the native library. >>>>>>>>>>> >>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >> > From jaroslav.bachorik at oracle.com Mon Oct 14 08:21:52 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 14 Oct 2013 17:21:52 +0200 Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on slower machines Message-ID: <525C0C10.7000104@oracle.com> Please, review the following simple change. The test times out on slower machines and I was able to reproduce the failure even on a normally fast machine using the fastdebug build. The timeout does not occur on every run - more like once in 10-15 runs. There is nothing really wrong with the test - it just takes rather long time to obtain the jvmstat counters. The remedy is to specify a longer timeout and see if it is enough. I am using 10 minutes for the timeout in the patch. Issue : https://bugs.openjdk.java.net/browse/JDK-6804470 Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00 Thanks, -JB- From Alan.Bateman at oracle.com Mon Oct 14 11:11:25 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 14 Oct 2013 19:11:25 +0100 Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on slower machines In-Reply-To: <525C0C10.7000104@oracle.com> References: <525C0C10.7000104@oracle.com> Message-ID: <525C33CD.4010505@oracle.com> On 14/10/2013 16:21, Jaroslav Bachorik wrote: > Please, review the following simple change. > > The test times out on slower machines and I was able to reproduce the > failure even on a normally fast machine using the fastdebug build. The > timeout does not occur on every run - more like once in 10-15 runs. > > There is nothing really wrong with the test - it just takes rather > long time to obtain the jvmstat counters. The remedy is to specify a > longer timeout and see if it is enough. I am using 10 minutes for the > timeout in the patch. > > Issue : https://bugs.openjdk.java.net/browse/JDK-6804470 > Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00 > > Thanks, > > -JB- This looks okay to me but if someone is testing a fastdebug build then they really need to specify the -timeoutFactor option to jtreg so as to scale the timeouts. -Alan. From david.holmes at oracle.com Mon Oct 14 23:49:17 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 15 Oct 2013 16:49:17 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525BFC0D.8090101@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> Message-ID: <525CE56D.4000708@oracle.com> Hi Jaroslav, os_bsd.cpp / os_linux.cpp: If you don't have a monotonic clock you leave timer_frequency set to 0! (So you need to test on a system without a monotonic clock, or else force it to act as-if not present.) That aside I don't trust clock_getres to give values that actually allow the timer frequency to be determined. As per the comments in os_linux.cpp: // It's fixed in newer kernels, however clock_getres() still returns // 1/HZ. We check if clock_getres() works, but will ignore its reported // resolution for now. Hopefully as people move to new kernels, this // won't be a problem. we don't know what kernels provide real values here and which provide dummy ones. On BSD you haven't modified os::elapsed_counter. Looking at the linux changes I don't think the logic is correct even if clock_getres is accurate. In the existing code we have: elapsed_counter -> elapsed time in microseconds elapsed_frequency -> 1000 * 1000 (ie micros per second) elapsed_time -> elapsed_counter*0.000001 -> time in seconds Now we have: elapsed_counter -> elapsed time in nanoseconds elapsed_frequency -> 1x10^9 / whatever clock_getres says elapsed_time -> counter/frequency -> ??? So elapsed_time not, in general, going to give the elapsed time in seconds. And elapsed_time is not dependent on the "frequency" at all because elapsed_counter is not reporting ticks but an actual elapsed "time" in nanoseconds. Also note that we constants for: NANOSECS_PER_SEC NANOSECS_PER_MILLISEC to aid with time conversions. The linux webrev contains unrelated UseLargePages changes! David ----- On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: > On 10.10.2013 13:15, Staffan Larsen wrote: >> >> On 10 okt 2013, at 13:02, Jaroslav Bachorik >> wrote: >> >>> On 10.10.2013 05:44, David Holmes wrote: >>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>> >>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>> wrote: >>>>> >>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>> There is now an awful amount of different timestamps in the >>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>> >>>>>>> This discussion also implies that the "elapsed time" we print in the >>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>> os::elapsed_counter(). >>>>>>> >>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>>> >>>>>> Also the reported GC pauses duration might be wrong since it uses >>>>>> Management::timestamp(). >>>>>> >>>>>> On the first sight the change looks rather trivial. But, honestly, >>>>>> I'm not sure which other parts could for whatever reason break once >>>>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>>>> One would think that it shouldn't matter but one never knows ... >>>>>> >>>>>> Staffan, do you think this kind of change is suitable for the current >>>>>> phase of JDK release cycle? I think I could improve the patch in few >>>>>> days and then it should probably be able to pass the review before >>>>>> ZBB. But, it's only P3 ... >>>>> >>>>> I think it is a bit late in the release cycle to clean this up in the >>>>> way it should be cleaned up. I think we should wait until the first 8 >>>>> update release and do a more thorough job than we have time for right >>>>> now. >>>> >>>> I second that. The elapsed_counter/elpased_timer APIs and >>>> implementations are a tangled mess. But part of the problem has been >>>> that people want/expect monotonic time-of-day based timestamps (yes a >>>> contradiction - though some people make sure TOD does not get modified >>>> on their production systems). The use of timestamps in logging has >>>> to be >>>> examined carefully - mainly GC logging. I recall a "simple" attempted >>>> change in the past that resulted in trying to compare a nanoTime based >>>> timestamp with the TOD. :( >>> >>> Actually, if I'm reading the sources right for Solaris and Win the >>> monotonic clock source is used to provide elapsed_counter() value. It >>> falls back to TOD when the monotonic clock source is not available. >>> For Linux/BSD the TOD is used directly. >>> >>> This makes me wonder if changing the linux/bsd implementation to >>> follow the same logic would be really that disruptive. >> >> Good point. I would like a world where elapsed_counter is monotonic >> (where possible). Still a bit scary this late in the release, but an >> interesting experiment. > > The change is rather simple and tests ok. All the means to get a > monotonic timestamp are already there and proved to work. The core tests > in JPRT went fine. > > The updated webrev is at > http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 > > -JB- > >> >> /Staffan >> >> >> >> >>> >>> -JB- >>>> >>>> David >>>> ----- >>>> >>>>> /Staffan >>>>> >>>>> >>>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>> >>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>> Jaroslav, >>>>>>>>>>> >>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>> Hello, >>>>>>>>>>>> >>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>> based on >>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>> changes of the >>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The >>>>>>>>>>>> uptime >>>>>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>>>>> using a >>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>> >>>>>>>>>>>> There is already the way to get the actual JVM uptime in ticks. >>>>>>>>>>>> It is >>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>> convertible to >>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>> making it >>>>>>>>>>>> very >>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>> >>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is >>>>>>>>>>> not a >>>>>>>>>>> monotonic clock source. >>>>>>>>>> >>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>> >>>>>>>>>> Is there any reason why a non monotonic clock source is used for >>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() >>>>>>>>>> uses >>>>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>> "gettimeofday" >>>>>>>>>> is not really a counter? >>>>>>>>> >>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>>>> make me >>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>> because of >>>>>>>>> the way these are used in logging etc. Certainly not something >>>>>>>>> to be >>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>> >>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>>>> can then >>>>>>>>> be used for the uptime. >>>>>>>> >>>>>>>> My attempt at this is at >>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>>> possible. >>>>>>>> >>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> David >>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> ----- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>> >>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>>>>>> uptime in >>>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>>>>>> >>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>>>> bridging >>>>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>>>> constant >>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>>>> order to >>>>>>>>>>>> properly build the native library. >>>>>>>>>>>> >>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>> >> > From jaroslav.bachorik at oracle.com Tue Oct 15 01:10:03 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 15 Oct 2013 10:10:03 +0200 Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on slower machines In-Reply-To: <525C33CD.4010505@oracle.com> References: <525C0C10.7000104@oracle.com> <525C33CD.4010505@oracle.com> Message-ID: <525CF85B.7080301@oracle.com> On 14.10.2013 20:11, Alan Bateman wrote: > On 14/10/2013 16:21, Jaroslav Bachorik wrote: >> Please, review the following simple change. >> >> The test times out on slower machines and I was able to reproduce the >> failure even on a normally fast machine using the fastdebug build. The >> timeout does not occur on every run - more like once in 10-15 runs. >> >> There is nothing really wrong with the test - it just takes rather >> long time to obtain the jvmstat counters. The remedy is to specify a >> longer timeout and see if it is enough. I am using 10 minutes for the >> timeout in the patch. >> >> Issue : https://bugs.openjdk.java.net/browse/JDK-6804470 >> Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00 >> >> Thanks, >> >> -JB- > This looks okay to me but if someone is testing a fastdebug build then > they really need to specify the -timeoutFactor option to jtreg so as to > scale the timeouts. Thanks for the review. I'm talking to QE about using the -timeoutFactor option in the automated test runs if possible. -JB- > > -Alan. From jaroslav.bachorik at oracle.com Tue Oct 15 06:01:32 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 15 Oct 2013 15:01:32 +0200 Subject: jmx-dev RFR: 6804470 JvmstatCountersTest.java test times out on slower machines In-Reply-To: <525C33CD.4010505@oracle.com> References: <525C0C10.7000104@oracle.com> <525C33CD.4010505@oracle.com> Message-ID: <525D3CAC.1030905@oracle.com> On 14.10.2013 20:11, Alan Bateman wrote: > On 14/10/2013 16:21, Jaroslav Bachorik wrote: >> Please, review the following simple change. >> >> The test times out on slower machines and I was able to reproduce the >> failure even on a normally fast machine using the fastdebug build. The >> timeout does not occur on every run - more like once in 10-15 runs. >> >> There is nothing really wrong with the test - it just takes rather >> long time to obtain the jvmstat counters. The remedy is to specify a >> longer timeout and see if it is enough. I am using 10 minutes for the >> timeout in the patch. >> >> Issue : https://bugs.openjdk.java.net/browse/JDK-6804470 >> Webrev: http://cr.openjdk.java.net/~jbachorik/6804470/webrev.00 >> >> Thanks, >> >> -JB- > This looks okay to me but if someone is testing a fastdebug build then > they really need to specify the -timeoutFactor option to jtreg so as to > scale the timeouts. Just FYI - the SQE does use timeoutFactor 8x for the fastdebug runs. I hope this will be enough in combination with the extended timeout in the tests. -JB- > > -Alan. From shanliang.jiang at oracle.com Wed Oct 16 06:58:31 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 16 Oct 2013 15:58:31 +0200 Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report some issue in com.sun.jmx.snmp package In-Reply-To: <508E8F79.60909@oracle.com> References: <508E8F79.60909@oracle.com> Message-ID: <525E9B87.9050406@oracle.com> Hi, Please review the following fix, main issue here is that we should clone an internal variable before returning. webrev: http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/ bug https://bugs.openjdk.java.net/browse/JDK-8026028 Thanks, Shanliang From jaroslav.bachorik at oracle.com Wed Oct 16 07:18:39 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 16 Oct 2013 16:18:39 +0200 Subject: jmx-dev RFR 7197919: java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency issues Message-ID: <525EA03F.2070106@oracle.com> Please, review this simple test change. The test tries to get the number of times a certain thread was blocked during the test run and intermittently fails with the difference of 1 - the expected number is 4 but the reported number is 3. When updating the thread statistics (the blocked count in this case) no lock is used so there might be stale data when the ThreadMXBean retrieves the stats. The patch tries to workaround this problem by retrying a few times with the added delay. The test will try to obtain the correct result for at most 10 seconds - after that it will fail if the retrieved blocked count does not equal the expected blocked count. Issue : https://bugs.openjdk.java.net/browse/JDK-7197919 Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00 Thanks, -JB- From jaroslav.bachorik at oracle.com Wed Oct 16 07:44:47 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 16 Oct 2013 16:44:47 +0200 Subject: jmx-dev [PING] Re: RFR: 8024613 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java failing intermittently In-Reply-To: <524BFB87.10808@oracle.com> References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com> <524BFB87.10808@oracle.com> Message-ID: <525EA65F.9040509@oracle.com> On 2.10.2013 12:55, Jaroslav Bachorik wrote: > On 20.9.2013 14:54, shanliang wrote: >> Jaroslav, >> >> It is a good idea to use the RMI Testlibrary. >> >> Better to call: >> agent.close(); >> >> at Line 55, close the RMIRegistry (rmid.shutdown(rmidPort) Line 55) >> does not ensure the JMX connector doing full clean, it is always better >> to do clean within a test. > > Thanks. Implemented. > > http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01 > > -JB- > >> >> Shanliang >> >> >> Jaroslav Bachorik wrote: >>> Please, review the following change for JDK-8024613 >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613 >>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/ >>> >>> >>> The patch takes care of intermittent test failures caused by timing >>> issues when starting the RMID process. It could happen that the RMID >>> process hasn't been properly initialized in the timeframe of 5 seconds >>> and the test would fail. >>> >>> The patch replaces the home-brewed RMID process management with the >>> one available in the RMI Testlibrary which is used by more tests and >>> therefore should be more stable. >>> >>> Thanks, >>> >>> -JB- >> > From daniel.fuchs at oracle.com Wed Oct 16 07:49:42 2013 From: daniel.fuchs at oracle.com (Daniel Fuchs) Date: Wed, 16 Oct 2013 16:49:42 +0200 Subject: jmx-dev [PING] Re: RFR: 8024613 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java failing intermittently In-Reply-To: <525EA65F.9040509@oracle.com> References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com> <524BFB87.10808@oracle.com> <525EA65F.9040509@oracle.com> Message-ID: <525EA786.6020508@oracle.com> Hi Jaroslav, Looks fine to me (not a reviewer). -- daniel On 10/16/13 4:44 PM, Jaroslav Bachorik wrote: > On 2.10.2013 12:55, Jaroslav Bachorik wrote: >> On 20.9.2013 14:54, shanliang wrote: >>> Jaroslav, >>> >>> It is a good idea to use the RMI Testlibrary. >>> >>> Better to call: >>> agent.close(); >>> >>> at Line 55, close the RMIRegistry (rmid.shutdown(rmidPort) Line 55) >>> does not ensure the JMX connector doing full clean, it is always better >>> to do clean within a test. >> >> Thanks. Implemented. >> >> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01 >> >> -JB- >> >>> >>> Shanliang >>> >>> >>> Jaroslav Bachorik wrote: >>>> Please, review the following change for JDK-8024613 >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/ >>>> >>>> >>>> The patch takes care of intermittent test failures caused by timing >>>> issues when starting the RMID process. It could happen that the RMID >>>> process hasn't been properly initialized in the timeframe of 5 seconds >>>> and the test would fail. >>>> >>>> The patch replaces the home-brewed RMID process management with the >>>> one available in the RMI Testlibrary which is used by more tests and >>>> therefore should be more stable. >>>> >>>> Thanks, >>>> >>>> -JB- >>> >> > From shanliang.jiang at oracle.com Wed Oct 16 07:50:13 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Wed, 16 Oct 2013 16:50:13 +0200 Subject: jmx-dev [PING] Re: RFR: 8024613 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java failing intermittently In-Reply-To: <525EA65F.9040509@oracle.com> References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com> <524BFB87.10808@oracle.com> <525EA65F.9040509@oracle.com> Message-ID: <525EA7A5.7080904@oracle.com> Looks fine to me. Shanliang Jaroslav Bachorik wrote: > On 2.10.2013 12:55, Jaroslav Bachorik wrote: >> On 20.9.2013 14:54, shanliang wrote: >>> Jaroslav, >>> >>> It is a good idea to use the RMI Testlibrary. >>> >>> Better to call: >>> agent.close(); >>> >>> at Line 55, close the RMIRegistry (rmid.shutdown(rmidPort) Line 55) >>> does not ensure the JMX connector doing full clean, it is always better >>> to do clean within a test. >> >> Thanks. Implemented. >> >> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01 >> >> -JB- >> >>> >>> Shanliang >>> >>> >>> Jaroslav Bachorik wrote: >>>> Please, review the following change for JDK-8024613 >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/ >>>> >>>> >>>> The patch takes care of intermittent test failures caused by timing >>>> issues when starting the RMID process. It could happen that the RMID >>>> process hasn't been properly initialized in the timeframe of 5 seconds >>>> and the test would fail. >>>> >>>> The patch replaces the home-brewed RMID process management with the >>>> one available in the RMI Testlibrary which is used by more tests and >>>> therefore should be more stable. >>>> >>>> Thanks, >>>> >>>> -JB- >>> >> > From jaroslav.bachorik at oracle.com Wed Oct 16 09:16:15 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 16 Oct 2013 18:16:15 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525CE56D.4000708@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> Message-ID: <525EBBCF.3020303@oracle.com> On 15.10.2013 08:49, David Holmes wrote: > Hi Jaroslav, > > os_bsd.cpp / os_linux.cpp: > > If you don't have a monotonic clock you leave timer_frequency set to 0! > (So you need to test on a system without a monotonic clock, or else > force it to act as-if not present.) > > That aside I don't trust clock_getres to give values that actually allow > the timer frequency to be determined. As per the comments in os_linux.cpp: > > // It's fixed in newer kernels, however clock_getres() still returns > // 1/HZ. We check if clock_getres() works, but will ignore its reported > // resolution for now. Hopefully as people move to new kernels, this > // won't be a problem. > > we don't know what kernels provide real values here and which provide > dummy ones. > > On BSD you haven't modified os::elapsed_counter. > > Looking at the linux changes I don't think the logic is correct even if > clock_getres is accurate. In the existing code we have: > > elapsed_counter -> elapsed time in microseconds > elapsed_frequency -> 1000 * 1000 (ie micros per second) > elapsed_time -> elapsed_counter*0.000001 -> time in seconds > > Now we have: > > elapsed_counter -> elapsed time in nanoseconds > elapsed_frequency -> 1x10^9 / whatever clock_getres says > elapsed_time -> counter/frequency -> ??? > > So elapsed_time not, in general, going to give the elapsed time in > seconds. And elapsed_time is not dependent on the "frequency" at all > because elapsed_counter is not reporting ticks but an actual elapsed > "time" in nanoseconds. > > > Also note that we constants for: > > NANOSECS_PER_SEC > NANOSECS_PER_MILLISEC > > to aid with time conversions. > > The linux webrev contains unrelated UseLargePages changes! Sorry for the mess with UseLargePages changes :/ I've fixed the problems with the frequency (using a fixed number as before) and I kept the changes to minimum. I was hesitating about changing the elapsed_counter precision from microseconds to nanoseconds but since solaris and windows versions already use nanosecond ticks for elapsed_counter I think the change is safe. The update webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 > > > David > ----- > > > On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >> On 10.10.2013 13:15, Staffan Larsen wrote: >>> >>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>> wrote: >>> >>>> On 10.10.2013 05:44, David Holmes wrote: >>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>> >>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>> wrote: >>>>>> >>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>> >>>>>>>> This discussion also implies that the "elapsed time" we print in >>>>>>>> the >>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>> os::elapsed_counter(). >>>>>>>> >>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>>>> >>>>>>> Also the reported GC pauses duration might be wrong since it uses >>>>>>> Management::timestamp(). >>>>>>> >>>>>>> On the first sight the change looks rather trivial. But, honestly, >>>>>>> I'm not sure which other parts could for whatever reason break once >>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>> >>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>> current >>>>>>> phase of JDK release cycle? I think I could improve the patch in few >>>>>>> days and then it should probably be able to pass the review before >>>>>>> ZBB. But, it's only P3 ... >>>>>> >>>>>> I think it is a bit late in the release cycle to clean this up in the >>>>>> way it should be cleaned up. I think we should wait until the first 8 >>>>>> update release and do a more thorough job than we have time for right >>>>>> now. >>>>> >>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>> implementations are a tangled mess. But part of the problem has been >>>>> that people want/expect monotonic time-of-day based timestamps (yes a >>>>> contradiction - though some people make sure TOD does not get modified >>>>> on their production systems). The use of timestamps in logging has >>>>> to be >>>>> examined carefully - mainly GC logging. I recall a "simple" attempted >>>>> change in the past that resulted in trying to compare a nanoTime based >>>>> timestamp with the TOD. :( >>>> >>>> Actually, if I'm reading the sources right for Solaris and Win the >>>> monotonic clock source is used to provide elapsed_counter() value. It >>>> falls back to TOD when the monotonic clock source is not available. >>>> For Linux/BSD the TOD is used directly. >>>> >>>> This makes me wonder if changing the linux/bsd implementation to >>>> follow the same logic would be really that disruptive. >>> >>> Good point. I would like a world where elapsed_counter is monotonic >>> (where possible). Still a bit scary this late in the release, but an >>> interesting experiment. >> >> The change is rather simple and tests ok. All the means to get a >> monotonic timestamp are already there and proved to work. The core tests >> in JPRT went fine. >> >> The updated webrev is at >> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >> >> -JB- >> >>> >>> /Staffan >>> >>> >>> >>> >>>> >>>> -JB- >>>>> >>>>> David >>>>> ----- >>>>> >>>>>> /Staffan >>>>>> >>>>>> >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> /Staffan >>>>>>>> >>>>>>>> >>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>> Jaroslav, >>>>>>>>>>>> >>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>> Hello, >>>>>>>>>>>>> >>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>> based on >>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>> changes of the >>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). The >>>>>>>>>>>>> uptime >>>>>>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>>>>>> using a >>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>> >>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>> ticks. >>>>>>>>>>>>> It is >>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>> convertible to >>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>> making it >>>>>>>>>>>>> very >>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>> >>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is >>>>>>>>>>>> not a >>>>>>>>>>>> monotonic clock source. >>>>>>>>>>> >>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>> >>>>>>>>>>> Is there any reason why a non monotonic clock source is used for >>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() >>>>>>>>>>> uses >>>>>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>> "gettimeofday" >>>>>>>>>>> is not really a counter? >>>>>>>>>> >>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>>>>> make me >>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>> because of >>>>>>>>>> the way these are used in logging etc. Certainly not something >>>>>>>>>> to be >>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>> >>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>>>>> can then >>>>>>>>>> be used for the uptime. >>>>>>>>> >>>>>>>>> My attempt at this is at >>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>>>> possible. >>>>>>>>> >>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> ----- >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>> >>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain the >>>>>>>>>>>>> uptime in >>>>>>>>>>>>> milliseconds is added in src/share/vm/services/management.cpp. >>>>>>>>>>>>> >>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>>>>> bridging >>>>>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>>>>> constant >>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>>>>> order to >>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>> >>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>> Webrev: >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>> >> From david.holmes at oracle.com Wed Oct 16 19:26:40 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 17 Oct 2013 12:26:40 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525EBBCF.3020303@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> Message-ID: <525F4AE0.1000406@oracle.com> Hi Jaroslav, Minor nit: os::elapsed_time should really be defined in terms of the other functions ie: return ((double) os::elapsed_counter()) / os::elapsed_frequency(); I also prefer the cast above as it is very clear that we will be doing a floating-point division. Aside: AFAICS os::elapsed_time() is never actually used ?? I agree that it appears that changing the frequency should be okay. Thanks, David On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: > On 15.10.2013 08:49, David Holmes wrote: >> Hi Jaroslav, >> >> os_bsd.cpp / os_linux.cpp: >> >> If you don't have a monotonic clock you leave timer_frequency set to 0! >> (So you need to test on a system without a monotonic clock, or else >> force it to act as-if not present.) >> >> That aside I don't trust clock_getres to give values that actually allow >> the timer frequency to be determined. As per the comments in >> os_linux.cpp: >> >> // It's fixed in newer kernels, however clock_getres() still returns >> // 1/HZ. We check if clock_getres() works, but will ignore its reported >> // resolution for now. Hopefully as people move to new kernels, this >> // won't be a problem. >> >> we don't know what kernels provide real values here and which provide >> dummy ones. >> >> On BSD you haven't modified os::elapsed_counter. >> >> Looking at the linux changes I don't think the logic is correct even if >> clock_getres is accurate. In the existing code we have: >> >> elapsed_counter -> elapsed time in microseconds >> elapsed_frequency -> 1000 * 1000 (ie micros per second) >> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >> >> Now we have: >> >> elapsed_counter -> elapsed time in nanoseconds >> elapsed_frequency -> 1x10^9 / whatever clock_getres says >> elapsed_time -> counter/frequency -> ??? >> >> So elapsed_time not, in general, going to give the elapsed time in >> seconds. And elapsed_time is not dependent on the "frequency" at all >> because elapsed_counter is not reporting ticks but an actual elapsed >> "time" in nanoseconds. >> >> >> Also note that we constants for: >> >> NANOSECS_PER_SEC >> NANOSECS_PER_MILLISEC >> >> to aid with time conversions. >> >> The linux webrev contains unrelated UseLargePages changes! > > Sorry for the mess with UseLargePages changes :/ > > I've fixed the problems with the frequency (using a fixed number as > before) and I kept the changes to minimum. > > I was hesitating about changing the elapsed_counter precision from > microseconds to nanoseconds but since solaris and windows versions > already use nanosecond ticks for elapsed_counter I think the change is > safe. > > The update webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 > >> >> >> David >> ----- >> >> >> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>> >>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>> wrote: >>>> >>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>> >>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>> >>>>>>>>> This discussion also implies that the "elapsed time" we print in >>>>>>>>> the >>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>> os::elapsed_counter(). >>>>>>>>> >>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>>>>> >>>>>>>> Also the reported GC pauses duration might be wrong since it uses >>>>>>>> Management::timestamp(). >>>>>>>> >>>>>>>> On the first sight the change looks rather trivial. But, honestly, >>>>>>>> I'm not sure which other parts could for whatever reason break once >>>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>> >>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>> current >>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>> few >>>>>>>> days and then it should probably be able to pass the review before >>>>>>>> ZBB. But, it's only P3 ... >>>>>>> >>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>> the >>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>> first 8 >>>>>>> update release and do a more thorough job than we have time for >>>>>>> right >>>>>>> now. >>>>>> >>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>> implementations are a tangled mess. But part of the problem has been >>>>>> that people want/expect monotonic time-of-day based timestamps (yes a >>>>>> contradiction - though some people make sure TOD does not get >>>>>> modified >>>>>> on their production systems). The use of timestamps in logging has >>>>>> to be >>>>>> examined carefully - mainly GC logging. I recall a "simple" attempted >>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>> based >>>>>> timestamp with the TOD. :( >>>>> >>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>> monotonic clock source is used to provide elapsed_counter() value. It >>>>> falls back to TOD when the monotonic clock source is not available. >>>>> For Linux/BSD the TOD is used directly. >>>>> >>>>> This makes me wonder if changing the linux/bsd implementation to >>>>> follow the same logic would be really that disruptive. >>>> >>>> Good point. I would like a world where elapsed_counter is monotonic >>>> (where possible). Still a bit scary this late in the release, but an >>>> interesting experiment. >>> >>> The change is rather simple and tests ok. All the means to get a >>> monotonic timestamp are already there and proved to work. The core tests >>> in JPRT went fine. >>> >>> The updated webrev is at >>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>> >>> -JB- >>> >>>> >>>> /Staffan >>>> >>>> >>>> >>>> >>>>> >>>>> -JB- >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> >>>>>>>>> /Staffan >>>>>>>>> >>>>>>>>> >>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>> >>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>> based on >>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>> The >>>>>>>>>>>>>> uptime >>>>>>>>>>>>>> should not depend on the system time and should be calculated >>>>>>>>>>>>>> using a >>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>> >>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>> It is >>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>> making it >>>>>>>>>>>>>> very >>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>> >>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is >>>>>>>>>>>>> not a >>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>> >>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>> >>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>> for >>>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() >>>>>>>>>>>> uses >>>>>>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>> is not really a counter? >>>>>>>>>>> >>>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>>>>>> make me >>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>> because of >>>>>>>>>>> the way these are used in logging etc. Certainly not something >>>>>>>>>>> to be >>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>> >>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>>>>>> can then >>>>>>>>>>> be used for the uptime. >>>>>>>>>> >>>>>>>>>> My attempt at this is at >>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>>>>> possible. >>>>>>>>>> >>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> David >>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>> the >>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>> >>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>>>>>> bridging >>>>>>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>>>>>> constant >>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>>>>>> order to >>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>> >>> > From daniel.fuchs at oracle.com Thu Oct 17 02:53:47 2013 From: daniel.fuchs at oracle.com (Daniel Fuchs) Date: Thu, 17 Oct 2013 11:53:47 +0200 Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report some issue in com.sun.jmx.snmp package In-Reply-To: <525E9B87.9050406@oracle.com> References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com> Message-ID: <525FB3AB.3040105@oracle.com> Hi Shanliang, Looks good! -- daniel On 10/16/13 3:58 PM, shanliang wrote: > Hi, > > Please review the following fix, main issue here is that we should clone > an internal variable before returning. > > webrev: > http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/ > > bug > https://bugs.openjdk.java.net/browse/JDK-8026028 > > Thanks, > Shanliang > > > > From jaroslav.bachorik at oracle.com Thu Oct 17 03:10:39 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 17 Oct 2013 12:10:39 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525F4AE0.1000406@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com> Message-ID: <525FB79F.7070101@oracle.com> Hi David, On 17.10.2013 04:26, David Holmes wrote: > Hi Jaroslav, > > Minor nit: os::elapsed_time should really be defined in terms of the > other functions ie: > > return ((double) os::elapsed_counter()) / os::elapsed_frequency(); Ok. I've changed it. It better communicates the way the elapsedTime is calculated anyway. > > I also prefer the cast above as it is very clear that we will be doing a > floating-point division. > > Aside: AFAICS os::elapsed_time() is never actually used ?? Actually, it is os::elapsedTime() and this one is used quite a lot. The "elapsed_time()" form is used only in bytecodeHistogram.hpp, parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared. > > I agree that it appears that changing the frequency should be okay. Thanks for the feedback. Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 -JB- > > Thanks, > David > > On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: >> On 15.10.2013 08:49, David Holmes wrote: >>> Hi Jaroslav, >>> >>> os_bsd.cpp / os_linux.cpp: >>> >>> If you don't have a monotonic clock you leave timer_frequency set to 0! >>> (So you need to test on a system without a monotonic clock, or else >>> force it to act as-if not present.) >>> >>> That aside I don't trust clock_getres to give values that actually allow >>> the timer frequency to be determined. As per the comments in >>> os_linux.cpp: >>> >>> // It's fixed in newer kernels, however clock_getres() still returns >>> // 1/HZ. We check if clock_getres() works, but will ignore its reported >>> // resolution for now. Hopefully as people move to new kernels, this >>> // won't be a problem. >>> >>> we don't know what kernels provide real values here and which provide >>> dummy ones. >>> >>> On BSD you haven't modified os::elapsed_counter. >>> >>> Looking at the linux changes I don't think the logic is correct even if >>> clock_getres is accurate. In the existing code we have: >>> >>> elapsed_counter -> elapsed time in microseconds >>> elapsed_frequency -> 1000 * 1000 (ie micros per second) >>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >>> >>> Now we have: >>> >>> elapsed_counter -> elapsed time in nanoseconds >>> elapsed_frequency -> 1x10^9 / whatever clock_getres says >>> elapsed_time -> counter/frequency -> ??? >>> >>> So elapsed_time not, in general, going to give the elapsed time in >>> seconds. And elapsed_time is not dependent on the "frequency" at all >>> because elapsed_counter is not reporting ticks but an actual elapsed >>> "time" in nanoseconds. >>> >>> >>> Also note that we constants for: >>> >>> NANOSECS_PER_SEC >>> NANOSECS_PER_MILLISEC >>> >>> to aid with time conversions. >>> >>> The linux webrev contains unrelated UseLargePages changes! >> >> Sorry for the mess with UseLargePages changes :/ >> >> I've fixed the problems with the frequency (using a fixed number as >> before) and I kept the changes to minimum. >> >> I was hesitating about changing the elapsed_counter precision from >> microseconds to nanoseconds but since solaris and windows versions >> already use nanosecond ticks for elapsed_counter I think the change is >> safe. >> >> The update webrev: >> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >> >>> >>> >>> David >>> ----- >>> >>> >>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>>> >>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>>> wrote: >>>>> >>>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>>> >>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>>> >>>>>>>>>> This discussion also implies that the "elapsed time" we print in >>>>>>>>>> the >>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>>> os::elapsed_counter(). >>>>>>>>>> >>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>>>>>> >>>>>>>>> Also the reported GC pauses duration might be wrong since it uses >>>>>>>>> Management::timestamp(). >>>>>>>>> >>>>>>>>> On the first sight the change looks rather trivial. But, honestly, >>>>>>>>> I'm not sure which other parts could for whatever reason break >>>>>>>>> once >>>>>>>>> the time-of-day timestamp is replaced with a monotonic equivalent. >>>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>>> >>>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>>> current >>>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>>> few >>>>>>>>> days and then it should probably be able to pass the review before >>>>>>>>> ZBB. But, it's only P3 ... >>>>>>>> >>>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>>> the >>>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>>> first 8 >>>>>>>> update release and do a more thorough job than we have time for >>>>>>>> right >>>>>>>> now. >>>>>>> >>>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>>> implementations are a tangled mess. But part of the problem has been >>>>>>> that people want/expect monotonic time-of-day based timestamps >>>>>>> (yes a >>>>>>> contradiction - though some people make sure TOD does not get >>>>>>> modified >>>>>>> on their production systems). The use of timestamps in logging has >>>>>>> to be >>>>>>> examined carefully - mainly GC logging. I recall a "simple" >>>>>>> attempted >>>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>>> based >>>>>>> timestamp with the TOD. :( >>>>>> >>>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>>> monotonic clock source is used to provide elapsed_counter() value. It >>>>>> falls back to TOD when the monotonic clock source is not available. >>>>>> For Linux/BSD the TOD is used directly. >>>>>> >>>>>> This makes me wonder if changing the linux/bsd implementation to >>>>>> follow the same logic would be really that disruptive. >>>>> >>>>> Good point. I would like a world where elapsed_counter is monotonic >>>>> (where possible). Still a bit scary this late in the release, but an >>>>> interesting experiment. >>>> >>>> The change is rather simple and tests ok. All the means to get a >>>> monotonic timestamp are already there and proved to work. The core >>>> tests >>>> in JPRT went fine. >>>> >>>> The updated webrev is at >>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>>> >>>> -JB- >>>> >>>>> >>>>> /Staffan >>>>> >>>>> >>>>> >>>>> >>>>>> >>>>>> -JB- >>>>>>> >>>>>>> David >>>>>>> ----- >>>>>>> >>>>>>>> /Staffan >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> >>>>>>>>>> /Staffan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>>> The >>>>>>>>>>>>>>> uptime >>>>>>>>>>>>>>> should not depend on the system time and should be >>>>>>>>>>>>>>> calculated >>>>>>>>>>>>>>> using a >>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>>> It is >>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>>> making it >>>>>>>>>>>>>>> very >>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is >>>>>>>>>>>>>> not a >>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>> >>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>>> >>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>>> for >>>>>>>>>>>>> timestamping except of the historical one? os::javaTimeNanos() >>>>>>>>>>>>> uses >>>>>>>>>>>>> montonic clock when available - why can't be the same used for >>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>>> is not really a counter? >>>>>>>>>>>> >>>>>>>>>>>> It is all historical. These elapsed_counters and elapsed_timers >>>>>>>>>>>> make me >>>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>>> because of >>>>>>>>>>>> the way these are used in logging etc. Certainly not something >>>>>>>>>>>> to be >>>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>>> >>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos that >>>>>>>>>>>> can then >>>>>>>>>>>> be used for the uptime. >>>>>>>>>>> >>>>>>>>>>> My attempt at this is at >>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>>>>>> possible. >>>>>>>>>>> >>>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary JNI >>>>>>>>>>>>>>> bridging >>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the same >>>>>>>>>>>>>>> constant >>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers files in >>>>>>>>>>>>>>> order to >>>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>> >>>>> >>>> >> From david.holmes at oracle.com Thu Oct 17 04:07:36 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 17 Oct 2013 21:07:36 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525FB79F.7070101@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com> <525FB79F.7070101@oracle.com> Message-ID: <525FC4F8.1020004@oracle.com> On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote: > Hi David, > > On 17.10.2013 04:26, David Holmes wrote: >> Hi Jaroslav, >> >> Minor nit: os::elapsed_time should really be defined in terms of the >> other functions ie: >> >> return ((double) os::elapsed_counter()) / os::elapsed_frequency(); > > Ok. I've changed it. It better communicates the way the elapsedTime is > calculated anyway. > >> >> I also prefer the cast above as it is very clear that we will be doing a >> floating-point division. >> >> Aside: AFAICS os::elapsed_time() is never actually used ?? > > Actually, it is os::elapsedTime() and this one is used quite a lot. The > "elapsed_time()" form is used only in bytecodeHistogram.hpp, > parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared. AFAICS they all define their own elapsed_time() functions they don't use os::elapsed_time(). >> >> I agree that it appears that changing the frequency should be okay. > > Thanks for the feedback. > > Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 That should be .04 version :) Looks okay. Thanks, David > -JB- > >> >> Thanks, >> David >> >> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: >>> On 15.10.2013 08:49, David Holmes wrote: >>>> Hi Jaroslav, >>>> >>>> os_bsd.cpp / os_linux.cpp: >>>> >>>> If you don't have a monotonic clock you leave timer_frequency set to 0! >>>> (So you need to test on a system without a monotonic clock, or else >>>> force it to act as-if not present.) >>>> >>>> That aside I don't trust clock_getres to give values that actually >>>> allow >>>> the timer frequency to be determined. As per the comments in >>>> os_linux.cpp: >>>> >>>> // It's fixed in newer kernels, however clock_getres() still returns >>>> // 1/HZ. We check if clock_getres() works, but will ignore its reported >>>> // resolution for now. Hopefully as people move to new kernels, this >>>> // won't be a problem. >>>> >>>> we don't know what kernels provide real values here and which provide >>>> dummy ones. >>>> >>>> On BSD you haven't modified os::elapsed_counter. >>>> >>>> Looking at the linux changes I don't think the logic is correct even if >>>> clock_getres is accurate. In the existing code we have: >>>> >>>> elapsed_counter -> elapsed time in microseconds >>>> elapsed_frequency -> 1000 * 1000 (ie micros per second) >>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >>>> >>>> Now we have: >>>> >>>> elapsed_counter -> elapsed time in nanoseconds >>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says >>>> elapsed_time -> counter/frequency -> ??? >>>> >>>> So elapsed_time not, in general, going to give the elapsed time in >>>> seconds. And elapsed_time is not dependent on the "frequency" at all >>>> because elapsed_counter is not reporting ticks but an actual elapsed >>>> "time" in nanoseconds. >>>> >>>> >>>> Also note that we constants for: >>>> >>>> NANOSECS_PER_SEC >>>> NANOSECS_PER_MILLISEC >>>> >>>> to aid with time conversions. >>>> >>>> The linux webrev contains unrelated UseLargePages changes! >>> >>> Sorry for the mess with UseLargePages changes :/ >>> >>> I've fixed the problems with the frequency (using a fixed number as >>> before) and I kept the changes to minimum. >>> >>> I was hesitating about changing the elapsed_counter precision from >>> microseconds to nanoseconds but since solaris and windows versions >>> already use nanosecond ticks for elapsed_counter I think the change is >>> safe. >>> >>> The update webrev: >>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >>> >>>> >>>> >>>> David >>>> ----- >>>> >>>> >>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>>>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>>>> >>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>>>> wrote: >>>>>> >>>>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>>>> >>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>>>> >>>>>>>>>>> This discussion also implies that the "elapsed time" we print in >>>>>>>>>>> the >>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>> >>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>>>> (class VMUptimeDCmd) which also relies on os::elapsed_counter(). >>>>>>>>>> >>>>>>>>>> Also the reported GC pauses duration might be wrong since it uses >>>>>>>>>> Management::timestamp(). >>>>>>>>>> >>>>>>>>>> On the first sight the change looks rather trivial. But, >>>>>>>>>> honestly, >>>>>>>>>> I'm not sure which other parts could for whatever reason break >>>>>>>>>> once >>>>>>>>>> the time-of-day timestamp is replaced with a monotonic >>>>>>>>>> equivalent. >>>>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>>>> >>>>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>>>> current >>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>>>> few >>>>>>>>>> days and then it should probably be able to pass the review >>>>>>>>>> before >>>>>>>>>> ZBB. But, it's only P3 ... >>>>>>>>> >>>>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>>>> the >>>>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>>>> first 8 >>>>>>>>> update release and do a more thorough job than we have time for >>>>>>>>> right >>>>>>>>> now. >>>>>>>> >>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>>>> implementations are a tangled mess. But part of the problem has >>>>>>>> been >>>>>>>> that people want/expect monotonic time-of-day based timestamps >>>>>>>> (yes a >>>>>>>> contradiction - though some people make sure TOD does not get >>>>>>>> modified >>>>>>>> on their production systems). The use of timestamps in logging has >>>>>>>> to be >>>>>>>> examined carefully - mainly GC logging. I recall a "simple" >>>>>>>> attempted >>>>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>>>> based >>>>>>>> timestamp with the TOD. :( >>>>>>> >>>>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>>>> monotonic clock source is used to provide elapsed_counter() >>>>>>> value. It >>>>>>> falls back to TOD when the monotonic clock source is not available. >>>>>>> For Linux/BSD the TOD is used directly. >>>>>>> >>>>>>> This makes me wonder if changing the linux/bsd implementation to >>>>>>> follow the same logic would be really that disruptive. >>>>>> >>>>>> Good point. I would like a world where elapsed_counter is monotonic >>>>>> (where possible). Still a bit scary this late in the release, but an >>>>>> interesting experiment. >>>>> >>>>> The change is rather simple and tests ok. All the means to get a >>>>> monotonic timestamp are already there and proved to work. The core >>>>> tests >>>>> in JPRT went fine. >>>>> >>>>> The updated webrev is at >>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> /Staffan >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> -JB- >>>>>>>> >>>>>>>> David >>>>>>>> ----- >>>>>>>> >>>>>>>>> /Staffan >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> /Staffan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>> uptime >>>>>>>>>>>>>>>> should not depend on the system time and should be >>>>>>>>>>>>>>>> calculated >>>>>>>>>>>>>>>> using a >>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>>>> It is >>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>>>> making it >>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday so is >>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>>>> for >>>>>>>>>>>>>> timestamping except of the historical one? >>>>>>>>>>>>>> os::javaTimeNanos() >>>>>>>>>>>>>> uses >>>>>>>>>>>>>> montonic clock when available - why can't be the same used >>>>>>>>>>>>>> for >>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>>>> is not really a counter? >>>>>>>>>>>>> >>>>>>>>>>>>> It is all historical. These elapsed_counters and >>>>>>>>>>>>> elapsed_timers >>>>>>>>>>>>> make me >>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>>>> because of >>>>>>>>>>>>> the way these are used in logging etc. Certainly not something >>>>>>>>>>>>> to be >>>>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>>>> >>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos >>>>>>>>>>>>> that >>>>>>>>>>>>> can then >>>>>>>>>>>>> be used for the uptime. >>>>>>>>>>>> >>>>>>>>>>>> My attempt at this is at >>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks where >>>>>>>>>>>> possible. >>>>>>>>>>>> >>>>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary >>>>>>>>>>>>>>>> JNI >>>>>>>>>>>>>>>> bridging >>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the >>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>> constant >>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers >>>>>>>>>>>>>>>> files in >>>>>>>>>>>>>>>> order to >>>>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>> > From david.holmes at oracle.com Thu Oct 17 04:13:49 2013 From: david.holmes at oracle.com (David Holmes) Date: Thu, 17 Oct 2013 21:13:49 +1000 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525FC4F8.1020004@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com> <525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com> Message-ID: <525FC66D.9040602@oracle.com> On 17/10/2013 9:07 PM, David Holmes wrote: > On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote: >> Hi David, >> >> On 17.10.2013 04:26, David Holmes wrote: >>> Hi Jaroslav, >>> >>> Minor nit: os::elapsed_time should really be defined in terms of the >>> other functions ie: >>> >>> return ((double) os::elapsed_counter()) / os::elapsed_frequency(); >> >> Ok. I've changed it. It better communicates the way the elapsedTime is >> calculated anyway. >> >>> >>> I also prefer the cast above as it is very clear that we will be doing a >>> floating-point division. >>> >>> Aside: AFAICS os::elapsed_time() is never actually used ?? >> >> Actually, it is os::elapsedTime() and this one is used quite a lot. The >> "elapsed_time()" form is used only in bytecodeHistogram.hpp, >> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared. > > AFAICS they all define their own elapsed_time() functions they don't use > os::elapsed_time(). Ooops! I mis-grepped. It is os::elapsedTime not os::elapsed_time Nothing like inconsistent naming :( David ----- >>> >>> I agree that it appears that changing the frequency should be okay. >> >> Thanks for the feedback. >> >> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 > > That should be .04 version :) > > Looks okay. > > Thanks, > David > >> -JB- >> >>> >>> Thanks, >>> David >>> >>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: >>>> On 15.10.2013 08:49, David Holmes wrote: >>>>> Hi Jaroslav, >>>>> >>>>> os_bsd.cpp / os_linux.cpp: >>>>> >>>>> If you don't have a monotonic clock you leave timer_frequency set >>>>> to 0! >>>>> (So you need to test on a system without a monotonic clock, or else >>>>> force it to act as-if not present.) >>>>> >>>>> That aside I don't trust clock_getres to give values that actually >>>>> allow >>>>> the timer frequency to be determined. As per the comments in >>>>> os_linux.cpp: >>>>> >>>>> // It's fixed in newer kernels, however clock_getres() still returns >>>>> // 1/HZ. We check if clock_getres() works, but will ignore its >>>>> reported >>>>> // resolution for now. Hopefully as people move to new kernels, this >>>>> // won't be a problem. >>>>> >>>>> we don't know what kernels provide real values here and which provide >>>>> dummy ones. >>>>> >>>>> On BSD you haven't modified os::elapsed_counter. >>>>> >>>>> Looking at the linux changes I don't think the logic is correct >>>>> even if >>>>> clock_getres is accurate. In the existing code we have: >>>>> >>>>> elapsed_counter -> elapsed time in microseconds >>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second) >>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >>>>> >>>>> Now we have: >>>>> >>>>> elapsed_counter -> elapsed time in nanoseconds >>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says >>>>> elapsed_time -> counter/frequency -> ??? >>>>> >>>>> So elapsed_time not, in general, going to give the elapsed time in >>>>> seconds. And elapsed_time is not dependent on the "frequency" at all >>>>> because elapsed_counter is not reporting ticks but an actual elapsed >>>>> "time" in nanoseconds. >>>>> >>>>> >>>>> Also note that we constants for: >>>>> >>>>> NANOSECS_PER_SEC >>>>> NANOSECS_PER_MILLISEC >>>>> >>>>> to aid with time conversions. >>>>> >>>>> The linux webrev contains unrelated UseLargePages changes! >>>> >>>> Sorry for the mess with UseLargePages changes :/ >>>> >>>> I've fixed the problems with the frequency (using a fixed number as >>>> before) and I kept the changes to minimum. >>>> >>>> I was hesitating about changing the elapsed_counter precision from >>>> microseconds to nanoseconds but since solaris and windows versions >>>> already use nanosecond ticks for elapsed_counter I think the change is >>>> safe. >>>> >>>> The update webrev: >>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >>>> >>>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>> >>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>>>>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>>>>> >>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>>>>> >>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>>>>> >>>>>>>>>>>> This discussion also implies that the "elapsed time" we >>>>>>>>>>>> print in >>>>>>>>>>>> the >>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>>> >>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>>>>> (class VMUptimeDCmd) which also relies on >>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>> >>>>>>>>>>> Also the reported GC pauses duration might be wrong since it >>>>>>>>>>> uses >>>>>>>>>>> Management::timestamp(). >>>>>>>>>>> >>>>>>>>>>> On the first sight the change looks rather trivial. But, >>>>>>>>>>> honestly, >>>>>>>>>>> I'm not sure which other parts could for whatever reason break >>>>>>>>>>> once >>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic >>>>>>>>>>> equivalent. >>>>>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>>>>> >>>>>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>>>>> current >>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>>>>> few >>>>>>>>>>> days and then it should probably be able to pass the review >>>>>>>>>>> before >>>>>>>>>>> ZBB. But, it's only P3 ... >>>>>>>>>> >>>>>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>>>>> the >>>>>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>>>>> first 8 >>>>>>>>>> update release and do a more thorough job than we have time for >>>>>>>>>> right >>>>>>>>>> now. >>>>>>>>> >>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>>>>> implementations are a tangled mess. But part of the problem has >>>>>>>>> been >>>>>>>>> that people want/expect monotonic time-of-day based timestamps >>>>>>>>> (yes a >>>>>>>>> contradiction - though some people make sure TOD does not get >>>>>>>>> modified >>>>>>>>> on their production systems). The use of timestamps in logging has >>>>>>>>> to be >>>>>>>>> examined carefully - mainly GC logging. I recall a "simple" >>>>>>>>> attempted >>>>>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>>>>> based >>>>>>>>> timestamp with the TOD. :( >>>>>>>> >>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>>>>> monotonic clock source is used to provide elapsed_counter() >>>>>>>> value. It >>>>>>>> falls back to TOD when the monotonic clock source is not available. >>>>>>>> For Linux/BSD the TOD is used directly. >>>>>>>> >>>>>>>> This makes me wonder if changing the linux/bsd implementation to >>>>>>>> follow the same logic would be really that disruptive. >>>>>>> >>>>>>> Good point. I would like a world where elapsed_counter is monotonic >>>>>>> (where possible). Still a bit scary this late in the release, but an >>>>>>> interesting experiment. >>>>>> >>>>>> The change is rather simple and tests ok. All the means to get a >>>>>> monotonic timestamp are already there and proved to work. The core >>>>>> tests >>>>>> in JPRT went fine. >>>>>> >>>>>> The updated webrev is at >>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -JB- >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> /Staffan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> /Staffan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>> uptime >>>>>>>>>>>>>>>>> should not depend on the system time and should be >>>>>>>>>>>>>>>>> calculated >>>>>>>>>>>>>>>>> using a >>>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>>>>> It is >>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>>>>> making it >>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday >>>>>>>>>>>>>>>> so is >>>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> timestamping except of the historical one? >>>>>>>>>>>>>>> os::javaTimeNanos() >>>>>>>>>>>>>>> uses >>>>>>>>>>>>>>> montonic clock when available - why can't be the same used >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>>>>> is not really a counter? >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is all historical. These elapsed_counters and >>>>>>>>>>>>>> elapsed_timers >>>>>>>>>>>>>> make me >>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>>>>> because of >>>>>>>>>>>>>> the way these are used in logging etc. Certainly not >>>>>>>>>>>>>> something >>>>>>>>>>>>>> to be >>>>>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos >>>>>>>>>>>>>> that >>>>>>>>>>>>>> can then >>>>>>>>>>>>>> be used for the uptime. >>>>>>>>>>>>> >>>>>>>>>>>>> My attempt at this is at >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>>>>> >>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks >>>>>>>>>>>>> where >>>>>>>>>>>>> possible. >>>>>>>>>>>>> >>>>>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary >>>>>>>>>>>>>>>>> JNI >>>>>>>>>>>>>>>>> bridging >>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the >>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>> constant >>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers >>>>>>>>>>>>>>>>> files in >>>>>>>>>>>>>>>>> order to >>>>>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >> From jaroslav.bachorik at oracle.com Thu Oct 17 05:09:40 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 17 Oct 2013 14:09:40 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525FC4F8.1020004@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com> <525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com> Message-ID: <525FD384.4010904@oracle.com> On 17.10.2013 13:07, David Holmes wrote: > On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote: >> Hi David, >> >> On 17.10.2013 04:26, David Holmes wrote: >>> Hi Jaroslav, >>> >>> Minor nit: os::elapsed_time should really be defined in terms of the >>> other functions ie: >>> >>> return ((double) os::elapsed_counter()) / os::elapsed_frequency(); >> >> Ok. I've changed it. It better communicates the way the elapsedTime is >> calculated anyway. >> >>> >>> I also prefer the cast above as it is very clear that we will be doing a >>> floating-point division. >>> >>> Aside: AFAICS os::elapsed_time() is never actually used ?? >> >> Actually, it is os::elapsedTime() and this one is used quite a lot. The >> "elapsed_time()" form is used only in bytecodeHistogram.hpp, >> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared. > > AFAICS they all define their own elapsed_time() functions they don't use > os::elapsed_time(). > >>> >>> I agree that it appears that changing the frequency should be okay. >> >> Thanks for the feedback. >> >> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 > > That should be .04 version :) Yep :( copy-paste ... http://cr.openjdk.java.net/~jbachorik/6523160/webrev.04 > > Looks okay. > > Thanks, > David > >> -JB- >> >>> >>> Thanks, >>> David >>> >>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: >>>> On 15.10.2013 08:49, David Holmes wrote: >>>>> Hi Jaroslav, >>>>> >>>>> os_bsd.cpp / os_linux.cpp: >>>>> >>>>> If you don't have a monotonic clock you leave timer_frequency set >>>>> to 0! >>>>> (So you need to test on a system without a monotonic clock, or else >>>>> force it to act as-if not present.) >>>>> >>>>> That aside I don't trust clock_getres to give values that actually >>>>> allow >>>>> the timer frequency to be determined. As per the comments in >>>>> os_linux.cpp: >>>>> >>>>> // It's fixed in newer kernels, however clock_getres() still returns >>>>> // 1/HZ. We check if clock_getres() works, but will ignore its >>>>> reported >>>>> // resolution for now. Hopefully as people move to new kernels, this >>>>> // won't be a problem. >>>>> >>>>> we don't know what kernels provide real values here and which provide >>>>> dummy ones. >>>>> >>>>> On BSD you haven't modified os::elapsed_counter. >>>>> >>>>> Looking at the linux changes I don't think the logic is correct >>>>> even if >>>>> clock_getres is accurate. In the existing code we have: >>>>> >>>>> elapsed_counter -> elapsed time in microseconds >>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second) >>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >>>>> >>>>> Now we have: >>>>> >>>>> elapsed_counter -> elapsed time in nanoseconds >>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says >>>>> elapsed_time -> counter/frequency -> ??? >>>>> >>>>> So elapsed_time not, in general, going to give the elapsed time in >>>>> seconds. And elapsed_time is not dependent on the "frequency" at all >>>>> because elapsed_counter is not reporting ticks but an actual elapsed >>>>> "time" in nanoseconds. >>>>> >>>>> >>>>> Also note that we constants for: >>>>> >>>>> NANOSECS_PER_SEC >>>>> NANOSECS_PER_MILLISEC >>>>> >>>>> to aid with time conversions. >>>>> >>>>> The linux webrev contains unrelated UseLargePages changes! >>>> >>>> Sorry for the mess with UseLargePages changes :/ >>>> >>>> I've fixed the problems with the frequency (using a fixed number as >>>> before) and I kept the changes to minimum. >>>> >>>> I was hesitating about changing the elapsed_counter precision from >>>> microseconds to nanoseconds but since solaris and windows versions >>>> already use nanosecond ticks for elapsed_counter I think the change is >>>> safe. >>>> >>>> The update webrev: >>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >>>> >>>>> >>>>> >>>>> David >>>>> ----- >>>>> >>>>> >>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>>>>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>>>>> >>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>>>>> >>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>>>>> >>>>>>>>>>>> This discussion also implies that the "elapsed time" we >>>>>>>>>>>> print in >>>>>>>>>>>> the >>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>>> >>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>>>>> (class VMUptimeDCmd) which also relies on >>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>> >>>>>>>>>>> Also the reported GC pauses duration might be wrong since it >>>>>>>>>>> uses >>>>>>>>>>> Management::timestamp(). >>>>>>>>>>> >>>>>>>>>>> On the first sight the change looks rather trivial. But, >>>>>>>>>>> honestly, >>>>>>>>>>> I'm not sure which other parts could for whatever reason break >>>>>>>>>>> once >>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic >>>>>>>>>>> equivalent. >>>>>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>>>>> >>>>>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>>>>> current >>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>>>>> few >>>>>>>>>>> days and then it should probably be able to pass the review >>>>>>>>>>> before >>>>>>>>>>> ZBB. But, it's only P3 ... >>>>>>>>>> >>>>>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>>>>> the >>>>>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>>>>> first 8 >>>>>>>>>> update release and do a more thorough job than we have time for >>>>>>>>>> right >>>>>>>>>> now. >>>>>>>>> >>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>>>>> implementations are a tangled mess. But part of the problem has >>>>>>>>> been >>>>>>>>> that people want/expect monotonic time-of-day based timestamps >>>>>>>>> (yes a >>>>>>>>> contradiction - though some people make sure TOD does not get >>>>>>>>> modified >>>>>>>>> on their production systems). The use of timestamps in logging has >>>>>>>>> to be >>>>>>>>> examined carefully - mainly GC logging. I recall a "simple" >>>>>>>>> attempted >>>>>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>>>>> based >>>>>>>>> timestamp with the TOD. :( >>>>>>>> >>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>>>>> monotonic clock source is used to provide elapsed_counter() >>>>>>>> value. It >>>>>>>> falls back to TOD when the monotonic clock source is not available. >>>>>>>> For Linux/BSD the TOD is used directly. >>>>>>>> >>>>>>>> This makes me wonder if changing the linux/bsd implementation to >>>>>>>> follow the same logic would be really that disruptive. >>>>>>> >>>>>>> Good point. I would like a world where elapsed_counter is monotonic >>>>>>> (where possible). Still a bit scary this late in the release, but an >>>>>>> interesting experiment. >>>>>> >>>>>> The change is rather simple and tests ok. All the means to get a >>>>>> monotonic timestamp are already there and proved to work. The core >>>>>> tests >>>>>> in JPRT went fine. >>>>>> >>>>>> The updated webrev is at >>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>>>>> >>>>>> -JB- >>>>>> >>>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -JB- >>>>>>>>> >>>>>>>>> David >>>>>>>>> ----- >>>>>>>>> >>>>>>>>>> /Staffan >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> /Staffan >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>> uptime >>>>>>>>>>>>>>>>> should not depend on the system time and should be >>>>>>>>>>>>>>>>> calculated >>>>>>>>>>>>>>>>> using a >>>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>>>>> It is >>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>>>>> making it >>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday >>>>>>>>>>>>>>>> so is >>>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> timestamping except of the historical one? >>>>>>>>>>>>>>> os::javaTimeNanos() >>>>>>>>>>>>>>> uses >>>>>>>>>>>>>>> montonic clock when available - why can't be the same used >>>>>>>>>>>>>>> for >>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>>>>> is not really a counter? >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is all historical. These elapsed_counters and >>>>>>>>>>>>>> elapsed_timers >>>>>>>>>>>>>> make me >>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>>>>> because of >>>>>>>>>>>>>> the way these are used in logging etc. Certainly not >>>>>>>>>>>>>> something >>>>>>>>>>>>>> to be >>>>>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos >>>>>>>>>>>>>> that >>>>>>>>>>>>>> can then >>>>>>>>>>>>>> be used for the uptime. >>>>>>>>>>>>> >>>>>>>>>>>>> My attempt at this is at >>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>>>>> >>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks >>>>>>>>>>>>> where >>>>>>>>>>>>> possible. >>>>>>>>>>>>> >>>>>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>>>>> >>>>>>>>>>>>> -JB- >>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> >>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary >>>>>>>>>>>>>>>>> JNI >>>>>>>>>>>>>>>>> bridging >>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the >>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>> constant >>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers >>>>>>>>>>>>>>>>> files in >>>>>>>>>>>>>>>>> order to >>>>>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >> From staffan.larsen at oracle.com Fri Oct 18 04:02:29 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Fri, 18 Oct 2013 13:02:29 +0200 Subject: jmx-dev RFR 6523160: RuntimeMXBean.getUptime() returns negative values In-Reply-To: <525FD384.4010904@oracle.com> References: <524BDD9E.1050100@oracle.com> <5253B596.1000206@oracle.com> <5253FC54.4010407@oracle.com> <52547D24.9060806@oracle.com> <52553D63.5000508@oracle.com> <8AF79393-C5A2-4458-AF72-2B90A85F11C3@oracle.com> <52556604.3080900@oracle.com> <5AFB7AC3-43C0-4A48-B716-1434CD7DBA93@oracle.com> <525622B4.5020606@oracle.com> <52568940.4000704@oracle.com> <23435103-156B-434F-994C-B6F913EE0364@oracle.com> <525BFC0D.8090101@oracle.com> <525CE56D.4000708@oracle.com> <525EBBCF.3020303@oracle.com> <525F4AE0.1000406@oracle.com> <525FB79F.7070101@oracle.com> <525FC4F8.1020004@oracle.com> <525FD384.4010904@oracle.com> Message-ID: Looks good to me. Thanks, /Staffan On 17 okt 2013, at 14:09, Jaroslav Bachorik wrote: > On 17.10.2013 13:07, David Holmes wrote: >> On 17/10/2013 8:10 PM, Jaroslav Bachorik wrote: >>> Hi David, >>> >>> On 17.10.2013 04:26, David Holmes wrote: >>>> Hi Jaroslav, >>>> >>>> Minor nit: os::elapsed_time should really be defined in terms of the >>>> other functions ie: >>>> >>>> return ((double) os::elapsed_counter()) / os::elapsed_frequency(); >>> >>> Ok. I've changed it. It better communicates the way the elapsedTime is >>> calculated anyway. >>> >>>> >>>> I also prefer the cast above as it is very clear that we will be doing a >>>> floating-point division. >>>> >>>> Aside: AFAICS os::elapsed_time() is never actually used ?? >>> >>> Actually, it is os::elapsedTime() and this one is used quite a lot. The >>> "elapsed_time()" form is used only in bytecodeHistogram.hpp, >>> parNewGeneration.hpp and g1CollectedHeap.hpp, where it is also declared. >> >> AFAICS they all define their own elapsed_time() functions they don't use >> os::elapsed_time(). >> >>>> >>>> I agree that it appears that changing the frequency should be okay. >>> >>> Thanks for the feedback. >>> >>> Updated webrev: http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >> >> That should be .04 version :) > > Yep :( copy-paste ... > http://cr.openjdk.java.net/~jbachorik/6523160/webrev.04 > >> >> Looks okay. >> >> Thanks, >> David >> >>> -JB- >>> >>>> >>>> Thanks, >>>> David >>>> >>>> On 17/10/2013 2:16 AM, Jaroslav Bachorik wrote: >>>>> On 15.10.2013 08:49, David Holmes wrote: >>>>>> Hi Jaroslav, >>>>>> >>>>>> os_bsd.cpp / os_linux.cpp: >>>>>> >>>>>> If you don't have a monotonic clock you leave timer_frequency set >>>>>> to 0! >>>>>> (So you need to test on a system without a monotonic clock, or else >>>>>> force it to act as-if not present.) >>>>>> >>>>>> That aside I don't trust clock_getres to give values that actually >>>>>> allow >>>>>> the timer frequency to be determined. As per the comments in >>>>>> os_linux.cpp: >>>>>> >>>>>> // It's fixed in newer kernels, however clock_getres() still returns >>>>>> // 1/HZ. We check if clock_getres() works, but will ignore its >>>>>> reported >>>>>> // resolution for now. Hopefully as people move to new kernels, this >>>>>> // won't be a problem. >>>>>> >>>>>> we don't know what kernels provide real values here and which provide >>>>>> dummy ones. >>>>>> >>>>>> On BSD you haven't modified os::elapsed_counter. >>>>>> >>>>>> Looking at the linux changes I don't think the logic is correct >>>>>> even if >>>>>> clock_getres is accurate. In the existing code we have: >>>>>> >>>>>> elapsed_counter -> elapsed time in microseconds >>>>>> elapsed_frequency -> 1000 * 1000 (ie micros per second) >>>>>> elapsed_time -> elapsed_counter*0.000001 -> time in seconds >>>>>> >>>>>> Now we have: >>>>>> >>>>>> elapsed_counter -> elapsed time in nanoseconds >>>>>> elapsed_frequency -> 1x10^9 / whatever clock_getres says >>>>>> elapsed_time -> counter/frequency -> ??? >>>>>> >>>>>> So elapsed_time not, in general, going to give the elapsed time in >>>>>> seconds. And elapsed_time is not dependent on the "frequency" at all >>>>>> because elapsed_counter is not reporting ticks but an actual elapsed >>>>>> "time" in nanoseconds. >>>>>> >>>>>> >>>>>> Also note that we constants for: >>>>>> >>>>>> NANOSECS_PER_SEC >>>>>> NANOSECS_PER_MILLISEC >>>>>> >>>>>> to aid with time conversions. >>>>>> >>>>>> The linux webrev contains unrelated UseLargePages changes! >>>>> >>>>> Sorry for the mess with UseLargePages changes :/ >>>>> >>>>> I've fixed the problems with the frequency (using a fixed number as >>>>> before) and I kept the changes to minimum. >>>>> >>>>> I was hesitating about changing the elapsed_counter precision from >>>>> microseconds to nanoseconds but since solaris and windows versions >>>>> already use nanosecond ticks for elapsed_counter I think the change is >>>>> safe. >>>>> >>>>> The update webrev: >>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.03 >>>>> >>>>>> >>>>>> >>>>>> David >>>>>> ----- >>>>>> >>>>>> >>>>>> On 15/10/2013 12:13 AM, Jaroslav Bachorik wrote: >>>>>>> On 10.10.2013 13:15, Staffan Larsen wrote: >>>>>>>> >>>>>>>> On 10 okt 2013, at 13:02, Jaroslav Bachorik >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On 10.10.2013 05:44, David Holmes wrote: >>>>>>>>>> On 10/10/2013 4:12 AM, Staffan Larsen wrote: >>>>>>>>>>> >>>>>>>>>>> On 9 okt 2013, at 16:19, Jaroslav Bachorik >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> On 9.10.2013 16:10, Staffan Larsen wrote: >>>>>>>>>>>>> There is now an awful amount of different timestamps in the >>>>>>>>>>>>> Management class. Can they be consolidated somehow? At least >>>>>>>>>>>>> _begin_vm_creation_time and the new _begin_vm_creation_ns. >>>>>>>>>>>>> >>>>>>>>>>>>> This discussion also implies that the "elapsed time" we >>>>>>>>>>>>> print in >>>>>>>>>>>>> the >>>>>>>>>>>>> hs_err file is also wrong. It uses os::elapsedTime() which uses >>>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>>>> >>>>>>>>>>>>> And I guess the same thing for the VM.uptime Diagnostic Command >>>>>>>>>>>>> (class VMUptimeDCmd) which also relies on >>>>>>>>>>>>> os::elapsed_counter(). >>>>>>>>>>>> >>>>>>>>>>>> Also the reported GC pauses duration might be wrong since it >>>>>>>>>>>> uses >>>>>>>>>>>> Management::timestamp(). >>>>>>>>>>>> >>>>>>>>>>>> On the first sight the change looks rather trivial. But, >>>>>>>>>>>> honestly, >>>>>>>>>>>> I'm not sure which other parts could for whatever reason break >>>>>>>>>>>> once >>>>>>>>>>>> the time-of-day timestamp is replaced with a monotonic >>>>>>>>>>>> equivalent. >>>>>>>>>>>> One would think that it shouldn't matter but one never knows ... >>>>>>>>>>>> >>>>>>>>>>>> Staffan, do you think this kind of change is suitable for the >>>>>>>>>>>> current >>>>>>>>>>>> phase of JDK release cycle? I think I could improve the patch in >>>>>>>>>>>> few >>>>>>>>>>>> days and then it should probably be able to pass the review >>>>>>>>>>>> before >>>>>>>>>>>> ZBB. But, it's only P3 ... >>>>>>>>>>> >>>>>>>>>>> I think it is a bit late in the release cycle to clean this up in >>>>>>>>>>> the >>>>>>>>>>> way it should be cleaned up. I think we should wait until the >>>>>>>>>>> first 8 >>>>>>>>>>> update release and do a more thorough job than we have time for >>>>>>>>>>> right >>>>>>>>>>> now. >>>>>>>>>> >>>>>>>>>> I second that. The elapsed_counter/elpased_timer APIs and >>>>>>>>>> implementations are a tangled mess. But part of the problem has >>>>>>>>>> been >>>>>>>>>> that people want/expect monotonic time-of-day based timestamps >>>>>>>>>> (yes a >>>>>>>>>> contradiction - though some people make sure TOD does not get >>>>>>>>>> modified >>>>>>>>>> on their production systems). The use of timestamps in logging has >>>>>>>>>> to be >>>>>>>>>> examined carefully - mainly GC logging. I recall a "simple" >>>>>>>>>> attempted >>>>>>>>>> change in the past that resulted in trying to compare a nanoTime >>>>>>>>>> based >>>>>>>>>> timestamp with the TOD. :( >>>>>>>>> >>>>>>>>> Actually, if I'm reading the sources right for Solaris and Win the >>>>>>>>> monotonic clock source is used to provide elapsed_counter() >>>>>>>>> value. It >>>>>>>>> falls back to TOD when the monotonic clock source is not available. >>>>>>>>> For Linux/BSD the TOD is used directly. >>>>>>>>> >>>>>>>>> This makes me wonder if changing the linux/bsd implementation to >>>>>>>>> follow the same logic would be really that disruptive. >>>>>>>> >>>>>>>> Good point. I would like a world where elapsed_counter is monotonic >>>>>>>> (where possible). Still a bit scary this late in the release, but an >>>>>>>> interesting experiment. >>>>>>> >>>>>>> The change is rather simple and tests ok. All the means to get a >>>>>>> monotonic timestamp are already there and proved to work. The core >>>>>>> tests >>>>>>> in JPRT went fine. >>>>>>> >>>>>>> The updated webrev is at >>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.02 >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> >>>>>>>> /Staffan >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>> David >>>>>>>>>> ----- >>>>>>>>>> >>>>>>>>>>> /Staffan >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> /Staffan >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 9 okt 2013, at 13:26, Jaroslav Bachorik >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> On 8.10.2013 23:46, David Holmes wrote: >>>>>>>>>>>>>>> On 8/10/2013 10:36 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>>> On 8.10.2013 09:34, David Holmes wrote: >>>>>>>>>>>>>>>>> Jaroslav, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2/10/2013 6:47 PM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> currently the JVM uptime reported by the RuntimeMXBean is >>>>>>>>>>>>>>>>>> based on >>>>>>>>>>>>>>>>>> System.currentTimeMillis() which makes it susceptible to >>>>>>>>>>>>>>>>>> changes of the >>>>>>>>>>>>>>>>>> OS time (eg. changing timezone, NTP synchronization etc.). >>>>>>>>>>>>>>>>>> The >>>>>>>>>>>>>>>>>> uptime >>>>>>>>>>>>>>>>>> should not depend on the system time and should be >>>>>>>>>>>>>>>>>> calculated >>>>>>>>>>>>>>>>>> using a >>>>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> There is already the way to get the actual JVM uptime in >>>>>>>>>>>>>>>>>> ticks. >>>>>>>>>>>>>>>>>> It is >>>>>>>>>>>>>>>>>> accessible as Management::timestamp() and the ticks are >>>>>>>>>>>>>>>>>> convertible to >>>>>>>>>>>>>>>>>> milliseconds using Management::ticks_to_ms(ts_ticks) thus >>>>>>>>>>>>>>>>>> making it >>>>>>>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>> easy to switch to the monotonic clock based uptime. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Maybe I'm missing something but TiumeStamp updates using >>>>>>>>>>>>>>>>> os::elapsed_counter() which on Linux uses gettimeofday >>>>>>>>>>>>>>>>> so is >>>>>>>>>>>>>>>>> not a >>>>>>>>>>>>>>>>> monotonic clock source. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hm, yes. I wasn't aware of this linux/bsd specific. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Is there any reason why a non monotonic clock source is used >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> timestamping except of the historical one? >>>>>>>>>>>>>>>> os::javaTimeNanos() >>>>>>>>>>>>>>>> uses >>>>>>>>>>>>>>>> montonic clock when available - why can't be the same used >>>>>>>>>>>>>>>> for >>>>>>>>>>>>>>>> os::elapsed_counter() especially when a counter based on >>>>>>>>>>>>>>>> "gettimeofday" >>>>>>>>>>>>>>>> is not really a counter? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It is all historical. These elapsed_counters and >>>>>>>>>>>>>>> elapsed_timers >>>>>>>>>>>>>>> make me >>>>>>>>>>>>>>> cringe. But changing it has a lot of potential consequences >>>>>>>>>>>>>>> because of >>>>>>>>>>>>>>> the way these are used in logging etc. Certainly not >>>>>>>>>>>>>>> something >>>>>>>>>>>>>>> to be >>>>>>>>>>>>>>> contemplated at this stage of JDK 8. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Perhaps a simpler fix here is to expose a startUpTimeNanos >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>> can then >>>>>>>>>>>>>>> be used for the uptime. >>>>>>>>>>>>>> >>>>>>>>>>>>>> My attempt at this is at >>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.01/hotspot >>>>>>>>>>>>>> >>>>>>>>>>>>>> I am using os::javaTimeNanos() to get the monotonic ticks >>>>>>>>>>>>>> where >>>>>>>>>>>>>> possible. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The JDK part stays the same as for webrev.00 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> David >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> David >>>>>>>>>>>>>>>>> ----- >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The patch consists of the hotspot and jdk parts. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For the hotspot a new constant needs to be introduced in >>>>>>>>>>>>>>>>>> src/share/vm/services/jmm.h and the actual logic to obtain >>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> uptime in >>>>>>>>>>>>>>>>>> milliseconds is added in >>>>>>>>>>>>>>>>>> src/share/vm/services/management.cpp. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> For the jdk the changes comprise of adding the necessary >>>>>>>>>>>>>>>>>> JNI >>>>>>>>>>>>>>>>>> bridging >>>>>>>>>>>>>>>>>> methods in order to get the new uptime, introducing the >>>>>>>>>>>>>>>>>> same >>>>>>>>>>>>>>>>>> constant >>>>>>>>>>>>>>>>>> that is used in hotspot and changes to mapfile-vers >>>>>>>>>>>>>>>>>> files in >>>>>>>>>>>>>>>>>> order to >>>>>>>>>>>>>>>>>> properly build the native library. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6523160 >>>>>>>>>>>>>>>>>> Webrev: >>>>>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/6523160/webrev.00 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> > From staffan.larsen at oracle.com Fri Oct 18 04:05:17 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Fri, 18 Oct 2013 13:05:17 +0200 Subject: jmx-dev [PING] RFR: 8024613 javax/management/remote/mandatory/connection/RMIConnector_NPETest.java failing intermittently In-Reply-To: <525EA65F.9040509@oracle.com> References: <523C3F8B.6080002@oracle.com> <523C459A.3080303@oracle.com> <524BFB87.10808@oracle.com> <525EA65F.9040509@oracle.com> Message-ID: <306F58ED-6630-4AB0-877E-49083631C4C2@oracle.com> Looks good! Thanks, /Staffan On 16 okt 2013, at 16:44, Jaroslav Bachorik wrote: > On 2.10.2013 12:55, Jaroslav Bachorik wrote: >> On 20.9.2013 14:54, shanliang wrote: >>> Jaroslav, >>> >>> It is a good idea to use the RMI Testlibrary. >>> >>> Better to call: >>> agent.close(); >>> >>> at Line 55, close the RMIRegistry (rmid.shutdown(rmidPort) Line 55) >>> does not ensure the JMX connector doing full clean, it is always better >>> to do clean within a test. >> >> Thanks. Implemented. >> >> http://cr.openjdk.java.net/~jbachorik/8024613/webrev.01 >> >> -JB- >> >>> >>> Shanliang >>> >>> >>> Jaroslav Bachorik wrote: >>>> Please, review the following change for JDK-8024613 >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8024613 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8024613/webrev.00/ >>>> >>>> >>>> The patch takes care of intermittent test failures caused by timing >>>> issues when starting the RMID process. It could happen that the RMID >>>> process hasn't been properly initialized in the timeframe of 5 seconds >>>> and the test would fail. >>>> >>>> The patch replaces the home-brewed RMID process management with the >>>> one available in the RMI Testlibrary which is used by more tests and >>>> therefore should be more stable. >>>> >>>> Thanks, >>>> >>>> -JB- >>> >> > From staffan.larsen at oracle.com Fri Oct 18 04:09:07 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Fri, 18 Oct 2013 13:09:07 +0200 Subject: jmx-dev RFR 7197919: java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency issues In-Reply-To: <525EA03F.2070106@oracle.com> References: <525EA03F.2070106@oracle.com> Message-ID: <5D7A8C19-DB9C-4CE7-B18A-E22C74C0794C@oracle.com> Looks good! Nit: for(int i=0;i<100;i++) should have more spaces: for (int i = 0; i < 100; i++) Thanks, /Staffan On 16 okt 2013, at 16:18, Jaroslav Bachorik wrote: > Please, review this simple test change. > > The test tries to get the number of times a certain thread was blocked during the test run and intermittently fails with the difference of 1 - the expected number is 4 but the reported number is 3. > > When updating the thread statistics (the blocked count in this case) no lock is used so there might be stale data when the ThreadMXBean retrieves the stats. The patch tries to workaround this problem by retrying a few times with the added delay. The test will try to obtain the correct result for at most 10 seconds - after that it will fail if the retrieved blocked count does not equal the expected blocked count. > > Issue : https://bugs.openjdk.java.net/browse/JDK-7197919 > Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00 > > Thanks, > > -JB- From shanliang.jiang at oracle.com Fri Oct 18 07:57:42 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Fri, 18 Oct 2013 16:57:42 +0200 Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report some issue in com.sun.jmx.snmp package In-Reply-To: <525FB3AB.3040105@oracle.com> References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com> <525FB3AB.3040105@oracle.com> Message-ID: <52614C66.6040507@oracle.com> Thanks Paul and Daniel for the review. Shanliang Daniel Fuchs wrote: > Hi Shanliang, > > Looks good! > > -- daniel > > On 10/16/13 3:58 PM, shanliang wrote: >> Hi, >> >> Please review the following fix, main issue here is that we should clone >> an internal variable before returning. >> >> webrev: >> http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/ >> >> bug >> https://bugs.openjdk.java.net/browse/JDK-8026028 >> >> Thanks, >> Shanliang >> >> >> >> > From mandy.chung at oracle.com Fri Oct 18 09:42:32 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Fri, 18 Oct 2013 09:42:32 -0700 Subject: jmx-dev RFR 7197919: java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency issues In-Reply-To: <525EA03F.2070106@oracle.com> References: <525EA03F.2070106@oracle.com> Message-ID: <526164F8.1070501@oracle.com> On 10/16/2013 7:18 AM, Jaroslav Bachorik wrote: > Please, review this simple test change. > > The test tries to get the number of times a certain thread was blocked > during the test run and intermittently fails with the difference of 1 > - the expected number is 4 but the reported number is 3. > > When updating the thread statistics (the blocked count in this case) > no lock is used so there might be stale data when the ThreadMXBean > retrieves the stats. The patch tries to workaround this problem by > retrying a few times with the added delay. The test will try to obtain > the correct result for at most 10 seconds - after that it will fail if > the retrieved blocked count does not equal the expected blocked count. > > Issue : https://bugs.openjdk.java.net/browse/JDK-7197919 > Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00 Looks okay. I notice that existing code that catches InterruptedException only sets testFailed to true but continue. I think it might be good to fix them to return if IE is caught to fail-fast like what your fix does. Mandy From jaroslav.bachorik at oracle.com Mon Oct 21 01:47:49 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 21 Oct 2013 10:47:49 +0200 Subject: jmx-dev RFR 7197919: java/lang/management/ThreadMXBean/ThreadBlockedCount.java has concurency issues In-Reply-To: <526164F8.1070501@oracle.com> References: <525EA03F.2070106@oracle.com> <526164F8.1070501@oracle.com> Message-ID: <5264EA35.2070605@oracle.com> On 18.10.2013 18:42, Mandy Chung wrote: > On 10/16/2013 7:18 AM, Jaroslav Bachorik wrote: >> Please, review this simple test change. >> >> The test tries to get the number of times a certain thread was blocked >> during the test run and intermittently fails with the difference of 1 >> - the expected number is 4 but the reported number is 3. >> >> When updating the thread statistics (the blocked count in this case) >> no lock is used so there might be stale data when the ThreadMXBean >> retrieves the stats. The patch tries to workaround this problem by >> retrying a few times with the added delay. The test will try to obtain >> the correct result for at most 10 seconds - after that it will fail if >> the retrieved blocked count does not equal the expected blocked count. >> >> Issue : https://bugs.openjdk.java.net/browse/JDK-7197919 >> Webrev: http://cr.openjdk.java.net/~jbachorik/7197919/webrev.00 > > Looks okay. I notice that existing code that catches > InterruptedException only sets testFailed to true but continue. I think > it might be good to fix them to return if IE is caught to fail-fast like > what your fix does. Unfortunately, it's not possible to directly return in those cases. The synchronization logic relies on the code passing through all the "signal"/"waitForSignal" pairs for the test to finish - otherwise the test might just hang. I have at least added loop breaks to fail a bit faster in case of IE. -JB- > > Mandy From jaroslav.bachorik at oracle.com Mon Oct 21 04:03:04 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 21 Oct 2013 13:03:04 +0200 Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails intermittently Message-ID: <526509E8.4030002@oracle.com> Hi, please, review the following small test change: Issue: https://bugs.openjdk.java.net/browse/JDK-7140929 Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00 The test fails intermittently, mostly when it is run with -Xcomp option. The failure is due to fixed timeout used in the test when waiting for the notifications arrival. Tests of such slow configurations are run with "timeoutfactor" set but the NotSerializableNotifTest does not respect the timeoutfactor. The patch allows the test to reflect the provided "timeoutfactor" and therefore successfully pass even when -Xcomp is used. Thanks, -JB- From Alan.Bateman at oracle.com Mon Oct 21 04:20:17 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Mon, 21 Oct 2013 12:20:17 +0100 Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails intermittently In-Reply-To: <526509E8.4030002@oracle.com> References: <526509E8.4030002@oracle.com> Message-ID: <52650DF1.2050008@oracle.com> On 21/10/2013 12:03, Jaroslav Bachorik wrote: > Hi, > > please, review the following small test change: > > Issue: https://bugs.openjdk.java.net/browse/JDK-7140929 > Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00 > > The test fails intermittently, mostly when it is run with -Xcomp > option. The failure is due to fixed timeout used in the test when > waiting for the notifications arrival. Tests of such slow > configurations are run with "timeoutfactor" set but the > NotSerializableNotifTest does not respect the timeoutfactor. > > The patch allows the test to reflect the provided "timeoutfactor" and > therefore successfully pass even when -Xcomp is used. Good to see test.timeout.factor being used (I think a lot of tests could benefit from using it). The change in the webrev looks okay, in the sense that you have scaled the existing 10s timeout. -Alan. From shanliang.jiang at oracle.com Mon Oct 21 04:45:53 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Mon, 21 Oct 2013 13:45:53 +0200 Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails intermittently In-Reply-To: <526509E8.4030002@oracle.com> References: <526509E8.4030002@oracle.com> Message-ID: <526513F1.4080700@oracle.com> Jaroslav, Look fine to me, thanks to fix the timing. Next time we may need to fix its fixed port:) Shanliang Jaroslav Bachorik wrote: > Hi, > > please, review the following small test change: > > Issue: https://bugs.openjdk.java.net/browse/JDK-7140929 > Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00 > > The test fails intermittently, mostly when it is run with -Xcomp > option. The failure is due to fixed timeout used in the test when > waiting for the notifications arrival. Tests of such slow > configurations are run with "timeoutfactor" set but the > NotSerializableNotifTest does not respect the timeoutfactor. > > The patch allows the test to reflect the provided "timeoutfactor" and > therefore successfully pass even when -Xcomp is used. > > Thanks, > > -JB- From jaroslav.bachorik at oracle.com Mon Oct 21 04:55:50 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 21 Oct 2013 13:55:50 +0200 Subject: jmx-dev RFR 6309226: TEST: java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't check Thread.sleep Message-ID: <52651646.4050705@oracle.com> Please, review this small patch for a test failing due to the updated implementation in JDK6. Issue: https://bugs.openjdk.java.net/browse/JDK-6309226 Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/ The test fails due to the change in mustang where ThreadMXBean.getThreadInfo().getWaitedTime() and ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() too. Unfortunately, Thread.sleep() is used throughout the test for synchronization purposes and this breaks the test. In the patch I propose to replace Thread.sleep() with busy wait and hinting the scheduler by Thread.yield(). While not very elegant it successfully works around inclusion of unknown number of Thread.sleep()s (they are called in loop). Thanks, -JB- From paul.sandoz at oracle.com Thu Oct 17 03:28:14 2013 From: paul.sandoz at oracle.com (Paul Sandoz) Date: Thu, 17 Oct 2013 12:28:14 +0200 Subject: jmx-dev Codereview request: 8026028 [findbugs] findbugs report some issue in com.sun.jmx.snmp package In-Reply-To: <525E9B87.9050406@oracle.com> References: <508E8F79.60909@oracle.com> <525E9B87.9050406@oracle.com> Message-ID: On Oct 16, 2013, at 3:58 PM, shanliang wrote: > Hi, > > Please review the following fix, main issue here is that we should clone an internal variable before returning. > > webrev: > http://cr.openjdk.java.net/~sjiang/JDK-8026028/00/ > > bug > https://bugs.openjdk.java.net/browse/JDK-8026028 > +1. Paul. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 841 bytes Desc: Message signed with OpenPGP using GPGMail Url : http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131017/291bbc5f/signature.asc From jaroslav.bachorik at oracle.com Mon Oct 21 07:46:48 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Mon, 21 Oct 2013 16:46:48 +0200 Subject: jmx-dev RFR 7112404: 2 tests in java/lang/management/ManagementFactory fails with G1 because expect non-zero pools Message-ID: <52653E58.9070508@oracle.com> Please, review this simple test fix. Issue: https://bugs.openjdk.java.net/browse/JDK-7112404 Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00 The tests assume the MemoryUsage#commited values to be positive (>0) while according to the MemoryUsage only negative values are invalid. Therefore the tests should check and fail only when MemoryUsage#commited is < 0. Thanks, -JB- From daniel.fuchs at oracle.com Mon Oct 21 07:56:07 2013 From: daniel.fuchs at oracle.com (Daniel Fuchs) Date: Mon, 21 Oct 2013 16:56:07 +0200 Subject: jmx-dev RFR 7140929: NotSerializableNotifTest.java fails intermittently In-Reply-To: <526509E8.4030002@oracle.com> References: <526509E8.4030002@oracle.com> Message-ID: <52654087.4020905@oracle.com> On 10/21/13 1:03 PM, Jaroslav Bachorik wrote: > Hi, > > please, review the following small test change: > > Issue: https://bugs.openjdk.java.net/browse/JDK-7140929 > Webrev: http://cr.openjdk.java.net/~jbachorik/7140929/webrev.00 > > The test fails intermittently, mostly when it is run with -Xcomp option. > The failure is due to fixed timeout used in the test when waiting for > the notifications arrival. Tests of such slow configurations are run > with "timeoutfactor" set but the NotSerializableNotifTest does not > respect the timeoutfactor. > > The patch allows the test to reflect the provided "timeoutfactor" and > therefore successfully pass even when -Xcomp is used. > > Thanks, > > -JB- Hi Jaroslav, Looks good to me. I didn't know timeoutFactor was availaible as a system property. You can probably simplify the code like this: private static double timeoutFactor; ... main(...) { ... timeoutFactor = Double.parseDouble( System.getProperty("test.timeout.factor", "1.0") ); } (no need for the timeoutVal variable) regards, -- daniel From shanliang.jiang at oracle.com Mon Oct 21 08:14:18 2013 From: shanliang.jiang at oracle.com (shanliang) Date: Mon, 21 Oct 2013 17:14:18 +0200 Subject: jmx-dev RFR 7112404: 2 tests in java/lang/management/ManagementFactory fails with G1 because expect non-zero pools In-Reply-To: <52653E58.9070508@oracle.com> References: <52653E58.9070508@oracle.com> Message-ID: <526544CA.1030601@oracle.com> Looks OK. 164 // sanity check to have non-zero usage should be changed to ? 164 // sanity check to have non-negative usage Shanliang Jaroslav Bachorik wrote: > Please, review this simple test fix. > > Issue: https://bugs.openjdk.java.net/browse/JDK-7112404 > Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00 > > The tests assume the MemoryUsage#commited values to be positive (>0) > while according to the MemoryUsage only negative values are invalid. > Therefore the tests should check and fail only when > MemoryUsage#commited is < 0. > > Thanks, > > -JB- From david.holmes at oracle.com Tue Oct 22 00:58:21 2013 From: david.holmes at oracle.com (David Holmes) Date: Tue, 22 Oct 2013 17:58:21 +1000 Subject: jmx-dev RFR 6309226: TEST: java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't check Thread.sleep In-Reply-To: <52651646.4050705@oracle.com> References: <52651646.4050705@oracle.com> Message-ID: <5266301D.5040002@oracle.com> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote: > Please, review this small patch for a test failing due to the updated > implementation in JDK6. > > Issue: https://bugs.openjdk.java.net/browse/JDK-6309226 > Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/ > > The test fails due to the change in mustang where > ThreadMXBean.getThreadInfo().getWaitedTime() and > ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() > too. Unfortunately, Thread.sleep() is used throughout the test for > synchronization purposes and this breaks the test. > > In the patch I propose to replace Thread.sleep() with busy wait and > hinting the scheduler by Thread.yield(). While not very elegant it > successfully works around inclusion of unknown number of Thread.sleep()s > (they are called in loop). Not elegant and not completely reliable either. Probably adequate on a multi-core system but single-core and with some schedulers it could just be a busy spin. David > Thanks, > > -JB- From jaroslav.bachorik at oracle.com Tue Oct 22 04:03:48 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 22 Oct 2013 13:03:48 +0200 Subject: jmx-dev RFR 6309226: TEST: java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't check Thread.sleep In-Reply-To: <5266301D.5040002@oracle.com> References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com> Message-ID: <52665B94.8090902@oracle.com> On 22.10.2013 09:58, David Holmes wrote: > On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote: >> Please, review this small patch for a test failing due to the updated >> implementation in JDK6. >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-6309226 >> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/ >> >> The test fails due to the change in mustang where >> ThreadMXBean.getThreadInfo().getWaitedTime() and >> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() >> too. Unfortunately, Thread.sleep() is used throughout the test for >> synchronization purposes and this breaks the test. >> >> In the patch I propose to replace Thread.sleep() with busy wait and >> hinting the scheduler by Thread.yield(). While not very elegant it >> successfully works around inclusion of unknown number of Thread.sleep()s >> (they are called in loop). > > Not elegant and not completely reliable either. Probably adequate on a > multi-core system but single-core and with some schedulers it could just > be a busy spin. :/ Ok, so I need to account for the Thread.sleep() calls made outside of the test code but still reported by the ThreadMXBean. Not that elegant, too, but at least should be reliable. http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01 -JB- > > David > >> Thanks, >> >> -JB- From jaroslav.bachorik at oracle.com Tue Oct 22 06:47:41 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 22 Oct 2013 15:47:41 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool Message-ID: <526681FD.90604@oracle.com> Please, review the following test fix: Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 The test tries to make sure that the "pool usage threshold" trigger and the reported pool memory usage are not contradicting each other. The problem is that it is not possible to get the "pool usage threshold exceeded" flag and the pool memory usage atomicly in regard to the GC. Specifically, when "CMS Old Gen" pool is examined and the usage is retrieved before a GC promotes some objects to the old gen but the usage threshold is checked after the GC has promoted some instance into the old gen the test will fail. The patch makes sure that there are some instances promoted in "CMS Old Gen" before checking the "pool usage threshold" to get semi-consistent view. Thanks, -JB- From mandy.chung at oracle.com Tue Oct 22 12:38:56 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 22 Oct 2013 12:38:56 -0700 Subject: jmx-dev RFR 7112404: 2 tests in java/lang/management/ManagementFactory fails with G1 because expect non-zero pools In-Reply-To: <52653E58.9070508@oracle.com> References: <52653E58.9070508@oracle.com> Message-ID: <5266D450.5050506@oracle.com> On 10/21/13 7:46 AM, Jaroslav Bachorik wrote: > Please, review this simple test fix. > > Issue: https://bugs.openjdk.java.net/browse/JDK-7112404 > Webrev: http://cr.openjdk.java.net/~jbachorik/7112404/webrev.00 > Looks okay to me. Mandy > The tests assume the MemoryUsage#commited values to be positive (>0) > while according to the MemoryUsage only negative values are invalid. > Therefore the tests should check and fail only when > MemoryUsage#commited is < 0. > > Thanks, > > -JB- From mandy.chung at oracle.com Tue Oct 22 13:04:38 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Tue, 22 Oct 2013 13:04:38 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <526681FD.90604@oracle.com> References: <526681FD.90604@oracle.com> Message-ID: <5266DA56.6050609@oracle.com> Hi Jaroslav, On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: > Please, review the following test fix: > > Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 > Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 > Have you considered to force GC when getUsed() == 0 regardless of which memory pool it is? This will avoid special casing for CMS old gen in the test and will handle similar issue in the future for a different collector implementation. To make the test reliable, the test should still pass if the memory pool has no object in it (G1 survivor space case?). Mandy > The test tries to make sure that the "pool usage threshold" trigger > and the reported pool memory usage are not contradicting each other. > The problem is that it is not possible to get the "pool usage > threshold exceeded" flag and the pool memory usage atomicly in regard > to the GC. Specifically, when "CMS Old Gen" pool is examined and the > usage is retrieved before a GC promotes some objects to the old gen > but the usage threshold is checked after the GC has promoted some > instance into the old gen the test will fail. > > The patch makes sure that there are some instances promoted in "CMS > Old Gen" before checking the "pool usage threshold" to get > semi-consistent view. > > Thanks, > > -JB- From david.holmes at oracle.com Tue Oct 22 17:40:21 2013 From: david.holmes at oracle.com (David Holmes) Date: Wed, 23 Oct 2013 10:40:21 +1000 Subject: jmx-dev RFR 6309226: TEST: java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't check Thread.sleep In-Reply-To: <52665B94.8090902@oracle.com> References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com> <52665B94.8090902@oracle.com> Message-ID: <52671AF5.8050703@oracle.com> On 22/10/2013 9:03 PM, Jaroslav Bachorik wrote: > On 22.10.2013 09:58, David Holmes wrote: >> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote: >>> Please, review this small patch for a test failing due to the updated >>> implementation in JDK6. >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-6309226 >>> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/ >>> >>> The test fails due to the change in mustang where >>> ThreadMXBean.getThreadInfo().getWaitedTime() and >>> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() >>> too. Unfortunately, Thread.sleep() is used throughout the test for >>> synchronization purposes and this breaks the test. >>> >>> In the patch I propose to replace Thread.sleep() with busy wait and >>> hinting the scheduler by Thread.yield(). While not very elegant it >>> successfully works around inclusion of unknown number of Thread.sleep()s >>> (they are called in loop). >> >> Not elegant and not completely reliable either. Probably adequate on a >> multi-core system but single-core and with some schedulers it could just >> be a busy spin. > > :/ Ok, so I need to account for the Thread.sleep() calls made outside of > the test code but still reported by the ThreadMXBean. Not that elegant, > too, but at least should be reliable. > > http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01 Sorry, I can't follow the logic of that test enough to determine whether this compensating logic is correct. Whether this is more reliable depends on whether the 5% tolerance in timeRangeCheck is enough to account for all the potential inaccuracies in the time measurements. On a lightly loaded system it may be, but otherwise ... a context switch after determining the base time and doing the sleep could add an arbitrary load and cpu-dependent delay. It might be less reliable than the yield approach :( I can't help wonder whether there is a more explicit synchronization mechanism that will avoid the need for goSleep? But that becomes a much bigger task to deal with. I will leave this for the serviceability team to determine the best course of action. Thanks, David > -JB- > >> >> David >> >>> Thanks, >>> >>> -JB- > From jaroslav.bachorik at oracle.com Wed Oct 23 00:42:08 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 09:42:08 +0200 Subject: jmx-dev RFR 6309226: TEST: java/lang/management/ThreadMXBean/SynchronizationStatistics.java didn't check Thread.sleep In-Reply-To: <52671AF5.8050703@oracle.com> References: <52651646.4050705@oracle.com> <5266301D.5040002@oracle.com> <52665B94.8090902@oracle.com> <52671AF5.8050703@oracle.com> Message-ID: <52677DD0.7000808@oracle.com> On 23.10.2013 02:40, David Holmes wrote: > On 22/10/2013 9:03 PM, Jaroslav Bachorik wrote: >> On 22.10.2013 09:58, David Holmes wrote: >>> On 21/10/2013 9:55 PM, Jaroslav Bachorik wrote: >>>> Please, review this small patch for a test failing due to the updated >>>> implementation in JDK6. >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-6309226 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/6309226/webrev.00/ >>>> >>>> The test fails due to the change in mustang where >>>> ThreadMXBean.getThreadInfo().getWaitedTime() and >>>> ThreadMXBean.getThreadInfo().getWaitedCount() include Thread.sleep() >>>> too. Unfortunately, Thread.sleep() is used throughout the test for >>>> synchronization purposes and this breaks the test. >>>> >>>> In the patch I propose to replace Thread.sleep() with busy wait and >>>> hinting the scheduler by Thread.yield(). While not very elegant it >>>> successfully works around inclusion of unknown number of >>>> Thread.sleep()s >>>> (they are called in loop). >>> >>> Not elegant and not completely reliable either. Probably adequate on a >>> multi-core system but single-core and with some schedulers it could just >>> be a busy spin. >> >> :/ Ok, so I need to account for the Thread.sleep() calls made outside of >> the test code but still reported by the ThreadMXBean. Not that elegant, >> too, but at least should be reliable. >> >> http://cr.openjdk.java.net/~jbachorik/6309226/webrev.01 > > Sorry, I can't follow the logic of that test enough to determine whether > this compensating logic is correct. It simply adds the number of times and the time spent in sleeping during calls to goSleep() from the BlockedThread (the one that actually counts). It seems to be correct - otherwise the test would fail because the numbers wouldn't match. > > Whether this is more reliable depends on whether the 5% tolerance in > timeRangeCheck is enough to account for all the potential inaccuracies > in the time measurements. On a lightly loaded system it may be, but > otherwise ... a context switch after determining the base time and doing > the sleep could add an arbitrary load and cpu-dependent delay. It might > be less reliable than the yield approach :( I wonder how would "yield" in busy wait behave on a single core architecture. I need the second thread to progress while busy-waiting ... > > I can't help wonder whether there is a more explicit synchronization > mechanism that will avoid the need for goSleep? But that becomes a much > bigger task to deal with. Yes. The only task of this fix is to enable the test to be run even after Thread.sleep() started to be included in the waited time (sometime in JDK6 timeframe). I suppose the test was successfully used before the change and if there are any problems with timing additional issues will be filed and the test will be redesigned. For now I would like to keep the change simple and really focus on making the test executable on JDK7/8. > > I will leave this for the serviceability team to determine the best > course of action. Thanks for valuable comments, anyway. -JB- > > Thanks, > David > >> -JB- >> >>> >>> David >>> >>>> Thanks, >>>> >>>> -JB- >> From jaroslav.bachorik at oracle.com Wed Oct 23 01:02:08 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 10:02:08 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5266DA56.6050609@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> Message-ID: <52678280.1070004@oracle.com> On 22.10.2013 22:04, Mandy Chung wrote: > Hi Jaroslav, > > On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >> Please, review the following test fix: >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >> > > Have you considered to force GC when getUsed() == 0 regardless of which > memory pool it is? This will avoid special casing for CMS old gen in > the test and will handle similar issue in the future for a different > collector implementation. To make the test reliable, the test should > still pass if the memory pool has no object in it (G1 survivor space > case?). Hi Mandy, I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt. The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold. -JB- > > Mandy > >> The test tries to make sure that the "pool usage threshold" trigger >> and the reported pool memory usage are not contradicting each other. >> The problem is that it is not possible to get the "pool usage >> threshold exceeded" flag and the pool memory usage atomicly in regard >> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >> usage is retrieved before a GC promotes some objects to the old gen >> but the usage threshold is checked after the GC has promoted some >> instance into the old gen the test will fail. >> >> The patch makes sure that there are some instances promoted in "CMS >> Old Gen" before checking the "pool usage threshold" to get >> semi-consistent view. >> >> Thanks, >> >> -JB- > From staffan.larsen at oracle.com Wed Oct 23 01:08:24 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 23 Oct 2013 10:08:24 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52678280.1070004@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> Message-ID: <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors. /Staffan On 23 okt 2013, at 10:02, Jaroslav Bachorik wrote: > On 22.10.2013 22:04, Mandy Chung wrote: >> Hi Jaroslav, >> >> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>> Please, review the following test fix: >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>> >> >> Have you considered to force GC when getUsed() == 0 regardless of which >> memory pool it is? This will avoid special casing for CMS old gen in >> the test and will handle similar issue in the future for a different >> collector implementation. To make the test reliable, the test should >> still pass if the memory pool has no object in it (G1 survivor space >> case?). > > Hi Mandy, > > I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt. > > The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold. > > -JB- > >> >> Mandy >> >>> The test tries to make sure that the "pool usage threshold" trigger >>> and the reported pool memory usage are not contradicting each other. >>> The problem is that it is not possible to get the "pool usage >>> threshold exceeded" flag and the pool memory usage atomicly in regard >>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >>> usage is retrieved before a GC promotes some objects to the old gen >>> but the usage threshold is checked after the GC has promoted some >>> instance into the old gen the test will fail. >>> >>> The patch makes sure that there are some instances promoted in "CMS >>> Old Gen" before checking the "pool usage threshold" to get >>> semi-consistent view. >>> >>> Thanks, >>> >>> -JB- >> > From jaroslav.bachorik at oracle.com Wed Oct 23 01:12:57 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 10:12:57 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> Message-ID: <52678509.2020002@oracle.com> On 23.10.2013 10:08, Staffan Larsen wrote: > I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors. Hm, will it now? I had the impression that it was just hinting the GC system to perform GC but it might decide to ignore it. I need to be sure that the GC was performed before continuing - otherwise I might get inconsistent data again. -JB- > > /Staffan > > On 23 okt 2013, at 10:02, Jaroslav Bachorik wrote: > >> On 22.10.2013 22:04, Mandy Chung wrote: >>> Hi Jaroslav, >>> >>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>> Please, review the following test fix: >>>> >>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>> >>> >>> Have you considered to force GC when getUsed() == 0 regardless of which >>> memory pool it is? This will avoid special casing for CMS old gen in >>> the test and will handle similar issue in the future for a different >>> collector implementation. To make the test reliable, the test should >>> still pass if the memory pool has no object in it (G1 survivor space >>> case?). >> >> Hi Mandy, >> >> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt. >> >> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold. >> >> -JB- >> >>> >>> Mandy >>> >>>> The test tries to make sure that the "pool usage threshold" trigger >>>> and the reported pool memory usage are not contradicting each other. >>>> The problem is that it is not possible to get the "pool usage >>>> threshold exceeded" flag and the pool memory usage atomicly in regard >>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >>>> usage is retrieved before a GC promotes some objects to the old gen >>>> but the usage threshold is checked after the GC has promoted some >>>> instance into the old gen the test will fail. >>>> >>>> The patch makes sure that there are some instances promoted in "CMS >>>> Old Gen" before checking the "pool usage threshold" to get >>>> semi-consistent view. >>>> >>>> Thanks, >>>> >>>> -JB- >>> >> > From staffan.larsen at oracle.com Wed Oct 23 01:18:49 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Wed, 23 Oct 2013 10:18:49 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52678509.2020002@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> Message-ID: <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> On 23 okt 2013, at 10:12, Jaroslav Bachorik wrote: > On 23.10.2013 10:08, Staffan Larsen wrote: >> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors. > > Hm, will it now? I had the impression that it was just hinting the GC system to perform GC but it might decide to ignore it. I need to be sure that the GC was performed before continuing - otherwise I might get inconsistent data again. According to the spec it's just a hint, but I think the implementation happens to be a force. But better safe than sorry. :) /Staffan > > -JB- > >> >> /Staffan >> >> On 23 okt 2013, at 10:02, Jaroslav Bachorik wrote: >> >>> On 22.10.2013 22:04, Mandy Chung wrote: >>>> Hi Jaroslav, >>>> >>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>> Please, review the following test fix: >>>>> >>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>> >>>> >>>> Have you considered to force GC when getUsed() == 0 regardless of which >>>> memory pool it is? This will avoid special casing for CMS old gen in >>>> the test and will handle similar issue in the future for a different >>>> collector implementation. To make the test reliable, the test should >>>> still pass if the memory pool has no object in it (G1 survivor space >>>> case?). >>> >>> Hi Mandy, >>> >>> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt. >>> >>> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold. >>> >>> -JB- >>> >>>> >>>> Mandy >>>> >>>>> The test tries to make sure that the "pool usage threshold" trigger >>>>> and the reported pool memory usage are not contradicting each other. >>>>> The problem is that it is not possible to get the "pool usage >>>>> threshold exceeded" flag and the pool memory usage atomicly in regard >>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >>>>> usage is retrieved before a GC promotes some objects to the old gen >>>>> but the usage threshold is checked after the GC has promoted some >>>>> instance into the old gen the test will fail. >>>>> >>>>> The patch makes sure that there are some instances promoted in "CMS >>>>> Old Gen" before checking the "pool usage threshold" to get >>>>> semi-consistent view. >>>>> >>>>> Thanks, >>>>> >>>>> -JB- >>>> >>> >> > From bengt.rutisson at oracle.com Wed Oct 23 05:40:13 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Wed, 23 Oct 2013 14:40:13 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> Message-ID: <5267C3AD.5050306@oracle.com> Hi Jaroslav, A couple of questions. I don't understand why this is a CMS only problem? Why don't the other collectors have the same issue? I guess it is less likely that the other collectors start (or complete) a GC without a lot of allocation going on. But at least G1 should have the same problem. Also, from the problem description in the CR I would have guessed that you want the GC to happen between these two statements: p.setUsageThreshold(1); MemoryUsage u = p.getUsage(); Now you have added the GC just after these statements. I thought that was what caused the problem. That you read the usage data at one point, then a GC happens and you compare the cached usage data to the new data that p.isUsageThresholdExceeded() will fetch. Looking at the promoteToOldGen() method I assume that the intent is that the code should be using the return value. So my guess is that this code: 94 if (p.getName().equals("CMS Old Gen")) { 95 promoteToOldGen(p, u); 96 } Should be: 94 if (p.getName().equals("CMS Old Gen")) { 95 u = promoteToOldGen(p, u); 96 } With that, I think it might work. But I still don't understand why this is only a CMS problem. One more question about the promoteToOldGen() and forceGC() methods. I don't really know much about how the different beans work, but are we sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same pool names? That is, are you sure that forceGC() actually will do anything? As for just doing a System.gc() to force a GC I think you can rely on that System.gc() does a full GC in Hotspot unless someone sets -XX:+DisableExplicitGC on the command line. Considering that you are relying on Hotspot specifc names for pools I don't think it is a limitation to the test to rely on the Hotspot implementatoin of System.gc(). Thanks, Bengt On 2013-10-23 10:18, Staffan Larsen wrote: > On 23 okt 2013, at 10:12, Jaroslav Bachorik wrote: > >> On 23.10.2013 10:08, Staffan Larsen wrote: >>> I think you can simplify the logic for forcing a GC to just a simple call to "System.gc();". AFAIK System.gc() will cause a full collection to happen for all collectors. >> Hm, will it now? I had the impression that it was just hinting the GC system to perform GC but it might decide to ignore it. I need to be sure that the GC was performed before continuing - otherwise I might get inconsistent data again. > According to the spec it's just a hint, but I think the implementation happens to be a force. But better safe than sorry. :) > > /Staffan > >> -JB- >> >>> /Staffan >>> >>> On 23 okt 2013, at 10:02, Jaroslav Bachorik wrote: >>> >>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>> Hi Jaroslav, >>>>> >>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>> Please, review the following test fix: >>>>>> >>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>> >>>>> Have you considered to force GC when getUsed() == 0 regardless of which >>>>> memory pool it is? This will avoid special casing for CMS old gen in >>>>> the test and will handle similar issue in the future for a different >>>>> collector implementation. To make the test reliable, the test should >>>>> still pass if the memory pool has no object in it (G1 survivor space >>>>> case?). >>>> Hi Mandy, >>>> >>>> I don't know whether GC will help for other pools - but I can enable it for all pools - it should not hurt. >>>> >>>> The test should pass even with on object in the monitored pool since the pool should not report an exceeded threshold. >>>> >>>> -JB- >>>> >>>>> Mandy >>>>> >>>>>> The test tries to make sure that the "pool usage threshold" trigger >>>>>> and the reported pool memory usage are not contradicting each other. >>>>>> The problem is that it is not possible to get the "pool usage >>>>>> threshold exceeded" flag and the pool memory usage atomicly in regard >>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >>>>>> usage is retrieved before a GC promotes some objects to the old gen >>>>>> but the usage threshold is checked after the GC has promoted some >>>>>> instance into the old gen the test will fail. >>>>>> >>>>>> The patch makes sure that there are some instances promoted in "CMS >>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>> semi-consistent view. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> -JB- From jaroslav.bachorik at oracle.com Wed Oct 23 05:55:43 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 14:55:43 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267C3AD.5050306@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> Message-ID: <5267C74F.2010302@oracle.com> Hi Bengt, On 23.10.2013 14:40, Bengt Rutisson wrote: > > Hi Jaroslav, > > A couple of questions. > > I don't understand why this is a CMS only problem? Why don't the other > collectors have the same issue? I guess it is less likely that the other > collectors start (or complete) a GC without a lot of allocation going > on. But at least G1 should have the same problem. I don't really know. If there are other pools that can have the "used" value 0 before a GC happens then yes, they are susceptible to the same problem. > > Also, from the problem description in the CR I would have guessed that > you want the GC to happen between these two statements: > > p.setUsageThreshold(1); > MemoryUsage u = p.getUsage(); This is all but a heuristic here. The problem lies in the fact that it is not possible to retrieve the pool usage and the "threshold exceeded" flag consistently in one atomic operation. I might get usable data from the first call and then I don't need to force GC. > > Now you have added the GC just after these statements. I thought that > was what caused the problem. That you read the usage data at one point, > then a GC happens and you compare the cached usage > data to the new data that p.isUsageThresholdExceeded() will fetch. > > Looking at the promoteToOldGen() method I assume that the intent is that > the code should be using the return value. So my guess is that this code: > > 94 if (p.getName().equals("CMS Old Gen")) { > 95 promoteToOldGen(p, u); > 96 } > > Should be: > > 94 if (p.getName().equals("CMS Old Gen")) { > 95 u = promoteToOldGen(p, u); > 96 } Indeed. It was meant to re-fetch the usage after GC. > > With that, I think it might work. But I still don't understand why this > is only a CMS problem. > > One more question about the promoteToOldGen() and forceGC() methods. I > don't really know much about how the different beans work, but are we > sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same > pool names? That is, are you sure that forceGC() actually will do anything? They use the pool names as reported by the GC infrastracture so they should be the same. > > As for just doing a System.gc() to force a GC I think you can rely on > that System.gc() does a full GC in Hotspot unless someone sets > -XX:+DisableExplicitGC on the command line. Considering that you are > relying on Hotspot specifc names for pools I don't think it is a > limitation to the test to rely on the Hotspot implementatoin of > System.gc(). Good to know. I guess I could simplify the change and just call System.gc(), after all. Thanks, -JB- > > Thanks, > Bengt > > > > > On 2013-10-23 10:18, Staffan Larsen wrote: >> On 23 okt 2013, at 10:12, Jaroslav Bachorik >> wrote: >> >>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>> I think you can simplify the logic for forcing a GC to just a simple >>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>> collection to happen for all collectors. >>> Hm, will it now? I had the impression that it was just hinting the GC >>> system to perform GC but it might decide to ignore it. I need to be >>> sure that the GC was performed before continuing - otherwise I might >>> get inconsistent data again. >> According to the spec it's just a hint, but I think the implementation >> happens to be a force. But better safe than sorry. :) >> >> /Staffan >> >>> -JB- >>> >>>> /Staffan >>>> >>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>> wrote: >>>> >>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>> Hi Jaroslav, >>>>>> >>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>> Please, review the following test fix: >>>>>>> >>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>> >>>>>> Have you considered to force GC when getUsed() == 0 regardless of >>>>>> which >>>>>> memory pool it is? This will avoid special casing for CMS old gen in >>>>>> the test and will handle similar issue in the future for a different >>>>>> collector implementation. To make the test reliable, the test should >>>>>> still pass if the memory pool has no object in it (G1 survivor space >>>>>> case?). >>>>> Hi Mandy, >>>>> >>>>> I don't know whether GC will help for other pools - but I can >>>>> enable it for all pools - it should not hurt. >>>>> >>>>> The test should pass even with on object in the monitored pool >>>>> since the pool should not report an exceeded threshold. >>>>> >>>>> -JB- >>>>> >>>>>> Mandy >>>>>> >>>>>>> The test tries to make sure that the "pool usage threshold" trigger >>>>>>> and the reported pool memory usage are not contradicting each other. >>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>> regard >>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined and the >>>>>>> usage is retrieved before a GC promotes some objects to the old gen >>>>>>> but the usage threshold is checked after the GC has promoted some >>>>>>> instance into the old gen the test will fail. >>>>>>> >>>>>>> The patch makes sure that there are some instances promoted in "CMS >>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>> semi-consistent view. >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> -JB- > From bengt.rutisson at oracle.com Wed Oct 23 06:15:44 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Wed, 23 Oct 2013 15:15:44 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267C74F.2010302@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> Message-ID: <5267CC00.7080509@oracle.com> On 2013-10-23 14:55, Jaroslav Bachorik wrote: > Hi Bengt, > > On 23.10.2013 14:40, Bengt Rutisson wrote: >> >> Hi Jaroslav, >> >> A couple of questions. >> >> I don't understand why this is a CMS only problem? Why don't the other >> collectors have the same issue? I guess it is less likely that the other >> collectors start (or complete) a GC without a lot of allocation going >> on. But at least G1 should have the same problem. > > I don't really know. If there are other pools that can have the "used" > value 0 before a GC happens then yes, they are susceptible to the same > problem. I think all the "old" pools can have 0 used before a GC happens. But except for CMS and G1 it is less likely that a GC happens unless you do allocations. As long as they keep the 0 used the test will pass. So, my guess is that to be on the safe side all "old" pools should make sure to do a full GC first. > >> >> Also, from the problem description in the CR I would have guessed that >> you want the GC to happen between these two statements: >> >> p.setUsageThreshold(1); >> MemoryUsage u = p.getUsage(); > > This is all but a heuristic here. The problem lies in the fact that it > is not possible to retrieve the pool usage and the "threshold > exceeded" flag consistently in one atomic operation. I might get > usable data from the first call and then I don't need to force GC. Right. This is why I think you want to avoid a GC after you have fetched getUsage() but before you do isUsageThresholdExceeded(). With your suggested patch you are explicitly inserting a GC at that point. To me this sounds like the opposite of what you want to do. > >> >> Now you have added the GC just after these statements. I thought that >> was what caused the problem. That you read the usage data at one point, >> then a GC happens and you compare the cached usage >> data to the new data that p.isUsageThresholdExceeded() will fetch. >> >> Looking at the promoteToOldGen() method I assume that the intent is that >> the code should be using the return value. So my guess is that this >> code: >> >> 94 if (p.getName().equals("CMS Old Gen")) { >> 95 promoteToOldGen(p, u); >> 96 } >> >> Should be: >> >> 94 if (p.getName().equals("CMS Old Gen")) { >> 95 u = promoteToOldGen(p, u); >> 96 } > > Indeed. It was meant to re-fetch the usage after GC. OK. Good. With this code I think it should work. Now you make sure to get the GC before you do getUsage(). > >> >> With that, I think it might work. But I still don't understand why this >> is only a CMS problem. >> >> One more question about the promoteToOldGen() and forceGC() methods. I >> don't really know much about how the different beans work, but are we >> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same >> pool names? That is, are you sure that forceGC() actually will do >> anything? > > They use the pool names as reported by the GC infrastracture so they > should be the same. Ok. > >> >> As for just doing a System.gc() to force a GC I think you can rely on >> that System.gc() does a full GC in Hotspot unless someone sets >> -XX:+DisableExplicitGC on the command line. Considering that you are >> relying on Hotspot specifc names for pools I don't think it is a >> limitation to the test to rely on the Hotspot implementatoin of >> System.gc(). > > Good to know. I guess I could simplify the change and just call > System.gc(), after all. Yes, I think that' simpler. Thanks, Bengt > > Thanks, > > -JB- > >> >> Thanks, >> Bengt >> >> >> >> >> On 2013-10-23 10:18, Staffan Larsen wrote: >>> On 23 okt 2013, at 10:12, Jaroslav Bachorik >>> wrote: >>> >>>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>>> I think you can simplify the logic for forcing a GC to just a simple >>>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>>> collection to happen for all collectors. >>>> Hm, will it now? I had the impression that it was just hinting the GC >>>> system to perform GC but it might decide to ignore it. I need to be >>>> sure that the GC was performed before continuing - otherwise I might >>>> get inconsistent data again. >>> According to the spec it's just a hint, but I think the implementation >>> happens to be a force. But better safe than sorry. :) >>> >>> /Staffan >>> >>>> -JB- >>>> >>>>> /Staffan >>>>> >>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>>> wrote: >>>>> >>>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>>> Hi Jaroslav, >>>>>>> >>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>>> Please, review the following test fix: >>>>>>>> >>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>>> >>>>>>> Have you considered to force GC when getUsed() == 0 regardless of >>>>>>> which >>>>>>> memory pool it is? This will avoid special casing for CMS old >>>>>>> gen in >>>>>>> the test and will handle similar issue in the future for a >>>>>>> different >>>>>>> collector implementation. To make the test reliable, the test >>>>>>> should >>>>>>> still pass if the memory pool has no object in it (G1 survivor >>>>>>> space >>>>>>> case?). >>>>>> Hi Mandy, >>>>>> >>>>>> I don't know whether GC will help for other pools - but I can >>>>>> enable it for all pools - it should not hurt. >>>>>> >>>>>> The test should pass even with on object in the monitored pool >>>>>> since the pool should not report an exceeded threshold. >>>>>> >>>>>> -JB- >>>>>> >>>>>>> Mandy >>>>>>> >>>>>>>> The test tries to make sure that the "pool usage threshold" >>>>>>>> trigger >>>>>>>> and the reported pool memory usage are not contradicting each >>>>>>>> other. >>>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>>> regard >>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined >>>>>>>> and the >>>>>>>> usage is retrieved before a GC promotes some objects to the old >>>>>>>> gen >>>>>>>> but the usage threshold is checked after the GC has promoted some >>>>>>>> instance into the old gen the test will fail. >>>>>>>> >>>>>>>> The patch makes sure that there are some instances promoted in >>>>>>>> "CMS >>>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>>> semi-consistent view. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> -JB- >> > From jaroslav.bachorik at oracle.com Wed Oct 23 07:32:28 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 16:32:28 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267CC00.7080509@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> Message-ID: <5267DDFC.4060607@oracle.com> On 23.10.2013 15:15, Bengt Rutisson wrote: > > On 2013-10-23 14:55, Jaroslav Bachorik wrote: >> Hi Bengt, >> >> On 23.10.2013 14:40, Bengt Rutisson wrote: >>> >>> Hi Jaroslav, >>> >>> A couple of questions. >>> >>> I don't understand why this is a CMS only problem? Why don't the other >>> collectors have the same issue? I guess it is less likely that the other >>> collectors start (or complete) a GC without a lot of allocation going >>> on. But at least G1 should have the same problem. >> >> I don't really know. If there are other pools that can have the "used" >> value 0 before a GC happens then yes, they are susceptible to the same >> problem. > > I think all the "old" pools can have 0 used before a GC happens. But > except for CMS and G1 it is less likely that a GC happens unless you do > allocations. As long as they keep the 0 used the test will pass. So, my > guess is that to be on the safe side all "old" pools should make sure to > do a full GC first. > >> >>> >>> Also, from the problem description in the CR I would have guessed that >>> you want the GC to happen between these two statements: >>> >>> p.setUsageThreshold(1); >>> MemoryUsage u = p.getUsage(); >> >> This is all but a heuristic here. The problem lies in the fact that it >> is not possible to retrieve the pool usage and the "threshold >> exceeded" flag consistently in one atomic operation. I might get >> usable data from the first call and then I don't need to force GC. > > Right. This is why I think you want to avoid a GC after you have fetched > getUsage() but before you do isUsageThresholdExceeded(). With your > suggested patch you are explicitly inserting a GC at that point. To me > this sounds like the opposite of what you want to do. I've updated the patch. The GC is called even before the first attempt to get the pool memory usage and System.gc() is used to perform GC (no MXBean checks). This should simplify the change a bit. http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 -JB- > >> >>> >>> Now you have added the GC just after these statements. I thought that >>> was what caused the problem. That you read the usage data at one point, >>> then a GC happens and you compare the cached usage >>> data to the new data that p.isUsageThresholdExceeded() will fetch. >>> >>> Looking at the promoteToOldGen() method I assume that the intent is that >>> the code should be using the return value. So my guess is that this >>> code: >>> >>> 94 if (p.getName().equals("CMS Old Gen")) { >>> 95 promoteToOldGen(p, u); >>> 96 } >>> >>> Should be: >>> >>> 94 if (p.getName().equals("CMS Old Gen")) { >>> 95 u = promoteToOldGen(p, u); >>> 96 } >> >> Indeed. It was meant to re-fetch the usage after GC. > > OK. Good. With this code I think it should work. Now you make sure to > get the GC before you do getUsage(). > >> >>> >>> With that, I think it might work. But I still don't understand why this >>> is only a CMS problem. >>> >>> One more question about the promoteToOldGen() and forceGC() methods. I >>> don't really know much about how the different beans work, but are we >>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the same >>> pool names? That is, are you sure that forceGC() actually will do >>> anything? >> >> They use the pool names as reported by the GC infrastracture so they >> should be the same. > > Ok. > >> >>> >>> As for just doing a System.gc() to force a GC I think you can rely on >>> that System.gc() does a full GC in Hotspot unless someone sets >>> -XX:+DisableExplicitGC on the command line. Considering that you are >>> relying on Hotspot specifc names for pools I don't think it is a >>> limitation to the test to rely on the Hotspot implementatoin of >>> System.gc(). >> >> Good to know. I guess I could simplify the change and just call >> System.gc(), after all. > > Yes, I think that' simpler. > > Thanks, > Bengt > >> >> Thanks, >> >> -JB- >> >>> >>> Thanks, >>> Bengt >>> >>> >>> >>> >>> On 2013-10-23 10:18, Staffan Larsen wrote: >>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik >>>> wrote: >>>> >>>>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>>>> I think you can simplify the logic for forcing a GC to just a simple >>>>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>>>> collection to happen for all collectors. >>>>> Hm, will it now? I had the impression that it was just hinting the GC >>>>> system to perform GC but it might decide to ignore it. I need to be >>>>> sure that the GC was performed before continuing - otherwise I might >>>>> get inconsistent data again. >>>> According to the spec it's just a hint, but I think the implementation >>>> happens to be a force. But better safe than sorry. :) >>>> >>>> /Staffan >>>> >>>>> -JB- >>>>> >>>>>> /Staffan >>>>>> >>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>>>> wrote: >>>>>> >>>>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>>>> Hi Jaroslav, >>>>>>>> >>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>>>> Please, review the following test fix: >>>>>>>>> >>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>>>> >>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of >>>>>>>> which >>>>>>>> memory pool it is? This will avoid special casing for CMS old >>>>>>>> gen in >>>>>>>> the test and will handle similar issue in the future for a >>>>>>>> different >>>>>>>> collector implementation. To make the test reliable, the test >>>>>>>> should >>>>>>>> still pass if the memory pool has no object in it (G1 survivor >>>>>>>> space >>>>>>>> case?). >>>>>>> Hi Mandy, >>>>>>> >>>>>>> I don't know whether GC will help for other pools - but I can >>>>>>> enable it for all pools - it should not hurt. >>>>>>> >>>>>>> The test should pass even with on object in the monitored pool >>>>>>> since the pool should not report an exceeded threshold. >>>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> Mandy >>>>>>>> >>>>>>>>> The test tries to make sure that the "pool usage threshold" >>>>>>>>> trigger >>>>>>>>> and the reported pool memory usage are not contradicting each >>>>>>>>> other. >>>>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>>>> regard >>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined >>>>>>>>> and the >>>>>>>>> usage is retrieved before a GC promotes some objects to the old >>>>>>>>> gen >>>>>>>>> but the usage threshold is checked after the GC has promoted some >>>>>>>>> instance into the old gen the test will fail. >>>>>>>>> >>>>>>>>> The patch makes sure that there are some instances promoted in >>>>>>>>> "CMS >>>>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>>>> semi-consistent view. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> -JB- >>> >> > From bengt.rutisson at oracle.com Wed Oct 23 07:43:30 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Wed, 23 Oct 2013 16:43:30 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267DDFC.4060607@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> Message-ID: <5267E092.6090006@oracle.com> Hi Jaroslav, On 2013-10-23 16:32, Jaroslav Bachorik wrote: > On 23.10.2013 15:15, Bengt Rutisson wrote: >> >> On 2013-10-23 14:55, Jaroslav Bachorik wrote: >>> Hi Bengt, >>> >>> On 23.10.2013 14:40, Bengt Rutisson wrote: >>>> >>>> Hi Jaroslav, >>>> >>>> A couple of questions. >>>> >>>> I don't understand why this is a CMS only problem? Why don't the other >>>> collectors have the same issue? I guess it is less likely that the >>>> other >>>> collectors start (or complete) a GC without a lot of allocation going >>>> on. But at least G1 should have the same problem. >>> >>> I don't really know. If there are other pools that can have the "used" >>> value 0 before a GC happens then yes, they are susceptible to the same >>> problem. >> >> I think all the "old" pools can have 0 used before a GC happens. But >> except for CMS and G1 it is less likely that a GC happens unless you do >> allocations. As long as they keep the 0 used the test will pass. So, my >> guess is that to be on the safe side all "old" pools should make sure to >> do a full GC first. >> >>> >>>> >>>> Also, from the problem description in the CR I would have guessed that >>>> you want the GC to happen between these two statements: >>>> >>>> p.setUsageThreshold(1); >>>> MemoryUsage u = p.getUsage(); >>> >>> This is all but a heuristic here. The problem lies in the fact that it >>> is not possible to retrieve the pool usage and the "threshold >>> exceeded" flag consistently in one atomic operation. I might get >>> usable data from the first call and then I don't need to force GC. >> >> Right. This is why I think you want to avoid a GC after you have fetched >> getUsage() but before you do isUsageThresholdExceeded(). With your >> suggested patch you are explicitly inserting a GC at that point. To me >> this sounds like the opposite of what you want to do. > > I've updated the patch. The GC is called even before the first attempt > to get the pool memory usage and System.gc() is used to perform GC (no > MXBean checks). This should simplify the change a bit. > > http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 Thanks for doing this update so quickly! Have you been able to verify that this change still fixes the issue? I think it should, but it would be good if we could verify it. This code worries me a little bit: 114 private static MemoryUsage getUsage(MemoryPoolMXBean p) { 115 MemoryUsage u = null; 116 do { 117 System.gc(); 118 u = p.getUsage(); 119 } while (u.getUsed() == 0); 120 return u; 121 } I think one call to System.gc() should be enough. And if it is not, if we still get 0 as used, I think that another System.gc() call will just render the same result. Thus, I'm a bit worried that this will be an endless loop. Since the test actually handles the case where used is 0, I think it is enough to just do a single call to System.gc() and then get the usage data. Thanks, Bengt > > -JB- > >> >>> >>>> >>>> Now you have added the GC just after these statements. I thought that >>>> was what caused the problem. That you read the usage data at one >>>> point, >>>> then a GC happens and you compare the cached usage >>>> data to the new data that p.isUsageThresholdExceeded() will fetch. >>>> >>>> Looking at the promoteToOldGen() method I assume that the intent is >>>> that >>>> the code should be using the return value. So my guess is that this >>>> code: >>>> >>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>> 95 promoteToOldGen(p, u); >>>> 96 } >>>> >>>> Should be: >>>> >>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>> 95 u = promoteToOldGen(p, u); >>>> 96 } >>> >>> Indeed. It was meant to re-fetch the usage after GC. >> >> OK. Good. With this code I think it should work. Now you make sure to >> get the GC before you do getUsage(). >> >>> >>>> >>>> With that, I think it might work. But I still don't understand why >>>> this >>>> is only a CMS problem. >>>> >>>> One more question about the promoteToOldGen() and forceGC() methods. I >>>> don't really know much about how the different beans work, but are we >>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the >>>> same >>>> pool names? That is, are you sure that forceGC() actually will do >>>> anything? >>> >>> They use the pool names as reported by the GC infrastracture so they >>> should be the same. >> >> Ok. >> >>> >>>> >>>> As for just doing a System.gc() to force a GC I think you can rely on >>>> that System.gc() does a full GC in Hotspot unless someone sets >>>> -XX:+DisableExplicitGC on the command line. Considering that you are >>>> relying on Hotspot specifc names for pools I don't think it is a >>>> limitation to the test to rely on the Hotspot implementatoin of >>>> System.gc(). >>> >>> Good to know. I guess I could simplify the change and just call >>> System.gc(), after all. >> >> Yes, I think that' simpler. >> >> Thanks, >> Bengt >> >>> >>> Thanks, >>> >>> -JB- >>> >>>> >>>> Thanks, >>>> Bengt >>>> >>>> >>>> >>>> >>>> On 2013-10-23 10:18, Staffan Larsen wrote: >>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik >>>>> wrote: >>>>> >>>>>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>>>>> I think you can simplify the logic for forcing a GC to just a >>>>>>> simple >>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>>>>> collection to happen for all collectors. >>>>>> Hm, will it now? I had the impression that it was just hinting >>>>>> the GC >>>>>> system to perform GC but it might decide to ignore it. I need to be >>>>>> sure that the GC was performed before continuing - otherwise I might >>>>>> get inconsistent data again. >>>>> According to the spec it's just a hint, but I think the >>>>> implementation >>>>> happens to be a force. But better safe than sorry. :) >>>>> >>>>> /Staffan >>>>> >>>>>> -JB- >>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>>>>> Hi Jaroslav, >>>>>>>>> >>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>>>>> Please, review the following test fix: >>>>>>>>>> >>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>>>>> >>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of >>>>>>>>> which >>>>>>>>> memory pool it is? This will avoid special casing for CMS old >>>>>>>>> gen in >>>>>>>>> the test and will handle similar issue in the future for a >>>>>>>>> different >>>>>>>>> collector implementation. To make the test reliable, the test >>>>>>>>> should >>>>>>>>> still pass if the memory pool has no object in it (G1 survivor >>>>>>>>> space >>>>>>>>> case?). >>>>>>>> Hi Mandy, >>>>>>>> >>>>>>>> I don't know whether GC will help for other pools - but I can >>>>>>>> enable it for all pools - it should not hurt. >>>>>>>> >>>>>>>> The test should pass even with on object in the monitored pool >>>>>>>> since the pool should not report an exceeded threshold. >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> Mandy >>>>>>>>> >>>>>>>>>> The test tries to make sure that the "pool usage threshold" >>>>>>>>>> trigger >>>>>>>>>> and the reported pool memory usage are not contradicting each >>>>>>>>>> other. >>>>>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>>>>> regard >>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined >>>>>>>>>> and the >>>>>>>>>> usage is retrieved before a GC promotes some objects to the old >>>>>>>>>> gen >>>>>>>>>> but the usage threshold is checked after the GC has promoted >>>>>>>>>> some >>>>>>>>>> instance into the old gen the test will fail. >>>>>>>>>> >>>>>>>>>> The patch makes sure that there are some instances promoted in >>>>>>>>>> "CMS >>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>>>>> semi-consistent view. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> >>>>>>>>>> -JB- >>>> >>> >> > From jaroslav.bachorik at oracle.com Wed Oct 23 08:07:13 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 23 Oct 2013 17:07:13 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267E092.6090006@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <5267E092.6090006@oracle.com> Message-ID: <5267E621.2040601@oracle.com> On 23.10.2013 16:43, Bengt Rutisson wrote: > > Hi Jaroslav, > > On 2013-10-23 16:32, Jaroslav Bachorik wrote: >> On 23.10.2013 15:15, Bengt Rutisson wrote: >>> >>> On 2013-10-23 14:55, Jaroslav Bachorik wrote: >>>> Hi Bengt, >>>> >>>> On 23.10.2013 14:40, Bengt Rutisson wrote: >>>>> >>>>> Hi Jaroslav, >>>>> >>>>> A couple of questions. >>>>> >>>>> I don't understand why this is a CMS only problem? Why don't the other >>>>> collectors have the same issue? I guess it is less likely that the >>>>> other >>>>> collectors start (or complete) a GC without a lot of allocation going >>>>> on. But at least G1 should have the same problem. >>>> >>>> I don't really know. If there are other pools that can have the "used" >>>> value 0 before a GC happens then yes, they are susceptible to the same >>>> problem. >>> >>> I think all the "old" pools can have 0 used before a GC happens. But >>> except for CMS and G1 it is less likely that a GC happens unless you do >>> allocations. As long as they keep the 0 used the test will pass. So, my >>> guess is that to be on the safe side all "old" pools should make sure to >>> do a full GC first. >>> >>>> >>>>> >>>>> Also, from the problem description in the CR I would have guessed that >>>>> you want the GC to happen between these two statements: >>>>> >>>>> p.setUsageThreshold(1); >>>>> MemoryUsage u = p.getUsage(); >>>> >>>> This is all but a heuristic here. The problem lies in the fact that it >>>> is not possible to retrieve the pool usage and the "threshold >>>> exceeded" flag consistently in one atomic operation. I might get >>>> usable data from the first call and then I don't need to force GC. >>> >>> Right. This is why I think you want to avoid a GC after you have fetched >>> getUsage() but before you do isUsageThresholdExceeded(). With your >>> suggested patch you are explicitly inserting a GC at that point. To me >>> this sounds like the opposite of what you want to do. >> >> I've updated the patch. The GC is called even before the first attempt >> to get the pool memory usage and System.gc() is used to perform GC (no >> MXBean checks). This should simplify the change a bit. >> >> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 > > Thanks for doing this update so quickly! > > Have you been able to verify that this change still fixes the issue? I > think it should, but it would be good if we could verify it. Yep, it still fixes the problem. Unfortunatelly, the only way to reproduce the problem locally is to run the test under debugger and invoke GC explicitly between getting the pool memory usage and threshold flag. > > This code worries me a little bit: > > 114 private static MemoryUsage getUsage(MemoryPoolMXBean p) { > 115 MemoryUsage u = null; > 116 do { > 117 System.gc(); > 118 u = p.getUsage(); > 119 } while (u.getUsed() == 0); > 120 return u; > 121 } > > I think one call to System.gc() should be enough. And if it is not, if > we still get 0 as used, I think that another System.gc() call will just > render the same result. Thus, I'm a bit worried that this will be an > endless loop. Sounds reasonable. My motivation was to try to make sure some objects are promoted to old gen but it seems redundant and in case of non-oldgen pools might not even work :( > > Since the test actually handles the case where used is 0, I think it is > enough to just do a single call to System.gc() and then get the usage data. Hm, this makes the patch even simpler ... http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03 -JB- > > Thanks, > Bengt > > >> >> -JB- >> >>> >>>> >>>>> >>>>> Now you have added the GC just after these statements. I thought that >>>>> was what caused the problem. That you read the usage data at one >>>>> point, >>>>> then a GC happens and you compare the cached usage >>>>> data to the new data that p.isUsageThresholdExceeded() will fetch. >>>>> >>>>> Looking at the promoteToOldGen() method I assume that the intent is >>>>> that >>>>> the code should be using the return value. So my guess is that this >>>>> code: >>>>> >>>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>>> 95 promoteToOldGen(p, u); >>>>> 96 } >>>>> >>>>> Should be: >>>>> >>>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>>> 95 u = promoteToOldGen(p, u); >>>>> 96 } >>>> >>>> Indeed. It was meant to re-fetch the usage after GC. >>> >>> OK. Good. With this code I think it should work. Now you make sure to >>> get the GC before you do getUsage(). >>> >>>> >>>>> >>>>> With that, I think it might work. But I still don't understand why >>>>> this >>>>> is only a CMS problem. >>>>> >>>>> One more question about the promoteToOldGen() and forceGC() methods. I >>>>> don't really know much about how the different beans work, but are we >>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the >>>>> same >>>>> pool names? That is, are you sure that forceGC() actually will do >>>>> anything? >>>> >>>> They use the pool names as reported by the GC infrastracture so they >>>> should be the same. >>> >>> Ok. >>> >>>> >>>>> >>>>> As for just doing a System.gc() to force a GC I think you can rely on >>>>> that System.gc() does a full GC in Hotspot unless someone sets >>>>> -XX:+DisableExplicitGC on the command line. Considering that you are >>>>> relying on Hotspot specifc names for pools I don't think it is a >>>>> limitation to the test to rely on the Hotspot implementatoin of >>>>> System.gc(). >>>> >>>> Good to know. I guess I could simplify the change and just call >>>> System.gc(), after all. >>> >>> Yes, I think that' simpler. >>> >>> Thanks, >>> Bengt >>> >>>> >>>> Thanks, >>>> >>>> -JB- >>>> >>>>> >>>>> Thanks, >>>>> Bengt >>>>> >>>>> >>>>> >>>>> >>>>> On 2013-10-23 10:18, Staffan Larsen wrote: >>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik >>>>>> wrote: >>>>>> >>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>>>>>> I think you can simplify the logic for forcing a GC to just a >>>>>>>> simple >>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>>>>>> collection to happen for all collectors. >>>>>>> Hm, will it now? I had the impression that it was just hinting >>>>>>> the GC >>>>>>> system to perform GC but it might decide to ignore it. I need to be >>>>>>> sure that the GC was performed before continuing - otherwise I might >>>>>>> get inconsistent data again. >>>>>> According to the spec it's just a hint, but I think the >>>>>> implementation >>>>>> happens to be a force. But better safe than sorry. :) >>>>>> >>>>>> /Staffan >>>>>> >>>>>>> -JB- >>>>>>> >>>>>>>> /Staffan >>>>>>>> >>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>>>>>> wrote: >>>>>>>> >>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>>>>>> Hi Jaroslav, >>>>>>>>>> >>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>>>>>> Please, review the following test fix: >>>>>>>>>>> >>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>>>>>> >>>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of >>>>>>>>>> which >>>>>>>>>> memory pool it is? This will avoid special casing for CMS old >>>>>>>>>> gen in >>>>>>>>>> the test and will handle similar issue in the future for a >>>>>>>>>> different >>>>>>>>>> collector implementation. To make the test reliable, the test >>>>>>>>>> should >>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor >>>>>>>>>> space >>>>>>>>>> case?). >>>>>>>>> Hi Mandy, >>>>>>>>> >>>>>>>>> I don't know whether GC will help for other pools - but I can >>>>>>>>> enable it for all pools - it should not hurt. >>>>>>>>> >>>>>>>>> The test should pass even with on object in the monitored pool >>>>>>>>> since the pool should not report an exceeded threshold. >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> Mandy >>>>>>>>>> >>>>>>>>>>> The test tries to make sure that the "pool usage threshold" >>>>>>>>>>> trigger >>>>>>>>>>> and the reported pool memory usage are not contradicting each >>>>>>>>>>> other. >>>>>>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>>>>>> regard >>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined >>>>>>>>>>> and the >>>>>>>>>>> usage is retrieved before a GC promotes some objects to the old >>>>>>>>>>> gen >>>>>>>>>>> but the usage threshold is checked after the GC has promoted >>>>>>>>>>> some >>>>>>>>>>> instance into the old gen the test will fail. >>>>>>>>>>> >>>>>>>>>>> The patch makes sure that there are some instances promoted in >>>>>>>>>>> "CMS >>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>>>>>> semi-consistent view. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> >>>>>>>>>>> -JB- >>>>> >>>> >>> >> > From bengt.rutisson at oracle.com Wed Oct 23 08:31:18 2013 From: bengt.rutisson at oracle.com (Bengt Rutisson) Date: Wed, 23 Oct 2013 17:31:18 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267E621.2040601@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <5267E092.6090006@oracle.com> <5267E621.2040601@oracle.com> Message-ID: <5267EBC6.8010609@oracle.com> Hi again Jaroslav, On 2013-10-23 17:07, Jaroslav Bachorik wrote: > On 23.10.2013 16:43, Bengt Rutisson wrote: >> >> Hi Jaroslav, >> >> On 2013-10-23 16:32, Jaroslav Bachorik wrote: >>> On 23.10.2013 15:15, Bengt Rutisson wrote: >>>> >>>> On 2013-10-23 14:55, Jaroslav Bachorik wrote: >>>>> Hi Bengt, >>>>> >>>>> On 23.10.2013 14:40, Bengt Rutisson wrote: >>>>>> >>>>>> Hi Jaroslav, >>>>>> >>>>>> A couple of questions. >>>>>> >>>>>> I don't understand why this is a CMS only problem? Why don't the >>>>>> other >>>>>> collectors have the same issue? I guess it is less likely that the >>>>>> other >>>>>> collectors start (or complete) a GC without a lot of allocation >>>>>> going >>>>>> on. But at least G1 should have the same problem. >>>>> >>>>> I don't really know. If there are other pools that can have the >>>>> "used" >>>>> value 0 before a GC happens then yes, they are susceptible to the >>>>> same >>>>> problem. >>>> >>>> I think all the "old" pools can have 0 used before a GC happens. But >>>> except for CMS and G1 it is less likely that a GC happens unless >>>> you do >>>> allocations. As long as they keep the 0 used the test will pass. >>>> So, my >>>> guess is that to be on the safe side all "old" pools should make >>>> sure to >>>> do a full GC first. >>>> >>>>> >>>>>> >>>>>> Also, from the problem description in the CR I would have guessed >>>>>> that >>>>>> you want the GC to happen between these two statements: >>>>>> >>>>>> p.setUsageThreshold(1); >>>>>> MemoryUsage u = p.getUsage(); >>>>> >>>>> This is all but a heuristic here. The problem lies in the fact >>>>> that it >>>>> is not possible to retrieve the pool usage and the "threshold >>>>> exceeded" flag consistently in one atomic operation. I might get >>>>> usable data from the first call and then I don't need to force GC. >>>> >>>> Right. This is why I think you want to avoid a GC after you have >>>> fetched >>>> getUsage() but before you do isUsageThresholdExceeded(). With your >>>> suggested patch you are explicitly inserting a GC at that point. To me >>>> this sounds like the opposite of what you want to do. >>> >>> I've updated the patch. The GC is called even before the first attempt >>> to get the pool memory usage and System.gc() is used to perform GC (no >>> MXBean checks). This should simplify the change a bit. >>> >>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 >> >> Thanks for doing this update so quickly! >> >> Have you been able to verify that this change still fixes the issue? I >> think it should, but it would be good if we could verify it. > > Yep, it still fixes the problem. Unfortunatelly, the only way to > reproduce the problem locally is to run the test under debugger and > invoke GC explicitly between getting the pool memory usage and > threshold flag. > >> >> This code worries me a little bit: >> >> 114 private static MemoryUsage getUsage(MemoryPoolMXBean p) { >> 115 MemoryUsage u = null; >> 116 do { >> 117 System.gc(); >> 118 u = p.getUsage(); >> 119 } while (u.getUsed() == 0); >> 120 return u; >> 121 } >> >> I think one call to System.gc() should be enough. And if it is not, if >> we still get 0 as used, I think that another System.gc() call will just >> render the same result. Thus, I'm a bit worried that this will be an >> endless loop. > > Sounds reasonable. My motivation was to try to make sure some objects > are promoted to old gen but it seems redundant and in case of > non-oldgen pools might not even work :( > >> >> Since the test actually handles the case where used is 0, I think it is >> enough to just do a single call to System.gc() and then get the usage >> data. > > Hm, this makes the patch even simpler ... > http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03 Yes, I think this looks simple and good. :-) Thanks, Bengt > > -JB- > > >> >> Thanks, >> Bengt >> >> >>> >>> -JB- >>> >>>> >>>>> >>>>>> >>>>>> Now you have added the GC just after these statements. I thought >>>>>> that >>>>>> was what caused the problem. That you read the usage data at one >>>>>> point, >>>>>> then a GC happens and you compare the cached usage >>>>>> data to the new data that p.isUsageThresholdExceeded() will fetch. >>>>>> >>>>>> Looking at the promoteToOldGen() method I assume that the intent is >>>>>> that >>>>>> the code should be using the return value. So my guess is that this >>>>>> code: >>>>>> >>>>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>>>> 95 promoteToOldGen(p, u); >>>>>> 96 } >>>>>> >>>>>> Should be: >>>>>> >>>>>> 94 if (p.getName().equals("CMS Old Gen")) { >>>>>> 95 u = promoteToOldGen(p, u); >>>>>> 96 } >>>>> >>>>> Indeed. It was meant to re-fetch the usage after GC. >>>> >>>> OK. Good. With this code I think it should work. Now you make sure to >>>> get the GC before you do getUsage(). >>>> >>>>> >>>>>> >>>>>> With that, I think it might work. But I still don't understand why >>>>>> this >>>>>> is only a CMS problem. >>>>>> >>>>>> One more question about the promoteToOldGen() and forceGC() >>>>>> methods. I >>>>>> don't really know much about how the different beans work, but >>>>>> are we >>>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the >>>>>> same >>>>>> pool names? That is, are you sure that forceGC() actually will do >>>>>> anything? >>>>> >>>>> They use the pool names as reported by the GC infrastracture so they >>>>> should be the same. >>>> >>>> Ok. >>>> >>>>> >>>>>> >>>>>> As for just doing a System.gc() to force a GC I think you can >>>>>> rely on >>>>>> that System.gc() does a full GC in Hotspot unless someone sets >>>>>> -XX:+DisableExplicitGC on the command line. Considering that you are >>>>>> relying on Hotspot specifc names for pools I don't think it is a >>>>>> limitation to the test to rely on the Hotspot implementatoin of >>>>>> System.gc(). >>>>> >>>>> Good to know. I guess I could simplify the change and just call >>>>> System.gc(), after all. >>>> >>>> Yes, I think that' simpler. >>>> >>>> Thanks, >>>> Bengt >>>> >>>>> >>>>> Thanks, >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> Thanks, >>>>>> Bengt >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 2013-10-23 10:18, Staffan Larsen wrote: >>>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik >>>>>>> wrote: >>>>>>> >>>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote: >>>>>>>>> I think you can simplify the logic for forcing a GC to just a >>>>>>>>> simple >>>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full >>>>>>>>> collection to happen for all collectors. >>>>>>>> Hm, will it now? I had the impression that it was just hinting >>>>>>>> the GC >>>>>>>> system to perform GC but it might decide to ignore it. I need >>>>>>>> to be >>>>>>>> sure that the GC was performed before continuing - otherwise I >>>>>>>> might >>>>>>>> get inconsistent data again. >>>>>>> According to the spec it's just a hint, but I think the >>>>>>> implementation >>>>>>> happens to be a force. But better safe than sorry. :) >>>>>>> >>>>>>> /Staffan >>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> /Staffan >>>>>>>>> >>>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote: >>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>> >>>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>> Please, review the following test fix: >>>>>>>>>>>> >>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467 >>>>>>>>>>>> Webrev: >>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01 >>>>>>>>>>>> >>>>>>>>>>> Have you considered to force GC when getUsed() == 0 >>>>>>>>>>> regardless of >>>>>>>>>>> which >>>>>>>>>>> memory pool it is? This will avoid special casing for CMS old >>>>>>>>>>> gen in >>>>>>>>>>> the test and will handle similar issue in the future for a >>>>>>>>>>> different >>>>>>>>>>> collector implementation. To make the test reliable, the test >>>>>>>>>>> should >>>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor >>>>>>>>>>> space >>>>>>>>>>> case?). >>>>>>>>>> Hi Mandy, >>>>>>>>>> >>>>>>>>>> I don't know whether GC will help for other pools - but I can >>>>>>>>>> enable it for all pools - it should not hurt. >>>>>>>>>> >>>>>>>>>> The test should pass even with on object in the monitored pool >>>>>>>>>> since the pool should not report an exceeded threshold. >>>>>>>>>> >>>>>>>>>> -JB- >>>>>>>>>> >>>>>>>>>>> Mandy >>>>>>>>>>> >>>>>>>>>>>> The test tries to make sure that the "pool usage threshold" >>>>>>>>>>>> trigger >>>>>>>>>>>> and the reported pool memory usage are not contradicting each >>>>>>>>>>>> other. >>>>>>>>>>>> The problem is that it is not possible to get the "pool usage >>>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in >>>>>>>>>>>> regard >>>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined >>>>>>>>>>>> and the >>>>>>>>>>>> usage is retrieved before a GC promotes some objects to the >>>>>>>>>>>> old >>>>>>>>>>>> gen >>>>>>>>>>>> but the usage threshold is checked after the GC has promoted >>>>>>>>>>>> some >>>>>>>>>>>> instance into the old gen the test will fail. >>>>>>>>>>>> >>>>>>>>>>>> The patch makes sure that there are some instances promoted in >>>>>>>>>>>> "CMS >>>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get >>>>>>>>>>>> semi-consistent view. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>> >>>>> >>>> >>> >> > From mandy.chung at oracle.com Wed Oct 23 16:02:09 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 23 Oct 2013 16:02:09 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5267DDFC.4060607@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> Message-ID: <52685571.1090407@oracle.com> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote: > I've updated the patch. The GC is called even before the first attempt > to get the pool memory usage and System.gc() is used to perform GC (no > MXBean checks). This should simplify the change a bit. > > http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 This change is okay. It will force GC once per each memory pool that supports usage threshold (I think 3 memory pools) which is not a huge issue. Perhaps a more reliable option is to make it an othervm test and allocating large object and calling GC once before the verification. Mandy From jaroslav.bachorik at oracle.com Thu Oct 24 07:01:43 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 24 Oct 2013 16:01:43 +0200 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52685571.1090407@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> Message-ID: <52692847.1030806@oracle.com> Hi Mandy, On 24.10.2013 01:02, Mandy Chung wrote: > > On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote: >> I've updated the patch. The GC is called even before the first attempt >> to get the pool memory usage and System.gc() is used to perform GC (no >> MXBean checks). This should simplify the change a bit. >> >> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 > > This change is okay. It will force GC once per each memory pool that > supports usage threshold (I think 3 memory pools) which is not a huge > issue. Perhaps a more reliable option is to make it an othervm test and > allocating large object and calling GC once before the verification. Running it as othervm might improve repeatbility but I don't quite follow the trick with large object. That would be effective for the oldgen pools only, I suppose? There were concerns raised during the review that other pools might be susceptible to the same timing related problems (theoretically). So, if you don't feel strongy about it I would leave the rest of the test as it is - that is calling System.gc() before checking the pool thresholds. -JB- > > Mandy From jaroslav.bachorik at oracle.com Thu Oct 24 07:10:12 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 24 Oct 2013 16:10:12 +0200 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <52562FF5.5060304@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com> <52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com> <52562FF5.5060304@oracle.com> Message-ID: <52692A44.9050004@oracle.com> Hi David, On 10.10.2013 06:41, David Holmes wrote: > On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote: >> On 9.10.2013 12:23, David Holmes wrote: >>> Jaroslav, >>> >>> Thanks for the details description of changes - much appreciated. >>> >>> There is a lot to digest in there. :) >> >> Yep, it started as a simple fix :/ >> >>> >>> It isn't obvious to me why these tests require a full JDK? >> >> IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its >> requirements are the same as the ones of CustomLauncherTest.sh (now >> *.java) so it seemed logical to list it there too. > > Ah! Now I see it - it uses tools.jar which implies a full JDK. > >>> >>> I don't quite follow the libjvm lookup logic - I would expect that you >>> would always want to test the libjvm that is currently running - though >>> it is hard to determine that. >> >> I'm afraid I can't be of much assistance here - I just took what was in >> the *.sh version and converted it to *.java. > > Okay. I expect this will need revisiting at some point. So, does this mean "ok, go"? Thanks, -JB- > > Thanks, > David > ----- > > >> -JB- >> >>> >>> Thanks, >>> David >>> >>> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote: >>>> On 8.10.2013 05:42, David Holmes wrote: >>>>> Jaroslav, >>>>> >>>>> Can you summarise the changes please? With the conversion to Java and >>>>> the infrastructure additions I can't tell what is actually fixing the >>>>> original timeout issue :) >>>> >>>> The timeout was most caused by using the same file for communication >>>> between java processes in more test cases. When those test cases were >>>> run in parallel the file got rewritten silently and some of the tests >>>> could end up trying to connect to incorrect port in the target >>>> application. I was able to reproduce the timeout by interleaving the >>>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and >>>> adding an artificial delay to CusteomLauncherTest.sh to allow >>>> LocalManagementTest.sh to change the port in the file. >>>> >>>> While it could be fixed by using a different file for each test case I >>>> took the liberty of converting the shell tests to java tests. This >>>> allows me to remove the communication file and, in the end, make the >>>> tests more robust. >>>> >>>> CustomLauncherTest.java and LocalManagementTest.java are the tests >>>> converted from shell to java. I decided to convert >>>> LocalManagementTest.sh as well because it has the same problems as the >>>> CustomLauncherTest.sh. >>>> >>>> The changes in the testlibrary are about introducing new methods >>>> allowing the tests easily start a process and wait for a certain text >>>> appearing in its stdout/stderr. Using these methods the caller can wait >>>> till the callee is fully initialized and eg. ready to accept >>>> connections. >>>> >>>> The changes in launchers make the launchers actually executable + I am >>>> adding a linux-amd64 launcher (I needed that one to work on the changes >>>> locally and thought it might be nice to have one more platform covered >>>> by the test). >>>> >>>> I've update the webrev to include changes to LocalManagementTest and >>>> TEST.groups (both of those tests require JDK) - >>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 >>>> >>>> -JB- >>>> >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >>>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>>>>> The updated webrev: >>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>>>>> >>>>>>> I've moved some of the functionality to the testlibrary. >>>>>>> >>>>>>> -JB - >>>>>>> >>>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>>>>> Jaroslav, >>>>>>>>> >>>>>>>>> CustomLauncherTest.java: >>>>>>>>> >>>>>>>>> 102: this check could be moved to switch at ll. 108 >>>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM >>>>>>>>> remains >>>>>>>>> unset. >>>>>>>> Good idea. Thanks. >>>>>>>> >>>>>>>>> 129: I would prefer don't have pattern like this one ever in shell >>>>>>>>> script. Could you prepare a list of VM's to check and just loop >>>>>>>>> over >>>>>>>>> it? >>>>>>>>> It makes test better readable. Also I think nowdays we can always >>>>>>>>> use >>>>>>>>> server VM. >>>>>>>> I tried to mirror the original shell test as closely as >>>>>>>> possible. It >>>>>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>>>>> more >>>>>>>> readable. >>>>>>>> >>>>>>>> -JB- >>>>>>>> >>>>>>>>> -Dmitry >>>>>>>>> >>>>>>>>> >>>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>>>> >>>>>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>>>>> >>>>>>>>>>>>> As written this test can only run on a full JDK - so please >>>>>>>>>>>>> add >>>>>>>>>>>>> it to >>>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really >>>>>>>>>>>>> needs to >>>>>>>>>>>>> come >>>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>>>>> >>>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>>>>> need >>>>>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>>>>> inherent in >>>>>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had >>>>>>>>>>>>> the >>>>>>>>>>>>> hotspot >>>>>>>>>>>>> test library's process tools available. :( >>>>>>>>>>>> We have some, is there an obvious gap? >>>>>>>>>>>> >>>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> Hm, thanks for the info. I should have used this library >>>>>>>>>>> instead. >>>>>>>>>>> >>>>>>>>>>> Please, stand by for the updated webrev. >>>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the >>>>>>>>>> target >>>>>>>>>> application can recognize its own PID and print it to its stdout. >>>>>>>>>> The >>>>>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>>>>> need >>>>>>>>>> for JCMD any more. >>>>>>>>>> >>>>>>>>>> I could not find a way to remove the dependency on "test.jdk" >>>>>>>>>> system >>>>>>>>>> property. According to the jtreg web documentation >>>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>>>>> "test.java" >>>>>>>>>> system property should be available but in fact is not. But it >>>>>>>>>> seems >>>>>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>>>>> >>>>>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>>>>> binary :) >>>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>>>>> linux-amd64 launcher while working on the test. >>>>>>>>>> >>>>>>>>>> While working with the test library I realized I was missing a >>>>>>>>>> crucial >>>>>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>>>>> message to >>>>>>>>>> appear in the stdout/stderr of the launched process. Very often I >>>>>>>>>> need >>>>>>>>>> to wait for the target process to get to certain point before the >>>>>>>>>> test >>>>>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>>>>> message in >>>>>>>>>> stdout/stderr. Currently all the proc tools are designed to >>>>>>>>>> work in >>>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings and >>>>>>>>>> analyzed after the target process died - and are not suitable for >>>>>>>>>> this >>>>>>>>>> kind of usage. >>>>>>>>>> >>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>>>>> >>>>>>>>>>> -JB- >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> -Chris. >>>>>>>>>>>> >>>>>>>>>>>>> David >>>>>>>>>>>>> ----- >>>>>>>>>>>>> >>>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>>>>> >>>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>>>>> original >>>>>>>>>>>>>> functionality and outputs should be 100% preserved. The >>>>>>>>>>>>>> patch is >>>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>>>>> similarity >>>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through >>>>>>>>>>>>>> the >>>>>>>>>>>>>> new >>>>>>>>>>>>>> source in whole. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>>>>> permissions to >>>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>>>>> Webrev : >>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>>>>> >>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>> >> From mandy.chung at oracle.com Thu Oct 24 12:33:08 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Thu, 24 Oct 2013 12:33:08 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52692847.1030806@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> Message-ID: <526975F4.8060707@oracle.com> On 10/24/2013 7:01 AM, Jaroslav Bachorik wrote: > Hi Mandy, > > On 24.10.2013 01:02, Mandy Chung wrote: >> >> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote: >>> I've updated the patch. The GC is called even before the first attempt >>> to get the pool memory usage and System.gc() is used to perform GC (no >>> MXBean checks). This should simplify the change a bit. >>> >>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 >> >> This change is okay. It will force GC once per each memory pool that >> supports usage threshold (I think 3 memory pools) which is not a huge >> issue. Perhaps a more reliable option is to make it an othervm test and >> allocating large object and calling GC once before the verification. > > Running it as othervm might improve repeatbility but I don't quite > follow the trick with large object. That would be effective for the > oldgen pools only, I suppose? There were concerns raised during the > review that other pools might be susceptible to the same timing > related problems (theoretically). This test was written before the samevm/agentvm support. In general we want the tests to be reliable. You want the System.gc() call to reduce the probability of the race such that the initially empty pool is being filled with objects between getUsage() and isUsageThresholdExceeded() methods are called but this has the assumption that there is some large object allocated and get promoted to the old gen (not done in this test though). The other possibility is that the old gen is cleared although it might be rare in practice? Holding on a large object will ensure that the old gen is always filled with something to make it more reliable. > So, if you don't feel strongy about it I would leave the rest of the > test as it is - that is calling System.gc() before checking the pool > thresholds. I just worry that this test will fail some day intermittently again. Since in practice the runtime has space allocated, I think running it in othervm would be adequate. Mandy From david.holmes at oracle.com Thu Oct 24 15:54:49 2013 From: david.holmes at oracle.com (David Holmes) Date: Fri, 25 Oct 2013 08:54:49 +1000 Subject: jmx-dev [ping][ping] Re: RFR: 8004926 sun/management/jmxremote/bootstrap/CustomLauncherTest.sh oftenly times out In-Reply-To: <52692A44.9050004@oracle.com> References: <52308ECC.1050304@oracle.com> <523138CB.9040401@oracle.com> <52317782.1060300@oracle.com> <523179C8.50606@oracle.com> <5231CCE4.7060902@oracle.com> <5231DA1B.4070706@oracle.com> <5231DE69.7090309@oracle.com> <523B0B30.4020003@oracle.com> <5252C1B6.2060904@oracle.com> <52537F36.8020001@oracle.com> <5253ED95.20706@oracle.com> <52552EBA.4060308@oracle.com> <52553EAD.4040506@oracle.com> <52562FF5.5060304@oracle.com> <52692A44.9050004@oracle.com> Message-ID: <5269A539.8020401@oracle.com> Good to go. Thanks, David On 25/10/2013 12:10 AM, Jaroslav Bachorik wrote: > Hi David, > > On 10.10.2013 06:41, David Holmes wrote: >> On 9/10/2013 9:31 PM, Jaroslav Bachorik wrote: >>> On 9.10.2013 12:23, David Holmes wrote: >>>> Jaroslav, >>>> >>>> Thanks for the details description of changes - much appreciated. >>>> >>>> There is a lot to digest in there. :) >>> >>> Yep, it started as a simple fix :/ >>> >>>> >>>> It isn't obvious to me why these tests require a full JDK? >>> >>> IDK, LocalManagementTest.sh was listed as one requiring full JDK. Its >>> requirements are the same as the ones of CustomLauncherTest.sh (now >>> *.java) so it seemed logical to list it there too. >> >> Ah! Now I see it - it uses tools.jar which implies a full JDK. >> >>>> >>>> I don't quite follow the libjvm lookup logic - I would expect that you >>>> would always want to test the libjvm that is currently running - though >>>> it is hard to determine that. >>> >>> I'm afraid I can't be of much assistance here - I just took what was in >>> the *.sh version and converted it to *.java. >> >> Okay. I expect this will need revisiting at some point. > > So, does this mean "ok, go"? > > Thanks, > > -JB- > >> >> Thanks, >> David >> ----- >> >> >>> -JB- >>> >>>> >>>> Thanks, >>>> David >>>> >>>> On 8/10/2013 9:33 PM, Jaroslav Bachorik wrote: >>>>> On 8.10.2013 05:42, David Holmes wrote: >>>>>> Jaroslav, >>>>>> >>>>>> Can you summarise the changes please? With the conversion to Java and >>>>>> the infrastructure additions I can't tell what is actually fixing the >>>>>> original timeout issue :) >>>>> >>>>> The timeout was most caused by using the same file for communication >>>>> between java processes in more test cases. When those test cases were >>>>> run in parallel the file got rewritten silently and some of the tests >>>>> could end up trying to connect to incorrect port in the target >>>>> application. I was able to reproduce the timeout by interleaving the >>>>> test runs for CustomLauncherTest.sh and LocalManagementTest.sh and >>>>> adding an artificial delay to CusteomLauncherTest.sh to allow >>>>> LocalManagementTest.sh to change the port in the file. >>>>> >>>>> While it could be fixed by using a different file for each test case I >>>>> took the liberty of converting the shell tests to java tests. This >>>>> allows me to remove the communication file and, in the end, make the >>>>> tests more robust. >>>>> >>>>> CustomLauncherTest.java and LocalManagementTest.java are the tests >>>>> converted from shell to java. I decided to convert >>>>> LocalManagementTest.sh as well because it has the same problems as the >>>>> CustomLauncherTest.sh. >>>>> >>>>> The changes in the testlibrary are about introducing new methods >>>>> allowing the tests easily start a process and wait for a certain text >>>>> appearing in its stdout/stderr. Using these methods the caller can >>>>> wait >>>>> till the callee is fully initialized and eg. ready to accept >>>>> connections. >>>>> >>>>> The changes in launchers make the launchers actually executable + I am >>>>> adding a linux-amd64 launcher (I needed that one to work on the >>>>> changes >>>>> locally and thought it might be nice to have one more platform covered >>>>> by the test). >>>>> >>>>> I've update the webrev to include changes to LocalManagementTest and >>>>> TEST.groups (both of those tests require JDK) - >>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.05 >>>>> >>>>> -JB- >>>>> >>>>>> >>>>>> Thanks, >>>>>> David >>>>>> >>>>>> On 8/10/2013 12:14 AM, Jaroslav Bachorik wrote: >>>>>>> On 19.9.2013 16:33, Jaroslav Bachorik wrote: >>>>>>>> The updated webrev: >>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.03 >>>>>>>> >>>>>>>> I've moved some of the functionality to the testlibrary. >>>>>>>> >>>>>>>> -JB - >>>>>>>> >>>>>>>> On 12.9.2013 17:31, Jaroslav Bachorik wrote: >>>>>>>>> On 09/12/2013 05:13 PM, Dmitry Samersoff wrote: >>>>>>>>>> Jaroslav, >>>>>>>>>> >>>>>>>>>> CustomLauncherTest.java: >>>>>>>>>> >>>>>>>>>> 102: this check could be moved to switch at ll. 108 >>>>>>>>>> otherwise test fails on "sunos" and "linux" because PLATFORM >>>>>>>>>> remains >>>>>>>>>> unset. >>>>>>>>> Good idea. Thanks. >>>>>>>>> >>>>>>>>>> 129: I would prefer don't have pattern like this one ever in >>>>>>>>>> shell >>>>>>>>>> script. Could you prepare a list of VM's to check and just loop >>>>>>>>>> over >>>>>>>>>> it? >>>>>>>>>> It makes test better readable. Also I think nowdays we can always >>>>>>>>>> use >>>>>>>>>> server VM. >>>>>>>>> I tried to mirror the original shell test as closely as >>>>>>>>> possible. It >>>>>>>>> would be nice if we could rely on the "server" vm only. Definitely >>>>>>>>> more >>>>>>>>> readable. >>>>>>>>> >>>>>>>>> -JB- >>>>>>>>> >>>>>>>>>> -Dmitry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On 2013-09-12 18:17, Jaroslav Bachorik wrote: >>>>>>>>>>> On 09/12/2013 10:22 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>> On 09/12/2013 10:12 AM, Chris Hegarty wrote: >>>>>>>>>>>>> On 09/12/2013 04:45 AM, David Holmes wrote: >>>>>>>>>>>>>> Hi Jaroslav, >>>>>>>>>>>>>> >>>>>>>>>>>>>> You need a copyright notice in the new file. >>>>>>>>>>>>>> >>>>>>>>>>>>>> As written this test can only run on a full JDK - so please >>>>>>>>>>>>>> add >>>>>>>>>>>>>> it to >>>>>>>>>>>>>> the :needs_jdk group in TEST.groups. (Does jcmd really >>>>>>>>>>>>>> needs to >>>>>>>>>>>>>> come >>>>>>>>>>>>>> from the test-jdk? And use the VMOPTS passed to the test?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Is there a reason this test can't run on OSX? I know it would >>>>>>>>>>>>>> need >>>>>>>>>>>>>> further modification but was wondering if there is something >>>>>>>>>>>>>> inherent in >>>>>>>>>>>>>> the test that makes it inapplicable to OSX. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I think the test would be a lot simpler if the jdk tests had >>>>>>>>>>>>>> the >>>>>>>>>>>>>> hotspot >>>>>>>>>>>>>> test library's process tools available. :( >>>>>>>>>>>>> We have some, is there an obvious gap? >>>>>>>>>>>>> >>>>>>>>>>>>> http://hg.openjdk.java.net/jdk8/tl/jdk/file/e407df8093dc/test/lib/testlibrary/jdk/testlibrary/ >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> Hm, thanks for the info. I should have used this library >>>>>>>>>>>> instead. >>>>>>>>>>>> >>>>>>>>>>>> Please, stand by for the updated webrev. >>>>>>>>>>> I was able to get rid off the JCMD. Using the testlibrary the >>>>>>>>>>> target >>>>>>>>>>> application can recognize its own PID and print it to its >>>>>>>>>>> stdout. >>>>>>>>>>> The >>>>>>>>>>> main application then just reads the stdout to parse the PID. No >>>>>>>>>>> need >>>>>>>>>>> for JCMD any more. >>>>>>>>>>> >>>>>>>>>>> I could not find a way to remove the dependency on "test.jdk" >>>>>>>>>>> system >>>>>>>>>>> property. According to the jtreg web documentation >>>>>>>>>>> (http://openjdk.java.net/jtreg/vmoptions.html#cmdLineOpts) a >>>>>>>>>>> "test.java" >>>>>>>>>>> system property should be available but in fact is not. But it >>>>>>>>>>> seems >>>>>>>>>>> that the testlibrary uses "test.jdk" system property too. >>>>>>>>>>> >>>>>>>>>>> The test does not run on OSX because nobody built the launcher >>>>>>>>>>> binary :) >>>>>>>>>>> I think it is a kind of DIY so I took the liberty of adding a >>>>>>>>>>> linux-amd64 launcher while working on the test. >>>>>>>>>>> >>>>>>>>>>> While working with the test library I realized I was missing a >>>>>>>>>>> crucial >>>>>>>>>>> feature (at least for my purposes) - waiting for a certain >>>>>>>>>>> message to >>>>>>>>>>> appear in the stdout/stderr of the launched process. Very >>>>>>>>>>> often I >>>>>>>>>>> need >>>>>>>>>>> to wait for the target process to get to certain point before >>>>>>>>>>> the >>>>>>>>>>> test >>>>>>>>>>> can be allowed to continue - and the point is indicated by a >>>>>>>>>>> message in >>>>>>>>>>> stdout/stderr. Currently all the proc tools are designed to >>>>>>>>>>> work in >>>>>>>>>>> "batch" mode - the whole stdout/stderr is captured in strings >>>>>>>>>>> and >>>>>>>>>>> analyzed after the target process died - and are not suitable >>>>>>>>>>> for >>>>>>>>>>> this >>>>>>>>>>> kind of usage. >>>>>>>>>>> >>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8004926/webrev.01 >>>>>>>>>>> >>>>>>>>>>>> -JB- >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Chris. >>>>>>>>>>>>> >>>>>>>>>>>>>> David >>>>>>>>>>>>>> ----- >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 12/09/2013 1:39 AM, Jaroslav Bachorik wrote: >>>>>>>>>>>>>>> Please, review the patch for an intermittently failing test. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The test is a shell test, using files for the interprocess >>>>>>>>>>>>>>> synchronization. This leads to intermittent failures. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> In order to fix this the test is rewritten in Java - the >>>>>>>>>>>>>>> original >>>>>>>>>>>>>>> functionality and outputs should be 100% preserved. The >>>>>>>>>>>>>>> patch is >>>>>>>>>>>>>>> unfortunately a bit difficult to follow since there is no >>>>>>>>>>>>>>> similarity >>>>>>>>>>>>>>> between the *.sh and *.java file so one needs to go through >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>> new >>>>>>>>>>>>>>> source in whole. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The changes in "launcher" files are all about adding >>>>>>>>>>>>>>> permissions to >>>>>>>>>>>>>>> execute (0755) and as such the webrev shows no differences. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Issue : JDK-8004926 >>>>>>>>>>>>>>> Webrev : >>>>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8004926/webrev.00 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -JB- >>>>>>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> > From jaroslav.bachorik at oracle.com Tue Oct 29 10:28:37 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Tue, 29 Oct 2013 18:28:37 +0100 Subject: jmx-dev RFR 8027358: sun/management/jmxremote/bootstrap/LocalManagementTest.java failing since JDK-8004926 Message-ID: <526FF045.9060803@oracle.com> Please, review this test fix. In agentvm mode the test can not rely on the co-location of the test class and the auxiliary classes the test class wants to start. It is necessary to explicitly provide the test class path when starting an external java process. Issue : https://bugs.openjdk.java.net/browse/JDK-8027358 Webrev: http://cr.openjdk.java.net/~jbachorik/8027358/webrev.00/ Thanks, -JB- From Alan.Bateman at oracle.com Tue Oct 29 13:35:07 2013 From: Alan.Bateman at oracle.com (Alan Bateman) Date: Tue, 29 Oct 2013 20:35:07 +0000 Subject: jmx-dev RFR 8027358: sun/management/jmxremote/bootstrap/LocalManagementTest.java failing since JDK-8004926 In-Reply-To: <526FF045.9060803@oracle.com> References: <526FF045.9060803@oracle.com> Message-ID: <52701BFB.1030107@oracle.com> On 29/10/2013 17:28, Jaroslav Bachorik wrote: > Please, review this test fix. > > In agentvm mode the test can not rely on the co-location of the test > class and the auxiliary classes the test class wants to start. It is > necessary to explicitly provide the test class path when starting an > external java process. > > Issue : https://bugs.openjdk.java.net/browse/JDK-8027358 > Webrev: http://cr.openjdk.java.net/~jbachorik/8027358/webrev.00/ This looks okay to me. -Alan. From jaroslav.bachorik at oracle.com Wed Oct 30 04:23:55 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 30 Oct 2013 12:23:55 +0100 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <526975F4.8060707@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> Message-ID: <5270EC4B.9080205@oracle.com> On 24.10.2013 21:33, Mandy Chung wrote: > > On 10/24/2013 7:01 AM, Jaroslav Bachorik wrote: >> Hi Mandy, >> >> On 24.10.2013 01:02, Mandy Chung wrote: >>> >>> On 10/23/2013 7:32 AM, Jaroslav Bachorik wrote: >>>> I've updated the patch. The GC is called even before the first attempt >>>> to get the pool memory usage and System.gc() is used to perform GC (no >>>> MXBean checks). This should simplify the change a bit. >>>> >>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02 >>> >>> This change is okay. It will force GC once per each memory pool that >>> supports usage threshold (I think 3 memory pools) which is not a huge >>> issue. Perhaps a more reliable option is to make it an othervm test and >>> allocating large object and calling GC once before the verification. >> >> Running it as othervm might improve repeatbility but I don't quite >> follow the trick with large object. That would be effective for the >> oldgen pools only, I suppose? There were concerns raised during the >> review that other pools might be susceptible to the same timing >> related problems (theoretically). > > This test was written before the samevm/agentvm support. In general we > want the tests to be reliable. You want the System.gc() call to reduce > the probability of the race such that the initially empty pool is being > filled with objects between getUsage() and isUsageThresholdExceeded() > methods are called but this has the assumption that there is some large > object allocated and get promoted to the old gen (not done in this test > though). The other possibility is that the old gen is cleared although > it might be rare in practice? Holding on a large object will ensure > that the old gen is always filled with something to make it more reliable. > >> So, if you don't feel strongy about it I would leave the rest of the >> test as it is - that is calling System.gc() before checking the pool >> thresholds. > > I just worry that this test will fail some day intermittently again. > Since in practice the runtime has space allocated, I think running it in > othervm would be adequate. Ok. I've added a big object and an initial call to System.gc(). But I'm leaving the calls to System.gc() right before checking the pools as well - just to be sure. http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04 -JB- > > Mandy From mandy.chung at oracle.com Wed Oct 30 09:30:17 2013 From: mandy.chung at oracle.com (Mandy Chung) Date: Wed, 30 Oct 2013 09:30:17 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <5270EC4B.9080205@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com> Message-ID: <52713419.5040809@oracle.com> On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote: > Ok. I've added a big object and an initial call to System.gc(). But > I'm leaving the calls to System.gc() right before checking the pools > as well - just to be sure. > > http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04 > The update looks okay and I think System.gc() at line 90 is no longer needed as the failure was due to the empty old gen. thanks for the update. Mandy From jaroslav.bachorik at oracle.com Wed Oct 30 09:58:13 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Wed, 30 Oct 2013 17:58:13 +0100 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52713419.5040809@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com> <52713419.5040809@oracle.com> Message-ID: <52713AA5.3050102@oracle.com> On 30.10.2013 17:30, Mandy Chung wrote: > > On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote: >> Ok. I've added a big object and an initial call to System.gc(). But >> I'm leaving the calls to System.gc() right before checking the pools >> as well - just to be sure. >> >> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04 >> > > The update looks okay and I think System.gc() at line 90 is no longer > needed as the failure was due to the empty old gen. > > thanks for the update. Thanks for the review. I've left the System.gc() at line 90 intact - when discussing this with Bengt during the review he was concerned that other pools might be susceptible to this kind of problem and having a full GC right before the check could lessen the probability of running into the data races described in this issue. -JB- > Mandy From staffan.larsen at oracle.com Wed Oct 30 23:32:28 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 31 Oct 2013 07:32:28 +0100 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com> <52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com> Message-ID: <9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com> Quoting Bengt from earlier in this conversation: "As for just doing a System.gc() to force a GC I think you can rely on that System.gc() does a full GC in Hotspot unless someone sets -XX:+DisableExplicitGC on the command line. Considering that you are relying on Hotspot specifc names for pools I don't think it is a limitation to the test to rely on the Hotspot implementatoin of System.gc()." The spec for System.gc() doesn't promising anything, but all the collectors in Hotspot are implemented to do a full GC when System.gc() is called. Thanks, /Staffan On 30 okt 2013, at 21:02, Martin Buchholz wrote: > Technically, System.gc() doesn't promise anything. I believe it may merely initiate a gc if the gc implementation is concurrent. Check out awaitFullGc in my beloved GcFinalization > > https://code.google.com/p/guava-libraries/source/browse/guava-testlib/src/com/google/common/testing/GcFinalization.java?spec=svn196edb139d49d373abbce013008da0206b83f0ca&r=ae6bc9be431d7601b1f4713679efea126673378e > > > On Wed, Oct 30, 2013 at 9:58 AM, Jaroslav Bachorik wrote: > On 30.10.2013 17:30, Mandy Chung wrote: > > On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote: > Ok. I've added a big object and an initial call to System.gc(). But > I'm leaving the calls to System.gc() right before checking the pools > as well - just to be sure. > > http://cr.openjdk.java.net/~jbachorik/8020467/webrev.04 > > > The update looks okay and I think System.gc() at line 90 is no longer > needed as the failure was due to the empty old gen. > > thanks for the update. > > Thanks for the review. I've left the System.gc() at line 90 intact - when discussing this with Bengt during the review he was concerned that other pools might be susceptible to this kind of problem and having a full GC right before the check could lessen the probability of running into the data races described in this issue. > > -JB- > > Mandy > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131031/13793423/attachment.html From jaroslav.bachorik at oracle.com Thu Oct 31 03:27:04 2013 From: jaroslav.bachorik at oracle.com (Jaroslav Bachorik) Date: Thu, 31 Oct 2013 11:27:04 +0100 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled In-Reply-To: References: <5252BE3B.5020607@oracle.com> Message-ID: <52723078.2010507@oracle.com> On 7.10.2013 16:35, Staffan Larsen wrote: > This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.) > > A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable. I've implemented the check for non-decrementing class count. I talked to SQE about not running this test with JFR but it seems that it is not currently possible to exclude single tests from parametrized runs. Also, the test is marked as /othervm http://cr.openjdk.java.net/~jbachorik/7144200/webrev.02 Cheers, -JB- > > In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test. > > S/taffan > > On 7 okt 2013, at 15:59, Jaroslav Bachorik wrote: > >> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR. >> >> The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently. >> >> Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 >> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ >> >> Cheers, >> >> -JB- > From staffan.larsen at oracle.com Thu Oct 31 03:43:29 2013 From: staffan.larsen at oracle.com (Staffan Larsen) Date: Thu, 31 Oct 2013 11:43:29 +0100 Subject: jmx-dev RFR 7144200: java/lang/management/ClassLoadingMXBean/LoadCounts.java failed with JFR enabled In-Reply-To: <52723078.2010507@oracle.com> References: <5252BE3B.5020607@oracle.com> <52723078.2010507@oracle.com> Message-ID: <7B3294CA-2BA9-458E-82D8-6491306B8392@oracle.com> Looks good! Thanks, /Staffan On 31 okt 2013, at 11:27, Jaroslav Bachorik wrote: > On 7.10.2013 16:35, Staffan Larsen wrote: >> This will make it less likely for the test to fail, but does not guarantee it since there is nothing that says classloading will be done in 300 ms. Any failures will unfortunately be harder to reproduce. (And the test is now 300 ms slower to run.) >> >> A different solution is to only allow the number of classes to increase, but not be strict about the increase being exactly 4. That would of course make the test less stringent, but very stable. > > I've implemented the check for non-decrementing class count. > > I talked to SQE about not running this test with JFR but it seems that it is not currently possible to exclude single tests from parametrized runs. > > Also, the test is marked as /othervm > > http://cr.openjdk.java.net/~jbachorik/7144200/webrev.02 > > Cheers, > > -JB- > >> >> In any case, I think the test has to be marked as /othervm since running other tests simultaneously will impact this test. >> >> S/taffan >> >> On 7 okt 2013, at 15:59, Jaroslav Bachorik wrote: >> >>> The test captures the number of loaded classes right at the start and then checks the diffs when it's finished. However, it seems that there might by some async class loading still going on, initiated by JFR. >>> >>> The patch simply adds a loop to wait for the number of loaded classes to settle before continuing. This should prevent the test failing with JFR intermittently. >>> >>> Issue: https://bugs.openjdk.java.net/browse/JDK-7144200 >>> Webrev: http://cr.openjdk.java.net/~jbachorik/7144200/webrev.00/ >>> >>> Cheers, >>> >>> -JB- >> > From martinrb at google.com Wed Oct 30 13:02:23 2013 From: martinrb at google.com (Martin Buchholz) Date: Wed, 30 Oct 2013 13:02:23 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <52713AA5.3050102@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com> <52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com> Message-ID: Technically, System.gc() doesn't promise anything. I believe it may merely initiate a gc if the gc implementation is concurrent. Check out awaitFullGc in my beloved GcFinalization https://code.google.com/p/guava-libraries/source/browse/guava-testlib/src/com/google/common/testing/GcFinalization.java?spec=svn196edb139d49d373abbce013008da0206b83f0ca&r=ae6bc9be431d7601b1f4713679efea126673378e On Wed, Oct 30, 2013 at 9:58 AM, Jaroslav Bachorik < jaroslav.bachorik at oracle.com> wrote: > On 30.10.2013 17:30, Mandy Chung wrote: > >> >> On 10/30/2013 4:23 AM, Jaroslav Bachorik wrote: >> >>> Ok. I've added a big object and an initial call to System.gc(). But >>> I'm leaving the calls to System.gc() right before checking the pools >>> as well - just to be sure. >>> >>> http://cr.openjdk.java.net/~**jbachorik/8020467/webrev.04 >>> >>> >> The update looks okay and I think System.gc() at line 90 is no longer >> needed as the failure was due to the empty old gen. >> >> thanks for the update. >> > > Thanks for the review. I've left the System.gc() at line 90 intact - when > discussing this with Bengt during the review he was concerned that other > pools might be susceptible to this kind of problem and having a full GC > right before the check could lessen the probability of running into the > data races described in this issue. > > -JB- > > Mandy >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131030/adb1cf70/attachment-0001.html From martinrb at google.com Thu Oct 31 00:08:31 2013 From: martinrb at google.com (Martin Buchholz) Date: Thu, 31 Oct 2013 00:08:31 -0700 Subject: jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool In-Reply-To: <9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com> References: <526681FD.90604@oracle.com> <5266DA56.6050609@oracle.com> <52678280.1070004@oracle.com> <4972770C-A8BB-40A0-9BB8-885AA6B707BC@oracle.com> <52678509.2020002@oracle.com> <68F92D47-E916-426B-8D6A-2EA48302053D@oracle.com> <5267C3AD.5050306@oracle.com> <5267C74F.2010302@oracle.com> <5267CC00.7080509@oracle.com> <5267DDFC.4060607@oracle.com> <52685571.1090407@oracle.com> <52692847.1030806@oracle.com> <526975F4.8060707@oracle.com> <5270EC4B.9080205@oracle.com> <52713419.5040809@oracle.com> <52713AA5.3050102@oracle.com> <9C102C02-3375-4F93-9969-432431DCBE7B@oracle.com> Message-ID: On Wed, Oct 30, 2013 at 11:32 PM, Staffan Larsen wrote: > Quoting Bengt from earlier in this conversation: > > "As for just doing a System.gc() to force a GC I think you can rely on > that System.gc() does a full GC in Hotspot unless someone sets > -XX:+DisableExplicitGC on the command line. Considering that you are > relying on Hotspot specifc names for pools I don't think it is a limitation > to the test to rely on the Hotspot implementatoin of System.gc()." > > A full synchronous gc is a stronger condition than a full gc. > The spec for System.gc() doesn't promising anything, but all the > collectors in Hotspot are implemented to do a full GC when System.gc() is > called. > > I'm not a GC expert and I have no proof, but that is not my understanding. I believe that a concurrent gc (CMS) remains concurrent even if initiated by System.gc(). Hmmm.... checking hotspot flags I see: java -XX:+PrintFlagsFinal bool ExplicitGCInvokesConcurrent = false {product} bool ExplicitGCInvokesConcurrentAndUnloadsClasses = false {product} which suggests you are right for default gc operation. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.openjdk.java.net/pipermail/jmx-dev/attachments/20131031/bc3723e0/attachment-0001.html