RFR 8207367: 10 vmTestbase/nsk/jdi tests timed out when running with jtreg

Daniil Titov daniil.x.titov at oracle.com
Wed Feb 27 17:03:47 UTC 2019


Hi Serguei and Chris,

Thank you for reviewing this change. Please review a new version of the fix that addresses these findings.

Bug: https://bugs.openjdk.java.net/browse/JDK-8207367  
Webrev: http://cr.openjdk.java.net/~dtitov/8207367/webrev.03  

Best regards,
Daniil

From: <serguei.spitsyn at oracle.com>
Organization: Oracle Corporation
Date: Tuesday, February 26, 2019 at 6:50 PM
To: Daniil Titov <daniil.x.titov at oracle.com>, Chris Plummer <chris.plummer at oracle.com>, OpenJDK Serviceability <serviceability-dev at openjdk.java.net>
Subject: Re: RFR 8207367: 10 vmTestbase/nsk/jdi tests timed out when running with jtreg

Hi Daniil,

It looks good to me.
I have some minor comments though.
http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java.frames.html

 163                 if(eventSet != null) {
 Space is missed after if


http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove/remove004.java.frames.html
  70  * In first one, second thread waits for any incoming event from the    <BR>
  71  * debugger which is sleeping for some time; hence, <BR>
  72  * no events are expected to be received at the debugger end.           <BR>
 <BR> is not aligned at 71


 While you are at this code, could you, please,
 also fix unneeded spaces at the lines? :
 516             } catch ( InterruptedException e1) {
 ...
 526             } catch ( Exception e ) {

http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove/remove004a.java.frames.html
 111                                         Utils.adjustTimeout(argHandler.getWaitTime() * 10000),
 112                                        100,
 113                                         remove004a::log1);
  Line 112 is not properly aligned


http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/EventQueue/remove_l/remove_l004.java.frames.html
  65  * In the first one the first assertion is checked up on as follows:    <BR>
  66  * the debugger sleeps for some time;                                  <BR>
  67  * hence, no event is expected in the debugger within WAITTIME, and     <BR>
  <BR> at 66 is not aligned
  Thank you for catching and fixing the typo at 67!

http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002.java.frames.html

 295                         }, Utils.adjustTimeout(waitTime * 60000), 1000, dispose002::log3);
 A separate line is needed for next wait() argument


http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002a.java.frames.html
 128                          while(true) {
 ...
 130                              if(instruction.equals("check_done")){
 131                                  if (test_thread.isAlive()) {
 132                                      exitCode = FAILED;
 133                                  }

  Space is missed after while.
  Some logErr("...") for failed case would be useful.
 

http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003.java.frames.html
 260                         }, Utils.adjustTimeout(waitTime * 60000), 1000, dispose003::log3);
 New line is needed for next wait() argument


http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003a.java.frames.html
 129                          while(true) {
 130                              instruction = pipe.readln();
 131                              if(instruction.equals("check_done")) {
 132                                  if (test_thread.isAlive()) {
 133                                      exitCode = FAILED;
 134                                  }
  Space is missed after while and if.
  Some logErr("...") for failed case would be useful. 

http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004.java.frames.html
 284                         }, Utils.adjustTimeout(waitTime * 60000), 1000, dispose004::log3);
 New line is needed for next wait() argument


http://cr.openjdk.java.net/~dtitov/8207367/webrev.02/test/hotspot/jtreg/vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004a.java.frames.html
 130                          while(true) {
 131                              instruction = pipe.readln();
 132                              if(instruction.equals("check_done")) {
 133                                  if (test_thread.isAlive()) {
 134                                      exitCode = FAILED;
 135                                  }
  Space is missed after while and if.
  Some logErr("...") for failed case would be useful. 


Thanks,
Serguei

On 2/26/19 6:06 PM, Daniil Titov wrote:
Hi Chris,

Please review a new version of the webrev that slightly changes method receiveEvents(long,pattern)  in the following tests:
 -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java
 -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java)


The new changes ensure that receiveEvents(long,pattern)  method keeps receiving events in a while loop even if eventSet returned by  debugee.VM().eventQueue().remove(waitTime) is a null due to timeout.


Bug: https://bugs.openjdk.java.net/browse/JDK-8207367 
Webrev: http://cr.openjdk.java.net/~dtitov/8207367/webrev.02

Thanks!
-Daniil

On 2/26/19, 11:41 AM, "Chris Plummer" mailto:chris.plummer at oracle.com wrote:

    Ok. I think you mentioned below that default wait time will be 5 
    seconds. That seems sufficient to avoid a timeout if there are some 
    minor network issues and some packets are lost.
    
    Changes look good.
    
    thanks,
    
    Chris
    
    On 2/26/19 10:01 AM, Daniil Titov wrote:
    > Hi Chris,
    >
    > Yes , it is correct. For example in this particular test the timeout is expected (line 283 expects that breakpoint() returns returnCode3 that is set on line 460 when eventSet on line 456 is null due to a timeout in eventQueue.remove()) and shortening it makes the whole test fit in jtreg time limits.
    >
    >    281	                log2("       checking up that the thread2 is not at breakpoint1 because of suspension");
    >     282	                expresult = breakpoint();
    >     283	                if (expresult != returnCode3) {
    >     284	                    log3("ERROR: no timeout for waiting for BreakpointEvent when the thread2 is suspended");
    >     285	                    expresult = returnCode1;
    >     286	                    break label1;
    >     287	                } else
    >
    >
    >    445	    private int breakpoint () {
    >     446	
    >     447	        int returnCode = returnCode0;
    >     448	
    >     449	        log2("       waiting for BreakpointEvent");
    >     450	
    >     451	        labelBP:
    >     452	            for (;;) {
    >     453	
    >     454	                log2("       new:  eventSet = eventQueue.remove();");
    >     455	                try {
    >     456	                    eventSet = eventQueue.remove (Utils.adjustTimeout(waitTime*1000));
    >     457	                    if (eventSet == null) {
    >     458	                        log2("::::::  timeout when waiting for a BreakpintEvent");
    >     459	//                        log3("ERROR:  timeout for waiting for a BreakpintEvent");
    >     460	                        returnCode = returnCode3;
    >     461	                        break labelBP;
    >     462	                    }
    >
    >>     And I just noticed the space right after "remove". Can you remove it?
    > Sure. Will do.
    >
    > Thanks!
    > --Daniil
    >
    > On 2/25/19, 7:26 PM, "Chris Plummer" mailto:chris.plummer at oracle.com wrote:
    >
    >      Hi Daniil,
    >      
    >      For suspend001, are you saying the following is expected to timeout
    >      sometimes, so you need a shorter waittime to avoid making the whole test
    >      time out?
    >      
    >        456                     eventSet = eventQueue.remove
    >      (Utils.adjustTimeout(waitTime*1000));
    >      
    >      And I just noticed the space right after "remove". Can you remove it?
    >      
    >      thanks,
    >      
    >      Chris
    >      
    >      On 2/25/19 6:57 PM, Daniil Titov wrote:
    >      > Hi Chris,
    >      >
    >      > The timeout issue mentioned in the bug is about jtreg aborting the tests since they are running longer than the maximum allowed time. That happens since these tests  use extreme long internal delays,  e.g. a sleep for 5 minutes  or a wait for 5 minutes for a case when no events ( and a notify()) are expected. Reducing these internal delays makes the test passing within the default jtreg timeout ( 2 minutes).
    >      >
    >      > Best regards,
    >      > Daniil
    >      >
    >      > On 2/25/19, 6:15 PM, "Chris Plummer" mailto:chris.plummer at oracle.com wrote:
    >      >
    >      >      Ok. So how is the timeout issue mentioned in the bug addressed when
    >      >      there is now a shorter wait time?
    >      >
    >      >      Chris
    >      >
    >      >      On 2/25/19 5:04 PM, Daniil Titov wrote:
    >      >      > Hi Chris,
    >      >      >
    >      >      > Forgot to answer to your another question:
    >      >      >   >      > For these 3 tests the event wait timeout was reduced and adjusted for test.timeout.factor:
    >      >      >      >      >    -vmTestbase/nsk/jdi/Event/_itself_/event001.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/ThreadReference/suspend/suspend001.java
    >      >      >      >      So overall is this a shorter or longer waittime now?
    >      >      >
    >      >      > Overall this is a shorter waitime now.  Instead of 300 seconds it is now 20 seconds for Mach5 jobs (they are run with test.timeout.factor set to 4.0) and 5 seconds for regular jtreg runs.
    >      >      >
    >      >      > Best regards,
    >      >      > Daniil
    >      >      >
    >      >      >
    >      >      > On 2/25/19, 4:38 PM, "Chris Plummer" mailto:chris.plummer at oracle.com wrote:
    >      >      >
    >      >      >      Hi Daniil,
    >      >      >
    >      >      >      Yes, my point was that the max time you wait for a single event is much
    >      >      >      smaller now. I can see a possibility that with a little bit of network
    >      >      >      instability  a packet gets lost and resend does not happen fast enough.
    >      >      >
    >      >      >      thanks,
    >      >      >
    >      >      >      Chris
    >      >      >
    >      >      >      On 2/25/19 4:32 PM, Daniil Titov wrote:
    >      >      >      > Hi Chris,
    >      >      >      >
    >      >      >      > The code still waits for the whole total wait time. There is a while loop at lines 163-186 that keeps receiving new events (line 183) till elapsed time is less than the waittime (line 178) or a timeout happens (so eventSet is null).
    >      >      >      >
    >      >      >      > 159	            begin = System.currentTimeMillis();
    >      >      >      >     160	            eventSet = debugee.VM().eventQueue().remove(waitTime);
    >      >      >      >     161	            delta = System.currentTimeMillis() - begin;
    >      >      >      >     162	            totalWaitTime -= delta;
    >      >      >      >     163	            while (eventSet != null) {
    >      >      >      >     164	                EventIterator eventIterator = eventSet.eventIterator();
    >      >      >      >
    >      >      >      >     178	                if (totalWaitTime <= 0 || exit) {
    >      >      >      >     179	                    break;
    >      >      >      >     180	                }
    >      >      >      >     181	                debugee.resume();
    >      >      >      >     182	                begin = System.currentTimeMillis();
    >      >      >      >     183	                eventSet = debugee.VM().eventQueue().remove(waitTime);
    >      >      >      >     184	                delta = System.currentTimeMillis() - begin;
    >      >      >      >     185	                totalWaitTime -= delta;
    >      >      >      >     186	            }
    >      >      >      >
    >      >      >      >
    >      >      >      > However, as I see now in case if a timeout happens on line 160  (eventSet is null) the loop is not executed at all.  I haven't observed it in test runs but I think it makes sense to adjust this test to take this potential case into account. I will send an updated version of the patch soon.
    >      >      >      >
    >      >      >      > Thanks!
    >      >      >      >
    >      >      >      > Best regards,
    >      >      >      > Daniil
    >      >      >      >
    >      >      >      > On 2/25/19, 12:21 PM, "Chris Plummer" mailto:chris.plummer at oracle.com wrote:
    >      >      >      >
    >      >      >      >      Hi Daniil,
    >      >      >      >
    >      >      >      >      On 2/23/19 1:02 PM, Daniil Titov wrote:
    >      >      >      >      > Please review the change that fixes timeout issues for the following 10 tests when running with jtreg and default timeout factor (1.0).
    >      >      >      >      In Utils.java, I think wait() should be moved right after
    >      >      >      >      waitForCondition() and maybe given a more descriptive name. It seems to
    >      >      >      >      basically the same as waitForCondition(), except you added a "log"
    >      >      >      >      parameter and slightly changed the behavior. Are these behavior
    >      >      >      >      differences necessary? Could you share code with the existing
    >      >      >      >      waitForCondition()?
    >      >      >      >      >
    >      >      >      >      > For the following 2 tests the event wait timeout was reduced and adjusted for test.timeout.factor.  Method receiveEvents(long,pattern) was fixed to ensure that it gracefully exits after the specified wait period elapsed:
    >      >      >      >      >    -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java
    >      >      >      >        183                 eventSet = debugee.VM().eventQueue().remove(waitTime);
    >      >      >      >
    >      >      >      >      This code used to wait for the total remaining waittime. Now it waits a
    >      >      >      >      fixed amount based on:
    >      >      >      >
    >      >      >      >        153         long waitTime = Utils.adjustTimeout(1000);
    >      >      >      >
    >      >      >      >      How did you come up with this wait amount, and is it long enough to deal
    >      >      >      >      with occasional hiccups?
    >      >      >      >      > For these 3 tests the event wait timeout was reduced and adjusted for test.timeout.factor:
    >      >      >      >      >    -vmTestbase/nsk/jdi/Event/_itself_/event001.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/ThreadReference/suspend/suspend001.java
    >      >      >      >      So overall is this a shorter or longer waittime now?
    >      >      >      >      >
    >      >      >      >      > For next 2 tests the event wait timeout and the thread sleep time were reduced and adjusted for test.timeout.factor. Additional synchronization between the debugger and the debuggee was added to ensure the debugee process continues as soon as the test finishes the timeout related checks and advances to the next steps:
    >      >      >      >      >    - vmTestbase/nsk/jdi/EventQueue/remove_l/remove_l004/TestDescription.java
    >      >      >      >      >    - vmTestbase/nsk/jdi/EventQueue/remove/remove004/TestDescription.java
    >      >      >      >      Ok.
    >      >      >      >      >
    >      >      >      >      > Instead of just sleeping for 5 minutes while waiting for the debuggee test thread to complete  the tests now check whether the debuggee thread is alive in the loop. The total waiting timeout was adjusted for test.timeout.factor:
    >      >      >      >      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004/TestDescription.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003/TestDescription.java
    >      >      >      >      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002/TestDescription.java
    >      >      >      >      Ok.
    >      >      >      >
    >      >      >      >      thanks,
    >      >      >      >
    >      >      >      >      Chris
    >      >      >      >      >
    >      >      >      >      > Testing.
    >      >      >      >      > The following VM options were used  in Mach5 jobs to  verify these changes:
    >      >      >      >      > 1. No VM args
    >      >      >      >      > 2. -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=grail
    >      >      >      >      > 3. -Xcomp
    >      >      >      >      >
    >      >      >      >      > Also tier1, tier2 and tier3 Mach5 jobs succeeded.
    >      >      >      >      >
    >      >      >      >      > To verify that tests succeed with test.timeout.factor set to 1.0 the following patch was used before running Mach5 jobs.
    >      >      >      >      >
    >      >      >      >      > --- a/make/RunTests.gmk Thu Feb 21 15:17:42 2019 -0800
    >      >      >      >      > +++ b/make/RunTests.gmk Thu Feb 21 15:42:15 2019 -0800
    >      >      >      >      > @@ -826,6 +826,7 @@
    >      >      >      >      >     else
    >      >      >      >      >       JTREG_TIMEOUT_FACTOR ?= 4
    >      >      >      >      >     endif
    >      >      >      >      > +  JTREG_TIMEOUT_FACTOR = 1
    >      >      >      >      >     JTREG_VERBOSE ?= fail,error,summary
    >      >      >      >      >     JTREG_RETAIN ?= fail,error
    >      >      >      >      >
    >      >      >      >      > Bug: https://bugs.openjdk.java.net/browse/JDK-8207367
    >      >      >      >      > Webrev: http://cr.openjdk.java.net/~dtitov/8207367/webrev.01
    >      >      >      >      >
    >      >      >      >      > Thanks!
    >      >      >      >      > --Daniil
    >      >      >      >      >
    >      >      >      >      >
    >      >      >      >
    >      >      >      >
    >      >      >      >
    >      >      >      >
    >      >      >      >
    >      >      >
    >      >      >
    >      >      >
    >      >      >
    >      >      >
    >      >
    >      >
    >      >
    >      >
    >      >
    >      
    >      
    >      
    >
    >
    
    
    








More information about the serviceability-dev mailing list