RFR 8207367: 10 vmTestbase/nsk/jdi tests timed out when running with jtreg

Chris Plummer chris.plummer at oracle.com
Tue Feb 26 00:38:50 UTC 2019


Hi Daniil,

Yes, my point was that the max time you wait for a single event is much 
smaller now. I can see a possibility that with a little bit of network 
instability  a packet gets lost and resend does not happen fast enough.

thanks,

Chris

On 2/25/19 4:32 PM, Daniil Titov wrote:
> Hi Chris,
>   
> The code still waits for the whole total wait time. There is a while loop at lines 163-186 that keeps receiving new events (line 183) till elapsed time is less than the waittime (line 178) or a timeout happens (so eventSet is null).
>
> 159	            begin = System.currentTimeMillis();
>     160	            eventSet = debugee.VM().eventQueue().remove(waitTime);
>     161	            delta = System.currentTimeMillis() - begin;
>     162	            totalWaitTime -= delta;
>     163	            while (eventSet != null) {
>     164	                EventIterator eventIterator = eventSet.eventIterator();
>
>     178	                if (totalWaitTime <= 0 || exit) {
>     179	                    break;
>     180	                }
>     181	                debugee.resume();
>     182	                begin = System.currentTimeMillis();
>     183	                eventSet = debugee.VM().eventQueue().remove(waitTime);
>     184	                delta = System.currentTimeMillis() - begin;
>     185	                totalWaitTime -= delta;
>     186	            }
>
>
> However, as I see now in case if a timeout happens on line 160  (eventSet is null) the loop is not executed at all.  I haven't observed it in test runs but I think it makes sense to adjust this test to take this potential case into account. I will send an updated version of the patch soon.
>
> Thanks!
>
> Best regards,
> Daniil
>
> On 2/25/19, 12:21 PM, "Chris Plummer" <chris.plummer at oracle.com> wrote:
>
>      Hi Daniil,
>      
>      On 2/23/19 1:02 PM, Daniil Titov wrote:
>      > Please review the change that fixes timeout issues for the following 10 tests when running with jtreg and default timeout factor (1.0).
>      In Utils.java, I think wait() should be moved right after
>      waitForCondition() and maybe given a more descriptive name. It seems to
>      basically the same as waitForCondition(), except you added a "log"
>      parameter and slightly changed the behavior. Are these behavior
>      differences necessary? Could you share code with the existing
>      waitForCondition()?
>      >
>      > For the following 2 tests the event wait timeout was reduced and adjusted for test.timeout.factor.  Method receiveEvents(long,pattern) was fixed to ensure that it gracefully exits after the specified wait period elapsed:
>      >    -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassExclusionFilter/exclfilter001.java
>      >    -vmTestbase/nsk/jdi/ClassUnloadRequest/addClassFilter/filter001.java
>        183                 eventSet = debugee.VM().eventQueue().remove(waitTime);
>      
>      This code used to wait for the total remaining waittime. Now it waits a
>      fixed amount based on:
>      
>        153         long waitTime = Utils.adjustTimeout(1000);
>      
>      How did you come up with this wait amount, and is it long enough to deal
>      with occasional hiccups?
>      > For these 3 tests the event wait timeout was reduced and adjusted for test.timeout.factor:
>      >    -vmTestbase/nsk/jdi/Event/_itself_/event001.java
>      >    -vmTestbase/nsk/jdi/VirtualMachine/suspend/suspend001/TestDescription.java
>      >    -vmTestbase/nsk/jdi/ThreadReference/suspend/suspend001.java
>      So overall is this a shorter or longer waittime now?
>      >
>      > For next 2 tests the event wait timeout and the thread sleep time were reduced and adjusted for test.timeout.factor. Additional synchronization between the debugger and the debuggee was added to ensure the debugee process continues as soon as the test finishes the timeout related checks and advances to the next steps:
>      >    - vmTestbase/nsk/jdi/EventQueue/remove_l/remove_l004/TestDescription.java
>      >    - vmTestbase/nsk/jdi/EventQueue/remove/remove004/TestDescription.java
>      Ok.
>      >
>      > Instead of just sleeping for 5 minutes while waiting for the debuggee test thread to complete  the tests now check whether the debuggee thread is alive in the loop. The total waiting timeout was adjusted for test.timeout.factor:
>      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose004/TestDescription.java
>      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose003/TestDescription.java
>      >    -vmTestbase/nsk/jdi/VirtualMachine/dispose/dispose002/TestDescription.java
>      Ok.
>      
>      thanks,
>      
>      Chris
>      >
>      > Testing.
>      > The following VM options were used  in Mach5 jobs to  verify these changes:
>      > 1. No VM args
>      > 2. -XX:+UnlockExperimentalVMOptions -XX:+EnableJVMCI -XX:+TieredCompilation -XX:+UseJVMCICompiler -Djvmci.Compiler=grail
>      > 3. -Xcomp
>      >
>      > Also tier1, tier2 and tier3 Mach5 jobs succeeded.
>      >
>      > To verify that tests succeed with test.timeout.factor set to 1.0 the following patch was used before running Mach5 jobs.
>      >
>      > --- a/make/RunTests.gmk Thu Feb 21 15:17:42 2019 -0800
>      > +++ b/make/RunTests.gmk Thu Feb 21 15:42:15 2019 -0800
>      > @@ -826,6 +826,7 @@
>      >     else
>      >       JTREG_TIMEOUT_FACTOR ?= 4
>      >     endif
>      > +  JTREG_TIMEOUT_FACTOR = 1
>      >     JTREG_VERBOSE ?= fail,error,summary
>      >     JTREG_RETAIN ?= fail,error
>      >
>      > Bug: https://bugs.openjdk.java.net/browse/JDK-8207367
>      > Webrev: http://cr.openjdk.java.net/~dtitov/8207367/webrev.01
>      >
>      > Thanks!
>      > --Daniil
>      >
>      >
>      
>      
>      
>
>




More information about the serviceability-dev mailing list