JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events

David Holmes david.holmes at oracle.com
Sun Aug 12 23:01:38 UTC 2018


Hi Gary,

Good analysis. I agree with your reasoning.

I think these tests were written in a way that assumed that we have to 
locally timeout to detect things not happening "quickly enough" but as 
we well know "quickly enough" is highly dependent on a number of 
factors. And now we rely on the test framework to manage timeouts.

I think removing the timeout from the first wait() and simply allowing 
the framework timeouts to take effect is a good solution.

Repeating the wait in the second case with twice the timeout is less 
clear. One second wasn't long enough, but perhaps 3 seconds will be. And 
it probably will but is not guaranteed. The real guarantee is to wait 
until you see the expected outcome (either by polling using the 
timed-wait, or else by inserting a notify elsewhere - which itself may 
be tricky). And once you start pushing on the wait/notify logic the 
whole structure of the test potentially changes as you either wait until 
you get the expected result, or else get timed out by the framework.

I think waiting a bit longer will probably suffice and cause minimum 
disruption to the test logic, but I'd be included to put the wait into a 
for-loop and use 1-second timeouts per iteration ie.

// wait for up to 3 seconds
for (int i = 0 ; i < 3 ; i++) {
   err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
   if (endsCount == endsExpected)
     break;
}

Thanks,
David
On 11/08/2018 4:15 AM, Gary Adams wrote:
> There's a fairly old bug concerning jvmti thread start and end events.
> I believe there are 2 possible race conditions in the threadstart003
> test based on some arbitrary timeout values used in the test itself.
> 
> Here are a few snippets from the test to illustrate the problem area.
> 
> In the initial sequence RunAgentThread is used to start a thread
> with the agentThreadProc entry point.
> 
>     188        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>     189        if (err != JVMTI_ERROR_NONE) {
>     190            printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
>     191                   TranslateError(err), err);
>     192            result = FAILED;
>     193        }
> 
> *   194        err = (*jvmti)->RunAgentThread(jvmti, thr, agentThreadProc,*
>     195            NULL, JVMTI_THREAD_MAX_PRIORITY);
>     196        if (err != JVMTI_ERROR_NONE) {
>     197            printf("(RunAgentThread) unexpected error: %s (%d)\n",
>     198                   TranslateError(err), err);
>     199            result = FAILED;
>     200        }
> 
> *   201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
> (jlong)WAIT_TIME);*
>     202        if (err != JVMTI_ERROR_NONE) {
>     203            printf("(RawMonitorWait) unexpected error: %s (%d)\n",
>     204                   TranslateError(err), err);
>     205            result = FAILED;
>     206        }
>     207        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>     208        if (err != JVMTI_ERROR_NONE) {
>     209            printf("(RawMonitorExit) unexpected error: %s (%d)\n",
>     210                   TranslateError(err), err);
>     211            result = FAILED;
>     212        }
> 
> I believe the test expects to wake up based on the code in agentThreadProc
> notification of the wait_lock.
> 
>     121        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>     122        if (err != JVMTI_ERROR_NONE) {
>     123            printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
>     124                   TranslateError(err), err);
>     125            result = FAILED;
>     126        }
> *   127        err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>     128        if (err != JVMTI_ERROR_NONE) {
>     129            printf("(RawMonitorNotify) unexpected error: %s (%d)\n",
>     130                   TranslateError(err), err);
>     131            result = FAILED;
>     132        }
>     133        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>     134        if (err != JVMTI_ERROR_NONE) {
>     135            printf("(RawMonitorExit) unexpected error: %s (%d)\n",
>     136                   TranslateError(err), err);
>     137            result = FAILED;
>     138        }
> 
> A second block for the wait_lock does not have a corresponding
> notify call. I'm guessing the monitor is expected to time out,
> so it allows for one second to complete the delivery of the
> thread end event.
> 
>    213
>     214        err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>     215        if (err != JVMTI_ERROR_NONE) {
>     216            printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
>     217                   TranslateError(err), err);
>     218            result = FAILED;
>     219        }
> *   220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
> (jlong)WAIT_TIME);*
>     221        if (err != JVMTI_ERROR_NONE) {
>     222            printf("(RawMonitorWait) unexpected error: %s (%d)\n",
>     223                   TranslateError(err), err);
>     224            result = FAILED;
>     225        }
>     226        err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>     227        if (err != JVMTI_ERROR_NONE) {
>     228            printf("(RawMonitorExit) unexpected error: %s (%d)\n",
>     229                   TranslateError(err), err);
>     230            result = FAILED;
>     231        }
> 
> I see 2 possible race conditions here.
> 
>    - if the thread is slow to start, then the first RawMonitorWait
>      could timeout. The code would continue to the second RawMonitorWait
>      which would wake up from the agentThreadProc call to RawMonitorNotify.
> 
>    - if the thread ending is slow to complete then the second RawMonitorWait
>      could timeout before the thread end event is processed.
> 
> After the second RawMonitorWait completes, the event notification is 
> disabled
> for both thread start and end events. If we disable the end thread event 
> before
> the thread end event is delivered the counts will not match as expected.
> 
>     233        err = (*jvmti)->SetEventNotificationMode(jvmti, 
> JVMTI_DISABLE,
>     234                JVMTI_EVENT_THREAD_START, NULL);
>     235        if (err != JVMTI_ERROR_NONE) {
>     236            printf("Failed to disable JVMTI_EVENT_THREAD_START: 
> %s (%d)\n",
>     237                   TranslateError(err), err);
>     238            result = FAILED;
>     239        }
>     240 *
>     241        err = (*jvmti)->SetEventNotificationMode(jvmti, 
> JVMTI_DISABLE,
>     242                JVMTI_EVENT_THREAD_END, NULL);*
>     243        if (err != JVMTI_ERROR_NONE) {
>     244            printf("Failed to disable JVMTI_EVENT_THREAD_END: %s 
> (%d)\n",
>     245                   TranslateError(err), err);
>     246            result = FAILED;
>     247        }
> 
> My recommendation is to change the first RawMonitorWait to use 0
> rather than WAIT_TIME. This means it will wait until notified.
> For the second RawMonitorWait, we should retry the wait if
> endsCount does not equal endsExpected.
> 
>     201        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
> 
> 
>     220        err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
> (jlong)WAIT_TIME);
>                if (endsCount != endsExpected) {
>                    err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 
> (jlong)WAIT_TIME * 2);
>                }
>     221        if (err != JVMTI_ERROR_NONE) {
> 
> So we remove the first race condition by removing the timeout on the 
> first wait call.
> And we accommodate a possible second race condition by retrying the 
> second wait with
> an extended delay.
> 
> An alternative to the retry could be to add a RawMonitorNotify call in 
> the thread
> end event callback ThreadEnd() after the endsCount has been incremented.
> 
> This test fails very infrequently, so it may be difficult to prove that 
> the specific
> problem has been resolved. I'll try some testing with the proposed changes
> and see if we can work towards getting this test off the ProblemList for 
> jdk12.
> 
> 


More information about the serviceability-dev mailing list