JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
David Holmes
david.holmes at oracle.com
Sun Aug 12 23:01:38 UTC 2018
Hi Gary,
Good analysis. I agree with your reasoning.
I think these tests were written in a way that assumed that we have to
locally timeout to detect things not happening "quickly enough" but as
we well know "quickly enough" is highly dependent on a number of
factors. And now we rely on the test framework to manage timeouts.
I think removing the timeout from the first wait() and simply allowing
the framework timeouts to take effect is a good solution.
Repeating the wait in the second case with twice the timeout is less
clear. One second wasn't long enough, but perhaps 3 seconds will be. And
it probably will but is not guaranteed. The real guarantee is to wait
until you see the expected outcome (either by polling using the
timed-wait, or else by inserting a notify elsewhere - which itself may
be tricky). And once you start pushing on the wait/notify logic the
whole structure of the test potentially changes as you either wait until
you get the expected result, or else get timed out by the framework.
I think waiting a bit longer will probably suffice and cause minimum
disruption to the test logic, but I'd be included to put the wait into a
for-loop and use 1-second timeouts per iteration ie.
// wait for up to 3 seconds
for (int i = 0 ; i < 3 ; i++) {
err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
if (endsCount == endsExpected)
break;
}
Thanks,
David
On 11/08/2018 4:15 AM, Gary Adams wrote:
> There's a fairly old bug concerning jvmti thread start and end events.
> I believe there are 2 possible race conditions in the threadstart003
> test based on some arbitrary timeout values used in the test itself.
>
> Here are a few snippets from the test to illustrate the problem area.
>
> In the initial sequence RunAgentThread is used to start a thread
> with the agentThreadProc entry point.
>
> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
> 189 if (err != JVMTI_ERROR_NONE) {
> 190 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
> 191 TranslateError(err), err);
> 192 result = FAILED;
> 193 }
>
> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr, agentThreadProc,*
> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
> 196 if (err != JVMTI_ERROR_NONE) {
> 197 printf("(RunAgentThread) unexpected error: %s (%d)\n",
> 198 TranslateError(err), err);
> 199 result = FAILED;
> 200 }
>
> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
> (jlong)WAIT_TIME);*
> 202 if (err != JVMTI_ERROR_NONE) {
> 203 printf("(RawMonitorWait) unexpected error: %s (%d)\n",
> 204 TranslateError(err), err);
> 205 result = FAILED;
> 206 }
> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
> 208 if (err != JVMTI_ERROR_NONE) {
> 209 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
> 210 TranslateError(err), err);
> 211 result = FAILED;
> 212 }
>
> I believe the test expects to wake up based on the code in agentThreadProc
> notification of the wait_lock.
>
> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
> 122 if (err != JVMTI_ERROR_NONE) {
> 123 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
> 124 TranslateError(err), err);
> 125 result = FAILED;
> 126 }
> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
> 128 if (err != JVMTI_ERROR_NONE) {
> 129 printf("(RawMonitorNotify) unexpected error: %s (%d)\n",
> 130 TranslateError(err), err);
> 131 result = FAILED;
> 132 }
> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
> 134 if (err != JVMTI_ERROR_NONE) {
> 135 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
> 136 TranslateError(err), err);
> 137 result = FAILED;
> 138 }
>
> A second block for the wait_lock does not have a corresponding
> notify call. I'm guessing the monitor is expected to time out,
> so it allows for one second to complete the delivery of the
> thread end event.
>
> 213
> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
> 215 if (err != JVMTI_ERROR_NONE) {
> 216 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
> 217 TranslateError(err), err);
> 218 result = FAILED;
> 219 }
> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
> (jlong)WAIT_TIME);*
> 221 if (err != JVMTI_ERROR_NONE) {
> 222 printf("(RawMonitorWait) unexpected error: %s (%d)\n",
> 223 TranslateError(err), err);
> 224 result = FAILED;
> 225 }
> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
> 227 if (err != JVMTI_ERROR_NONE) {
> 228 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
> 229 TranslateError(err), err);
> 230 result = FAILED;
> 231 }
>
> I see 2 possible race conditions here.
>
> - if the thread is slow to start, then the first RawMonitorWait
> could timeout. The code would continue to the second RawMonitorWait
> which would wake up from the agentThreadProc call to RawMonitorNotify.
>
> - if the thread ending is slow to complete then the second RawMonitorWait
> could timeout before the thread end event is processed.
>
> After the second RawMonitorWait completes, the event notification is
> disabled
> for both thread start and end events. If we disable the end thread event
> before
> the thread end event is delivered the counts will not match as expected.
>
> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
> JVMTI_DISABLE,
> 234 JVMTI_EVENT_THREAD_START, NULL);
> 235 if (err != JVMTI_ERROR_NONE) {
> 236 printf("Failed to disable JVMTI_EVENT_THREAD_START:
> %s (%d)\n",
> 237 TranslateError(err), err);
> 238 result = FAILED;
> 239 }
> 240 *
> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
> JVMTI_DISABLE,
> 242 JVMTI_EVENT_THREAD_END, NULL);*
> 243 if (err != JVMTI_ERROR_NONE) {
> 244 printf("Failed to disable JVMTI_EVENT_THREAD_END: %s
> (%d)\n",
> 245 TranslateError(err), err);
> 246 result = FAILED;
> 247 }
>
> My recommendation is to change the first RawMonitorWait to use 0
> rather than WAIT_TIME. This means it will wait until notified.
> For the second RawMonitorWait, we should retry the wait if
> endsCount does not equal endsExpected.
>
> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>
>
> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
> (jlong)WAIT_TIME);
> if (endsCount != endsExpected) {
> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
> (jlong)WAIT_TIME * 2);
> }
> 221 if (err != JVMTI_ERROR_NONE) {
>
> So we remove the first race condition by removing the timeout on the
> first wait call.
> And we accommodate a possible second race condition by retrying the
> second wait with
> an extended delay.
>
> An alternative to the retry could be to add a RawMonitorNotify call in
> the thread
> end event callback ThreadEnd() after the endsCount has been incremented.
>
> This test fails very infrequently, so it may be difficult to prove that
> the specific
> problem has been resolved. I'll try some testing with the proposed changes
> and see if we can work towards getting this test off the ProblemList for
> jdk12.
>
>
More information about the serviceability-dev
mailing list