JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
David Holmes
david.holmes at oracle.com
Thu Aug 16 00:35:31 UTC 2018
+1
Thanks,
David
On 15/08/2018 8:41 AM, Alex Menkov wrote:
> Hi Gary,
>
> I think check for RawMonitorWait error should be moved into the loop
> or exit loop condition can be updated to
>
> 243 if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
> 244 break;
> 245 }
>
> --alex
>
> On 08/14/2018 10:03, Gary Adams wrote:
>> I did 1000 testruns with these changes. There were 3 failures due to
>> "executor timeout"
>> but those are not related to these changes (infrastructure).
>>
>> Webrev:
>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>
>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>> Hi Gary,
>>>
>>> Good analysis. I agree with your reasoning.
>>>
>>> I think these tests were written in a way that assumed that we have
>>> to locally timeout to detect things not happening "quickly enough"
>>> but as we well know "quickly enough" is highly dependent on a number
>>> of factors. And now we rely on the test framework to manage timeouts.
>>>
>>> I think removing the timeout from the first wait() and simply
>>> allowing the framework timeouts to take effect is a good solution.
>>>
>>> Repeating the wait in the second case with twice the timeout is less
>>> clear. One second wasn't long enough, but perhaps 3 seconds will be.
>>> And it probably will but is not guaranteed. The real guarantee is to
>>> wait until you see the expected outcome (either by polling using the
>>> timed-wait, or else by inserting a notify elsewhere - which itself
>>> may be tricky). And once you start pushing on the wait/notify logic
>>> the whole structure of the test potentially changes as you either
>>> wait until you get the expected result, or else get timed out by the
>>> framework.
>>>
>>> I think waiting a bit longer will probably suffice and cause minimum
>>> disruption to the test logic, but I'd be included to put the wait
>>> into a for-loop and use 1-second timeouts per iteration ie.
>>>
>>> // wait for up to 3 seconds
>>> for (int i = 0 ; i < 3 ; i++) {
>>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>> if (endsCount == endsExpected)
>>> break;
>>> }
>>>
>>> Thanks,
>>> David
>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>> There's a fairly old bug concerning jvmti thread start and end events.
>>>> I believe there are 2 possible race conditions in the threadstart003
>>>> test based on some arbitrary timeout values used in the test itself.
>>>>
>>>> Here are a few snippets from the test to illustrate the problem area.
>>>>
>>>> In the initial sequence RunAgentThread is used to start a thread
>>>> with the agentThreadProc entry point.
>>>>
>>>> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>> 189 if (err != JVMTI_ERROR_NONE) {
>>>> 190 printf("(RawMonitorEnter) unexpected error: %s
>>>> (%d)\n",
>>>> 191 TranslateError(err), err);
>>>> 192 result = FAILED;
>>>> 193 }
>>>>
>>>> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr,
>>>> agentThreadProc,*
>>>> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>> 196 if (err != JVMTI_ERROR_NONE) {
>>>> 197 printf("(RunAgentThread) unexpected error: %s
>>>> (%d)\n",
>>>> 198 TranslateError(err), err);
>>>> 199 result = FAILED;
>>>> 200 }
>>>>
>>>> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>> (jlong)WAIT_TIME);*
>>>> 202 if (err != JVMTI_ERROR_NONE) {
>>>> 203 printf("(RawMonitorWait) unexpected error: %s
>>>> (%d)\n",
>>>> 204 TranslateError(err), err);
>>>> 205 result = FAILED;
>>>> 206 }
>>>> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>> 208 if (err != JVMTI_ERROR_NONE) {
>>>> 209 printf("(RawMonitorExit) unexpected error: %s
>>>> (%d)\n",
>>>> 210 TranslateError(err), err);
>>>> 211 result = FAILED;
>>>> 212 }
>>>>
>>>> I believe the test expects to wake up based on the code in
>>>> agentThreadProc
>>>> notification of the wait_lock.
>>>>
>>>> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>> 122 if (err != JVMTI_ERROR_NONE) {
>>>> 123 printf("(RawMonitorEnter) unexpected error: %s
>>>> (%d)\n",
>>>> 124 TranslateError(err), err);
>>>> 125 result = FAILED;
>>>> 126 }
>>>> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>> 128 if (err != JVMTI_ERROR_NONE) {
>>>> 129 printf("(RawMonitorNotify) unexpected error: %s
>>>> (%d)\n",
>>>> 130 TranslateError(err), err);
>>>> 131 result = FAILED;
>>>> 132 }
>>>> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>> 134 if (err != JVMTI_ERROR_NONE) {
>>>> 135 printf("(RawMonitorExit) unexpected error: %s
>>>> (%d)\n",
>>>> 136 TranslateError(err), err);
>>>> 137 result = FAILED;
>>>> 138 }
>>>>
>>>> A second block for the wait_lock does not have a corresponding
>>>> notify call. I'm guessing the monitor is expected to time out,
>>>> so it allows for one second to complete the delivery of the
>>>> thread end event.
>>>>
>>>> 213
>>>> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>> 215 if (err != JVMTI_ERROR_NONE) {
>>>> 216 printf("(RawMonitorEnter) unexpected error: %s
>>>> (%d)\n",
>>>> 217 TranslateError(err), err);
>>>> 218 result = FAILED;
>>>> 219 }
>>>> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>> (jlong)WAIT_TIME);*
>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>> 222 printf("(RawMonitorWait) unexpected error: %s
>>>> (%d)\n",
>>>> 223 TranslateError(err), err);
>>>> 224 result = FAILED;
>>>> 225 }
>>>> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>> 227 if (err != JVMTI_ERROR_NONE) {
>>>> 228 printf("(RawMonitorExit) unexpected error: %s
>>>> (%d)\n",
>>>> 229 TranslateError(err), err);
>>>> 230 result = FAILED;
>>>> 231 }
>>>>
>>>> I see 2 possible race conditions here.
>>>>
>>>> - if the thread is slow to start, then the first RawMonitorWait
>>>> could timeout. The code would continue to the second
>>>> RawMonitorWait
>>>> which would wake up from the agentThreadProc call to
>>>> RawMonitorNotify.
>>>>
>>>> - if the thread ending is slow to complete then the second
>>>> RawMonitorWait
>>>> could timeout before the thread end event is processed.
>>>>
>>>> After the second RawMonitorWait completes, the event notification is
>>>> disabled
>>>> for both thread start and end events. If we disable the end thread
>>>> event before
>>>> the thread end event is delivered the counts will not match as
>>>> expected.
>>>>
>>>> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>> JVMTI_DISABLE,
>>>> 234 JVMTI_EVENT_THREAD_START, NULL);
>>>> 235 if (err != JVMTI_ERROR_NONE) {
>>>> 236 printf("Failed to disable
>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>> 237 TranslateError(err), err);
>>>> 238 result = FAILED;
>>>> 239 }
>>>> 240 *
>>>> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>> JVMTI_DISABLE,
>>>> 242 JVMTI_EVENT_THREAD_END, NULL);*
>>>> 243 if (err != JVMTI_ERROR_NONE) {
>>>> 244 printf("Failed to disable JVMTI_EVENT_THREAD_END:
>>>> %s (%d)\n",
>>>> 245 TranslateError(err), err);
>>>> 246 result = FAILED;
>>>> 247 }
>>>>
>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>> For the second RawMonitorWait, we should retry the wait if
>>>> endsCount does not equal endsExpected.
>>>>
>>>> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>
>>>>
>>>> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>> (jlong)WAIT_TIME);
>>>> if (endsCount != endsExpected) {
>>>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>> (jlong)WAIT_TIME * 2);
>>>> }
>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>>
>>>> So we remove the first race condition by removing the timeout on the
>>>> first wait call.
>>>> And we accommodate a possible second race condition by retrying the
>>>> second wait with
>>>> an extended delay.
>>>>
>>>> An alternative to the retry could be to add a RawMonitorNotify call
>>>> in the thread
>>>> end event callback ThreadEnd() after the endsCount has been
>>>> incremented.
>>>>
>>>> This test fails very infrequently, so it may be difficult to prove
>>>> that the specific
>>>> problem has been resolved. I'll try some testing with the proposed
>>>> changes
>>>> and see if we can work towards getting this test off the ProblemList
>>>> for jdk12.
>>>>
>>>>
>>
More information about the serviceability-dev
mailing list