JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
Alex Menkov
alexey.menkov at oracle.com
Wed Aug 22 17:34:04 UTC 2018
looks good.
--alex
On 08/22/2018 09:54, Gary Adams wrote:
> Updated webrev:
> http://cr.openjdk.java.net/~gadams/8034084/webrev.01/index.html
>
> On 8/15/18, 8:35 PM, David Holmes wrote:
>> +1
>>
>> Thanks,
>> David
>>
>> On 15/08/2018 8:41 AM, Alex Menkov wrote:
>>> Hi Gary,
>>>
>>> I think check for RawMonitorWait error should be moved into the loop
>>> or exit loop condition can be updated to
>>>
>>> 243 if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
>>> 244 break;
>>> 245 }
>>>
>>> --alex
>>>
>>> On 08/14/2018 10:03, Gary Adams wrote:
>>>> I did 1000 testruns with these changes. There were 3 failures due to
>>>> "executor timeout"
>>>> but those are not related to these changes (infrastructure).
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>>>
>>>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>>>> Hi Gary,
>>>>>
>>>>> Good analysis. I agree with your reasoning.
>>>>>
>>>>> I think these tests were written in a way that assumed that we have
>>>>> to locally timeout to detect things not happening "quickly enough"
>>>>> but as we well know "quickly enough" is highly dependent on a
>>>>> number of factors. And now we rely on the test framework to manage
>>>>> timeouts.
>>>>>
>>>>> I think removing the timeout from the first wait() and simply
>>>>> allowing the framework timeouts to take effect is a good solution.
>>>>>
>>>>> Repeating the wait in the second case with twice the timeout is
>>>>> less clear. One second wasn't long enough, but perhaps 3 seconds
>>>>> will be. And it probably will but is not guaranteed. The real
>>>>> guarantee is to wait until you see the expected outcome (either by
>>>>> polling using the timed-wait, or else by inserting a notify
>>>>> elsewhere - which itself may be tricky). And once you start pushing
>>>>> on the wait/notify logic the whole structure of the test
>>>>> potentially changes as you either wait until you get the expected
>>>>> result, or else get timed out by the framework.
>>>>>
>>>>> I think waiting a bit longer will probably suffice and cause
>>>>> minimum disruption to the test logic, but I'd be included to put
>>>>> the wait into a for-loop and use 1-second timeouts per iteration ie.
>>>>>
>>>>> // wait for up to 3 seconds
>>>>> for (int i = 0 ; i < 3 ; i++) {
>>>>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>>>> if (endsCount == endsExpected)
>>>>> break;
>>>>> }
>>>>>
>>>>> Thanks,
>>>>> David
>>>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>>>> There's a fairly old bug concerning jvmti thread start and end
>>>>>> events.
>>>>>> I believe there are 2 possible race conditions in the threadstart003
>>>>>> test based on some arbitrary timeout values used in the test itself.
>>>>>>
>>>>>> Here are a few snippets from the test to illustrate the problem area.
>>>>>>
>>>>>> In the initial sequence RunAgentThread is used to start a thread
>>>>>> with the agentThreadProc entry point.
>>>>>>
>>>>>> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>> 189 if (err != JVMTI_ERROR_NONE) {
>>>>>> 190 printf("(RawMonitorEnter) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 191 TranslateError(err), err);
>>>>>> 192 result = FAILED;
>>>>>> 193 }
>>>>>>
>>>>>> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr,
>>>>>> agentThreadProc,*
>>>>>> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>>>> 196 if (err != JVMTI_ERROR_NONE) {
>>>>>> 197 printf("(RunAgentThread) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 198 TranslateError(err), err);
>>>>>> 199 result = FAILED;
>>>>>> 200 }
>>>>>>
>>>>>> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>>> (jlong)WAIT_TIME);*
>>>>>> 202 if (err != JVMTI_ERROR_NONE) {
>>>>>> 203 printf("(RawMonitorWait) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 204 TranslateError(err), err);
>>>>>> 205 result = FAILED;
>>>>>> 206 }
>>>>>> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>> 208 if (err != JVMTI_ERROR_NONE) {
>>>>>> 209 printf("(RawMonitorExit) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 210 TranslateError(err), err);
>>>>>> 211 result = FAILED;
>>>>>> 212 }
>>>>>>
>>>>>> I believe the test expects to wake up based on the code in
>>>>>> agentThreadProc
>>>>>> notification of the wait_lock.
>>>>>>
>>>>>> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>> 122 if (err != JVMTI_ERROR_NONE) {
>>>>>> 123 printf("(RawMonitorEnter) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 124 TranslateError(err), err);
>>>>>> 125 result = FAILED;
>>>>>> 126 }
>>>>>> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>>>> 128 if (err != JVMTI_ERROR_NONE) {
>>>>>> 129 printf("(RawMonitorNotify) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 130 TranslateError(err), err);
>>>>>> 131 result = FAILED;
>>>>>> 132 }
>>>>>> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>> 134 if (err != JVMTI_ERROR_NONE) {
>>>>>> 135 printf("(RawMonitorExit) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 136 TranslateError(err), err);
>>>>>> 137 result = FAILED;
>>>>>> 138 }
>>>>>>
>>>>>> A second block for the wait_lock does not have a corresponding
>>>>>> notify call. I'm guessing the monitor is expected to time out,
>>>>>> so it allows for one second to complete the delivery of the
>>>>>> thread end event.
>>>>>>
>>>>>> 213
>>>>>> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>>> 215 if (err != JVMTI_ERROR_NONE) {
>>>>>> 216 printf("(RawMonitorEnter) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 217 TranslateError(err), err);
>>>>>> 218 result = FAILED;
>>>>>> 219 }
>>>>>> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>>> (jlong)WAIT_TIME);*
>>>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>>>> 222 printf("(RawMonitorWait) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 223 TranslateError(err), err);
>>>>>> 224 result = FAILED;
>>>>>> 225 }
>>>>>> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>>> 227 if (err != JVMTI_ERROR_NONE) {
>>>>>> 228 printf("(RawMonitorExit) unexpected error: %s
>>>>>> (%d)\n",
>>>>>> 229 TranslateError(err), err);
>>>>>> 230 result = FAILED;
>>>>>> 231 }
>>>>>>
>>>>>> I see 2 possible race conditions here.
>>>>>>
>>>>>> - if the thread is slow to start, then the first RawMonitorWait
>>>>>> could timeout. The code would continue to the second
>>>>>> RawMonitorWait
>>>>>> which would wake up from the agentThreadProc call to
>>>>>> RawMonitorNotify.
>>>>>>
>>>>>> - if the thread ending is slow to complete then the second
>>>>>> RawMonitorWait
>>>>>> could timeout before the thread end event is processed.
>>>>>>
>>>>>> After the second RawMonitorWait completes, the event notification
>>>>>> is disabled
>>>>>> for both thread start and end events. If we disable the end thread
>>>>>> event before
>>>>>> the thread end event is delivered the counts will not match as
>>>>>> expected.
>>>>>>
>>>>>> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>>>> JVMTI_DISABLE,
>>>>>> 234 JVMTI_EVENT_THREAD_START, NULL);
>>>>>> 235 if (err != JVMTI_ERROR_NONE) {
>>>>>> 236 printf("Failed to disable
>>>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>>>> 237 TranslateError(err), err);
>>>>>> 238 result = FAILED;
>>>>>> 239 }
>>>>>> 240 *
>>>>>> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>>>> JVMTI_DISABLE,
>>>>>> 242 JVMTI_EVENT_THREAD_END, NULL);*
>>>>>> 243 if (err != JVMTI_ERROR_NONE) {
>>>>>> 244 printf("Failed to disable
>>>>>> JVMTI_EVENT_THREAD_END: %s (%d)\n",
>>>>>> 245 TranslateError(err), err);
>>>>>> 246 result = FAILED;
>>>>>> 247 }
>>>>>>
>>>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>>>> For the second RawMonitorWait, we should retry the wait if
>>>>>> endsCount does not equal endsExpected.
>>>>>>
>>>>>> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>>>
>>>>>>
>>>>>> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>>> (jlong)WAIT_TIME);
>>>>>> if (endsCount != endsExpected) {
>>>>>> err = (*jvmti)->RawMonitorWait(jvmti,
>>>>>> wait_lock, (jlong)WAIT_TIME * 2);
>>>>>> }
>>>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>>>>
>>>>>> So we remove the first race condition by removing the timeout on
>>>>>> the first wait call.
>>>>>> And we accommodate a possible second race condition by retrying
>>>>>> the second wait with
>>>>>> an extended delay.
>>>>>>
>>>>>> An alternative to the retry could be to add a RawMonitorNotify
>>>>>> call in the thread
>>>>>> end event callback ThreadEnd() after the endsCount has been
>>>>>> incremented.
>>>>>>
>>>>>> This test fails very infrequently, so it may be difficult to prove
>>>>>> that the specific
>>>>>> problem has been resolved. I'll try some testing with the proposed
>>>>>> changes
>>>>>> and see if we can work towards getting this test off the
>>>>>> ProblemList for jdk12.
>>>>>>
>>>>>>
>>>>
>
More information about the serviceability-dev
mailing list