JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
Gary Adams
gary.adams at oracle.com
Wed Aug 22 16:54:01 UTC 2018
Updated webrev:
http://cr.openjdk.java.net/~gadams/8034084/webrev.01/index.html
On 8/15/18, 8:35 PM, David Holmes wrote:
> +1
>
> Thanks,
> David
>
> On 15/08/2018 8:41 AM, Alex Menkov wrote:
>> Hi Gary,
>>
>> I think check for RawMonitorWait error should be moved into the loop
>> or exit loop condition can be updated to
>>
>> 243 if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
>> 244 break;
>> 245 }
>>
>> --alex
>>
>> On 08/14/2018 10:03, Gary Adams wrote:
>>> I did 1000 testruns with these changes. There were 3 failures due to
>>> "executor timeout"
>>> but those are not related to these changes (infrastructure).
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>>>
>>> On 8/12/18, 7:01 PM, David Holmes wrote:
>>>> Hi Gary,
>>>>
>>>> Good analysis. I agree with your reasoning.
>>>>
>>>> I think these tests were written in a way that assumed that we have
>>>> to locally timeout to detect things not happening "quickly enough"
>>>> but as we well know "quickly enough" is highly dependent on a
>>>> number of factors. And now we rely on the test framework to manage
>>>> timeouts.
>>>>
>>>> I think removing the timeout from the first wait() and simply
>>>> allowing the framework timeouts to take effect is a good solution.
>>>>
>>>> Repeating the wait in the second case with twice the timeout is
>>>> less clear. One second wasn't long enough, but perhaps 3 seconds
>>>> will be. And it probably will but is not guaranteed. The real
>>>> guarantee is to wait until you see the expected outcome (either by
>>>> polling using the timed-wait, or else by inserting a notify
>>>> elsewhere - which itself may be tricky). And once you start pushing
>>>> on the wait/notify logic the whole structure of the test
>>>> potentially changes as you either wait until you get the expected
>>>> result, or else get timed out by the framework.
>>>>
>>>> I think waiting a bit longer will probably suffice and cause
>>>> minimum disruption to the test logic, but I'd be included to put
>>>> the wait into a for-loop and use 1-second timeouts per iteration ie.
>>>>
>>>> // wait for up to 3 seconds
>>>> for (int i = 0 ; i < 3 ; i++) {
>>>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>>>> if (endsCount == endsExpected)
>>>> break;
>>>> }
>>>>
>>>> Thanks,
>>>> David
>>>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>>>> There's a fairly old bug concerning jvmti thread start and end
>>>>> events.
>>>>> I believe there are 2 possible race conditions in the threadstart003
>>>>> test based on some arbitrary timeout values used in the test itself.
>>>>>
>>>>> Here are a few snippets from the test to illustrate the problem area.
>>>>>
>>>>> In the initial sequence RunAgentThread is used to start a thread
>>>>> with the agentThreadProc entry point.
>>>>>
>>>>> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>> 189 if (err != JVMTI_ERROR_NONE) {
>>>>> 190 printf("(RawMonitorEnter) unexpected error: %s
>>>>> (%d)\n",
>>>>> 191 TranslateError(err), err);
>>>>> 192 result = FAILED;
>>>>> 193 }
>>>>>
>>>>> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr,
>>>>> agentThreadProc,*
>>>>> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
>>>>> 196 if (err != JVMTI_ERROR_NONE) {
>>>>> 197 printf("(RunAgentThread) unexpected error: %s
>>>>> (%d)\n",
>>>>> 198 TranslateError(err), err);
>>>>> 199 result = FAILED;
>>>>> 200 }
>>>>>
>>>>> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>> (jlong)WAIT_TIME);*
>>>>> 202 if (err != JVMTI_ERROR_NONE) {
>>>>> 203 printf("(RawMonitorWait) unexpected error: %s
>>>>> (%d)\n",
>>>>> 204 TranslateError(err), err);
>>>>> 205 result = FAILED;
>>>>> 206 }
>>>>> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>> 208 if (err != JVMTI_ERROR_NONE) {
>>>>> 209 printf("(RawMonitorExit) unexpected error: %s
>>>>> (%d)\n",
>>>>> 210 TranslateError(err), err);
>>>>> 211 result = FAILED;
>>>>> 212 }
>>>>>
>>>>> I believe the test expects to wake up based on the code in
>>>>> agentThreadProc
>>>>> notification of the wait_lock.
>>>>>
>>>>> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>> 122 if (err != JVMTI_ERROR_NONE) {
>>>>> 123 printf("(RawMonitorEnter) unexpected error: %s
>>>>> (%d)\n",
>>>>> 124 TranslateError(err), err);
>>>>> 125 result = FAILED;
>>>>> 126 }
>>>>> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>>>> 128 if (err != JVMTI_ERROR_NONE) {
>>>>> 129 printf("(RawMonitorNotify) unexpected error: %s
>>>>> (%d)\n",
>>>>> 130 TranslateError(err), err);
>>>>> 131 result = FAILED;
>>>>> 132 }
>>>>> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>> 134 if (err != JVMTI_ERROR_NONE) {
>>>>> 135 printf("(RawMonitorExit) unexpected error: %s
>>>>> (%d)\n",
>>>>> 136 TranslateError(err), err);
>>>>> 137 result = FAILED;
>>>>> 138 }
>>>>>
>>>>> A second block for the wait_lock does not have a corresponding
>>>>> notify call. I'm guessing the monitor is expected to time out,
>>>>> so it allows for one second to complete the delivery of the
>>>>> thread end event.
>>>>>
>>>>> 213
>>>>> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>>>> 215 if (err != JVMTI_ERROR_NONE) {
>>>>> 216 printf("(RawMonitorEnter) unexpected error: %s
>>>>> (%d)\n",
>>>>> 217 TranslateError(err), err);
>>>>> 218 result = FAILED;
>>>>> 219 }
>>>>> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>> (jlong)WAIT_TIME);*
>>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>>> 222 printf("(RawMonitorWait) unexpected error: %s
>>>>> (%d)\n",
>>>>> 223 TranslateError(err), err);
>>>>> 224 result = FAILED;
>>>>> 225 }
>>>>> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>>>> 227 if (err != JVMTI_ERROR_NONE) {
>>>>> 228 printf("(RawMonitorExit) unexpected error: %s
>>>>> (%d)\n",
>>>>> 229 TranslateError(err), err);
>>>>> 230 result = FAILED;
>>>>> 231 }
>>>>>
>>>>> I see 2 possible race conditions here.
>>>>>
>>>>> - if the thread is slow to start, then the first RawMonitorWait
>>>>> could timeout. The code would continue to the second
>>>>> RawMonitorWait
>>>>> which would wake up from the agentThreadProc call to
>>>>> RawMonitorNotify.
>>>>>
>>>>> - if the thread ending is slow to complete then the second
>>>>> RawMonitorWait
>>>>> could timeout before the thread end event is processed.
>>>>>
>>>>> After the second RawMonitorWait completes, the event notification
>>>>> is disabled
>>>>> for both thread start and end events. If we disable the end thread
>>>>> event before
>>>>> the thread end event is delivered the counts will not match as
>>>>> expected.
>>>>>
>>>>> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>>> JVMTI_DISABLE,
>>>>> 234 JVMTI_EVENT_THREAD_START, NULL);
>>>>> 235 if (err != JVMTI_ERROR_NONE) {
>>>>> 236 printf("Failed to disable
>>>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>>>> 237 TranslateError(err), err);
>>>>> 238 result = FAILED;
>>>>> 239 }
>>>>> 240 *
>>>>> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>>>> JVMTI_DISABLE,
>>>>> 242 JVMTI_EVENT_THREAD_END, NULL);*
>>>>> 243 if (err != JVMTI_ERROR_NONE) {
>>>>> 244 printf("Failed to disable
>>>>> JVMTI_EVENT_THREAD_END: %s (%d)\n",
>>>>> 245 TranslateError(err), err);
>>>>> 246 result = FAILED;
>>>>> 247 }
>>>>>
>>>>> My recommendation is to change the first RawMonitorWait to use 0
>>>>> rather than WAIT_TIME. This means it will wait until notified.
>>>>> For the second RawMonitorWait, we should retry the wait if
>>>>> endsCount does not equal endsExpected.
>>>>>
>>>>> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>>>
>>>>>
>>>>> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>>>> (jlong)WAIT_TIME);
>>>>> if (endsCount != endsExpected) {
>>>>> err = (*jvmti)->RawMonitorWait(jvmti,
>>>>> wait_lock, (jlong)WAIT_TIME * 2);
>>>>> }
>>>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>>>
>>>>> So we remove the first race condition by removing the timeout on
>>>>> the first wait call.
>>>>> And we accommodate a possible second race condition by retrying
>>>>> the second wait with
>>>>> an extended delay.
>>>>>
>>>>> An alternative to the retry could be to add a RawMonitorNotify
>>>>> call in the thread
>>>>> end event callback ThreadEnd() after the endsCount has been
>>>>> incremented.
>>>>>
>>>>> This test fails very infrequently, so it may be difficult to prove
>>>>> that the specific
>>>>> problem has been resolved. I'll try some testing with the proposed
>>>>> changes
>>>>> and see if we can work towards getting this test off the
>>>>> ProblemList for jdk12.
>>>>>
>>>>>
>>>
More information about the serviceability-dev
mailing list