JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
Alex Menkov
alexey.menkov at oracle.com
Tue Aug 14 22:41:34 UTC 2018
Hi Gary,
I think check for RawMonitorWait error should be moved into the loop
or exit loop condition can be updated to
243 if (endsCount == endsExpected || err != JVMTI_ERROR_NONE) {
244 break;
245 }
--alex
On 08/14/2018 10:03, Gary Adams wrote:
> I did 1000 testruns with these changes. There were 3 failures due to
> "executor timeout"
> but those are not related to these changes (infrastructure).
>
> Webrev: http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
> Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
>
> On 8/12/18, 7:01 PM, David Holmes wrote:
>> Hi Gary,
>>
>> Good analysis. I agree with your reasoning.
>>
>> I think these tests were written in a way that assumed that we have to
>> locally timeout to detect things not happening "quickly enough" but as
>> we well know "quickly enough" is highly dependent on a number of
>> factors. And now we rely on the test framework to manage timeouts.
>>
>> I think removing the timeout from the first wait() and simply allowing
>> the framework timeouts to take effect is a good solution.
>>
>> Repeating the wait in the second case with twice the timeout is less
>> clear. One second wasn't long enough, but perhaps 3 seconds will be.
>> And it probably will but is not guaranteed. The real guarantee is to
>> wait until you see the expected outcome (either by polling using the
>> timed-wait, or else by inserting a notify elsewhere - which itself may
>> be tricky). And once you start pushing on the wait/notify logic the
>> whole structure of the test potentially changes as you either wait
>> until you get the expected result, or else get timed out by the
>> framework.
>>
>> I think waiting a bit longer will probably suffice and cause minimum
>> disruption to the test logic, but I'd be included to put the wait into
>> a for-loop and use 1-second timeouts per iteration ie.
>>
>> // wait for up to 3 seconds
>> for (int i = 0 ; i < 3 ; i++) {
>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
>> if (endsCount == endsExpected)
>> break;
>> }
>>
>> Thanks,
>> David
>> On 11/08/2018 4:15 AM, Gary Adams wrote:
>>> There's a fairly old bug concerning jvmti thread start and end events.
>>> I believe there are 2 possible race conditions in the threadstart003
>>> test based on some arbitrary timeout values used in the test itself.
>>>
>>> Here are a few snippets from the test to illustrate the problem area.
>>>
>>> In the initial sequence RunAgentThread is used to start a thread
>>> with the agentThreadProc entry point.
>>>
>>> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>> 189 if (err != JVMTI_ERROR_NONE) {
>>> 190 printf("(RawMonitorEnter) unexpected error: %s
>>> (%d)\n",
>>> 191 TranslateError(err), err);
>>> 192 result = FAILED;
>>> 193 }
>>>
>>> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr,
>>> agentThreadProc,*
>>> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
>>> 196 if (err != JVMTI_ERROR_NONE) {
>>> 197 printf("(RunAgentThread) unexpected error: %s
>>> (%d)\n",
>>> 198 TranslateError(err), err);
>>> 199 result = FAILED;
>>> 200 }
>>>
>>> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>> (jlong)WAIT_TIME);*
>>> 202 if (err != JVMTI_ERROR_NONE) {
>>> 203 printf("(RawMonitorWait) unexpected error: %s
>>> (%d)\n",
>>> 204 TranslateError(err), err);
>>> 205 result = FAILED;
>>> 206 }
>>> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>> 208 if (err != JVMTI_ERROR_NONE) {
>>> 209 printf("(RawMonitorExit) unexpected error: %s
>>> (%d)\n",
>>> 210 TranslateError(err), err);
>>> 211 result = FAILED;
>>> 212 }
>>>
>>> I believe the test expects to wake up based on the code in
>>> agentThreadProc
>>> notification of the wait_lock.
>>>
>>> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>> 122 if (err != JVMTI_ERROR_NONE) {
>>> 123 printf("(RawMonitorEnter) unexpected error: %s
>>> (%d)\n",
>>> 124 TranslateError(err), err);
>>> 125 result = FAILED;
>>> 126 }
>>> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>>> 128 if (err != JVMTI_ERROR_NONE) {
>>> 129 printf("(RawMonitorNotify) unexpected error: %s
>>> (%d)\n",
>>> 130 TranslateError(err), err);
>>> 131 result = FAILED;
>>> 132 }
>>> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>> 134 if (err != JVMTI_ERROR_NONE) {
>>> 135 printf("(RawMonitorExit) unexpected error: %s
>>> (%d)\n",
>>> 136 TranslateError(err), err);
>>> 137 result = FAILED;
>>> 138 }
>>>
>>> A second block for the wait_lock does not have a corresponding
>>> notify call. I'm guessing the monitor is expected to time out,
>>> so it allows for one second to complete the delivery of the
>>> thread end event.
>>>
>>> 213
>>> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>>> 215 if (err != JVMTI_ERROR_NONE) {
>>> 216 printf("(RawMonitorEnter) unexpected error: %s
>>> (%d)\n",
>>> 217 TranslateError(err), err);
>>> 218 result = FAILED;
>>> 219 }
>>> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>> (jlong)WAIT_TIME);*
>>> 221 if (err != JVMTI_ERROR_NONE) {
>>> 222 printf("(RawMonitorWait) unexpected error: %s
>>> (%d)\n",
>>> 223 TranslateError(err), err);
>>> 224 result = FAILED;
>>> 225 }
>>> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>>> 227 if (err != JVMTI_ERROR_NONE) {
>>> 228 printf("(RawMonitorExit) unexpected error: %s
>>> (%d)\n",
>>> 229 TranslateError(err), err);
>>> 230 result = FAILED;
>>> 231 }
>>>
>>> I see 2 possible race conditions here.
>>>
>>> - if the thread is slow to start, then the first RawMonitorWait
>>> could timeout. The code would continue to the second RawMonitorWait
>>> which would wake up from the agentThreadProc call to
>>> RawMonitorNotify.
>>>
>>> - if the thread ending is slow to complete then the second
>>> RawMonitorWait
>>> could timeout before the thread end event is processed.
>>>
>>> After the second RawMonitorWait completes, the event notification is
>>> disabled
>>> for both thread start and end events. If we disable the end thread
>>> event before
>>> the thread end event is delivered the counts will not match as expected.
>>>
>>> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>> JVMTI_DISABLE,
>>> 234 JVMTI_EVENT_THREAD_START, NULL);
>>> 235 if (err != JVMTI_ERROR_NONE) {
>>> 236 printf("Failed to disable
>>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>>> 237 TranslateError(err), err);
>>> 238 result = FAILED;
>>> 239 }
>>> 240 *
>>> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
>>> JVMTI_DISABLE,
>>> 242 JVMTI_EVENT_THREAD_END, NULL);*
>>> 243 if (err != JVMTI_ERROR_NONE) {
>>> 244 printf("Failed to disable JVMTI_EVENT_THREAD_END:
>>> %s (%d)\n",
>>> 245 TranslateError(err), err);
>>> 246 result = FAILED;
>>> 247 }
>>>
>>> My recommendation is to change the first RawMonitorWait to use 0
>>> rather than WAIT_TIME. This means it will wait until notified.
>>> For the second RawMonitorWait, we should retry the wait if
>>> endsCount does not equal endsExpected.
>>>
>>> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>>
>>>
>>> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>> (jlong)WAIT_TIME);
>>> if (endsCount != endsExpected) {
>>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>>> (jlong)WAIT_TIME * 2);
>>> }
>>> 221 if (err != JVMTI_ERROR_NONE) {
>>>
>>> So we remove the first race condition by removing the timeout on the
>>> first wait call.
>>> And we accommodate a possible second race condition by retrying the
>>> second wait with
>>> an extended delay.
>>>
>>> An alternative to the retry could be to add a RawMonitorNotify call
>>> in the thread
>>> end event callback ThreadEnd() after the endsCount has been incremented.
>>>
>>> This test fails very infrequently, so it may be difficult to prove
>>> that the specific
>>> problem has been resolved. I'll try some testing with the proposed
>>> changes
>>> and see if we can work towards getting this test off the ProblemList
>>> for jdk12.
>>>
>>>
>
More information about the serviceability-dev
mailing list