JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
Gary Adams
gary.adams at oracle.com
Tue Aug 14 17:03:19 UTC 2018
I did 1000 testruns with these changes. There were 3 failures due to
"executor timeout"
but those are not related to these changes (infrastructure).
Webrev: http://cr.openjdk.java.net/~gadams/8034084/webrev.00/index.html
Issue: https://bugs.openjdk.java.net/browse/JDK-8034084
On 8/12/18, 7:01 PM, David Holmes wrote:
> Hi Gary,
>
> Good analysis. I agree with your reasoning.
>
> I think these tests were written in a way that assumed that we have to
> locally timeout to detect things not happening "quickly enough" but as
> we well know "quickly enough" is highly dependent on a number of
> factors. And now we rely on the test framework to manage timeouts.
>
> I think removing the timeout from the first wait() and simply allowing
> the framework timeouts to take effect is a good solution.
>
> Repeating the wait in the second case with twice the timeout is less
> clear. One second wasn't long enough, but perhaps 3 seconds will be.
> And it probably will but is not guaranteed. The real guarantee is to
> wait until you see the expected outcome (either by polling using the
> timed-wait, or else by inserting a notify elsewhere - which itself may
> be tricky). And once you start pushing on the wait/notify logic the
> whole structure of the test potentially changes as you either wait
> until you get the expected result, or else get timed out by the
> framework.
>
> I think waiting a bit longer will probably suffice and cause minimum
> disruption to the test logic, but I'd be included to put the wait into
> a for-loop and use 1-second timeouts per iteration ie.
>
> // wait for up to 3 seconds
> for (int i = 0 ; i < 3 ; i++) {
> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, (jlong)WAIT_TIME);
> if (endsCount == endsExpected)
> break;
> }
>
> Thanks,
> David
> On 11/08/2018 4:15 AM, Gary Adams wrote:
>> There's a fairly old bug concerning jvmti thread start and end events.
>> I believe there are 2 possible race conditions in the threadstart003
>> test based on some arbitrary timeout values used in the test itself.
>>
>> Here are a few snippets from the test to illustrate the problem area.
>>
>> In the initial sequence RunAgentThread is used to start a thread
>> with the agentThreadProc entry point.
>>
>> 188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>> 189 if (err != JVMTI_ERROR_NONE) {
>> 190 printf("(RawMonitorEnter) unexpected error: %s
>> (%d)\n",
>> 191 TranslateError(err), err);
>> 192 result = FAILED;
>> 193 }
>>
>> * 194 err = (*jvmti)->RunAgentThread(jvmti, thr,
>> agentThreadProc,*
>> 195 NULL, JVMTI_THREAD_MAX_PRIORITY);
>> 196 if (err != JVMTI_ERROR_NONE) {
>> 197 printf("(RunAgentThread) unexpected error: %s
>> (%d)\n",
>> 198 TranslateError(err), err);
>> 199 result = FAILED;
>> 200 }
>>
>> * 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>> (jlong)WAIT_TIME);*
>> 202 if (err != JVMTI_ERROR_NONE) {
>> 203 printf("(RawMonitorWait) unexpected error: %s
>> (%d)\n",
>> 204 TranslateError(err), err);
>> 205 result = FAILED;
>> 206 }
>> 207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>> 208 if (err != JVMTI_ERROR_NONE) {
>> 209 printf("(RawMonitorExit) unexpected error: %s
>> (%d)\n",
>> 210 TranslateError(err), err);
>> 211 result = FAILED;
>> 212 }
>>
>> I believe the test expects to wake up based on the code in
>> agentThreadProc
>> notification of the wait_lock.
>>
>> 121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>> 122 if (err != JVMTI_ERROR_NONE) {
>> 123 printf("(RawMonitorEnter) unexpected error: %s
>> (%d)\n",
>> 124 TranslateError(err), err);
>> 125 result = FAILED;
>> 126 }
>> * 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
>> 128 if (err != JVMTI_ERROR_NONE) {
>> 129 printf("(RawMonitorNotify) unexpected error: %s
>> (%d)\n",
>> 130 TranslateError(err), err);
>> 131 result = FAILED;
>> 132 }
>> 133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>> 134 if (err != JVMTI_ERROR_NONE) {
>> 135 printf("(RawMonitorExit) unexpected error: %s
>> (%d)\n",
>> 136 TranslateError(err), err);
>> 137 result = FAILED;
>> 138 }
>>
>> A second block for the wait_lock does not have a corresponding
>> notify call. I'm guessing the monitor is expected to time out,
>> so it allows for one second to complete the delivery of the
>> thread end event.
>>
>> 213
>> 214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
>> 215 if (err != JVMTI_ERROR_NONE) {
>> 216 printf("(RawMonitorEnter) unexpected error: %s
>> (%d)\n",
>> 217 TranslateError(err), err);
>> 218 result = FAILED;
>> 219 }
>> * 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>> (jlong)WAIT_TIME);*
>> 221 if (err != JVMTI_ERROR_NONE) {
>> 222 printf("(RawMonitorWait) unexpected error: %s
>> (%d)\n",
>> 223 TranslateError(err), err);
>> 224 result = FAILED;
>> 225 }
>> 226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
>> 227 if (err != JVMTI_ERROR_NONE) {
>> 228 printf("(RawMonitorExit) unexpected error: %s
>> (%d)\n",
>> 229 TranslateError(err), err);
>> 230 result = FAILED;
>> 231 }
>>
>> I see 2 possible race conditions here.
>>
>> - if the thread is slow to start, then the first RawMonitorWait
>> could timeout. The code would continue to the second RawMonitorWait
>> which would wake up from the agentThreadProc call to
>> RawMonitorNotify.
>>
>> - if the thread ending is slow to complete then the second
>> RawMonitorWait
>> could timeout before the thread end event is processed.
>>
>> After the second RawMonitorWait completes, the event notification is
>> disabled
>> for both thread start and end events. If we disable the end thread
>> event before
>> the thread end event is delivered the counts will not match as expected.
>>
>> 233 err = (*jvmti)->SetEventNotificationMode(jvmti,
>> JVMTI_DISABLE,
>> 234 JVMTI_EVENT_THREAD_START, NULL);
>> 235 if (err != JVMTI_ERROR_NONE) {
>> 236 printf("Failed to disable
>> JVMTI_EVENT_THREAD_START: %s (%d)\n",
>> 237 TranslateError(err), err);
>> 238 result = FAILED;
>> 239 }
>> 240 *
>> 241 err = (*jvmti)->SetEventNotificationMode(jvmti,
>> JVMTI_DISABLE,
>> 242 JVMTI_EVENT_THREAD_END, NULL);*
>> 243 if (err != JVMTI_ERROR_NONE) {
>> 244 printf("Failed to disable JVMTI_EVENT_THREAD_END:
>> %s (%d)\n",
>> 245 TranslateError(err), err);
>> 246 result = FAILED;
>> 247 }
>>
>> My recommendation is to change the first RawMonitorWait to use 0
>> rather than WAIT_TIME. This means it will wait until notified.
>> For the second RawMonitorWait, we should retry the wait if
>> endsCount does not equal endsExpected.
>>
>> 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
>>
>>
>> 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>> (jlong)WAIT_TIME);
>> if (endsCount != endsExpected) {
>> err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
>> (jlong)WAIT_TIME * 2);
>> }
>> 221 if (err != JVMTI_ERROR_NONE) {
>>
>> So we remove the first race condition by removing the timeout on the
>> first wait call.
>> And we accommodate a possible second race condition by retrying the
>> second wait with
>> an extended delay.
>>
>> An alternative to the retry could be to add a RawMonitorNotify call
>> in the thread
>> end event callback ThreadEnd() after the endsCount has been incremented.
>>
>> This test fails very infrequently, so it may be difficult to prove
>> that the specific
>> problem has been resolved. I'll try some testing with the proposed
>> changes
>> and see if we can work towards getting this test off the ProblemList
>> for jdk12.
>>
>>
More information about the serviceability-dev
mailing list