JDK-8034084: nsk.nsk/jvmti/ThreadStart/threadstart003 Wrong number of thread end events
Gary Adams
gary.adams at oracle.com
Fri Aug 10 18:15:37 UTC 2018
There's a fairly old bug concerning jvmti thread start and end events.
I believe there are 2 possible race conditions in the threadstart003
test based on some arbitrary timeout values used in the test itself.
Here are a few snippets from the test to illustrate the problem area.
In the initial sequence RunAgentThread is used to start a thread
with the agentThreadProc entry point.
188 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
189 if (err != JVMTI_ERROR_NONE) {
190 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
191 TranslateError(err), err);
192 result = FAILED;
193 }
* 194 err = (*jvmti)->RunAgentThread(jvmti, thr, agentThreadProc,*
195 NULL, JVMTI_THREAD_MAX_PRIORITY);
196 if (err != JVMTI_ERROR_NONE) {
197 printf("(RunAgentThread) unexpected error: %s (%d)\n",
198 TranslateError(err), err);
199 result = FAILED;
200 }
* 201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
(jlong)WAIT_TIME);*
202 if (err != JVMTI_ERROR_NONE) {
203 printf("(RawMonitorWait) unexpected error: %s (%d)\n",
204 TranslateError(err), err);
205 result = FAILED;
206 }
207 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
208 if (err != JVMTI_ERROR_NONE) {
209 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
210 TranslateError(err), err);
211 result = FAILED;
212 }
I believe the test expects to wake up based on the code in agentThreadProc
notification of the wait_lock.
121 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
122 if (err != JVMTI_ERROR_NONE) {
123 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
124 TranslateError(err), err);
125 result = FAILED;
126 }
* 127 err = (*jvmti)->RawMonitorNotify(jvmti, wait_lock);*
128 if (err != JVMTI_ERROR_NONE) {
129 printf("(RawMonitorNotify) unexpected error: %s (%d)\n",
130 TranslateError(err), err);
131 result = FAILED;
132 }
133 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
134 if (err != JVMTI_ERROR_NONE) {
135 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
136 TranslateError(err), err);
137 result = FAILED;
138 }
A second block for the wait_lock does not have a corresponding
notify call. I'm guessing the monitor is expected to time out,
so it allows for one second to complete the delivery of the
thread end event.
213
214 err = (*jvmti)->RawMonitorEnter(jvmti, wait_lock);
215 if (err != JVMTI_ERROR_NONE) {
216 printf("(RawMonitorEnter) unexpected error: %s (%d)\n",
217 TranslateError(err), err);
218 result = FAILED;
219 }
* 220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
(jlong)WAIT_TIME);*
221 if (err != JVMTI_ERROR_NONE) {
222 printf("(RawMonitorWait) unexpected error: %s (%d)\n",
223 TranslateError(err), err);
224 result = FAILED;
225 }
226 err = (*jvmti)->RawMonitorExit(jvmti, wait_lock);
227 if (err != JVMTI_ERROR_NONE) {
228 printf("(RawMonitorExit) unexpected error: %s (%d)\n",
229 TranslateError(err), err);
230 result = FAILED;
231 }
I see 2 possible race conditions here.
- if the thread is slow to start, then the first RawMonitorWait
could timeout. The code would continue to the second RawMonitorWait
which would wake up from the agentThreadProc call to RawMonitorNotify.
- if the thread ending is slow to complete then the second RawMonitorWait
could timeout before the thread end event is processed.
After the second RawMonitorWait completes, the event notification is
disabled
for both thread start and end events. If we disable the end thread event
before
the thread end event is delivered the counts will not match as expected.
233 err = (*jvmti)->SetEventNotificationMode(jvmti,
JVMTI_DISABLE,
234 JVMTI_EVENT_THREAD_START, NULL);
235 if (err != JVMTI_ERROR_NONE) {
236 printf("Failed to disable JVMTI_EVENT_THREAD_START:
%s (%d)\n",
237 TranslateError(err), err);
238 result = FAILED;
239 }
240 *
241 err = (*jvmti)->SetEventNotificationMode(jvmti,
JVMTI_DISABLE,
242 JVMTI_EVENT_THREAD_END, NULL);*
243 if (err != JVMTI_ERROR_NONE) {
244 printf("Failed to disable JVMTI_EVENT_THREAD_END: %s
(%d)\n",
245 TranslateError(err), err);
246 result = FAILED;
247 }
My recommendation is to change the first RawMonitorWait to use 0
rather than WAIT_TIME. This means it will wait until notified.
For the second RawMonitorWait, we should retry the wait if
endsCount does not equal endsExpected.
201 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock, 0);
220 err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
(jlong)WAIT_TIME);
if (endsCount != endsExpected) {
err = (*jvmti)->RawMonitorWait(jvmti, wait_lock,
(jlong)WAIT_TIME * 2);
}
221 if (err != JVMTI_ERROR_NONE) {
So we remove the first race condition by removing the timeout on the
first wait call.
And we accommodate a possible second race condition by retrying the
second wait with
an extended delay.
An alternative to the retry could be to add a RawMonitorNotify call in
the thread
end event callback ThreadEnd() after the endsCount has been incremented.
This test fails very infrequently, so it may be difficult to prove that
the specific
problem has been resolved. I'll try some testing with the proposed changes
and see if we can work towards getting this test off the ProblemList for
jdk12.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20180810/35ed77c4/attachment-0001.html>
More information about the serviceability-dev
mailing list