RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
David Holmes
david.holmes at oracle.com
Tue Aug 18 05:20:07 UTC 2020
Hi Richard,
The test seems a lot clearer to me now. I'll leave it to you are Serguei
to iron out any last wrinkles as I am disappearing on vacation for a
week after today. But you have my Review.
Thanks,
David
On 15/08/2020 12:06 am, Reingruber, Richard wrote:
> Hi Serguei,
>
> thanks for the feedback. I have implemented your suggestions and created a new
> webrev:
>
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4/
> Delta: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/
>
> Please find my replies to your comments below.
>
> Best regards,
> Richard.
>
>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html
>
>> 33 * the stack walk. The target thread's stack is walkable while in native. After sending the notification it
>> ...
>
>> 54 * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.
>> 55 * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.
>> ...
>
>> 71 * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.
>> ...
>
>> 89 msg((now -start) + " ms Iteration : " + iterations + " waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);
>
>> Could you, please, re-balance the lines above to make them shorter?
>
> Ok, done.
>
>
>> 90 int newTargetDepth = recursiveMethod(0, targetDepth);
>> 91 if (newTargetDepth < targetDepth) {
>> 92 msg("StackOverflowError during test.");
>> 93 msg("Old target depth: " + targetDepth);
>> 94 msg("Retry with new target depth: " + newTargetDepth);
>> 95 targetDepth = newTargetDepth;
>> 96 }
>> A comment is needed to explain why a StackOverflowError is not desired.
>> At least, it is not obvious initially.
>> 73 public int waitTimeInNativeAfterNotify;
>
>> This name is unreasonably long which makes the code less readable.
>> I'd suggest to reduce it to waitTime.
>
> Ok, done.
>
>> 103 notifyAgentToGetLocalAndWaitShortly(-1, 1);
>> ...
>> 119 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);
>
>> It is better to provide a short comment before each call explaining what it is doing.
>> For instance, it is not clear why the call at the line 103 is needed.
>> Why do we need to notify the agent to GetLocal for the second time?
>
> The test is repeated TEST_ITERATIONS times. In each iteration the agent calls
> GetLocal racing the target thread returning from the native call. The last call
> in line 103 ist the shutdown signal.
>
>> Can it be refactored into a separate native method?
>
> I've made the shutdown process more explicit with the new native method
> shutDown() which sets thest_state to ShutDown.
>
>> Then the the function name can be reduced to 'notifyAgentToGetLocal'.
>> This long name does not give enough context anyway.
>
> Ok, done.
>
>> 85 long iterations = 0;
>> 87 do {
>> ...
>> 97 iterations++;
>> ...
>> 102 } while (iterations < TEST_ITERATIONS);
>
>> Why a more explicit 'for' or 'while' loop is not used here? :
>> for (long iter = 0; iter < TEST_ITERATIONS; iter++) {
>
> I have converted the loop into a for loop.
>
>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html
>
>> The indent in this file varies. It is better to keep it the same: 4 or 2.
>
> Yes, I noticed this. I have not corrected it yet, because I didn't want to
> pullute the incremental webrev with that change. Would you like me to fix the
> indentation now to 2 spaces or do it as a last step?
>
>> 60 AgentCallingGetLocalObject // The target thread waits for the agent to call
>> I'd suggest to rename the constant to 'AgentInGetLocal'.
>
> Ok, done.
>
>> 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {
>
>> It is better rename the function to TestThreadLoop.
>
> Would AgentThreadLoop be ok too?
>
>> You can add a comment before to explain some basic about what it is doing.
>
> Ok, done.
>
>> 167 printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",
>> It is better to get rid of leading stars in all messages.
>
> Ok, done.
>
>> 176 // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly
>> The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.
>
> Ok, done.
>
> ---
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Freitag, 14. August 2020 10:11
> To: Reingruber, Richard <richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com>; serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
>
> Hi Richard,
>
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html
>
> 33 * the stack walk. The target thread's stack is walkable while in native. After sending the notification it
> ...
>
> 54 * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.
> 55 * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.
> ...
>
> 71 * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.
> ...
>
> 89 msg((now -start) + " ms Iteration : " + iterations + " waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);
>
> Could you, please, re-balance the lines above to make them shorter?
>
>
> 90 int newTargetDepth = recursiveMethod(0, targetDepth);
> 91 if (newTargetDepth < targetDepth) {
> 92 msg("StackOverflowError during test.");
> 93 msg("Old target depth: " + targetDepth);
> 94 msg("Retry with new target depth: " + newTargetDepth);
> 95 targetDepth = newTargetDepth;
> 96 }
> A comment is needed to explain why a StackOverflowError is not desired.
> At least, it is not obvious initially.
> 73 public int waitTimeInNativeAfterNotify;
>
> This name is unreasonably long which makes the code less readable.
> I'd suggest to reduce it to waitTime.
>
> 103 notifyAgentToGetLocalAndWaitShortly(-1, 1);
> ...
> 119 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);
>
> It is better to provide a short comment before each call explaining what it is doing.
> For instance, it is not clear why the call at the line 103 is needed.
> Why do we need to notify the agent to GetLocal for the second time?
> Can it be refactored into a separate native method?
> Then the the function name can be reduced to 'notifyAgentToGetLocal'.
> This long name does not give enough context anyway.
> 85 long iterations = 0;
> 87 do {
> ...
> 97 iterations++;
> ...
> 102 } while (iterations < TEST_ITERATIONS);
>
> Why a more explicit 'for' or 'while' loop is not used here? :
> for (long iter = 0; iter < TEST_ITERATIONS; iter++) {
>
>
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html
>
> The indent in this file varies. It is better to keep it the same: 4 or 2.
>
> 60 AgentCallingGetLocalObject // The target thread waits for the agent to call
> I'd suggest to rename the constant to 'AgentInGetLocal'.
> 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {
>
> It is better rename the function to TestThreadLoop.
> You can add a comment before to explain some basic about what it is doing.
> 167 printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",
> It is better to get rid of leading stars in all messages.
> 176 // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly
> The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.
>
>
> I'm still reviewing the test native agent code.
>
>
> Thanks,
> Serguei
>
>
> On 8/11/20 03:02, Reingruber, Richard wrote:
> Hi David and Serguei,
>
> On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:
> Hi Richard and David,
>
> The implementation looks good to me.
>
> But I do not understand what the test is doing with all this counters
> and recursions.
>
> For instance, these fragments:
>
> 86 recursions = 0;
> 87 try {
> 88 recursiveMethod(1<<20);
> 89 } catch (StackOverflowError e) {
> 90 msg("Caught StackOverflowError as expected");
> 91 }
> 92 int target_depth = recursions-100; // spaces are missed around the '-' sigh
>
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
> recursiveMethod(M);
> int target_depth = M - 100;
>
> Then the variable 'recursions' can be removed or become local.
>
> The recursiveMethod takes in the maximum recursions to try and updates
> the recursions variable to record how many recursions were possible - so:
>
> target_depth = <actual recursions> - 100;
>
> Possibly recursiveMethod could return the actual recursions instead of
> using the global variables?
>
> I've eliminated the static 'recursions' variable. recursiveMethod() now returns
> the depth at which the recursion was ended. I hesitated doing this, because I
> had to handle the StackOverflowError with all those frames still on stack. But
> the handler is empty, so it should not cause problems.
>
> This is the new webrev (as posted previously):
>
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3/
> Delta: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/
>
> Thanks, Richard.
>
> -----Original Message-----
> From: David Holmes mailto:david.holmes at oracle.com
> Sent: Dienstag, 11. August 2020 04:00
> To: mailto:serguei.spitsyn at oracle.com; Reingruber, Richard mailto:richard.reingruber at sap.com; mailto:serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
>
> Hi Serguei,
>
> On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:
> Hi Richard and David,
>
> The implementation looks good to me.
>
> But I do not understand what the test is doing with all this counters
> and recursions.
>
> For instance, these fragments:
>
> 86 recursions = 0;
> 87 try {
> 88 recursiveMethod(1<<20);
> 89 } catch (StackOverflowError e) {
> 90 msg("Caught StackOverflowError as expected");
> 91 }
> 92 int target_depth = recursions-100; // spaces are missed around the '-' sigh
>
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
> recursiveMethod(M);
> int target_depth = M - 100;
>
> Then the variable 'recursions' can be removed or become local.
>
> The recursiveMethod takes in the maximum recursions to try and updates
> the recursions variable to record how many recursions were possible - so:
>
> target_depth = <actual recursions> - 100;
>
> Possibly recursiveMethod could return the actual recursions instead of
> using the global variables?
>
> David
> -----
>
> This method will be:
>
> 47 private static final int M = 1 << 20;
> ...
> 121 public long recursiveMethod(int depth) {
> 123 if (depth == 0) {
> 124 notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
> 126 } else {
> 127 recursiveMethod(--depth);
> 128 }
> 129 }
>
>
> At least, he test is missing the comments explaining all these.
>
> Thanks,
> Serguei
>
>
>
> On 8/9/20 22:35, David Holmes wrote:
> Hi Richard,
>
> On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
> Hi,
>
> I rebase the fix after JDK-8250042.
>
> New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/
>
> The general fix for this seems good. A minor nit:
>
> 588 if (!is_assignable(signature, ob_k, Thread::current())) {
>
> You know that the current thread is the VMThread so can use
> VMThread::vm_thread().
>
> Similarly for this existing code:
>
> 694 Thread* current_thread = Thread::current();
>
> ---
>
> Looking at the test code ... I'm less clear on exactly what is
> happening and the use of spin-waits raises some red-flags for me in
> terms of test reliability on different platforms. The "while
> (--waitCycles > 0)" loop in particular offers no certainty that the
> agent thread is executing anything in particular. And the use of the
> spin_count as a guide to future waiting time seems somewhat arbitrary.
> In all seriousness I got a headache trying to work out how the test
> was expecting to operate. Some parts could be simplified using raw
> monitors, I think. But there's no sure way to know the agent thread is
> in the midst of the stackwalk when the target thread wants to leave
> the native code. So I understand what you are trying to achieve here,
> I'm just not sure how reliably it will actually achieve it.
>
> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp
>
>
> 32 static volatile jlong spinn_count = 0;
>
> Using a 64-bit counter seems like it will be a problem on 32-bit systems.
>
> Should be spin_count not spinn_count.
>
> 36 // Agent thread waits for value != 0, then performas the JVMTI
> call to get local variable.
>
> typo: performas
>
> Thanks,
> David
> -----
>
> Thanks, Richard.
>
> -----Original Message-----
> From: serviceability-dev mailto:serviceability-dev-retn at openjdk.java.net
> On Behalf Of Reingruber, Richard
> Sent: Montag, 27. Juli 2020 09:45
> To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
> Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
>
> Hi Serguei,
>
> > I tested it on Linux and Windows but not yet on MacOS.
>
> The test succeeded now on all platforms.
>
> Thanks, Richard.
>
> -----Original Message-----
> From: Reingruber, Richard
> Sent: Freitag, 24. Juli 2020 15:04
> To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
> Subject: RE: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
>
> Hi Serguei,
>
> The fix itself looks good to me.
>
> thanks for looking at the fix.
>
> I still need another look at new test.
> Could you, please, convert the agent of new test to C++?
> It will make it a little bit simpler.
>
> Sure, here is the new webrev.1 with a C++ version of the test agent:
>
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
>
> I tested it on Linux and Windows but not yet on MacOS.
>
> Thanks,
> Richard.
>
> -----Original Message-----
> From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com
> Sent: Freitag, 24. Juli 2020 00:00
> To: Reingruber, Richard mailto:richard.reingruber at sap.com;
> mailto:serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
>
> Hi Richard,
>
> Thank you for filing the CR and taking care about it!
> The fix itself looks good to me.
> I still need another look at new test.
> Could you, please, convert the agent of new test to C++?
> It will make it a little bit simpler.
>
> Thanks,
> Serguei
>
>
> On 7/20/20 01:15, Reingruber, Richard wrote:
> Hi,
>
> please help review this fix for VM_GetOrSetLocal. It moves the
> unsafe stackwalk from the vm
> operation prologue before the safepoint into the doit() method
> executed at the safepoint.
>
> Webrev:
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
> Bug: https://bugs.openjdk.java.net/browse/JDK-8249293
>
> According to the JVMTI spec on local variable access it is not
> required to suspend the target thread
> T [1]. The operation will simply fail with
> JVMTI_ERROR_NO_MORE_FRAMES if T is executing
> bytecodes. It will succeed though if T is blocked because of
> synchronization or executing some native
> code.
>
> The issue is that in the latter case the stack walk in
> VM_GetOrSetLocal::doit_prologue() to prepare
> the access to the local variable is unsafe, because it is done
> before the safepoint and it races
> with T returning to execute bytecodes making its stack not walkable.
> The included test shows that
> this can crash the VM if T wins the race.
>
> Manual testing:
>
> - new test
> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
> - test/hotspot/jtreg/vmTestbase/nsk/jvmti
> - test/hotspot/jtreg/serviceability/jvmti
>
> Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode,
> SPECjvm2008, SPECjbb2015,
> Renaissance Suite, SAP specific tests with fastdebug and release
> builds on all platforms
>
> Thanks, Richard.
>
> [1]
> https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local
>
>
>
More information about the serviceability-dev
mailing list