RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

David Holmes david.holmes at oracle.com
Tue Aug 18 05:20:07 UTC 2020


Hi Richard,

The test seems a lot clearer to me now. I'll leave it to you are Serguei 
to iron out any last wrinkles as I am disappearing on vacation for a 
week after today. But you have my Review.

Thanks,
David

On 15/08/2020 12:06 am, Reingruber, Richard wrote:
> Hi Serguei,
> 
> thanks for the feedback. I have implemented your suggestions and created a new
> webrev:
> 
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4/
> Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/
> 
> Please find my replies to your comments below.
> 
> Best regards,
> Richard.
> 
>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html
> 
>>    33  *          the stack walk. The target thread's stack is walkable while in native. After sending the notification it
>>    ...
> 
>>    54      * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.
>>    55      * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.
>>    ...
> 
>>    71      * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.
>>    ...
> 
>>    89             msg((now -start) + " ms  Iteration : " + iterations + "  waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);
> 
>> Could you, please, re-balance the lines above to make them shorter?
> 
> Ok, done.
> 
> 
>>    90             int newTargetDepth = recursiveMethod(0, targetDepth);
>>    91             if (newTargetDepth < targetDepth) {
>>    92                 msg("StackOverflowError during test.");
>>    93                 msg("Old target depth: " + targetDepth);
>>    94                 msg("Retry with new target depth: " + newTargetDepth);
>>    95                 targetDepth = newTargetDepth;
>>    96             }
>> A comment is needed to explain why a StackOverflowError is not desired.
>> At least, it is not obvious initially.
>>    73     public int waitTimeInNativeAfterNotify;
> 
>> This name is unreasonably long which makes the code less readable.
>> I'd suggest to reduce it to waitTime.
> 
> Ok, done.
> 
>>   103         notifyAgentToGetLocalAndWaitShortly(-1, 1);
>>   ...
>>   119                 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);
> 
>> It is better to provide a short comment before each call explaining what it is doing.
>> For instance, it is not clear why the call at the line 103 is needed.
>> Why do we need to notify the agent to GetLocal for the second time?
> 
> The test is repeated TEST_ITERATIONS times. In each iteration the agent calls
> GetLocal racing the target thread returning from the native call. The last call
> in line 103 ist the shutdown signal.
> 
>> Can it be refactored into a separate native method?
> 
> I've made the shutdown process more explicit with the new native method
> shutDown() which sets thest_state to ShutDown.
> 
>> Then the the function name can be reduced to 'notifyAgentToGetLocal'.
>> This long name does not give enough context anyway.
> 
> Ok, done.
> 
>>    85         long iterations = 0;
>>    87         do {
>>    ...
>>    97             iterations++;
>>    ...
>>   102         } while (iterations < TEST_ITERATIONS);
> 
>> Why a more explicit 'for' or 'while' loop is not used here? :
>> for (long iter = 0; iter < TEST_ITERATIONS; iter++) {
>   
> I have converted the loop into a for loop.
> 
>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html
> 
>> The indent in this file varies. It is better to keep it the same: 4 or 2.
> 
> Yes, I noticed this. I have not corrected it yet, because I didn't want to
> pullute the incremental webrev with that change. Would you like me to fix the
> indentation now to 2 spaces or do it as a last step?
> 
>>    60   AgentCallingGetLocalObject // The target thread waits for the agent to call
>> I'd suggest to rename the constant to 'AgentInGetLocal'.
> 
> Ok, done.
> 
>>   150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {
> 
>> It is better rename the function to TestThreadLoop.
> 
> Would AgentThreadLoop be ok too?
> 
>> You can add a comment before to explain some basic about what it is doing.
> 
> Ok, done.
> 
>>   167     printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",
>> It is better to get rid of leading stars in all messages.
> 
> Ok, done.
> 
>>   176         // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly
>> The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.
> 
> Ok, done.
> 
> ---
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Freitag, 14. August 2020 10:11
> To: Reingruber, Richard <richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com>; serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
> 
> Hi Richard,
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html
> 
>    33  *          the stack walk. The target thread's stack is walkable while in native. After sending the notification it
>    ...
> 
>    54      * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.
>    55      * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.
>    ...
> 
>    71      * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.
>    ...
> 
>    89             msg((now -start) + " ms  Iteration : " + iterations + "  waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);
> 
> Could you, please, re-balance the lines above to make them shorter?
> 
> 
>    90             int newTargetDepth = recursiveMethod(0, targetDepth);
>    91             if (newTargetDepth < targetDepth) {
>    92                 msg("StackOverflowError during test.");
>    93                 msg("Old target depth: " + targetDepth);
>    94                 msg("Retry with new target depth: " + newTargetDepth);
>    95                 targetDepth = newTargetDepth;
>    96             }
> A comment is needed to explain why a StackOverflowError is not desired.
> At least, it is not obvious initially.
>    73     public int waitTimeInNativeAfterNotify;
> 
> This name is unreasonably long which makes the code less readable.
> I'd suggest to reduce it to waitTime.
> 
>   103         notifyAgentToGetLocalAndWaitShortly(-1, 1);
>   ...
>   119                 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);
> 
> It is better to provide a short comment before each call explaining what it is doing.
> For instance, it is not clear why the call at the line 103 is needed.
> Why do we need to notify the agent to GetLocal for the second time?
> Can it be refactored into a separate native method?
> Then the the function name can be reduced to 'notifyAgentToGetLocal'.
> This long name does not give enough context anyway.
>    85         long iterations = 0;
>    87         do {
>    ...
>    97             iterations++;
>    ...
>   102         } while (iterations < TEST_ITERATIONS);
> 
> Why a more explicit 'for' or 'while' loop is not used here? :
> for (long iter = 0; iter < TEST_ITERATIONS; iter++) {
>   
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html
> 
> The indent in this file varies. It is better to keep it the same: 4 or 2.
> 
>    60   AgentCallingGetLocalObject // The target thread waits for the agent to call
> I'd suggest to rename the constant to 'AgentInGetLocal'.
>   150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {
> 
> It is better rename the function to TestThreadLoop.
> You can add a comment before to explain some basic about what it is doing.
>   167     printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",
> It is better to get rid of leading stars in all messages.
>   176         // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly
> The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.
> 
> 
> I'm still reviewing the test native agent code.
> 
> 
> Thanks,
> Serguei
> 
> 
> On 8/11/20 03:02, Reingruber, Richard wrote:
> Hi David and Serguei,
> 
> On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:
> Hi Richard and David,
> 
> The implementation looks good to me.
> 
> But I do not understand what the test is doing with all this counters
> and recursions.
> 
> For instance, these fragments:
> 
>     86         recursions = 0;
>     87         try {
>     88             recursiveMethod(1<<20);
>     89         } catch (StackOverflowError e) {
>     90             msg("Caught StackOverflowError as expected");
>     91         }
>     92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh
> 
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
>      recursiveMethod(M);
>      int target_depth = M - 100;
> 
> Then the variable 'recursions' can be removed or become local.
> 
> The recursiveMethod takes in the maximum recursions to try and updates
> the recursions variable to record how many recursions were possible - so:
> 
> target_depth = <actual recursions> - 100;
> 
> Possibly recursiveMethod could return the actual recursions instead of
> using the global variables?
> 
> I've eliminated the static 'recursions' variable. recursiveMethod() now returns
> the depth at which the recursion was ended. I hesitated doing this, because I
> had to handle the StackOverflowError with all those frames still on stack. But
> the handler is empty, so it should not cause problems.
> 
> This is the new webrev (as posted previously):
> 
> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3/
> Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/
> 
> Thanks, Richard.
> 
> -----Original Message-----
> From: David Holmes mailto:david.holmes at oracle.com
> Sent: Dienstag, 11. August 2020 04:00
> To: mailto:serguei.spitsyn at oracle.com; Reingruber, Richard mailto:richard.reingruber at sap.com; mailto:serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
> 
> Hi Serguei,
> 
> On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:
> Hi Richard and David,
> 
> The implementation looks good to me.
> 
> But I do not understand what the test is doing with all this counters
> and recursions.
> 
> For instance, these fragments:
> 
>     86         recursions = 0;
>     87         try {
>     88             recursiveMethod(1<<20);
>     89         } catch (StackOverflowError e) {
>     90             msg("Caught StackOverflowError as expected");
>     91         }
>     92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh
> 
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
>      recursiveMethod(M);
>      int target_depth = M - 100;
> 
> Then the variable 'recursions' can be removed or become local.
> 
> The recursiveMethod takes in the maximum recursions to try and updates
> the recursions variable to record how many recursions were possible - so:
> 
> target_depth = <actual recursions> - 100;
> 
> Possibly recursiveMethod could return the actual recursions instead of
> using the global variables?
> 
> David
> -----
> 
> This method will be:
> 
>     47     private static final int M = 1 << 20;
>    ...
>    121     public long recursiveMethod(int depth) {
>    123         if (depth == 0) {
>    124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
>    126         } else {
>    127             recursiveMethod(--depth);
>    128         }
>    129     }
> 
> 
> At least, he test is missing the comments explaining all these.
> 
> Thanks,
> Serguei
> 
> 
> 
> On 8/9/20 22:35, David Holmes wrote:
> Hi Richard,
> 
> On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
> Hi,
> 
> I rebase the fix after JDK-8250042.
> 
> New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/
> 
> The general fix for this seems good. A minor nit:
> 
>   588     if (!is_assignable(signature, ob_k, Thread::current())) {
> 
> You know that the current thread is the VMThread so can use
> VMThread::vm_thread().
> 
> Similarly for this existing code:
> 
>   694     Thread* current_thread = Thread::current();
> 
> ---
> 
> Looking at the test code ... I'm less clear on exactly what is
> happening and the use of spin-waits raises some red-flags for me in
> terms of test reliability on different platforms. The "while
> (--waitCycles > 0)" loop in particular offers no certainty that the
> agent thread is executing anything in particular. And the use of the
> spin_count as a guide to future waiting time seems somewhat arbitrary.
> In all seriousness I got a headache trying to work out how the test
> was expecting to operate. Some parts could be simplified using raw
> monitors, I think. But there's no sure way to know the agent thread is
> in the midst of the stackwalk when the target thread wants to leave
> the native code. So I understand what you are trying to achieve here,
> I'm just not sure how reliably it will actually achieve it.
> 
> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp
> 
> 
>   32 static volatile jlong spinn_count     = 0;
> 
> Using a 64-bit counter seems like it will be a problem on 32-bit systems.
> 
> Should be spin_count not spinn_count.
> 
>   36 // Agent thread waits for value != 0, then performas the JVMTI
> call to get local variable.
> 
> typo: performas
> 
> Thanks,
> David
> -----
> 
> Thanks, Richard.
> 
> -----Original Message-----
> From: serviceability-dev mailto:serviceability-dev-retn at openjdk.java.net
> On Behalf Of Reingruber, Richard
> Sent: Montag, 27. Juli 2020 09:45
> To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
> Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
> 
> Hi Serguei,
> 
>     > I tested it on Linux and Windows but not yet on MacOS.
> 
> The test succeeded now on all platforms.
> 
> Thanks, Richard.
> 
> -----Original Message-----
> From: Reingruber, Richard
> Sent: Freitag, 24. Juli 2020 15:04
> To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
> Subject: RE: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
> 
> Hi Serguei,
> 
> The fix itself looks good to me.
> 
> thanks for looking at the fix.
> 
> I still need another look at new test.
> Could you, please, convert the agent of new test to C++?
> It will make it a little bit simpler.
> 
> Sure, here is the new webrev.1 with a C++ version of the test agent:
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
> 
> I tested it on Linux and Windows but not yet on MacOS.
> 
> Thanks,
> Richard.
> 
> -----Original Message-----
> From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com
> Sent: Freitag, 24. Juli 2020 00:00
> To: Reingruber, Richard mailto:richard.reingruber at sap.com;
> mailto:serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in
> VM_GetOrSetLocal::doit_prologue()
> 
> Hi Richard,
> 
> Thank you for filing the CR and taking care about it!
> The fix itself looks good to me.
> I still need another look at new test.
> Could you, please, convert the agent of new test to C++?
> It will make it a little bit simpler.
> 
> Thanks,
> Serguei
> 
> 
> On 7/20/20 01:15, Reingruber, Richard wrote:
> Hi,
> 
> please help review this fix for VM_GetOrSetLocal. It moves the
> unsafe stackwalk from the vm
> operation prologue before the safepoint into the doit() method
> executed at the safepoint.
> 
> Webrev:
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
> Bug: https://bugs.openjdk.java.net/browse/JDK-8249293
> 
> According to the JVMTI spec on local variable access it is not
> required to suspend the target thread
> T [1]. The operation will simply fail with
> JVMTI_ERROR_NO_MORE_FRAMES if T is executing
> bytecodes. It will succeed though if T is blocked because of
> synchronization or executing some native
> code.
> 
> The issue is that in the latter case the stack walk in
> VM_GetOrSetLocal::doit_prologue() to prepare
> the access to the local variable is unsafe, because it is done
> before the safepoint and it races
> with T returning to execute bytecodes making its stack not walkable.
> The included test shows that
> this can crash the VM if T wins the race.
> 
> Manual testing:
> 
>      - new test
> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
>      - test/hotspot/jtreg/vmTestbase/nsk/jvmti
>      - test/hotspot/jtreg/serviceability/jvmti
> 
> Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode,
> SPECjvm2008, SPECjbb2015,
> Renaissance Suite, SAP specific tests with fastdebug and release
> builds on all platforms
> 
> Thanks, Richard.
> 
> [1]
> https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local
> 
> 
> 


More information about the serviceability-dev mailing list