RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reingruber, Richard richard.reingruber at sap.com
Tue Aug 11 09:52:20 UTC 2020


Hi Serguei,

> The implementation looks good to me.

Thanks.

> But I do not understand what the test is doing with all this counters and recursions.

The @comment gives an explanation: the target thread builds a stack as large as
possible to prolong the unsafe stackwalk. This is done by means of recursion.

> For instance, these fragments:
>   86         recursions = 0;
>   87         try {
>   88             recursiveMethod(1<<20);
>   89         } catch (StackOverflowError e) {
>   90             msg("Caught StackOverflowError as expected");
>   91         }
>   92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh

Here we calibrate the test for maximum stack depth. We measure how large the
stack can grow by calling recursiveMethod() until a StackOverflowError is
raised. We use recursions - 100 as target_depth to avoid the StackOverflowError
during the actual test.

> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
>    recursiveMethod(M);
>    int target_depth = M - 100;

> Then the variable 'recursions' can be removed or become local.

> This method will be:
>   47     private static final int M = 1 << 20;
>  ...
>  121     public long recursiveMethod(int depth) {
>  123         if (depth == 0) {
>  124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
>  126         } else {
>  127             recursiveMethod(--depth);
>  128         }
>  129     }

I don't think this would work. A StackOverflowError would be raised
before notifyAgentToGetLocalAndWaitShortly() is called.

> At least, he test is missing the comments explaining all these.

The arguments to the msg() method serve as documentation too. I would not want
to repeat the msg strings in comments.

Thanks, Richard.


-------------------------------------------------------------
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com> 
Sent: Montag, 10. August 2020 19:22
To: David Holmes <david.holmes at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Richard and David,

The implementation looks good to me.

But I do not understand what the test is doing with all this counters and recursions.

For instance, these fragments:
  86         recursions = 0;
  87         try {
  88             recursiveMethod(1<<20);
  89         } catch (StackOverflowError e) {
  90             msg("Caught StackOverflowError as expected");
  91         }
  92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh

It is not obvious that the 'recursion' is updated in the recursiveMethod.
I would suggestto make it more explicit:
   recursiveMethod(M);
   int target_depth = M - 100;

Then the variable 'recursions' can be removed or become local.

This method will be:
  47     private static final int M = 1 << 20;
 ...
 121     public long recursiveMethod(int depth) {
 123         if (depth == 0) {
 124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
 126         } else {
 127             recursiveMethod(--depth);
 128         }
 129     }


At least, he test is missing the comments explaining all these.

Thanks,
Serguei



On 8/9/20 22:35, David Holmes wrote:
Hi Richard, 

On 31/07/2020 5:28 pm, Reingruber, Richard wrote: 

Hi, 

I rebase the fix after JDK-8250042. 

New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/ 

The general fix for this seems good. A minor nit: 

 588     if (!is_assignable(signature, ob_k, Thread::current())) { 

You know that the current thread is the VMThread so can use VMThread::vm_thread(). 

Similarly for this existing code: 

 694     Thread* current_thread = Thread::current(); 

--- 

Looking at the test code ... I'm less clear on exactly what is happening and the use of spin-waits raises some red-flags for me in terms of test reliability on different platforms. The "while (--waitCycles > 0)" loop in particular offers no certainty that the agent thread is executing anything in particular. And the use of the spin_count as a guide to future waiting time seems somewhat arbitrary. In all seriousness I got a headache trying to work out how the test was expecting to operate. Some parts could be simplified using raw monitors, I think. But there's no sure way to know the agent thread is in the midst of the stackwalk when the target thread wants to leave the native code. So I understand what you are trying to achieve here, I'm just not sure how reliably it will actually achieve it. 

test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp 

 32 static volatile jlong spinn_count     = 0; 

Using a 64-bit counter seems like it will be a problem on 32-bit systems. 

Should be spin_count not spinn_count. 

 36 // Agent thread waits for value != 0, then performas the JVMTI call to get local variable. 

typo: performas 

Thanks, 
David 
----- 


Thanks, Richard. 

-----Original Message----- 
From: serviceability-dev mailto:serviceability-dev-retn at openjdk.java.net On Behalf Of Reingruber, Richard 
Sent: Montag, 27. Juli 2020 09:45 
To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net 
Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() 

Hi Serguei, 

   > I tested it on Linux and Windows but not yet on MacOS. 

The test succeeded now on all platforms. 

Thanks, Richard. 

-----Original Message----- 
From: Reingruber, Richard 
Sent: Freitag, 24. Juli 2020 15:04 
To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net 
Subject: RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() 

Hi Serguei, 


The fix itself looks good to me. 

thanks for looking at the fix. 


I still need another look at new test. 
Could you, please, convert the agent of new test to C++? 
It will make it a little bit simpler. 

Sure, here is the new webrev.1 with a C++ version of the test agent: 

http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/ 

I tested it on Linux and Windows but not yet on MacOS. 

Thanks, 
Richard. 

-----Original Message----- 
From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com 
Sent: Freitag, 24. Juli 2020 00:00 
To: Reingruber, Richard mailto:richard.reingruber at sap.com; mailto:serviceability-dev at openjdk.java.net 
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue() 

Hi Richard, 

Thank you for filing the CR and taking care about it! 
The fix itself looks good to me. 
I still need another look at new test. 
Could you, please, convert the agent of new test to C++? 
It will make it a little bit simpler. 

Thanks, 
Serguei 


On 7/20/20 01:15, Reingruber, Richard wrote: 

Hi, 

please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk from the vm 
operation prologue before the safepoint into the doit() method executed at the safepoint. 

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html 
Bug:    https://bugs.openjdk.java.net/browse/JDK-8249293 

According to the JVMTI spec on local variable access it is not required to suspend the target thread 
T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is executing 
bytecodes. It will succeed though if T is blocked because of synchronization or executing some native 
code. 

The issue is that in the latter case the stack walk in VM_GetOrSetLocal::doit_prologue() to prepare 
the access to the local variable is unsafe, because it is done before the safepoint and it races 
with T returning to execute bytecodes making its stack not walkable. The included test shows that 
this can crash the VM if T wins the race. 

Manual testing: 

    - new test test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
    - test/hotspot/jtreg/vmTestbase/nsk/jvmti 
    - test/hotspot/jtreg/serviceability/jvmti 

Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015, 
Renaissance Suite, SAP specific tests with fastdebug and release builds on all platforms 

Thanks, Richard. 

[1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local 




More information about the serviceability-dev mailing list