RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reingruber, Richard richard.reingruber at sap.com
Tue Aug 11 10:02:43 UTC 2020


Hi David and Serguei,

> On 11/08/2020 3:21 am, serguei.spitsyn at oracle.com wrote:
> > Hi Richard and David,
> > 
> > The implementation looks good to me.
> > 
> > But I do not understand what the test is doing with all this counters 
> > and recursions.
> > 
> > For instance, these fragments:
> > 
> >    86         recursions = 0;
> >    87         try {
> >    88             recursiveMethod(1<<20);
> >    89         } catch (StackOverflowError e) {
> >    90             msg("Caught StackOverflowError as expected");
> >    91         }
> >    92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh
> > 
> > It is not obvious that the 'recursion' is updated in the recursiveMethod.
> > I would suggestto make it more explicit:
> >     recursiveMethod(M);
> >     int target_depth = M - 100;
> > 
> > Then the variable 'recursions' can be removed or become local.

> The recursiveMethod takes in the maximum recursions to try and updates 
> the recursions variable to record how many recursions were possible - so:

> target_depth = <actual recursions> - 100;

> Possibly recursiveMethod could return the actual recursions instead of 
> using the global variables?

I've eliminated the static 'recursions' variable. recursiveMethod() now returns
the depth at which the recursion was ended. I hesitated doing this, because I
had to handle the StackOverflowError with all those frames still on stack. But
the handler is empty, so it should not cause problems.

This is the new webrev (as posted previously):

Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3/
Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/

Thanks, Richard.

-----Original Message-----
From: David Holmes <david.holmes at oracle.com> 
Sent: Dienstag, 11. August 2020 04:00
To: serguei.spitsyn at oracle.com; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Serguei,

On 11/08/2020 3:21 am, serguei.spitsyn at oracle.com wrote:
> Hi Richard and David,
> 
> The implementation looks good to me.
> 
> But I do not understand what the test is doing with all this counters 
> and recursions.
> 
> For instance, these fragments:
> 
>    86         recursions = 0;
>    87         try {
>    88             recursiveMethod(1<<20);
>    89         } catch (StackOverflowError e) {
>    90             msg("Caught StackOverflowError as expected");
>    91         }
>    92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh
> 
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
>     recursiveMethod(M);
>     int target_depth = M - 100;
> 
> Then the variable 'recursions' can be removed or become local.

The recursiveMethod takes in the maximum recursions to try and updates 
the recursions variable to record how many recursions were possible - so:

target_depth = <actual recursions> - 100;

Possibly recursiveMethod could return the actual recursions instead of 
using the global variables?

David
-----

> This method will be:
> 
>    47     private static final int M = 1 << 20;
>   ...
>   121     public long recursiveMethod(int depth) {
>   123         if (depth == 0) {
>   124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
>   126         } else {
>   127             recursiveMethod(--depth);
>   128         }
>   129     }
> 
> 
> At least, he test is missing the comments explaining all these.
> 
> Thanks,
> Serguei
> 
> 
> 
> On 8/9/20 22:35, David Holmes wrote:
>> Hi Richard,
>>
>> On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
>>> Hi,
>>>
>>> I rebase the fix after JDK-8250042.
>>>
>>> New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/
>>
>> The general fix for this seems good. A minor nit:
>>
>>  588     if (!is_assignable(signature, ob_k, Thread::current())) {
>>
>> You know that the current thread is the VMThread so can use 
>> VMThread::vm_thread().
>>
>> Similarly for this existing code:
>>
>>  694     Thread* current_thread = Thread::current();
>>
>> ---
>>
>> Looking at the test code ... I'm less clear on exactly what is 
>> happening and the use of spin-waits raises some red-flags for me in 
>> terms of test reliability on different platforms. The "while 
>> (--waitCycles > 0)" loop in particular offers no certainty that the 
>> agent thread is executing anything in particular. And the use of the 
>> spin_count as a guide to future waiting time seems somewhat arbitrary. 
>> In all seriousness I got a headache trying to work out how the test 
>> was expecting to operate. Some parts could be simplified using raw 
>> monitors, I think. But there's no sure way to know the agent thread is 
>> in the midst of the stackwalk when the target thread wants to leave 
>> the native code. So I understand what you are trying to achieve here, 
>> I'm just not sure how reliably it will actually achieve it.
>>
>> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp 
>>
>>
>>  32 static volatile jlong spinn_count     = 0;
>>
>> Using a 64-bit counter seems like it will be a problem on 32-bit systems.
>>
>> Should be spin_count not spinn_count.
>>
>>  36 // Agent thread waits for value != 0, then performas the JVMTI 
>> call to get local variable.
>>
>> typo: performas
>>
>> Thanks,
>> David
>> -----
>>
>>> Thanks, Richard.
>>>
>>> -----Original Message-----
>>> From: serviceability-dev <serviceability-dev-retn at openjdk.java.net> 
>>> On Behalf Of Reingruber, Richard
>>> Sent: Montag, 27. Juli 2020 09:45
>>> To: serguei.spitsyn at oracle.com; serviceability-dev at openjdk.java.net
>>> Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in 
>>> VM_GetOrSetLocal::doit_prologue()
>>>
>>> Hi Serguei,
>>>
>>>    > I tested it on Linux and Windows but not yet on MacOS.
>>>
>>> The test succeeded now on all platforms.
>>>
>>> Thanks, Richard.
>>>
>>> -----Original Message-----
>>> From: Reingruber, Richard
>>> Sent: Freitag, 24. Juli 2020 15:04
>>> To: serguei.spitsyn at oracle.com; serviceability-dev at openjdk.java.net
>>> Subject: RE: RFR(S) 8249293: Unsafe stackwalk in 
>>> VM_GetOrSetLocal::doit_prologue()
>>>
>>> Hi Serguei,
>>>
>>>> The fix itself looks good to me.
>>>
>>> thanks for looking at the fix.
>>>
>>>> I still need another look at new test.
>>>> Could you, please, convert the agent of new test to C++?
>>>> It will make it a little bit simpler.
>>>
>>> Sure, here is the new webrev.1 with a C++ version of the test agent:
>>>
>>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
>>>
>>> I tested it on Linux and Windows but not yet on MacOS.
>>>
>>> Thanks,
>>> Richard.
>>>
>>> -----Original Message-----
>>> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
>>> Sent: Freitag, 24. Juli 2020 00:00
>>> To: Reingruber, Richard <richard.reingruber at sap.com>; 
>>> serviceability-dev at openjdk.java.net
>>> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in 
>>> VM_GetOrSetLocal::doit_prologue()
>>>
>>> Hi Richard,
>>>
>>> Thank you for filing the CR and taking care about it!
>>> The fix itself looks good to me.
>>> I still need another look at new test.
>>> Could you, please, convert the agent of new test to C++?
>>> It will make it a little bit simpler.
>>>
>>> Thanks,
>>> Serguei
>>>
>>>
>>> On 7/20/20 01:15, Reingruber, Richard wrote:
>>>> Hi,
>>>>
>>>> please help review this fix for VM_GetOrSetLocal. It moves the 
>>>> unsafe stackwalk from the vm
>>>> operation prologue before the safepoint into the doit() method 
>>>> executed at the safepoint.
>>>>
>>>> Webrev: 
>>>> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
>>>> Bug: https://bugs.openjdk.java.net/browse/JDK-8249293
>>>>
>>>> According to the JVMTI spec on local variable access it is not 
>>>> required to suspend the target thread
>>>> T [1]. The operation will simply fail with 
>>>> JVMTI_ERROR_NO_MORE_FRAMES if T is executing
>>>> bytecodes. It will succeed though if T is blocked because of 
>>>> synchronization or executing some native
>>>> code.
>>>>
>>>> The issue is that in the latter case the stack walk in 
>>>> VM_GetOrSetLocal::doit_prologue() to prepare
>>>> the access to the local variable is unsafe, because it is done 
>>>> before the safepoint and it races
>>>> with T returning to execute bytecodes making its stack not walkable. 
>>>> The included test shows that
>>>> this can crash the VM if T wins the race.
>>>>
>>>> Manual testing:
>>>>
>>>>     - new test 
>>>> test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
>>>>     - test/hotspot/jtreg/vmTestbase/nsk/jvmti
>>>>     - test/hotspot/jtreg/serviceability/jvmti
>>>>
>>>> Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, 
>>>> SPECjvm2008, SPECjbb2015,
>>>> Renaissance Suite, SAP specific tests with fastdebug and release 
>>>> builds on all platforms
>>>>
>>>> Thanks, Richard.
>>>>
>>>> [1] 
>>>> https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local
>>>
> 


More information about the serviceability-dev mailing list