RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

David Holmes david.holmes at oracle.com
Mon Aug 10 05:35:15 UTC 2020


Hi Richard,

On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
> Hi,
> 
> I rebase the fix after JDK-8250042.
> 
> New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/

The general fix for this seems good. A minor nit:

  588     if (!is_assignable(signature, ob_k, Thread::current())) {

You know that the current thread is the VMThread so can use 
VMThread::vm_thread().

Similarly for this existing code:

  694     Thread* current_thread = Thread::current();

---

Looking at the test code ... I'm less clear on exactly what is happening 
and the use of spin-waits raises some red-flags for me in terms of test 
reliability on different platforms. The "while (--waitCycles > 0)" loop 
in particular offers no certainty that the agent thread is executing 
anything in particular. And the use of the spin_count as a guide to 
future waiting time seems somewhat arbitrary. In all seriousness I got 
a headache trying to work out how the test was expecting to operate. 
Some parts could be simplified using raw monitors, I think. But there's 
no sure way to know the agent thread is in the midst of the stackwalk 
when the target thread wants to leave the native code. So I understand 
what you are trying to achieve here, I'm just not sure how reliably it 
will actually achieve it.

test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp

  32 static volatile jlong spinn_count     = 0;

Using a 64-bit counter seems like it will be a problem on 32-bit systems.

Should be spin_count not spinn_count.

  36 // Agent thread waits for value != 0, then performas the JVMTI call 
to get local variable.

typo: performas

Thanks,
David
-----

> Thanks, Richard.
> 
> -----Original Message-----
> From: serviceability-dev <serviceability-dev-retn at openjdk.java.net> On Behalf Of Reingruber, Richard
> Sent: Montag, 27. Juli 2020 09:45
> To: serguei.spitsyn at oracle.com; serviceability-dev at openjdk.java.net
> Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
> 
> Hi Serguei,
> 
>    > I tested it on Linux and Windows but not yet on MacOS.
> 
> The test succeeded now on all platforms.
> 
> Thanks, Richard.
> 
> -----Original Message-----
> From: Reingruber, Richard
> Sent: Freitag, 24. Juli 2020 15:04
> To: serguei.spitsyn at oracle.com; serviceability-dev at openjdk.java.net
> Subject: RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
> 
> Hi Serguei,
> 
>> The fix itself looks good to me.
> 
> thanks for looking at the fix.
> 
>> I still need another look at new test.
>> Could you, please, convert the agent of new test to C++?
>> It will make it a little bit simpler.
> 
> Sure, here is the new webrev.1 with a C++ version of the test agent:
> 
> http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
> 
> I tested it on Linux and Windows but not yet on MacOS.
> 
> Thanks,
> Richard.
> 
> -----Original Message-----
> From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
> Sent: Freitag, 24. Juli 2020 00:00
> To: Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net
> Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
> 
> Hi Richard,
> 
> Thank you for filing the CR and taking care about it!
> The fix itself looks good to me.
> I still need another look at new test.
> Could you, please, convert the agent of new test to C++?
> It will make it a little bit simpler.
> 
> Thanks,
> Serguei
> 
> 
> On 7/20/20 01:15, Reingruber, Richard wrote:
>> Hi,
>>
>> please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk from the vm
>> operation prologue before the safepoint into the doit() method executed at the safepoint.
>>
>> Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
>> Bug:    https://bugs.openjdk.java.net/browse/JDK-8249293
>>
>> According to the JVMTI spec on local variable access it is not required to suspend the target thread
>> T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is executing
>> bytecodes. It will succeed though if T is blocked because of synchronization or executing some native
>> code.
>>
>> The issue is that in the latter case the stack walk in VM_GetOrSetLocal::doit_prologue() to prepare
>> the access to the local variable is unsafe, because it is done before the safepoint and it races
>> with T returning to execute bytecodes making its stack not walkable. The included test shows that
>> this can crash the VM if T wins the race.
>>
>> Manual testing:
>>
>>     - new test test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
>>     - test/hotspot/jtreg/vmTestbase/nsk/jvmti
>>     - test/hotspot/jtreg/serviceability/jvmti
>>
>> Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
>> Renaissance Suite, SAP specific tests with fastdebug and release builds on all platforms
>>
>> Thanks, Richard.
>>
>> [1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local
> 


More information about the serviceability-dev mailing list