RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Reingruber, Richard
richard.reingruber at sap.com
Tue Aug 11 09:52:20 UTC 2020
Hi Serguei,
> The implementation looks good to me.
Thanks.
> But I do not understand what the test is doing with all this counters and recursions.
The @comment gives an explanation: the target thread builds a stack as large as
possible to prolong the unsafe stackwalk. This is done by means of recursion.
> For instance, these fragments:
> 86 recursions = 0;
> 87 try {
> 88 recursiveMethod(1<<20);
> 89 } catch (StackOverflowError e) {
> 90 msg("Caught StackOverflowError as expected");
> 91 }
> 92 int target_depth = recursions-100; // spaces are missed around the '-' sigh
Here we calibrate the test for maximum stack depth. We measure how large the
stack can grow by calling recursiveMethod() until a StackOverflowError is
raised. We use recursions - 100 as target_depth to avoid the StackOverflowError
during the actual test.
> It is not obvious that the 'recursion' is updated in the recursiveMethod.
> I would suggestto make it more explicit:
> recursiveMethod(M);
> int target_depth = M - 100;
> Then the variable 'recursions' can be removed or become local.
> This method will be:
> 47 private static final int M = 1 << 20;
> ...
> 121 public long recursiveMethod(int depth) {
> 123 if (depth == 0) {
> 124 notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
> 126 } else {
> 127 recursiveMethod(--depth);
> 128 }
> 129 }
I don't think this would work. A StackOverflowError would be raised
before notifyAgentToGetLocalAndWaitShortly() is called.
> At least, he test is missing the comments explaining all these.
The arguments to the msg() method serve as documentation too. I would not want
to repeat the msg strings in comments.
Thanks, Richard.
-------------------------------------------------------------
From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
Sent: Montag, 10. August 2020 19:22
To: David Holmes <david.holmes at oracle.com>; Reingruber, Richard <richard.reingruber at sap.com>; serviceability-dev at openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Hi Richard and David,
The implementation looks good to me.
But I do not understand what the test is doing with all this counters and recursions.
For instance, these fragments:
86 recursions = 0;
87 try {
88 recursiveMethod(1<<20);
89 } catch (StackOverflowError e) {
90 msg("Caught StackOverflowError as expected");
91 }
92 int target_depth = recursions-100; // spaces are missed around the '-' sigh
It is not obvious that the 'recursion' is updated in the recursiveMethod.
I would suggestto make it more explicit:
recursiveMethod(M);
int target_depth = M - 100;
Then the variable 'recursions' can be removed or become local.
This method will be:
47 private static final int M = 1 << 20;
...
121 public long recursiveMethod(int depth) {
123 if (depth == 0) {
124 notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);
126 } else {
127 recursiveMethod(--depth);
128 }
129 }
At least, he test is missing the comments explaining all these.
Thanks,
Serguei
On 8/9/20 22:35, David Holmes wrote:
Hi Richard,
On 31/07/2020 5:28 pm, Reingruber, Richard wrote:
Hi,
I rebase the fix after JDK-8250042.
New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/
The general fix for this seems good. A minor nit:
588 if (!is_assignable(signature, ob_k, Thread::current())) {
You know that the current thread is the VMThread so can use VMThread::vm_thread().
Similarly for this existing code:
694 Thread* current_thread = Thread::current();
---
Looking at the test code ... I'm less clear on exactly what is happening and the use of spin-waits raises some red-flags for me in terms of test reliability on different platforms. The "while (--waitCycles > 0)" loop in particular offers no certainty that the agent thread is executing anything in particular. And the use of the spin_count as a guide to future waiting time seems somewhat arbitrary. In all seriousness I got a headache trying to work out how the test was expecting to operate. Some parts could be simplified using raw monitors, I think. But there's no sure way to know the agent thread is in the midst of the stackwalk when the target thread wants to leave the native code. So I understand what you are trying to achieve here, I'm just not sure how reliably it will actually achieve it.
test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp
32 static volatile jlong spinn_count = 0;
Using a 64-bit counter seems like it will be a problem on 32-bit systems.
Should be spin_count not spinn_count.
36 // Agent thread waits for value != 0, then performas the JVMTI call to get local variable.
typo: performas
Thanks,
David
-----
Thanks, Richard.
-----Original Message-----
From: serviceability-dev mailto:serviceability-dev-retn at openjdk.java.net On Behalf Of Reingruber, Richard
Sent: Montag, 27. Juli 2020 09:45
To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Hi Serguei,
> I tested it on Linux and Windows but not yet on MacOS.
The test succeeded now on all platforms.
Thanks, Richard.
-----Original Message-----
From: Reingruber, Richard
Sent: Freitag, 24. Juli 2020 15:04
To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net
Subject: RE: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Hi Serguei,
The fix itself looks good to me.
thanks for looking at the fix.
I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.
Sure, here is the new webrev.1 with a C++ version of the test agent:
http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/
I tested it on Linux and Windows but not yet on MacOS.
Thanks,
Richard.
-----Original Message-----
From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com
Sent: Freitag, 24. Juli 2020 00:00
To: Reingruber, Richard mailto:richard.reingruber at sap.com; mailto:serviceability-dev at openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()
Hi Richard,
Thank you for filing the CR and taking care about it!
The fix itself looks good to me.
I still need another look at new test.
Could you, please, convert the agent of new test to C++?
It will make it a little bit simpler.
Thanks,
Serguei
On 7/20/20 01:15, Reingruber, Richard wrote:
Hi,
please help review this fix for VM_GetOrSetLocal. It moves the unsafe stackwalk from the vm
operation prologue before the safepoint into the doit() method executed at the safepoint.
Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html
Bug: https://bugs.openjdk.java.net/browse/JDK-8249293
According to the JVMTI spec on local variable access it is not required to suspend the target thread
T [1]. The operation will simply fail with JVMTI_ERROR_NO_MORE_FRAMES if T is executing
bytecodes. It will succeed though if T is blocked because of synchronization or executing some native
code.
The issue is that in the latter case the stack walk in VM_GetOrSetLocal::doit_prologue() to prepare
the access to the local variable is unsafe, because it is done before the safepoint and it races
with T returning to execute bytecodes making its stack not walkable. The included test shows that
this can crash the VM if T wins the race.
Manual testing:
- new test test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java
- test/hotspot/jtreg/vmTestbase/nsk/jvmti
- test/hotspot/jtreg/serviceability/jvmti
Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode, SPECjvm2008, SPECjbb2015,
Renaissance Suite, SAP specific tests with fastdebug and release builds on all platforms
Thanks, Richard.
[1] https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local
More information about the serviceability-dev
mailing list