RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Reingruber, Richard richard.reingruber at sap.com
Tue Aug 25 07:34:24 UTC 2020


Hi Serguei,

thanks for your review.

> These two lines can be removed:
>  391   jvmtiEnv* jvmti;
>  406   ::jvmti = jvmti;
> No need in another webrev if you fix it.

I've made this change and fixed the copyright year in
jvmtiImpl.hpp. Unfortunately I got no results from the submit repo [1].

I've pushed again to submit. Will push to the jdk master repo when this job
reports success.

Thanks,
Richard.

[1] https://mail.openjdk.java.net/pipermail/jdk-submit-changes/2020-August/012091.html

From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
Sent: Freitag, 21. August 2020 18:48
To: Reingruber, Richard <richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com>; serviceability-dev at openjdk.java.net
Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()

Hi Richard,

It looks good, thank you for the update!

These two lines can be removed:

 391   jvmtiEnv* jvmti;

 406   ::jvmti = jvmti;
No need in another webrev if you fix it.

Thanks,
Serguei


On 8/21/20 09:09, Reingruber, Richard wrote:

Hi Serguei,



I have prepared a new webrev based on your suggestions.



Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.6/

Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.6.inc/



Thanks,

Richard.



______

From: serguei.spitsyn at oracle.com<mailto:serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com><mailto:serguei.spitsyn at oracle.com>

Sent: Freitag, 21. August 2020 11:22

To: Reingruber, Richard <richard.reingruber at sap.com><mailto:richard.reingruber at sap.com>; David Holmes <david.holmes at oracle.com><mailto:david.holmes at oracle.com>; serviceability-dev at openjdk.java.net<mailto:serviceability-dev at openjdk.java.net>

Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()



Hi Richard,





Thank you for the update, it looks really nice.

Just several more minor comments though (I hope, the last ones).



Should I rename the variable to spinWaitCycles or something similar?



Yes, waitCycles would be better and more consistent.



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.5.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.udiff.html

  81      * The wait time is given in cycles.

  82      */  83     public int waitTime;

  ...

  93         waitTime = 1;

This line 82 can be removed if you rename waitTime to waitCycles.

It is better to initialize waitCycles at definition and remove the line 93.

 146     public static void msg(String m) {

 147         System.out.println("### Java-Test: " + m);

 148     }

One of the de-facto standard names for such methods is "log".

  80      * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.

  89         msg("Test how many frames fit on the stack by performing recursive calls until StackOverflowError is thrown");

Could you, please, reballance the two long lines above?





http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.5.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html



There are several spots that can be simplified a little bit:

  95     jvmtiError result;

  96

  97     result = jvmti->GetErrorName(errCode, &errMsg);

  ==>

      jvmtiError result = jvmti->GetErrorName(errCode, &errMsg);



The same is true for for the following cases:

 115   err = jvmti->RawMonitorEnter(glws_monitor);

 125   err = jvmti->RawMonitorExit(glws_monitor);

 135   err = jvmti->RawMonitorWait(glws_monitor, 0);

 145   err = jvmti->RawMonitorNotify(glws_monitor);

 155   err = jvmti->DestroyRawMonitor(glws_monitor);



 173   if (errMsg !=  NULL) {

An extra space before NULL.

  89 static jvmtiEnv* jvmti_global = NULL;

 276   jvmtiEnv* jvmti = jvmti_global;

 308   jvmtiEnv* jvmti = jvmti_global;

 330     jvmtiEnv* jvmti = jvmti_global;

 ...

 409     jvmtiEnv* jvmti;

 419     res = jvm->GetEnv((void **) &jvmti, JVMTI_VERSION_9);

 424     jvmti_global = jvmti;



Normal practice is to name the "global_jvmti" as "jvmti".

Then there is no need to set it at the start of each function.



Thanks,

Serguei





On 8/20/20 23:47, Reingruber, Richard wrote:

Hi Serguei,



Sorry for the delay in reply and thank you for the update.

I like it in general. There are some minor comments though.



Excellent, thanks :)



I've prepared webrev.5.



Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.5/

Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.5.inc/



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html

 112             waitTime = (waitTime << 1); // double wait time

 113             if (waitTime >= M || waitTime < 0) {

 114                 waitTime = 1; // reset when too long

 115             }

The M is too big for time.



"waitTime" is roughly the number of cycles spent in a spin wait. M ~= 10^6

cycles does not seem too long. Should I rename the variable to spinWaitCycles or

something similar?



What about something like this:

  waitTime = (waitTime << 1) % 32;

or

  waitTime = (waitTime << 1) & 32;



I went for



            // Double wait time, but limit to roughly 10^6 cycles.

            waitTime = (waitTime << 1) & (M - 1);

            waitTime = waitTime == 0 ? 1 : waitTime;



Masking the waitTime with % 32 is too small. In my experiments with fastdebug

builds I got the crash often with a waitTime of 8K on a Linux server and 256K on

my Windows notebook.



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.udiff.html

// - Wait for the target thread to either start a new test iteration or to

+//   signal shutdown.

A suggestion to replace: "to either start" => "either to start".



Ok, done.



+//     Shutdown is signalled by setting test_state to ShutDown. The agent reacts

+//     to it by changing test_state to Terminated and then it exits.

The second "it" is not needed: "then it exits" => "then exits".



Ok, done.



+//     ... It sets the shared variable test_state

+//     to TargetInNative and then it uses the glws_monitor to send the



The second "it" is not needed.



Ok, done.





+  monitor_enter(jvmti, env, glws_monitor, AT_LINE);

+  monitor_notify(jvmti, env, glws_monitor, AT_LINE);

+    monitor_wait(jvmti, env, glws_monitor, AT_LINE);

+  monitor_exit(jvmti, env, glws_monitor, AT_LINE);

+  monitor_destroy(jvmti, env, glws_monitor, AT_LINE);



There is only one lock.

It'd be more simple to directly use it in the called functions and get rid of the parameter.

Just a suggestion, it is up to you to decide.



Ok, done.



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html



I'd suggest to refactor the lines 213 and 239-257 to a separate function test_GetLocalObject(jvmti, depth).

 240         jobject local_val;

Better to rename it to local_obj or just obj.



Ok, done.



There are still problems with the indent.



I reformatted the file using 2 space indentation like in other C++ sources. I didn't include the

indentation change in the delta webrev.



Thanks,

Richard.



______________________

From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com

Sent: Donnerstag, 20. August 2020 04:42

To: Reingruber, Richard mailto:richard.reingruber at sap.com; David Holmes mailto:david.holmes at oracle.com; mailto:serviceability-dev at openjdk.java.net

Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()



Hi Richard,



Sorry for the delay in reply and thank you for the update.

I like it in general. There are some minor comments though.





http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html

 112             waitTime = (waitTime << 1); // double wait time

 113             if (waitTime >= M || waitTime < 0) {

 114                 waitTime = 1; // reset when too long

 115             }

The M is too big for time.

What about something like this:

  waitTime = (waitTime << 1) % 32;

or

  waitTime = (waitTime << 1) & 32;



You can choose a better number instead of 32.





http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.udiff.html

// - Wait for the target thread to either start a new test iteration or to

+//   signal shutdown.

A suggestion to replace: "to either start" => "either to start".

+//     Shutdown is signalled by setting test_state to ShutDown. The agent reacts

+//     to it by changing test_state to Terminated and then it exits.

The second "it" is not needed: "then it exits" => "then exits".

+//     ... It sets the shared variable test_state

+//     to TargetInNative and then it uses the glws_monitor to send the



The second "it" is not needed.



+  monitor_enter(jvmti, env, glws_monitor, AT_LINE);

+  monitor_notify(jvmti, env, glws_monitor, AT_LINE);

+    monitor_wait(jvmti, env, glws_monitor, AT_LINE);

+  monitor_exit(jvmti, env, glws_monitor, AT_LINE);

+  monitor_destroy(jvmti, env, glws_monitor, AT_LINE);



There is only one lock.

It'd be more simple to directly use it in the called functions and get rid of the parameter.

Just a suggestion, it is up to you to decide.





http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html



I'd suggest to refactor the lines 213 and 239-257 to a separate function test_GetLocalObject(jvmti, depth).

 240         jobject local_val;

Better to rename it to local_obj or just obj.



There are still problems with the indent.

The indent 4 is mostly used.

However there are still fragments with the indent 2:

 112 static void monitor_enter(jvmtiEnv* jvmti, JNIEnv* env, jrawMonitorID mon, const char* loc) {

 113   jvmtiError err;

 114

 115   err = jvmti->RawMonitorEnter(mon);

 116   if (err != JVMTI_ERROR_NONE) {

 117     ShowErrorMessage(jvmti, err, loc);

 118     env->FatalError(loc);

 119   }

 120 }

 121

 122 static void monitor_exit(jvmtiEnv* jvmti, JNIEnv* env, jrawMonitorID mon, const char* loc) {

 123   jvmtiError err;

 124

 125   err = jvmti->RawMonitorExit(mon);

 126   if (err != JVMTI_ERROR_NONE) {

 127     ShowErrorMessage(jvmti, err, loc);

 128     env->FatalError(loc);

 129   }

 130 }

 131

 132 static void monitor_wait(jvmtiEnv* jvmti, JNIEnv* env, jrawMonitorID mon, const char* loc) {

 133   jvmtiError err;

 134

 135   err = jvmti->RawMonitorWait(mon, 0);

 136   if (err != JVMTI_ERROR_NONE) {

 137     ShowErrorMessage(jvmti, err, loc);

 138     env->FatalError(loc);

 139   }

 140 }

 141

 142 static void monitor_notify(jvmtiEnv* jvmti, JNIEnv* env, jrawMonitorID mon, const char* loc) {

 143   jvmtiError err;

 144

 145   err = jvmti->RawMonitorNotify(mon);

 146   if (err != JVMTI_ERROR_NONE) {

 147     ShowErrorMessage(jvmti, err, loc);

 148     env->FatalError(loc);

 149   }

 150 }

 151

 152 static void monitor_destroy(jvmtiEnv* jvmti, JNIEnv* env, jrawMonitorID mon, const char* loc) {

 153   jvmtiError err;

 154

 155   err = jvmti->DestroyRawMonitor(mon);

 156   if (err != JVMTI_ERROR_NONE) {

 157     ShowErrorMessage(jvmti, err, loc);

 158     env->FatalError(loc);

 159   }

 ...

 160 }

 196     while (target_thread == NULL) {

 197       monitor_wait(jvmti, env, glws_monitor, AT_LINE);

 198     }

 ...

 220         while (test_state != TargetInNative) {

 221           if (test_state == ShutDown) {

 222             test_state = Terminated;

 223             monitor_notify(jvmti, env, glws_monitor, AT_LINE);

 224             monitor_exit(jvmti, env, glws_monitor, AT_LINE);

 225             return;

 226           }

 227           monitor_wait(jvmti, env, glws_monitor, AT_LINE);

 228         }

 ...

 263 // Called by target thread after building a large stack.

 264 // By calling this native method, the thread's stack becomes walkable.

 265 // It notifies the agent to do the GetLocalObject() call and then races

 266 // it to make its stack not walkable by returning from the native call.

 267 JNIEXPORT void JNICALL

 268 Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocal(JNIEnv *env, jclass cls, jint depth, jlong waitCycles) {

 269   jvmtiEnv* jvmti = jvmti_global;

 270

 271   monitor_enter(jvmti, env, glws_monitor, AT_LINE);

 272

 273   // Set depth_for_get_local and notify agent that the target thread is ready for the GetLocalObject() call

 274   depth_for_get_local = depth;

 275   test_state = TargetInNative;

 276

 277   monitor_notify(jvmti, env, glws_monitor, AT_LINE);

 278

 279   // Wait for agent thread to read depth_for_get_local and do the GetLocalObject() call

 280   while (test_state != AgentInGetLocal) {

 281     monitor_wait(jvmti, env, glws_monitor, AT_LINE);

 282   }

 283

 284   // Reset state to Initial

 285   test_state = Initial;

 286

 287   monitor_exit(jvmti, env, glws_monitor, AT_LINE);

 288

 289   // Wait a little until agent thread is in unsafe stack walk.

 290   // This needs to be a spin wait or sleep because we cannot get a notification

 291   // from there.

 292   while (--waitCycles > 0) {

 293     dummy_counter++;

 294   }

 295 }

 ...

 299 JNIEXPORT void JNICALL

 300 Java_GetLocalWithoutSuspendTest_shutDown(JNIEnv *env, jclass cls) {

 301   jvmtiEnv* jvmti = jvmti_global;

 302

 303   monitor_enter(jvmti, env, glws_monitor, AT_LINE);

 304

 305   // Notify agent thread to shut down

 306   test_state = ShutDown;

 307   monitor_notify(jvmti, env, glws_monitor, AT_LINE);

 308

 309   // Wait for agent to terminate

 310   while (test_state != Terminated) {

 311     monitor_wait(jvmti, env, glws_monitor, AT_LINE);

 312   }

 313

 314   monitor_exit(jvmti, env, glws_monitor, AT_LINE);

 315

 316   // Destroy glws_monitor

 317   monitor_destroy(jvmti, env, glws_monitor, AT_LINE);

 318 }



Thanks,

Serguei





On 8/14/20 07:06, Reingruber, Richard wrote:

Hi Serguei,



thanks for the feedback. I have implemented your suggestions and created a new

webrev:



Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4/

Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.4.inc/



Please find my replies to your comments below.



Best regards,

Richard.



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html



  33  *          the stack walk. The target thread's stack is walkable while in native. After sending the notification it

  ...



  54      * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.

  55      * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.

  ...



  71      * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.

  ...



  89             msg((now -start) + " ms  Iteration : " + iterations + "  waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);



Could you, please, re-balance the lines above to make them shorter?



Ok, done.





  90             int newTargetDepth = recursiveMethod(0, targetDepth);

  91             if (newTargetDepth < targetDepth) {

  92                 msg("StackOverflowError during test.");

  93                 msg("Old target depth: " + targetDepth);

  94                 msg("Retry with new target depth: " + newTargetDepth);

  95                 targetDepth = newTargetDepth;

  96             }

A comment is needed to explain why a StackOverflowError is not desired.

At least, it is not obvious initially.

  73     public int waitTimeInNativeAfterNotify;



This name is unreasonably long which makes the code less readable.

I'd suggest to reduce it to waitTime.



Ok, done.



 103         notifyAgentToGetLocalAndWaitShortly(-1, 1);

 ...

 119                 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);



It is better to provide a short comment before each call explaining what it is doing.

For instance, it is not clear why the call at the line 103 is needed.

Why do we need to notify the agent to GetLocal for the second time?



The test is repeated TEST_ITERATIONS times. In each iteration the agent calls

GetLocal racing the target thread returning from the native call. The last call

in line 103 ist the shutdown signal.



Can it be refactored into a separate native method?



I've made the shutdown process more explicit with the new native method

shutDown() which sets thest_state to ShutDown.



Then the the function name can be reduced to 'notifyAgentToGetLocal'.

This long name does not give enough context anyway.



Ok, done.



  85         long iterations = 0;

  87         do {

  ...

  97             iterations++;

  ...

 102         } while (iterations < TEST_ITERATIONS);



Why a more explicit 'for' or 'while' loop is not used here? :

for (long iter = 0; iter < TEST_ITERATIONS; iter++) {



I have converted the loop into a for loop.



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html



The indent in this file varies. It is better to keep it the same: 4 or 2.



Yes, I noticed this. I have not corrected it yet, because I didn't want to

pullute the incremental webrev with that change. Would you like me to fix the

indentation now to 2 spaces or do it as a last step?



  60   AgentCallingGetLocalObject // The target thread waits for the agent to call

I'd suggest to rename the constant to 'AgentInGetLocal'.



Ok, done.



 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {



It is better rename the function to TestThreadLoop.



Would AgentThreadLoop be ok too?



You can add a comment before to explain some basic about what it is doing.



Ok, done.



 167     printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",

It is better to get rid of leading stars in all messages.



Ok, done.



 176         // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly

The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.



Ok, done.



---

From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com

Sent: Freitag, 14. August 2020 10:11

To: Reingruber, Richard mailto:richard.reingruber at sap.com; David Holmes mailto:david.holmes at oracle.com; mailto:serviceability-dev at openjdk.java.net

Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()



Hi Richard,



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java.frames.html



  33  *          the stack walk. The target thread's stack is walkable while in native. After sending the notification it

  ...



  54      * @param depth Depth of target frame for GetLocalObject() call. Should be large value to prolong the unsafe stack walk.

  55      * @param waitTimeInNativeAfterNotify Time to wait after notify with walkable stack before returning an becoming unsafe again.

  ...



  71      * Wait time in native, i.e. with walkable stack, after notifying agent thread to do GetLocalObject() call.

  ...



  89             msg((now -start) + " ms  Iteration : " + iterations + "  waitTimeInNativeAfterNotify : " + waitTimeInNativeAfterNotify);



Could you, please, re-balance the lines above to make them shorter?





  90             int newTargetDepth = recursiveMethod(0, targetDepth);

  91             if (newTargetDepth < targetDepth) {

  92                 msg("StackOverflowError during test.");

  93                 msg("Old target depth: " + targetDepth);

  94                 msg("Retry with new target depth: " + newTargetDepth);

  95                 targetDepth = newTargetDepth;

  96             }

A comment is needed to explain why a StackOverflowError is not desired.

At least, it is not obvious initially.

  73     public int waitTimeInNativeAfterNotify;



This name is unreasonably long which makes the code less readable.

I'd suggest to reduce it to waitTime.



 103         notifyAgentToGetLocalAndWaitShortly(-1, 1);

 ...

 119                 notifyAgentToGetLocalAndWaitShortly(depth - 100, waitTimeInNativeAfterNotify);



It is better to provide a short comment before each call explaining what it is doing.

For instance, it is not clear why the call at the line 103 is needed.

Why do we need to notify the agent to GetLocal for the second time?

Can it be refactored into a separate native method?

Then the the function name can be reduced to 'notifyAgentToGetLocal'.

This long name does not give enough context anyway.

  85         long iterations = 0;

  87         do {

  ...

  97             iterations++;

  ...

 102         } while (iterations < TEST_ITERATIONS);



Why a more explicit 'for' or 'while' loop is not used here? :

for (long iter = 0; iter < TEST_ITERATIONS; iter++) {





http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp.frames.html



The indent in this file varies. It is better to keep it the same: 4 or 2.



  60   AgentCallingGetLocalObject // The target thread waits for the agent to call

I'd suggest to rename the constant to 'AgentInGetLocal'.

 150 GetLocalWithoutSuspendTestThreadLoop(jvmtiEnv * jvmti, JNIEnv* env, void * arg) {



It is better rename the function to TestThreadLoop.

You can add a comment before to explain some basic about what it is doing.

 167     printf("*** AGENT: GetLocalWithoutSuspendTestThreadLoop thread started. Polling thread '%s' for local variables\n",

It is better to get rid of leading stars in all messages.

 176         // the native method Java_GetLocalWithoutSuspendTest_notifyAgentToGetLocalAndWaitShortly

The part 'Java_GetLocalWithoutSuspendTest_' can be removed from the function name.





I'm still reviewing the test native agent code.





Thanks,

Serguei





On 8/11/20 03:02, Reingruber, Richard wrote:

Hi David and Serguei,



On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:

Hi Richard and David,



The implementation looks good to me.



But I do not understand what the test is doing with all this counters

and recursions.



For instance, these fragments:



   86         recursions = 0;

   87         try {

   88             recursiveMethod(1<<20);

   89         } catch (StackOverflowError e) {

   90             msg("Caught StackOverflowError as expected");

   91         }

   92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh



It is not obvious that the 'recursion' is updated in the recursiveMethod.

I would suggestto make it more explicit:

    recursiveMethod(M);

    int target_depth = M - 100;



Then the variable 'recursions' can be removed or become local.



The recursiveMethod takes in the maximum recursions to try and updates

the recursions variable to record how many recursions were possible - so:



target_depth = <actual recursions> - 100;



Possibly recursiveMethod could return the actual recursions instead of

using the global variables?



I've eliminated the static 'recursions' variable. recursiveMethod() now returns

the depth at which the recursion was ended. I hesitated doing this, because I

had to handle the StackOverflowError with all those frames still on stack. But

the handler is empty, so it should not cause problems.



This is the new webrev (as posted previously):



Webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3/

Delta:  http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.3.inc/



Thanks, Richard.



-----Original Message-----

From: David Holmes mailto:david.holmes at oracle.com

Sent: Dienstag, 11. August 2020 04:00

To: mailto:serguei.spitsyn at oracle.com; Reingruber, Richard mailto:richard.reingruber at sap.com; mailto:serviceability-dev at openjdk.java.net

Subject: Re: RFR(S) 8249293: Unsafe stackwalk in VM_GetOrSetLocal::doit_prologue()



Hi Serguei,



On 11/08/2020 3:21 am, mailto:serguei.spitsyn at oracle.com wrote:

Hi Richard and David,



The implementation looks good to me.



But I do not understand what the test is doing with all this counters

and recursions.



For instance, these fragments:



   86         recursions = 0;

   87         try {

   88             recursiveMethod(1<<20);

   89         } catch (StackOverflowError e) {

   90             msg("Caught StackOverflowError as expected");

   91         }

   92         int target_depth = recursions-100;    // spaces are missed around the '-' sigh



It is not obvious that the 'recursion' is updated in the recursiveMethod.

I would suggestto make it more explicit:

    recursiveMethod(M);

    int target_depth = M - 100;



Then the variable 'recursions' can be removed or become local.



The recursiveMethod takes in the maximum recursions to try and updates

the recursions variable to record how many recursions were possible - so:



target_depth = <actual recursions> - 100;



Possibly recursiveMethod could return the actual recursions instead of

using the global variables?



David

-----



This method will be:



   47     private static final int M = 1 << 20;

  ...

  121     public long recursiveMethod(int depth) {

  123         if (depth == 0) {

  124             notifyAgentToGetLocalAndWaitShortly(M - 100, waitTimeInNativeAfterNotify);

  126         } else {

  127             recursiveMethod(--depth);

  128         }

  129     }





At least, he test is missing the comments explaining all these.



Thanks,

Serguei







On 8/9/20 22:35, David Holmes wrote:

Hi Richard,



On 31/07/2020 5:28 pm, Reingruber, Richard wrote:

Hi,



I rebase the fix after JDK-8250042.



New webrev: http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.2/



The general fix for this seems good. A minor nit:



 588     if (!is_assignable(signature, ob_k, Thread::current())) {



You know that the current thread is the VMThread so can use

VMThread::vm_thread().



Similarly for this existing code:



 694     Thread* current_thread = Thread::current();



---



Looking at the test code ... I'm less clear on exactly what is

happening and the use of spin-waits raises some red-flags for me in

terms of test reliability on different platforms. The "while

(--waitCycles > 0)" loop in particular offers no certainty that the

agent thread is executing anything in particular. And the use of the

spin_count as a guide to future waiting time seems somewhat arbitrary.

In all seriousness I got a headache trying to work out how the test

was expecting to operate. Some parts could be simplified using raw

monitors, I think. But there's no sure way to know the agent thread is

in the midst of the stackwalk when the target thread wants to leave

the native code. So I understand what you are trying to achieve here,

I'm just not sure how reliably it will actually achieve it.



test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/libGetLocalWithoutSuspendTest.cpp





 32 static volatile jlong spinn_count     = 0;



Using a 64-bit counter seems like it will be a problem on 32-bit systems.



Should be spin_count not spinn_count.



 36 // Agent thread waits for value != 0, then performas the JVMTI

call to get local variable.



typo: performas



Thanks,

David

-----



Thanks, Richard.



-----Original Message-----

From: serviceability-dev mailto:serviceability-dev-retn at openjdk.java.net

On Behalf Of Reingruber, Richard

Sent: Montag, 27. Juli 2020 09:45

To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net

Subject: [CAUTION] RE: RFR(S) 8249293: Unsafe stackwalk in

VM_GetOrSetLocal::doit_prologue()



Hi Serguei,



   > I tested it on Linux and Windows but not yet on MacOS.



The test succeeded now on all platforms.



Thanks, Richard.



-----Original Message-----

From: Reingruber, Richard

Sent: Freitag, 24. Juli 2020 15:04

To: mailto:serguei.spitsyn at oracle.com; mailto:serviceability-dev at openjdk.java.net

Subject: RE: RFR(S) 8249293: Unsafe stackwalk in

VM_GetOrSetLocal::doit_prologue()



Hi Serguei,



The fix itself looks good to me.



thanks for looking at the fix.



I still need another look at new test.

Could you, please, convert the agent of new test to C++?

It will make it a little bit simpler.



Sure, here is the new webrev.1 with a C++ version of the test agent:



http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.1/



I tested it on Linux and Windows but not yet on MacOS.



Thanks,

Richard.



-----Original Message-----

From: mailto:serguei.spitsyn at oracle.com mailto:serguei.spitsyn at oracle.com

Sent: Freitag, 24. Juli 2020 00:00

To: Reingruber, Richard mailto:richard.reingruber at sap.com;

mailto:serviceability-dev at openjdk.java.net

Subject: Re: RFR(S) 8249293: Unsafe stackwalk in

VM_GetOrSetLocal::doit_prologue()



Hi Richard,



Thank you for filing the CR and taking care about it!

The fix itself looks good to me.

I still need another look at new test.

Could you, please, convert the agent of new test to C++?

It will make it a little bit simpler.



Thanks,

Serguei





On 7/20/20 01:15, Reingruber, Richard wrote:

Hi,



please help review this fix for VM_GetOrSetLocal. It moves the

unsafe stackwalk from the vm

operation prologue before the safepoint into the doit() method

executed at the safepoint.



Webrev:

http://cr.openjdk.java.net/~rrich/webrevs/8249293/webrev.0/index.html

Bug: https://bugs.openjdk.java.net/browse/JDK-8249293



According to the JVMTI spec on local variable access it is not

required to suspend the target thread

T [1]. The operation will simply fail with

JVMTI_ERROR_NO_MORE_FRAMES if T is executing

bytecodes. It will succeed though if T is blocked because of

synchronization or executing some native

code.



The issue is that in the latter case the stack walk in

VM_GetOrSetLocal::doit_prologue() to prepare

the access to the local variable is unsafe, because it is done

before the safepoint and it races

with T returning to execute bytecodes making its stack not walkable.

The included test shows that

this can crash the VM if T wins the race.



Manual testing:



    - new test

test/hotspot/jtreg/serviceability/jvmti/GetLocalVariable/GetLocalWithoutSuspendTest.java

    - test/hotspot/jtreg/vmTestbase/nsk/jvmti

    - test/hotspot/jtreg/serviceability/jvmti



Nightly regression tests @SAP: JCK and JTREG, also in Xcomp mode,

SPECjvm2008, SPECjbb2015,

Renaissance Suite, SAP specific tests with fastdebug and release

builds on all platforms



Thanks, Richard.



[1]

https://docs.oracle.com/en/java/javase/14/docs/specs/jvmti.html#local











-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20200825/11f6976a/attachment-0001.htm>


More information about the serviceability-dev mailing list