shenandoah-dev Digest, Vol 100, Issue 40

Ramakrishna, Ramki ysr at amazon.com
Fri Jan 19 21:14:30 UTC 2024


Hi Kirill --

> I afraid that I can't make portable reproducer because the issue is happening
> during redeployment of large cluster (>200 machines in total) and may affects a
> random machine. Or two.
> 
> 
> This happened only on start, and if application had started and works few
> minutes, it will work without this issue.
>
>
> Unfortently not each redeployment triggers it, let say one of four.
> 
> 
> Until I've figured out how to reproduce it, I have no idea how to trace it on
> production environment without perofrmance degradation and it's clearly that
> both -Xlog:safepoint=trace and -XX:+SafepointALot aren't an option here :(

Perhaps try `-XX:+SafepointTimeout` along with a suitably high value for the associated `-XX: SafepointTimeoutDelay=` value?

Here are their respective defaults:

  product(bool, SafepointTimeout, false,                                    \
          "Time out and warn or fail after SafepointTimeoutDelay "          \
          "milliseconds if failed to reach safepoint")                      \
                                                                            \

  product(double, SafepointTimeoutDelay, 10000,                             \
          "Delay in milliseconds for option SafepointTimeout; "             \
          "supports sub-millisecond resolution with fractional values.")    \
          range(0, max_jlongDouble LP64_ONLY(/MICROUNITS))                  \


This is supposed to provide more info if we took too long to reach a safepoint:

    // Check if this has taken too long:
    if (SafepointTimeout && safepoint_limit_time < os::javaTimeNanos()) {
      print_safepoint_timeout();
    }

Where print_safepoint_timeout() does:

      ls.print_cr("# SafepointSynchronize::begin: Timeout detected:");
      ls.print_cr("# SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.");
      ls.print_cr("# SafepointSynchronize::begin: Threads which did not reach the safepoint:");
      for (JavaThreadIteratorWithHandle jtiwh; JavaThread *cur_thread = jtiwh.next(); ) {
        if (cur_thread->safepoint_state()->is_running()) {
          ls.print("# ");
          cur_thread->print_on(&ls);
          ls.cr();
        }
      }
      ls.print_cr("# SafepointSynchronize::begin: (End of list)");

best,
/ Ramki


> 
> End of shenandoah-dev Digest, Vol 100, Issue 40
> ***********************************************





More information about the shenandoah-dev mailing list