Deadlock on OpenJDK 17

Ramakrishna, Ramki ysr at amazon.com
Fri Jan 19 21:16:09 UTC 2024


Subject line corrected; my apologies!

On 1/19/24, 1:14 PM, "Ramakrishna, Ramki" <ysr at amazon.com <mailto:ysr at amazon.com>> wrote:


Hi Kirill --


> I afraid that I can't make portable reproducer because the issue is happening
> during redeployment of large cluster (>200 machines in total) and may affects a
> random machine. Or two.
> 
> 
> This happened only on start, and if application had started and works few
> minutes, it will work without this issue.
>
>
> Unfortently not each redeployment triggers it, let say one of four.
> 
> 
> Until I've figured out how to reproduce it, I have no idea how to trace it on
> production environment without perofrmance degradation and it's clearly that
> both -Xlog:safepoint=trace and -XX:+SafepointALot aren't an option here :(


Perhaps try `-XX:+SafepointTimeout` along with a suitably high value for the associated `-XX: SafepointTimeoutDelay=` value?


Here are their respective defaults:


product(bool, SafepointTimeout, false, \
"Time out and warn or fail after SafepointTimeoutDelay " \
"milliseconds if failed to reach safepoint") \
\


product(double, SafepointTimeoutDelay, 10000, \
"Delay in milliseconds for option SafepointTimeout; " \
"supports sub-millisecond resolution with fractional values.") \
range(0, max_jlongDouble LP64_ONLY(/MICROUNITS)) \




This is supposed to provide more info if we took too long to reach a safepoint:


// Check if this has taken too long:
if (SafepointTimeout && safepoint_limit_time < os::javaTimeNanos()) {
print_safepoint_timeout();
}


Where print_safepoint_timeout() does:


ls.print_cr("# SafepointSynchronize::begin: Timeout detected:");
ls.print_cr("# SafepointSynchronize::begin: Timed out while spinning to reach a safepoint.");
ls.print_cr("# SafepointSynchronize::begin: Threads which did not reach the safepoint:");
for (JavaThreadIteratorWithHandle jtiwh; JavaThread *cur_thread = jtiwh.next(); ) {
if (cur_thread->safepoint_state()->is_running()) {
ls.print("# ");
cur_thread->print_on(&ls);
ls.cr();
}
}
ls.print_cr("# SafepointSynchronize::begin: (End of list)");


best,
/ Ramki




> 
> End of shenandoah-dev Digest, Vol 100, Issue 40
> ***********************************************











More information about the shenandoah-dev mailing list