Jetty hangs early on Java 12 on OSv with Threadlocal Handshakes ON - shorter version

Waldek Kozaczuk jwkozaczuk at gmail.com
Thu Aug 8 01:48:31 UTC 2019


Hi,

I am one of the committers to the OSv project (http://osv.io/). OSv is is a
Linux-compatible unikernel. OSv has been able to run unmodified Linux
version of Java very well since early on including Java 7 all the way to
Java 11. However, I have discovered some problems with the latest version
of Java 12.

I essence when I try to start a demo version of Jetty app on Java 12
(specifically jdk-zulu12.3.11-ca-*jdk12.0.2*-linux_x64) on OSv it hangs
after printing a single log message (once I saw it print 3-4 lines and hang
after) which looks like this:

OSv v0.53.0-73-gd7054e0b

eth0: 192.168.122.15

Booted up in 135.69 ms

fpathconf() stubbed

2019-08-07 19:04:13.758:INFO::main: Logging initialized @797ms

Please note the last line from Jetty. When I run the exact same app on Java
11 (jdk-zulu11.33.15-ca*-jdk11.0.4*-linux_x64) on OSv, it starts just fine:

OSv v0.53.0-73-gd7054e0b

eth0: 192.168.122.15

Booted up in 138.25 ms

fpathconf() stubbed

2019-08-07 22:28:50.284:INFO::main: Logging initialized @726ms

2019-08-07 22:28:50.536:WARN::main: demo test-realm is deployed. DO NOT USE
IN PRODUCTION!

2019-08-07 22:28:50.539:INFO:oejs.Server:main: jetty-9.2.11.v20150529

2019-08-07 22:28:50.552:INFO:oejdp.ScanningAppProvider:main: Deployment
monitor [file:/jetty/demo-base/webapps/] at interval 1

2019-08-07 22:28:50.879:WARN::main: test webapp is deployed. DO NOT USE IN
PRODUCTION!

2019-08-07 22:28:51.279:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 1ffaf86
{/test,file:/tmp/jetty-0.0.0.0-8080-test.war-_test-any-5466198194536157666.dir/webapp/,AVAILABLE}{/test.war}

2019-08-07 22:28:51.392:WARN::main: test-spec webapp is deployed. DO NOT
USE IN PRODUCTION!

Calling TestListener.contextInitialized

2019-08-07 22:28:51.466:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 4bef0fe3{/test-spec,[file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/,
jar:file:/tmp/jetty-0.0.0.0-8080-test-spec.war-_test-spec-any-17979710061364402379.dir/webapp/WEB-INF/lib/test-web-fragment-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/test-spec.war}

2019-08-07 22:28:51.480:INFO:oejsh.ContextHandler:main: Started
o.e.j.s.h.MovedContextHandler at 18245eb0{/oldContextPath,null,AVAILABLE}

2019-08-07 22:28:51.517:WARN::main: test-jaas webapp is deployed. DO NOT
USE IN PRODUCTION!

2019-08-07 22:28:51.563:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 6d0b5baf
{/test-jaas,file:/tmp/jetty-0.0.0.0-8080-test-jaas.war-_test-jaas-any-4900956465487745831.dir/webapp/,AVAILABLE}{/test-jaas.war}

2019-08-07 22:28:51.618:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 36b0fcd5
{/,file:/jetty/demo-base/webapps/ROOT/,AVAILABLE}{/ROOT}

2019-08-07 22:28:51.678:WARN::main: test-jndi webapp is deployed. DO NOT
USE IN PRODUCTION!

2019-08-07 22:28:51.712:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 89ff02e
{/test-jndi,file:/tmp/jetty-0.0.0.0-8080-test-jndi.war-_test-jndi-any-5186649427965161663.dir/webapp/,AVAILABLE}{/test-jndi.war}

2019-08-07 22:28:51.853:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 12cd9150
{/proxy,file:/tmp/jetty-0.0.0.0-8080-xref-proxy.war-_xref-proxy-any-18424004387378155009.dir/webapp/,AVAILABLE}{/xref-proxy.war}

2019-08-07 22:28:51.953:WARN::main: async-rest webapp is deployed. DO NOT
USE IN PRODUCTION!

2019-08-07 22:28:51.970:INFO:oejsh.ContextHandler:main: Started
o.e.j.w.WebAppContext at 473b3b7a{/async-rest,[file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/,
jar:file:/tmp/jetty-0.0.0.0-8080-async-rest.war-_async-rest-any-18053166809886382856.dir/webapp/WEB-INF/lib/example-async-rest-jar-9.2.11.v20150529.jar!/META-INF/resources],AVAILABLE}{/async-rest.war}

2019-08-07 22:28:51.984:INFO:oejs.ServerConnector:main: Started
ServerConnector at 2b1a1fd{HTTP/1.1}{0.0.0.0:8080}

2019-08-07 22:28:51.991:INFO:oejs.ServerConnector:main: Started
ServerConnector at ad47a4a{SSL-http/1.1}{0.0.0.0:8443}

2019-08-07 22:28:51.993:INFO:oejs.Server:main: Started @2434ms

After connecting with gdb I have discovered that one of the JVM threads
hangs in a loop while actively calling *sem_trywait*() - see full
thread-dump for all Java threads in the bottom of the email):

thread 0xffff800004691040 VM Thread


#0  0x0000100000c86370 in sem_trywait at plt ()

#1  0x0000100001702bf2 in PosixSemaphore::trywait() ()

#2  0x000010000121aab4 in VM_HandshakeAllThreads::doit() ()

#3  0x000010000198a8d7 in VM_Operation::evaluate() ()

#4  0x0000100001990706 in VMThread::evaluate_operation(VM_Operation*)
[clone .constprop.64] ()

#5  0x0000100001990e7a in VMThread::loop() ()

#6  0x0000100001991073 in VMThread::run() ()

#7  0x00001000019186ff in Thread::call_run() ()

#8  0x000010000162e1be in thread_native_entry(Thread*) ()

#9  0x000000004045eeb6 in pthread_private::pthread::<lambda()>::operator()
(__closure=0xffffa00003d6d400) at libc/pthread.cc:114

#10 std::_Function_handler<void(), pthread_private::pthread::pthread(void*
(*)(void*), void*, sigset_t, const
pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
std::_Any_data &) (__functor=...)

    at /usr/include/c++/8/bits/std_function.h:297

#11 0x0000000040401117 in sched::thread_main_c (t=0xffff800004691040) at
arch/x64/arch-switch.hh:271

#12 0x00000000403a71d3 in thread_main () at arch/x64/entry.S:113

(note that frames 9 and above are from OSv as the libc calls are handled as
local function calls on OSv).

After more research around VM_HandshakeAllThreads (in
https://github.com/openjdk/jdk/blob/0fe0312f416add1536a45ecfb292c887ef7e02bd/src/hotspot/share/runtime/handshake.cpp#L152-L208)
I have stumbled across this JEP https://openjdk.java.net/jeps/312 -
Thread-Local Handshakes) where I realized new cross-threads callback
mechanism has been introduced starting in Java 10 to replace
"stop-the-world" mechanism (as I understand). I have also found a switch to
disable it using the -XX:-ThreadLocalHandshakes switch.

So now when I run Jetty app on Java 12 with such command line and disbale
TLH:

/usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs
-XX:-ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar
/jetty/start.jar
*it works and does not hang any more:*

when this one hangs (TLH explicitly enabled):

/usr/lib/jvm/jdk-zulu12.3.11-ca-jdk12.0.2-linux_x64/bin/java -Xrs
-XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar
/jetty/start.jar

When I explicitly enable thread local handshakes on Java 11 it works as
well:

/usr/lib/jvm/jdk-zulu11.33.15-ca-jdk11.0.4-linux_x64/bin/java -Xrs
-XX:+ThreadLocalHandshakes -Djetty.base=/jetty/demo-base -jar
/jetty/start.jar

Note I am limiting signals usage by JVM in all cases just to narrow down my
research.

One more datapoint - I am able to run a simple Hello World java app on Java
12 on OSv with thread-local handshakes on so it looks like this problem
appears on more complicated apps.

*Lastly, the exact same app works fine on Java 12 on Linux.*

So given all that could you please point me to what the problem might be?
What has changed in the implementation of thread local handshakes between
Java 11 and 12 that may have caused this issue (I see many changes made to
https://github.com/openjdk/jdk/blob/0fe0312f416add1536a45ecfb292c887ef7e02bd/src/hotspot/share/runtime/handshake.cpp
but
I am not even sure which commit matches the version of Java 12 or Java 11 I
am using)? I am thinking this has to do with some subtle differences
between relevant ABI implementation between OSv and Linux. Signals related?

Thanks for your help in advance,

Waldek

Thread dump of all Java threads (in other email that was to big)


More information about the hotspot-dev mailing list