Interesting profile session

Robert Engels robaho at me.com
Thu Jun 27 21:44:01 UTC 2024


One issue I have run into, with the Loom EA build I can’t run the async profiler against it.

When trying to profile it fails with:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007ff8021fd0b0, pid=80985, tid=36355
#
# JRE version: OpenJDK Runtime Environment (24.0+1) (build 24-loom+1-17)
# Java VM: OpenJDK 64-Bit Server VM (24-loom+1-17, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# Problematic frame:
# C  [libsystem_platform.dylib+0x10b0]  _platform_strcmp+0x50
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/robertengels/FrameworkBenchmarks/frameworks/Java/httpserver-robaho/hs_err_pid80985.log
[23.364s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
#   https://bugreport.java.com/bugreport/crash.jsp
#



> On Jun 25, 2024, at 8:34 PM, Robert Engels <robaho at me.com> wrote:
> 
> A couple more data points, I built a few more of the servers from https://github.com/TechEmpower/FrameworkBenchmarks/tree/master?tab=readme-ov-file <https://github.com/TechEmpower/FrameworkBenchmarks/tree/master?tab=readme-ov-file> including several non-Java (Rust, Go) - and all showed roughly the same latency (~12-13ms) and throughput, so I suspect it is some limiting factor in the OS.
> 
> Now, how the Java one with platform threads achieved such low latency, I don’t know…
> 
> 
>> On Jun 25, 2024, at 8:37 AM, Robert Engels <robaho at me.com <mailto:robaho at me.com>> wrote:
>> 
>> I tested with the Loom EA build (Interestingly, I needed to turn off the OSX firewall - there was no prompt “Allow network access?” and no errors, but the client could not connect. Even if I added the java binary to the allowed list - something to be aware of).
>> 
>> With the default, it made no difference:
>> 
>>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency    13.74ms    3.67ms 140.26ms   98.12%
>>     Req/Sec     1.18k   101.17     2.44k    92.82%
>> 
>> With pollerMode=1: (note the huge increase in the max requests per thread)
>> 
>>  Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency    13.81ms    5.72ms 442.92ms   98.91%
>>     Req/Sec     1.18k   125.92    11.17k    95.90%
>> 
>> With pollerMode=2:
>> 
>>   Thread Stats   Avg      Stdev     Max   +/- Stdev
>>     Latency    13.62ms    3.25ms 440.39ms   97.55%
>>     Req/Sec     1.18k   104.44     5.19k    91.89%
>> 
>> So no option approaches the platform thread latency.
>> 
>> Here is the vthread summary captured using the default pollerMode:
>> 
>> iMac:httpserver-robaho robertengels$ jcmd 99219 Thread.vthread_summary
>> 99219:
>> <root> [platform threads = 13, virtual threads = 0]
>>     +-- java.util.concurrent.ThreadPerTaskExecutor at 3b764bce [platform threads = 0, virtual threads = 1024]
>>     +-- java.util.concurrent.ScheduledThreadPoolExecutor at 80820e0 [platform threads = 1, virtual threads = 0]
>>     +-- java.util.concurrent.ScheduledThreadPoolExecutor at 4e40fef8 [platform threads = 0, virtual threads = 0]
>>     +-- java.util.concurrent.ScheduledThreadPoolExecutor at 72e23c5b [platform threads = 1, virtual threads = 0]
>>     +-- ForkJoinPool-1/jdk.internal.vm.SharedThreadContainer at 2b84658e [platform threads = 8, virtual threads = 0]
>>     +-- ForkJoinPool.commonPool/jdk.internal.vm.SharedThreadContainer at 368551f2 [platform threads = 0, virtual threads = 0]
>>     +-- java.util.concurrent.ThreadPoolExecutor at 2f2ae4f0 [platform threads = 0, virtual threads = 0]
>> 
>> Default virtual thread scheduler:
>> java.util.concurrent.ForkJoinPool at 7fb948b5[Running, parallelism = 8, size = 8, active = 4, running = 1, steals = 4360536, tasks = 0, submissions = 0]
>> 
>> Timeout schedulers:
>> [0] java.util.concurrent.ScheduledThreadPoolExecutor at 80820e0[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 2]
>> [1] java.util.concurrent.ScheduledThreadPoolExecutor at 72e23c5b[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 4]
>> 
>> Read I/O pollers:
>> [0] sun.nio.ch.KQueuePoller at 3ff9782b [registered = 1011]
>> 
>> Write I/O pollers:
>> [0] sun.nio.ch.KQueuePoller at 58e6b7f4 [registered = 0]
>> 
>> iMac:httpserver-robaho robertengels$ jcmd 99219 VM.version
>> 99219:
>> OpenJDK 64-Bit Server VM version 24-loom+1-17
>> JDK 24.0.0
>> 
>> 
>>> On Jun 25, 2024, at 6:02 AM, robert engels <robaho at me.com <mailto:robaho at me.com>> wrote:
>>> 
>>> Will do and I’ll report back. 
>>> 
>>>> On Jun 25, 2024, at 3:14 AM, Alan Bateman <Alan.Bateman at oracle.com <mailto:Alan.Bateman at oracle.com>> wrote:
>>>> 
>>>>  On 24/06/2024 23:03, Robert Engels wrote:
>>>>> 
>>>>> The VT latency is a couple of orders of magnitude higher than the platform latency.
>>>>> 
>>>>> I suspect that what is happening is that the virtual threads are parking waiting for work too soon - or that the read & writes are always causing a context switch - regardless of if the call is very short (room in the kernel socket buffers).
>>>>> 
>>>>> Can you suggest any remedies? It feels like the scheduler needs some heuristics on average call time to determine and a spin loop to avoid this sort of thing.
>>>>> 
>>>> There may be a combination of things here. From the stack traces, this looks like JDK 21 + macOS. There has been significant changes to both the scheduler (ForkJoinPool) and the I/O poller mechanism since then.  Would you have time to out the latest EA build [1] to see if you get similar latency? It would be interesting to see data with -Djdk.pollerMode=1 and 2 (the default on macOS is "a"). Also would be useful to see `jcmd <pid> Thread.vthread_summary` output while the benchmark is running as that will reveal more about the scheduler and how many virtual threads are blocked on socket I/O.
>>>> 
>>>> [1] https://jdk.java.net/loom/ <https://jdk.java.net/loom/>
>> 
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid52472.log
Type: application/octet-stream
Size: 467286 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/hs_err_pid52472-0001.log>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid80985.log
Type: application/octet-stream
Size: 89095 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/hs_err_pid80985-0001.log>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0005.htm>


More information about the loom-dev mailing list