Interesting profile session
Robert Engels
robaho at me.com
Thu Jun 27 21:44:01 UTC 2024
One issue I have run into, with the Loom EA build I can’t run the async profiler against it.
When trying to profile it fails with:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007ff8021fd0b0, pid=80985, tid=36355
#
# JRE version: OpenJDK Runtime Environment (24.0+1) (build 24-loom+1-17)
# Java VM: OpenJDK 64-Bit Server VM (24-loom+1-17, mixed mode, sharing, tiered, compressed oops, compressed class ptrs, g1 gc, bsd-amd64)
# Problematic frame:
# C [libsystem_platform.dylib+0x10b0] _platform_strcmp+0x50
#
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/robertengels/FrameworkBenchmarks/frameworks/Java/httpserver-robaho/hs_err_pid80985.log
[23.364s][warning][os] Loading hsdis library failed
#
# If you would like to submit a bug report, please visit:
# https://bugreport.java.com/bugreport/crash.jsp
#
> On Jun 25, 2024, at 8:34 PM, Robert Engels <robaho at me.com> wrote:
>
> A couple more data points, I built a few more of the servers from https://github.com/TechEmpower/FrameworkBenchmarks/tree/master?tab=readme-ov-file <https://github.com/TechEmpower/FrameworkBenchmarks/tree/master?tab=readme-ov-file> including several non-Java (Rust, Go) - and all showed roughly the same latency (~12-13ms) and throughput, so I suspect it is some limiting factor in the OS.
>
> Now, how the Java one with platform threads achieved such low latency, I don’t know…
>
>
>> On Jun 25, 2024, at 8:37 AM, Robert Engels <robaho at me.com <mailto:robaho at me.com>> wrote:
>>
>> I tested with the Loom EA build (Interestingly, I needed to turn off the OSX firewall - there was no prompt “Allow network access?” and no errors, but the client could not connect. Even if I added the java binary to the allowed list - something to be aware of).
>>
>> With the default, it made no difference:
>>
>> Thread Stats Avg Stdev Max +/- Stdev
>> Latency 13.74ms 3.67ms 140.26ms 98.12%
>> Req/Sec 1.18k 101.17 2.44k 92.82%
>>
>> With pollerMode=1: (note the huge increase in the max requests per thread)
>>
>> Thread Stats Avg Stdev Max +/- Stdev
>> Latency 13.81ms 5.72ms 442.92ms 98.91%
>> Req/Sec 1.18k 125.92 11.17k 95.90%
>>
>> With pollerMode=2:
>>
>> Thread Stats Avg Stdev Max +/- Stdev
>> Latency 13.62ms 3.25ms 440.39ms 97.55%
>> Req/Sec 1.18k 104.44 5.19k 91.89%
>>
>> So no option approaches the platform thread latency.
>>
>> Here is the vthread summary captured using the default pollerMode:
>>
>> iMac:httpserver-robaho robertengels$ jcmd 99219 Thread.vthread_summary
>> 99219:
>> <root> [platform threads = 13, virtual threads = 0]
>> +-- java.util.concurrent.ThreadPerTaskExecutor at 3b764bce [platform threads = 0, virtual threads = 1024]
>> +-- java.util.concurrent.ScheduledThreadPoolExecutor at 80820e0 [platform threads = 1, virtual threads = 0]
>> +-- java.util.concurrent.ScheduledThreadPoolExecutor at 4e40fef8 [platform threads = 0, virtual threads = 0]
>> +-- java.util.concurrent.ScheduledThreadPoolExecutor at 72e23c5b [platform threads = 1, virtual threads = 0]
>> +-- ForkJoinPool-1/jdk.internal.vm.SharedThreadContainer at 2b84658e [platform threads = 8, virtual threads = 0]
>> +-- ForkJoinPool.commonPool/jdk.internal.vm.SharedThreadContainer at 368551f2 [platform threads = 0, virtual threads = 0]
>> +-- java.util.concurrent.ThreadPoolExecutor at 2f2ae4f0 [platform threads = 0, virtual threads = 0]
>>
>> Default virtual thread scheduler:
>> java.util.concurrent.ForkJoinPool at 7fb948b5[Running, parallelism = 8, size = 8, active = 4, running = 1, steals = 4360536, tasks = 0, submissions = 0]
>>
>> Timeout schedulers:
>> [0] java.util.concurrent.ScheduledThreadPoolExecutor at 80820e0[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 2]
>> [1] java.util.concurrent.ScheduledThreadPoolExecutor at 72e23c5b[Running, pool size = 1, active threads = 0, queued tasks = 0, completed tasks = 4]
>>
>> Read I/O pollers:
>> [0] sun.nio.ch.KQueuePoller at 3ff9782b [registered = 1011]
>>
>> Write I/O pollers:
>> [0] sun.nio.ch.KQueuePoller at 58e6b7f4 [registered = 0]
>>
>> iMac:httpserver-robaho robertengels$ jcmd 99219 VM.version
>> 99219:
>> OpenJDK 64-Bit Server VM version 24-loom+1-17
>> JDK 24.0.0
>>
>>
>>> On Jun 25, 2024, at 6:02 AM, robert engels <robaho at me.com <mailto:robaho at me.com>> wrote:
>>>
>>> Will do and I’ll report back.
>>>
>>>> On Jun 25, 2024, at 3:14 AM, Alan Bateman <Alan.Bateman at oracle.com <mailto:Alan.Bateman at oracle.com>> wrote:
>>>>
>>>> On 24/06/2024 23:03, Robert Engels wrote:
>>>>>
>>>>> The VT latency is a couple of orders of magnitude higher than the platform latency.
>>>>>
>>>>> I suspect that what is happening is that the virtual threads are parking waiting for work too soon - or that the read & writes are always causing a context switch - regardless of if the call is very short (room in the kernel socket buffers).
>>>>>
>>>>> Can you suggest any remedies? It feels like the scheduler needs some heuristics on average call time to determine and a spin loop to avoid this sort of thing.
>>>>>
>>>> There may be a combination of things here. From the stack traces, this looks like JDK 21 + macOS. There has been significant changes to both the scheduler (ForkJoinPool) and the I/O poller mechanism since then. Would you have time to out the latest EA build [1] to see if you get similar latency? It would be interesting to see data with -Djdk.pollerMode=1 and 2 (the default on macOS is "a"). Also would be useful to see `jcmd <pid> Thread.vthread_summary` output while the benchmark is running as that will reveal more about the scheduler and how many virtual threads are blocked on socket I/O.
>>>>
>>>> [1] https://jdk.java.net/loom/ <https://jdk.java.net/loom/>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid52472.log
Type: application/octet-stream
Size: 467286 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/hs_err_pid52472-0001.log>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hs_err_pid80985.log
Type: application/octet-stream
Size: 89095 bytes
Desc: not available
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/hs_err_pid80985-0001.log>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20240627/604d3591/attachment-0005.htm>
More information about the loom-dev
mailing list