Thread.dump_to_file time

Tue Sep 5 20:44:59 UTC 2023

On 2023/09/05 17:18, Alan Bateman wrote:
> On 05/09/2023 12:26, Dr Heinz M. Kabutz wrote:
>>
>> Hi,
>>
>> I was wondering about the cost of generating the thread dumps with 
>> jcmd Thread.dump_to_file and how this would affect the practicality 
>> of finding issues in thread dumps. I thus spun up more and more 
>> virtual threads to see how long it would take to call jcmd 
>> Thread.dump_to_file.
>>
>> No surprise that the time is linear. Also no surprise that json is 
>> slower to generate than plain.
>>
>> What did surprise me was how much slower it was to generate these on 
>> my MacBookPro vs my server. The machines are roughly the same speed, 
>> and my MBP has a very fast SSD. I'm not sure why it would be so slow, 
>> but the MBP is approximately 3x slower for generating the dumps:
>>
>>
> The Linux vs. macOS difference is surprising but maybe Andrew's guess 
> explains it.
>
> As regards the time then some of it maybe related to the formatter, 
> can you try the change below and see if it changes the times?
>
> We have prototyped APIs in this area that would allow 
> filtering/de-duplication/processing by in-process tools and agents, 
> probably something we will get back to at some point.
>
> -Alan
>
> --- a/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java
> +++ b/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java
> @@ -160,7 +160,7 @@ private static void dumpThread(Thread thread, 
> PrintStream ps) {
>          String suffix = thread.isVirtual() ? " virtual" : "";
>          ps.format("#%d \"%s\"%s%n", thread.threadId(), 
> thread.getName(), suffix);
>          for (StackTraceElement ste : thread.getStackTrace()) {
> -            ps.format("      %s%n", ste);
> +            ps.println("      " + ste);
>          }
>          ps.println();
>      }
> @@ -263,7 +263,7 @@ private static void dumpThreadToJson(Thread 
> thread, PrintStream out, boolean mor
>          int i = 0;
>          StackTraceElement[] stackTrace = thread.getStackTrace();
>          while (i < stackTrace.length) {
> -            out.format("              \"%s\"", 
> escape(stackTrace[i].toString()));
> +            out.println("              \"" + 
> escape(stackTrace[i].toString()) + "\"");
>              i++;
>              if (i < stackTrace.length) {
>                  out.println(",");

Hi Alan,

that makes a huge difference. I love format(), but when we call it 
millions of times, it is of course preferable to use the indified String 
concatenation.

For 128k virtual threads, I got the time down on Linux from 6s to 3.5s 
(plain) and from 8s to 5.7s (json).

We can get it even faster if we turn autoflushing off for the 
PrintStreams and convert a few more format()s to print()s (2.9s plain)

https://github.com/kabutz/jdk/blob/faster-thread-dump/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java

For 2m virtual threads, we reduced the plain time from 1m34s down to 43s.

For the Intel Mac, and 128k virtual threads, the time went down from 18s 
to 9s for plain and from 39s to 18s for json. However, I don't give much 
weight to the laptop results :-)

The additional cost of logging in JSON format is in proportion to the 
size of the files, so I wouldn't worry about that difference too much.

What I am more concerned about is that in my experiment, I had all of 
the threads wait whilst I took the thread dump. However, if we follow 
the model that each virtual thread is a single task, we might create 
them faster than we can dump them, thus getting a situation where the 
dump_to_file never ends.

Heinz