Thread.dump_to_file time
Dr Heinz M. Kabutz
heinz at javaspecialists.eu
Thu Sep 7 17:01:32 UTC 2023
Indeed. I tried five different variations of the ThreadDumper vs the
master. I did not change the HotSpotDiagnostic, since the buffering was
sufficient in the ThreadDumper. I also experimented with a larger buffer
size, but that did not seem to make a difference.
Experiment 1: Alan's original change with changing format() to println()
for just the stack traces
(https://github.com/kabutz/jdk/blob/faster-thread-dump-1/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)
Experiment 2: Some additional format()s that were not printing the stack
traces.
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-2/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)
Experiment 3: Changing the PrintStreams to not autoflush
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-3/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)
Experiment 4: Adding a BufferedOutputStream into each of the
PrintStreams
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-4/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)
Experiment 5: Gzipping the files on the fly. This might be a handy
feature to add with a flag like jcmd pid Thread.dump_to_file -zip.
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-5/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)
Each experiment is based on the previous versions.
*plain*
*json*
*threads*
*master*
*1*
*2*
*3*
*4*
*5*
*master*
*1*
*2*
*3*
*4*
*5*
*1000*
45
25
20
20
10
13
63
43
37
38
15
18
*2000*
89
51
41
41
19
26
125
85
75
75
29
37
*4000*
178
101
81
81
38
51
248
168
147
150
58
72
*8000*
357
204
163
163
77
103
498
338
295
298
116
145
*16000*
704
400
320
319
151
204
989
667
585
592
229
289
*32000*
1414
802
640
644
303
411
1972
1346
1174
1195
461
575
*64000*
2838
1619
1305
1290
605
825
4028
2699
2390
2380
926
1157
*128000*
5783
3350
2681
2698
1225
1650
8015
5411
4772
4814
1860
2328
*256000*
11699
6706
5460
5530
2616
3596
16052
10819
9585
9825
3714
4681
*512000*
23395
13850
10926
10699
5464
6885
32694
21861
19037
19297
7637
9429
*1024000*
46174
27279
21486
21543
10816
14062
64039
43459
37904
39553
15079
19086
*2048000*
93816
54181
43859
42602
21096
28308
128765
87921
76738
76786
30471
38264
In the 128000 plain threads case, there is a big improvement of 42%
between master and experiment 1, then another 20% improvement between 1
and s (additional conversions of format() to print()).
We then see no improvement between 2 and 3 (making PrintStreams to not
autoflush, as also observed by Oli).
And another big 55% improvement between 3 and 4 - using
BufferedOutputStream.
If we GZip the files, we lose 35% in performance, but the files are just
1% of the size.
For the 128000 json threads case, the results are a bit different.
Initial master to experiment 1 is 32% better, then 12%, then nothing,
then 61% better. The GZip is only a 25% degradation.
To summarize, to go from the current to using print() and
BufferedOutputStream, gives us 79% improvement for plain and a 77%
improvement for json. If we also GZip them, we get a 71% improvement in
both cases.
I would propose that we change this as soon as possible - I'm happy to
make the change and also submit the findings and the test program to the
JDK. Furthermore, I would propose that perhaps as a second project that
we consider how to compress these files. One option is with GZip (would
require quite a few changes, including to jcmd) and another would be to
change the format of the json file to deduplicate thread stacks. If we
follow the "virtual thread per task" model, we will have many many
virtual threads with the exact same stack and it might be more
productive to have a better json model for such cases.
Regards
Heinz
--
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" -www.javaspecialists.eu
Java Champion -www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz
On 2023/09/06 12:58, Gillespie, Oli wrote:
> I don't think the BufferedWriter inside PrintStream (is that the one you meant?) is doing much buffering here. strace says:
>
> write(6, "ThreadDumpPerf.recurse(ThreadDumpPerf.java:20)", 46) = 46
> write(6, "\"", 1) = 1
> write(6, ",\n", 2) = 2
> write(6, " \"", 15) = 15
> write(6, "ThreadDumpPerf.recurse(ThreadDumpPerf.java:20)", 46) = 46
> write(6, "\"", 1) = 1
> write(6, ",\n", 2) = 2
> write(6, " \"", 15) = 15
>
> And definitely for me I get a big speedup and reduction in write calls from the 8192 byte buffer I showed.
>
> Oli
>
>
>
> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230907/800301b4/attachment-0001.htm>
More information about the loom-dev
mailing list