Thread.dump_to_file time

Dr Heinz M. Kabutz heinz at javaspecialists.eu
Thu Sep 7 17:01:32 UTC 2023


Indeed. I tried five different variations of the ThreadDumper vs the 
master. I did not change the HotSpotDiagnostic, since the buffering was 
sufficient in the ThreadDumper. I also experimented with a larger buffer 
size, but that did not seem to make a difference.

Experiment 1: Alan's original change with changing format() to println() 
for just the stack traces 
(https://github.com/kabutz/jdk/blob/faster-thread-dump-1/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)

Experiment 2: Some additional format()s that were not printing the stack 
traces. 
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-2/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)

Experiment 3: Changing the PrintStreams to not autoflush 
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-3/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)

Experiment 4: Adding a BufferedOutputStream into each of the 
PrintStreams 
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-4/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)

Experiment 5: Gzipping the files on the fly. This might be a handy 
feature to add with a flag like jcmd pid Thread.dump_to_file -zip. 
(https://github.com/kabutz/jdk/blob/faster-thread-dumps-5/src/java.base/share/classes/jdk/internal/vm/ThreadDumper.java)

Each experiment is based on the previous versions.


	

*plain*

	


	

*json*

*threads*

	

*master*

	

*1*

	

*2*

	

*3*

	

*4*

	

*5*

	


	

*master*

	

*1*

	

*2*

	

*3*

	

*4*

	

*5*

*1000*

	

45

	

25

	

20

	

20

	

10

	

13

	


	

63

	

43

	

37

	

38

	

15

	

18

*2000*

	

89

	

51

	

41

	

41

	

19

	

26

	


	

125

	

85

	

75

	

75

	

29

	

37

*4000*

	

178

	

101

	

81

	

81

	

38

	

51

	


	

248

	

168

	

147

	

150

	

58

	

72

*8000*

	

357

	

204

	

163

	

163

	

77

	

103

	


	

498

	

338

	

295

	

298

	

116

	

145

*16000*

	

704

	

400

	

320

	

319

	

151

	

204

	


	

989

	

667

	

585

	

592

	

229

	

289

*32000*

	

1414

	

802

	

640

	

644

	

303

	

411

	


	

1972

	

1346

	

1174

	

1195

	

461

	

575

*64000*

	

2838

	

1619

	

1305

	

1290

	

605

	

825

	


	

4028

	

2699

	

2390

	

2380

	

926

	

1157

*128000*

	

5783

	

3350

	

2681

	

2698

	

1225

	

1650

	


	

8015

	

5411

	

4772

	

4814

	

1860

	

2328

*256000*

	

11699

	

6706

	

5460

	

5530

	

2616

	

3596

	


	

16052

	

10819

	

9585

	

9825

	

3714

	

4681

*512000*

	

23395

	

13850

	

10926

	

10699

	

5464

	

6885

	


	

32694

	

21861

	

19037

	

19297

	

7637

	

9429

*1024000*

	

46174

	

27279

	

21486

	

21543

	

10816

	

14062

	


	

64039

	

43459

	

37904

	

39553

	

15079

	

19086

*2048000*

	

93816

	

54181

	

43859

	

42602

	

21096

	

28308

	


	

128765

	

87921

	

76738

	

76786

	

30471

	

38264

In the 128000 plain threads case, there is a big improvement of 42% 
between master and experiment 1, then another 20% improvement between 1 
and s (additional conversions of format() to print()).

We then see no improvement between 2 and 3 (making PrintStreams to not 
autoflush, as also observed by Oli).

And another big 55% improvement between 3 and 4 - using 
BufferedOutputStream.

If we GZip the files, we lose 35% in performance, but the files are just 
1% of the size.

For the 128000 json threads case, the results are a bit different. 
Initial master to experiment 1 is 32% better, then 12%, then nothing, 
then 61% better. The GZip is only a 25% degradation.


To summarize, to go from the current to using print() and 
BufferedOutputStream, gives us 79% improvement for plain and a 77% 
improvement for json. If we also GZip them, we get a 71% improvement in 
both cases.


I would propose that we change this as soon as possible - I'm happy to 
make the change and also submit the findings and the test program to the 
JDK. Furthermore, I would propose that perhaps as a second project that 
we consider how to compress these files. One option is with GZip (would 
require quite a few changes, including to jcmd) and another would be to 
change the format of the json file to deduplicate thread stacks. If we 
follow the "virtual thread per task" model, we will have many many 
virtual threads with the exact same stack and it might be more 
productive to have a better json model for such cases.


Regards

Heinz
-- 
Dr Heinz M. Kabutz (PhD CompSci)
Author of "The Java™ Specialists' Newsletter" -www.javaspecialists.eu
Java Champion -www.javachampions.org
JavaOne Rock Star Speaker
Tel: +30 69 75 595 262
Skype: kabutz

On 2023/09/06 12:58, Gillespie, Oli wrote:
> I don't think the BufferedWriter inside PrintStream (is that the one you meant?) is doing much buffering here. strace says:
>
> write(6, "ThreadDumpPerf.recurse(ThreadDumpPerf.java:20)", 46) = 46
> write(6, "\"", 1)           = 1
> write(6, ",\n", 2)          = 2
> write(6, "              \"", 15) = 15
> write(6, "ThreadDumpPerf.recurse(ThreadDumpPerf.java:20)", 46) = 46
> write(6, "\"", 1)           = 1
> write(6, ",\n", 2)          = 2
> write(6, "              \"", 15) = 15
>
> And definitely for me I get a big speedup and reduction in write calls from the 8192 byte buffer I showed.
>
> Oli
>
>
>
> Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/loom-dev/attachments/20230907/800301b4/attachment-0001.htm>


More information about the loom-dev mailing list