From peter.kessler at os.amperecomputing.com Sat Feb 26 00:36:32 2022 From: peter.kessler at os.amperecomputing.com (Peter Kessler OS) Date: Sat, 26 Feb 2022 00:36:32 +0000 Subject: How should I use -XX:PreTouchParallelChunkSize=? Message-ID: How should I use -XX:PreTouchParallelChunkSize=? I am using an Ampere Altra, running Oracle Linux Server release 8.5, that has 64KB pages and 524288KB huge pages. If I run a program with enough logging, I see, for example, $ ${OpenJDKDir}/ jdk-19-ea+8-441/bin/java -Xms112g -Xmx112g -Xmn12g -XX:SurvivorRatio=10 -XX:+AlwaysPreTouch -XX:+UseTransparentHugePages -XX:+UseParallelGC -XX:ParallelGCThreads=80 -Xlog:gc+init=info,gc+heap=debug,gc=info:stdout HelloWorld ... [0.035s][debug][gc,heap] Running ParallelGC PreTouch head with 20 workers for 20 work units pre-touching 10737418240B. [0.114s][debug][gc,heap] Running ParallelGC PreTouch head with 2 workers for 2 work units pre-touching 1073741824B. [0.130s][debug][gc,heap] Running ParallelGC PreTouch head with 2 workers for 2 work units pre-touching 1073741824B. [0.160s][debug][gc,heap] Running ParallelGC PreTouch head with 80 workers for 200 work units pre-touching 107374182400B. [0.937s][info ][gc,init] Version: 19-ea+8-441 (release) ... showing the parallel pretouch at work, and I can infer from the time stamps how long it takes to pretouch the 10GB of the eden, the 1GB of each of the survivors, and the 100GB of the old generation. If I do not specify a -XX:PreTouchParallelChunkSize=, or specify a size smaller than the huge page size, logging says that the work units are the huge page size. The default size (src/hotspot/os/linux/globals_linux.hpp#L93) is 4MB, which does not apply if I have huge pages of 512MB. If I gather the times to the "Version" line for various releases of the OpenJDK and various sizes of -XX:PreTouchParallelChunkSize=, I get -XX:PreTouchParallelChunkSize= JDK 4k 64k 2048k 4096k 524288k 1048576k ----------------- ----- ----- ----- ----- ------- -------- jdk-15.0.2 3.434 3.469 3.469 3.508 3.467 3.434 jdk-17.0.2 0.806 0.806 0.805 0.804 0.836 0.844 jdk-18-ea+34-2083 0.803 0.802 0.802 0.833 0.802 0.868 jdk-19-ea+8-441 0.806 0.849 0.803 0.803 0.803 0.846 showing seconds for the median of 7 runs. It looks like jdk-15.0.2 was not good at parallelizing pretouching of the heap. Later JDKs look like they parallelize the pretouching of the heap, but it does not matter much what the setting, or non-setting, of -XX:PreTouchParallelChunkSize= is. My question is: what difference does setting -XX:PreTouchParallelChunkSize= make? That is: is there some point to setting it at all? This is a somewhat academic question, since if I have specified -XX:+AlwaysPreTouch I am not that interested in start-up time. But thank you for getting back the 2.5 seconds from jdk-15.0.2 start-up. Thank you for your help in explaining -XX:PreTouchParallelChunkSize=. ... peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.schatzl at oracle.com Mon Feb 28 10:32:42 2022 From: thomas.schatzl at oracle.com (Thomas Schatzl) Date: Mon, 28 Feb 2022 11:32:42 +0100 Subject: How should I use -XX:PreTouchParallelChunkSize=? In-Reply-To: References: Message-ID: Hi, On 26.02.22 01:36, Peter Kessler OS wrote: > How should I use -XX:PreTouchParallelChunkSize=? > > I am using an Ampere Altra, running Oracle Linux Server release 8.5, > that has 64KB pages and 524288KB huge pages.? If I run a program with > enough logging, I see, for example, [...] > If I gather the times to the "Version" line for various releases of the > OpenJDK and various sizes of -XX:PreTouchParallelChunkSize=, I get > > ????????????????????????????????-XX:PreTouchParallelChunkSize= > > ??? JDK?????????????????? 4k???? 64k?? 2048k?? 4096k?? 524288k? 1048576k > > ??? -----------------? -----?? -----?? -----?? -----?? -------? -------- > > ??? jdk-15.0.2???????? 3.434?? 3.469?? 3.469?? 3.508???? 3.467???? 3.434 > > ??? jdk-17.0.2???????? 0.806?? 0.806?? 0.805?? 0.804???? 0.836???? 0.844 > > ??? jdk-18-ea+34-2083? 0.803?? 0.802?? 0.802?? 0.833???? 0.802???? 0.868 > > ??? jdk-19-ea+8-441??? 0.806?? 0.849?? 0.803?? 0.803???? 0.803???? 0.846 > > showing seconds for the median of 7 runs.? It looks like jdk-15.0.2 was > not good at parallelizing pretouching of the heap.? Later JDKs look like > they parallelize the pretouching of the heap, but it does not matter > much what the setting, or non-setting, of -XX:PreTouchParallelChunkSize= is. It sets the size of memory a single thread gets to pretouch at once before asking for more work. With a sufficiently large value, all is fine, the overhead of grabbing new chunks (and iterating over them) is negligible compared to actual commit work. As demonstration, with bin/java -Xmx40g -Xms40g -XX:+AlwaysPreTouch -XX:+UseG1GC -Xlog:gc+init=info,gc+heap=debug,gc=info:stdout -XX:PreTouchParallelChunkSize=2m -version [0.034s][debug][gc,heap] Running G1 PreTouch with 18 workers for 10240 work units pre-touching 42949672960B. [2.132s][debug][gc,heap] Running G1 PreTouch with 18 workers for 160 work units pre-touching 671088640B. but with -XX:PreTouchParallelChunkSize=4k (which is obviously a bad value) I get [0.030s][debug][gc,heap] Running G1 PreTouch with 18 workers for 10485760 work units pre-touching 42949672960B. [3.570s][debug][gc,heap] Running G1 PreTouch with 18 workers for 163840 work units pre-touching 671088640B. > > My question is: what difference does setting > -XX:PreTouchParallelChunkSize= make?? That is: is there some point to > setting it at all? The default values seem to be okay, so probably not :) Hth, Thomas