RFR: JDK-8327769: jcmd GC.heap_dump without options should write to location given by -XX:HeapDumpPath, if set [v10]

Fri Apr 5 09:01:58 UTC 2024

On Wed, 27 Mar 2024 13:44:42 GMT, Matthias Baesken <mbaesken at openjdk.org> wrote:

>> Currently jcmd command GC.heap_dump only works with an additionally provided file name.
>> Syntax : GC.heap_dump [options] <filename>
>> 
>> In case the JVM has the XX - flag HeapDumpPath set, we should support an additional mode where the <filename> is optional.
>> In case the filename is NOT set, we take the HeapDumpPath (file or directory);
>> 
>> new syntax :
>> GC.heap_dump [options] <filename> .. has precedence over second option
>> GC.heap_dump [options] …in case -XX: HeapDumpPath=p is set
>> 
>> This would be a simplification e.g. for support cases where a filename or directory is set at JVM startup with -XX: HeapDumpPath=p and writing to the path is intended/recommended for usage also in the jcmd case.
>
> Matthias Baesken has updated the pull request incrementally with one additional commit since the last revision:
> 
>   adjust java.1 man page

I disagree with the premise of this entire request that jcmd GC.heap_dump should follow -XX:HeapDumpPath.
Many customers may see such a change as a regression, as they rely on separation of location for heap dumps generated by the JVM at OOME and heap dumps manually pulled by various other processes attaching to the JVM.

I've been testing this on on JDK 8 and JDK 21 and confirmed in both versions that jcmd requires the filename argument that must be a file, not a directory. Supplying just a directory will fail with either "file exists" or "is a directory", and supplying just a filename without a preceding directory will create the dump in the current working directory (cwd).

Currently, the documentation isn't clear that filename is required.  This should be corrected and also suggest to add %p and %d to the filename, which is useful when you don't want them overwritten with the -overwrite option (JDK 21) or when running heap dumps on multiple processes at once (with a 0 pid  or with main-class specified instead of pid).

Regarding this justification:
> In a cloud environment using containers, the HeapDumpPath is automatically set to a file system service to persist the heapdump. However, if a support engineer or DevOps detects high or increasing memory usage and wants to trigger a heapdump via jcmd, they would need to specify the filename. This requires retrieving the set HeapDumpPath from the app environment and copying it to the jcmd to use the persistent file system. This change can help avoid the need for an additional copy and paste step, which is prone to errors.

It's too weak an argument to justify such a change.  Simple scripts can capture and propagate the HeapDumpPath value, if you really want to go that route.  

However, I recommend using separate directories for manual heap dumps.  Keeping them separate from JVM-initiated heap dumps, either through filename convention or separate directories,  is a best practice.  When you have a severity 1 down situation due to OOME crashes, you don't want to have a bunch of manually pulled heap dumps cluttering up your view of the JVM initiated ones.  You want to be able to get your hands on the exact heap dumps you need very quickly to have them analyzed for root cause.  

For this reason I also disagree with setting a default filename convention equivalent to the default HeapDumpPath filename.  Manual heap dumps require specifying a filename for a reason, and those filenames (and separate directories for those who define them) should be unique based on what triggered the heap dump.  Consider the numerous Support cases of multiple monitoring scripts kicking off jcmd processes to pull heap dumps at various resource usage thresholds, as well as multiple Java agents that may do so, e.g. for various performance reasons.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/18190#issuecomment-2039266399