RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump

Sat Feb 22 16:20:40 UTC 2020


On 2/21/20 5:19 PM, Ioi Lam wrote:
> Ralf and Christoph,
>
> I agree that making it easy for the user is important, so dependency 
> on an external program like pgzip will be a hassle.
>
> How about implementing the compression in a Java program? Will 
> something like this be too much of a hassle?
>
>     jcmd $PID GC.dump -stdout | java -jar HeapDumpZipper.jar > heap.gz
we could integrate the compression into cmd itself?
>
> This way, we can implement the exact compression algorithm as Ralf 
> described, without making it part of the VM. Writing it in Java 
> probably would be easier to maintain.
>
> If it makes sense, we can include the Java code as part of the JDK, so 
> there's no need to ship a separate JAR file to the user.
>
>     jcmd $PID GC.dump -stdout | java jdk.internal.heapdump.Zipper > 
> heap.gz
>
> Thanks
> - Ioi
>
> On 2/21/20 8:35 AM, Langer, Christoph wrote:
>> Hi all,
>>
>> let me share my thoughts after going through this mail thread and 
>> interrogating Ralf quite a bit about the feature ��.
>>
>> First of all, I very much value the discussion and the points brought 
>> up here. When deciding about the introduction of an enhancement or a 
>> new feature, it's always wise to thoroughly discuss it and value 
>> benefits against maintenance cost incurred. However, in this case I'm 
>> at a point where I would really like to see this going in. Let me 
>> elaborate on this.
>>
>> In the mail cited below, I think Ralf enumerates all the benefits 
>> quite comprehensively. With the gzip feature built into the 
>> heapdumper, we'll get the option to easily have the VM dump its heap 
>> in a space saving format in the same time (or even a bit quicker) 
>> than we currently can get fully exploded hprofs. There's no need for 
>> additional configuration steps and arrangements, just a simple 
>> additional option in the existing jcmd. And with the slightly updated 
>> dump format, tool builders will get options to improve handling of 
>> compressed heap dumps.
>>
>> Speaking as somebody who has to do customer support once in a while, 
>> I can't tell you how valuable it is to be able to give the customer 
>> simple instructions that just work when it comes to directing them to 
>> provide diagnosis data. And that's clearly a point here. Also, given 
>> the loads of different deployment scenarios of JVM applications, e.g. 
>> cloud, containers, monolith servers... it's really good to have 
>> simple options.
>>
>> On the other hand, that's true, the change introduces a bit of 
>> additional complexity. But, without looking into the new code in all 
>> details, I think the amount is acceptable. Most of the code really 
>> only touches a distinct module for dumping the heap (heapdumper.cpp). 
>> Some additional 600 lines of code (the file already had 2000 before). 
>> But the code actually is not messing too deep with hotspot internals, 
>> so it should be quite maintainable. The rest of the code is a few 
>> lines about enhancing the dcmd and some additional access points into 
>> zlib. Furthermore, it brings a bit of testing code, but that is a 
>> good thing. So, this should really be acceptable - given that Ralf is 
>> around to support this once it's checked in and there's also the rest 
>> of the SAP team which will be able to help out here.
>>
>> The ideas collected in this thread that go beyond this change, e.g. 
>> the possibility to dump the heap out to the network, the option to 
>> get heapdumps out to the jcmd and also the potential enhancements to 
>> the -XX: HeapDumpBeforeFullGC, -XX: HeapDumpAfterFullGC and 
>> -XX:HeapDumpOnOutOfMemoryError are partly orthogonal and are probably 
>> worth pursuing on their own.
>>
>> So I really think we should allow this enhancement in and start 
>> focusing on a good code review ��.
>>
>> Best regards
>> Christoph
>>> -----Original Message-----
>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>> bounces at openjdk.java.net> On Behalf Of Schmelter, Ralf
>>> Sent: Donnerstag, 20. Februar 2020 14:21
>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Ioi Lam
>>> <ioi.lam at oracle.com>; serguei.spitsyn at oracle.com; hotspot-runtime-
>>> dev at openjdk.java.net runtime <hotspot-runtime-dev at openjdk.java.net>
>>> Cc: serviceability-dev at openjdk.java.net
>>> Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a
>>> gzipped heap dump
>>>
>>> Hi Yasumasa,
>>>
>>> I think it would be great if we could redirect larger chunks data to 
>>> jcmd.
>>>
>>> But you have to differentiate between binary data (for the heap 
>>> dump) and
>>> text data (for the e.g. codelist).
>>>
>>> Currently jcmd assumes all bytes to be UTF-8 encoded, converts them to
>>> Unicode and then uses the platform encoding to write characters. 
>>> This is not
>>> suitable for binary data.
>>>
>>> And of course you cannot use the bufferedStream to get the output to 
>>> jcmd.
>>> You would have to implement an outputStream which can directly write to
>>> the AttachListener connection.
>>>
>>>
>>> But even with this change, I would still like the gzip compression 
>>> to be done
>>> in the VM. Let me try to list all the advantages I see for doing this:
>>>
>>> 1. It is by far the easiest to use. You just have to specify -gz for 
>>> the jcmd.
>>> While your command line (jcmd .... | gzip -c > file) is easy enough, 
>>> it assumes
>>> you have gzip (not by default on Windows) and it would be painfully 
>>> slow (~
>>> 10 x and more), since it is not parallel. You could use pigz, but it 
>>> is not as
>>> ubiquitous as gzip. I know it is sometimes hard to image this could 
>>> be a
>>> problem for anyone, but it is.
>>>
>>> It is easy to tell a customer to execute jcmd <pid> GC.heap_dump -gz
>>> test.hprof.gz. Adding additional requirements, especially if it is 
>>> external
>>> programs, and your chance of success diminish fast.
>>>
>>>
>>> 2. The -XX:HeapDumpOnOutOfMemoryError, -XX: HeapDumpBeforeFullGC
>>> and -XX: HeapDumpAfterFullGC options can easily create gzipped heap
>>> dumps directly when the compression is in the VM. And especially if you
>>> create more than one dump (with the before/after gc flags), 
>>> compression is
>>> very useful. Or if you want to support compressed heap dumps it in the
>>> HotSpotDiagnosticMXBean. Just add a flag and/or compression level.
>>>
>>>
>>> 3. The created gz-file is not a simple gz-file you would get when 
>>> simply using
>>> gzip.
>>>
>>>   It is created in a way that makes it possible to treat it like a 
>>> random access file
>>> without decompressing it.
>>>
>>> Currently for example the Eclipse Memory Analyzer (MAT) has the 
>>> option to
>>> directly open a gzipped hprof file and use it without decompression. 
>>> And for
>>> the initial parsing, they can just read the file sequentially, so 
>>> this is not too
>>> slow.
>>>
>>> But when accessing the values of objects or arrays, they have to 
>>> seek to
>>> specific positions in the gzipped hprof file. This is currently 
>>> implemented by
>>> having a Java implementation of a InflaterInputStream which is 
>>> capable to
>>> completely copy its state. This copy is then used to start 
>>> decompressing at
>>> the specific offset for which is was created. As you can imagine, 
>>> the state of
>>> the inflater is not small (MAT assumes about 64Kb, 32kB is needed at 
>>> least for
>>> the dictionary), so it limits the number of starting positions you 
>>> can use for
>>> large files. But it works for all kinds of gzip compressed streams.
>>>
>>> The gzip implementation used to write the heap dump in the VM creates
>>> many small gzip compressed chunks. At the start of each chunk you can
>>> create a fresh GZIPInputStream without having to store any internal 
>>> state.
>>> You only need to remember the physical offset and the logical offset 
>>> (so 2
>>> long values) for each chunk. If you then want to read data at a 
>>> specific logical
>>> offset, you binary search the nearest preceding chunk and create a
>>> GZIPInputStream reading from the physical offset of that chunk. So on
>>> average you have to decompress about half a chunk to get to the data 
>>> you
>>> need.
>>>
>>> If you look in the in webrev, you can see
>>> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.0/test/lib 
>>>
>>> /jdk/test/lib/hprof/parser/GzipRandomAccess.java.html. This implements
>>> the needed logic to treat the gzipped hprof file as a random access 
>>> file. I have
>>> used it to add support for gzipped files in the jhat library (which 
>>> is only used
>>> in tests). In jhat hat for example, the resolution of references is 
>>> done via
>>> random access. And the file also contains all the functionality MAT 
>>> would
>>> need.
>>>
>>> You can generate a more or less equivalent file if you use pigz with 
>>> the --
>>> independent option. But to make it easier to detect that the gzip 
>>> file is
>>> chunked (without decompressing it first), I've added a comment 
>>> marking it as
>>> a hprof file with a given chunk size. This would be missing from the 
>>> pigz file,
>>> but they instead adding 9 bytes when --independent is specified (00 
>>> 00 ff ff
>>> 00 00 00 ff ff), so you could detect it too.
>>>
>>> To summarize, the gzipped hprof file created by the VM makes it much
>>> easier for tools to access them efficiently at random positions. You 
>>> can do
>>> something equivalent with pigz, but not with gzip.
>>>
>>> And getting support for this type of gzipped hprof file by the heap 
>>> dump
>>> tools will be much easier, if this is the format the openjdk 
>>> produces, so it will
>>> be widespread.
>>>
>>> Best regards,
>>> Ralf
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> Sent: Donnerstag, 20. Februar 2020 00:59
>>> To: Ioi Lam <ioi.lam at oracle.com>; Schmelter, Ralf
>>> <ralf.schmelter at sap.com>; serguei.spitsyn at oracle.com; hotspot-runtime-
>>> dev at openjdk.java.net runtime <hotspot-runtime-dev at openjdk.java.net>
>>> Cc: serviceability-dev at openjdk.java.net
>>> Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap
>>> dump
>>>
>>> Hi,
>>>
>>> Generally I agree with Ioi, but I think it is not a problem only for 
>>> gzipped heap
>>> dump.
>>>
>>> For example, Compiler.codelist and Compiler.CodeHeap_Analytics might be
>>> large text.
>>> In addition, some users want to redirect the result from jcmd to other
>>> command or log collector.
>>>
>>> So I think it would be better if jcmd provides stdout redurect 
>>> option to all
>>> subocmmands. E.g.
>>>
>>>     $ jcmd <PID> GC.heap_dump -stdout | gzip -c - > heapdump.hprof.gz
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>