RFR(L) 8237354: Add option to jcmd to write a gzipped heap dump
Laurence Cable
larry.cable at oracle.com
Sat Feb 22 16:20:40 UTC 2020
On 2/21/20 5:19 PM, Ioi Lam wrote:
> Ralf and Christoph,
>
> I agree that making it easy for the user is important, so dependency
> on an external program like pgzip will be a hassle.
>
> How about implementing the compression in a Java program? Will
> something like this be too much of a hassle?
>
> jcmd $PID GC.dump -stdout | java -jar HeapDumpZipper.jar > heap.gz
we could integrate the compression into cmd itself?
>
> This way, we can implement the exact compression algorithm as Ralf
> described, without making it part of the VM. Writing it in Java
> probably would be easier to maintain.
>
> If it makes sense, we can include the Java code as part of the JDK, so
> there's no need to ship a separate JAR file to the user.
>
> jcmd $PID GC.dump -stdout | java jdk.internal.heapdump.Zipper >
> heap.gz
>
> Thanks
> - Ioi
>
> On 2/21/20 8:35 AM, Langer, Christoph wrote:
>> Hi all,
>>
>> let me share my thoughts after going through this mail thread and
>> interrogating Ralf quite a bit about the feature .
>>
>> First of all, I very much value the discussion and the points brought
>> up here. When deciding about the introduction of an enhancement or a
>> new feature, it's always wise to thoroughly discuss it and value
>> benefits against maintenance cost incurred. However, in this case I'm
>> at a point where I would really like to see this going in. Let me
>> elaborate on this.
>>
>> In the mail cited below, I think Ralf enumerates all the benefits
>> quite comprehensively. With the gzip feature built into the
>> heapdumper, we'll get the option to easily have the VM dump its heap
>> in a space saving format in the same time (or even a bit quicker)
>> than we currently can get fully exploded hprofs. There's no need for
>> additional configuration steps and arrangements, just a simple
>> additional option in the existing jcmd. And with the slightly updated
>> dump format, tool builders will get options to improve handling of
>> compressed heap dumps.
>>
>> Speaking as somebody who has to do customer support once in a while,
>> I can't tell you how valuable it is to be able to give the customer
>> simple instructions that just work when it comes to directing them to
>> provide diagnosis data. And that's clearly a point here. Also, given
>> the loads of different deployment scenarios of JVM applications, e.g.
>> cloud, containers, monolith servers... it's really good to have
>> simple options.
>>
>> On the other hand, that's true, the change introduces a bit of
>> additional complexity. But, without looking into the new code in all
>> details, I think the amount is acceptable. Most of the code really
>> only touches a distinct module for dumping the heap (heapdumper.cpp).
>> Some additional 600 lines of code (the file already had 2000 before).
>> But the code actually is not messing too deep with hotspot internals,
>> so it should be quite maintainable. The rest of the code is a few
>> lines about enhancing the dcmd and some additional access points into
>> zlib. Furthermore, it brings a bit of testing code, but that is a
>> good thing. So, this should really be acceptable - given that Ralf is
>> around to support this once it's checked in and there's also the rest
>> of the SAP team which will be able to help out here.
>>
>> The ideas collected in this thread that go beyond this change, e.g.
>> the possibility to dump the heap out to the network, the option to
>> get heapdumps out to the jcmd and also the potential enhancements to
>> the -XX: HeapDumpBeforeFullGC, -XX: HeapDumpAfterFullGC and
>> -XX:HeapDumpOnOutOfMemoryError are partly orthogonal and are probably
>> worth pursuing on their own.
>>
>> So I really think we should allow this enhancement in and start
>> focusing on a good code review .
>>
>> Best regards
>> Christoph
>>> -----Original Message-----
>>> From: hotspot-runtime-dev <hotspot-runtime-dev-
>>> bounces at openjdk.java.net> On Behalf Of Schmelter, Ralf
>>> Sent: Donnerstag, 20. Februar 2020 14:21
>>> To: Yasumasa Suenaga <suenaga at oss.nttdata.com>; Ioi Lam
>>> <ioi.lam at oracle.com>; serguei.spitsyn at oracle.com; hotspot-runtime-
>>> dev at openjdk.java.net runtime <hotspot-runtime-dev at openjdk.java.net>
>>> Cc: serviceability-dev at openjdk.java.net
>>> Subject: [CAUTION] RE: RFR(L) 8237354: Add option to jcmd to write a
>>> gzipped heap dump
>>>
>>> Hi Yasumasa,
>>>
>>> I think it would be great if we could redirect larger chunks data to
>>> jcmd.
>>>
>>> But you have to differentiate between binary data (for the heap
>>> dump) and
>>> text data (for the e.g. codelist).
>>>
>>> Currently jcmd assumes all bytes to be UTF-8 encoded, converts them to
>>> Unicode and then uses the platform encoding to write characters.
>>> This is not
>>> suitable for binary data.
>>>
>>> And of course you cannot use the bufferedStream to get the output to
>>> jcmd.
>>> You would have to implement an outputStream which can directly write to
>>> the AttachListener connection.
>>>
>>>
>>> But even with this change, I would still like the gzip compression
>>> to be done
>>> in the VM. Let me try to list all the advantages I see for doing this:
>>>
>>> 1. It is by far the easiest to use. You just have to specify -gz for
>>> the jcmd.
>>> While your command line (jcmd .... | gzip -c > file) is easy enough,
>>> it assumes
>>> you have gzip (not by default on Windows) and it would be painfully
>>> slow (~
>>> 10 x and more), since it is not parallel. You could use pigz, but it
>>> is not as
>>> ubiquitous as gzip. I know it is sometimes hard to image this could
>>> be a
>>> problem for anyone, but it is.
>>>
>>> It is easy to tell a customer to execute jcmd <pid> GC.heap_dump -gz
>>> test.hprof.gz. Adding additional requirements, especially if it is
>>> external
>>> programs, and your chance of success diminish fast.
>>>
>>>
>>> 2. The -XX:HeapDumpOnOutOfMemoryError, -XX: HeapDumpBeforeFullGC
>>> and -XX: HeapDumpAfterFullGC options can easily create gzipped heap
>>> dumps directly when the compression is in the VM. And especially if you
>>> create more than one dump (with the before/after gc flags),
>>> compression is
>>> very useful. Or if you want to support compressed heap dumps it in the
>>> HotSpotDiagnosticMXBean. Just add a flag and/or compression level.
>>>
>>>
>>> 3. The created gz-file is not a simple gz-file you would get when
>>> simply using
>>> gzip.
>>>
>>> It is created in a way that makes it possible to treat it like a
>>> random access file
>>> without decompressing it.
>>>
>>> Currently for example the Eclipse Memory Analyzer (MAT) has the
>>> option to
>>> directly open a gzipped hprof file and use it without decompression.
>>> And for
>>> the initial parsing, they can just read the file sequentially, so
>>> this is not too
>>> slow.
>>>
>>> But when accessing the values of objects or arrays, they have to
>>> seek to
>>> specific positions in the gzipped hprof file. This is currently
>>> implemented by
>>> having a Java implementation of a InflaterInputStream which is
>>> capable to
>>> completely copy its state. This copy is then used to start
>>> decompressing at
>>> the specific offset for which is was created. As you can imagine,
>>> the state of
>>> the inflater is not small (MAT assumes about 64Kb, 32kB is needed at
>>> least for
>>> the dictionary), so it limits the number of starting positions you
>>> can use for
>>> large files. But it works for all kinds of gzip compressed streams.
>>>
>>> The gzip implementation used to write the heap dump in the VM creates
>>> many small gzip compressed chunks. At the start of each chunk you can
>>> create a fresh GZIPInputStream without having to store any internal
>>> state.
>>> You only need to remember the physical offset and the logical offset
>>> (so 2
>>> long values) for each chunk. If you then want to read data at a
>>> specific logical
>>> offset, you binary search the nearest preceding chunk and create a
>>> GZIPInputStream reading from the physical offset of that chunk. So on
>>> average you have to decompress about half a chunk to get to the data
>>> you
>>> need.
>>>
>>> If you look in the in webrev, you can see
>>> http://cr.openjdk.java.net/~rschmelter/webrevs/8237354/webrev.0/test/lib
>>>
>>> /jdk/test/lib/hprof/parser/GzipRandomAccess.java.html. This implements
>>> the needed logic to treat the gzipped hprof file as a random access
>>> file. I have
>>> used it to add support for gzipped files in the jhat library (which
>>> is only used
>>> in tests). In jhat hat for example, the resolution of references is
>>> done via
>>> random access. And the file also contains all the functionality MAT
>>> would
>>> need.
>>>
>>> You can generate a more or less equivalent file if you use pigz with
>>> the --
>>> independent option. But to make it easier to detect that the gzip
>>> file is
>>> chunked (without decompressing it first), I've added a comment
>>> marking it as
>>> a hprof file with a given chunk size. This would be missing from the
>>> pigz file,
>>> but they instead adding 9 bytes when --independent is specified (00
>>> 00 ff ff
>>> 00 00 00 ff ff), so you could detect it too.
>>>
>>> To summarize, the gzipped hprof file created by the VM makes it much
>>> easier for tools to access them efficiently at random positions. You
>>> can do
>>> something equivalent with pigz, but not with gzip.
>>>
>>> And getting support for this type of gzipped hprof file by the heap
>>> dump
>>> tools will be much easier, if this is the format the openjdk
>>> produces, so it will
>>> be widespread.
>>>
>>> Best regards,
>>> Ralf
>>>
>>> -----Original Message-----
>>> From: Yasumasa Suenaga <suenaga at oss.nttdata.com>
>>> Sent: Donnerstag, 20. Februar 2020 00:59
>>> To: Ioi Lam <ioi.lam at oracle.com>; Schmelter, Ralf
>>> <ralf.schmelter at sap.com>; serguei.spitsyn at oracle.com; hotspot-runtime-
>>> dev at openjdk.java.net runtime <hotspot-runtime-dev at openjdk.java.net>
>>> Cc: serviceability-dev at openjdk.java.net
>>> Subject: Re: RFR(L) 8237354: Add option to jcmd to write a gzipped heap
>>> dump
>>>
>>> Hi,
>>>
>>> Generally I agree with Ioi, but I think it is not a problem only for
>>> gzipped heap
>>> dump.
>>>
>>> For example, Compiler.codelist and Compiler.CodeHeap_Analytics might be
>>> large text.
>>> In addition, some users want to redirect the result from jcmd to other
>>> command or log collector.
>>>
>>> So I think it would be better if jcmd provides stdout redurect
>>> option to all
>>> subocmmands. E.g.
>>>
>>> $ jcmd <PID> GC.heap_dump -stdout | gzip -c - > heapdump.hprof.gz
>>>
>>>
>>> Thanks,
>>>
>>> Yasumasa
>
More information about the serviceability-dev
mailing list