[RFR]8215623: Add incremental dump for jmap histo

Mon May 20 15:14:34 UTC 2019

Dear Serguei,
The main reason of using separate file for incremental dump is due to the consideration of parallel incremental dump implementation, so that every heap-iteration thread could dump its own data in  separate file, to avoid using file lock.
I originally made the design of whole file-support, incremental, parallel dump together, and then divided them into three sub-tasks, trying to make each task easy to discuss and review.  In that design, the file=<file> option is for final results, “incrementalHisto.dump.<thread_id>” is for thread “local” incremental dumped data.
For incremental dump, I agree it is better if we can get rid of incremental-file options and using file=<file>, but I think we need to discuss how to make it adaptable for parallel dump. What do you think?

Thanks,
Lin

From: serguei.spitsyn at oracle.com <serguei.spitsyn at oracle.com>
Sent: Saturday, May 18, 2019 2:42 AM
To: 臧琳 <zanglin5 at jd.com>
Cc: Hohensee, Paul <hohensee at amazon.com>; JC Beyler <jcbeyler at google.com>; serviceability-dev at openjdk.java.net
Subject: Re: [RFR]8215623: Add incremental dump for jmap histo

Dear Lin,

Before I go to the details below could you, please, explain why do we need a separate file for incremental dump.
Should we just record the full dump into file=<file> incrementally?
Recording into the full dump file just has to be always incremental.
It can be done by chunks if necessary, so I'm open to consider introducing new chunksize option.

Do I miss anything here?

Thanks,
Serguei

On 5/17/19 03:18, 臧琳 wrote:
Dear Serguei,

在 2019年5月16日，下午1:39，serguei.spitsyn at oracle.com<mailto:serguei.spitsyn at oracle.com> 写道：

On 5/13/19 23:46, 臧琳 wrote:

Dear Serguei,

     Thanks for your comments.

 > > - incremental[:<file_name>], enable the incremental dump of heap, dumped

 > >   data will be saved to, by default it is "IncrementalHisto.dump"

 >

 >  Q1: Should the <file_name> be full path or short name?

 >      Is there any default path? What is the path of the

 > "IncrementalHisto.dump" file?

The original design doesn't have the <file_name> option so the file is hardcoded named "IncrementalHisto.dump" and save to the same path as "file=" specified. Or print to whatever output stream is if "file=" is not set.

> The file option is described as: file=<file> dump data to <file>
> It does not tell anything about the path.

Yes, do you agree that we add a comment in the help info like:
file=<file> dump data to <file>, file can be specified with full path.

With the new design, I suggest firstly parse <file_name>, if the value contains folder path, use the specified path, if not, use same path as "file=" value,  and if "file=" is not set, use output stream. (The reason I prefer to use same path as "file=" is I assume that users prefer to save all data file under the same folder.)

> It needs to be clearly specified.
> What statements do you suggest?
>
> One idea of simplification is to get rid of the default <file_name>
> and to require it to be always specified (non-optional).
>
> Then we could replace this:

>   file=<file> dump data to <file>

>   incremental dump support:

>     incremental[:<file_name>]  enable incremental dump, data will be dumped

>                                to <file_name> (default is "IncrementalHisto.dump")
>
> with this:

>   file=<file> dump data to <file>

>   incremental=<inc_file>  dump incremental data to <inc_file>
I think having a default IncrementalHisto.dump file saved at the same path of the <file> is a way to make incremental easy to use.
IMHO, when user use jmap -histo with "file=<file>”, and want to enable inremental histo, the easiest way is just use "-incremental" flag and all data files will be  saved under the same folder of <file>. They don’t have to consider  the specific filename for incremental additionally. This is the reason I set default value of IncrementalHisto.dump.

But I also want the user to have freedom to use different filename and path for incremental results, so I  make it optional for incremental file_name.

If we can make it non-optional, does it mean that user may have following command:
       Jmap -histo,file=<absoult_path/a/b/c/histo.dump>,incremental:<absoult_path/a/b/c/incrementalHisto.dump> pid
It seems a little bit complicated to me, what do you think?

 > > - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.

 >

 > Q2: Should it be chunk of dump, not chunk of objects?

The purpose of "chunksize" is to decide how many objects' info are dumped at once. for example use "chunksize=1" on a "Xmx1m", there will be at max 1MB/1KB = 1000 chunks, which indicates that there will be 1000 times of file writing when do "jmap -histo".

> I hardly understand the point to know max of objects that can be dumped at once.
> It is more important to know how much memory in the file it is going to take.
> How much of dump memory will take one object?
> Does it vary (does it depend on object types)?

Yes, the dump memory for one object varies from size and types.
The option“chunksize” is for user to control the proportion of heap that the incremental dump can process at a time.  IMO the use scenario is as following:

   when the JVM have an 180GB max heap size, and jmap histo used with chunksize=1g, it means the incremental dump happens when every 1GB heap is scaned, so it does’t has too much incremenal dump, because the incremental dump takes time and may cause jmap -histo work slower.

PS, I think we should support “g”,“m” and “k” instead of using “KB” , do you agree?

 > > - maxfilesize=<N>, size of the incremental data dump file (in KB), when data size

 > >   is larger than maxfilesize, the file is erased and latest data will be written.

 > Q3: What is a relation and limitations between chunksize and maxfilesize?

 >    Should the maxfilesize be multiple of the chunksize?

> The question Q3 above was not unanswered.
> But never mind. Please, see the suggestion below.

If the chunksize it large, and there are too much objects of different classes in heap, the actual filesize can be larger than the maxfilesize.
But I believe this is raraly happened because the size of one class’s histo info only takes several bytes in the final result,  and from the implementation of jmap,
it can find out all loaded classes before doing heap iteration, so the different of result only happens on the object quantity.

 > Q4: The sentence "the file is erased and latest data will be written"

is not clear enough.

 >    Why the whole file needs to be erased

 >    Should the incremental file behave like a cyclic buffer?

 >    If so, then only next chunk needs to be erased.

 >    Then the chunks need to be numbered in order, so the earliest one can be found.

The "maxfilesize" controls the file size not to be too large, so when the dumped data is larger than "maxfilesize", the file is erased and latest data are written.The reason I erase whole file is that chunk data is accumulative, so the latest data includes the previous statistical ones. And this way may make the file easy to read.

I agree that we can add ordered number in chunks, I think it more or less help user to get to know how object distributed in heap.

I think maybe it is reasonable to have the incremental file behave like gclog, when maxfilesize is reached, the file is renamed with numbered suffix, and new file is created to use. so there can be IncrementalHisto.dump.0 and IncrementalHisto.dump.1 etc for large heap.

what do you think?

> I think, it is not a bad idea.
> In general, new incremental feature design does not look simple and clear enough.
> It feels like another step of simplification is needed.

> What about to get rid of the maxfilesize option?
> Then each chunk can be recorded to a separate file IncrementalHisto.dump.<chunk_number>.
> A couple of questions to clarify:
      > - Do want all chunks or just the latest chunk to be saved?

I think usually it is not required to save all chunks. The chunk is incremental, so the new one contains all info that old ones have.
        But having old chunks may help user to know how object is distributed , because one chunk is a fixed proportion of heap, so the different between to a chunk and it’s predecessor can tell the object distribution of the  newly scanned portion of heap.
        The question is do you think these info is necessary? If not, I agree we can get rid of the maxfilesize.

      > - If we save all chunks then what is the point to have the full dump recorded as well?

         IMO, the incremental histo solves two problems:  <1> The jmap histo may stuck if heap is large, so it is useful if we can get intermediate result. <2> incremental info may help user know object distribution of some portion of the heap.
         And I agree that if full dump is successfully gotten, the chunks became less useful.

> The advantages of this approach is that there is no need to describe:
      > - relationship between chunksize and maxfilesize
      > - recording behavior for multiple chunks in the incremental file
      > - what chunks have been recorded into the incremental

I agree that maxfilesize may not be useful because the histo data of chunks are usually not large.  And it sounds good to me that we save chunk data in seperate files named IncrementalHisto.dump.<chunk_number>.
So the problem is how much chunks do you think we need to save? I think the latest chunk is a must, and maybe the previous 3-5 ones?

> But again, this still needs to be clearly specified.
> It would be nice to reach a consensus on a design first.

Totally agree :)

> Thanks,
> Serguei

Again, Thanks for your comments

BRs,
Lin

Thanks,

Lin

________________________________________

From: serguei.spitsyn at oracle.com<mailto:serguei.spitsyn at oracle.com> <serguei.spitsyn at oracle.com><mailto:serguei.spitsyn at oracle.com>

Sent: Saturday, May 11, 2019 2:17:41 AM

To: 臧琳; Hohensee, Paul; JC Beyler

Cc: serviceability-dev at openjdk.java.net<mailto:serviceability-dev at openjdk.java.net>

Subject: Re: [RFR]8215623: Add incremental dump for jmap histo

Dear Lin,

Sorry for the late reply.

I've edited the CSR a little bit to fix some incorrect spots.

Now, a couple of spots are not clear to me.

 > - incremental[:<file_name>], enable the incremental dump of heap, dumped

 >   data will be saved to, by default it is "IncrementalHisto.dump"

  Q1: Should the <file_name> be full path or short name?

      Is there any default path? What is the path of the

"IncrementalHisto.dump" file?

 > - chunksize=<N>, size of objects (in KB) will be dumped in one chunk.

  Q2: Should it be chunk of dump, not chunk of objects?

 > - maxfilesize=<N>, size of the incremental data dump file (in KB),

when data size

 >   is larger than maxfilesize, the file is erased and latest data will

be written.

  Q3: What is a relation and limitations between chunksize and maxfilesize?

      Should the maxfilesize be multiple of the chunksize?

  Q4: The sentence "the file is erased and latest data will be written"

is not clear enough.

      Why the whole file needs to be erased

      Should the incremental file behave like a cyclic buffer?

      If so, then only next chunk needs to be erased.

      Then the chunks need to be numbered in order, so the earliest one

can be found.

      (I do not want you to accept my suggestions right away. It is just

a discussion point.

       You need to prove that your approach is good and clean enough.)

If we resolve the questions (or get into agreement) then I'll update the

CSR as needed.

Thanks,

Serguei

On 5/5/19 00:34, 臧琳 wrote:

Dear All,

      I have updated the CSR at https://bugs.openjdk.java.net/browse/JDK-8222319

      May I ask your help to review it?

      When it is finalized, I will refine the webrev.

BRs,

Lin

Dear Serguei，

          Thanks a lot for your reviewing.

   System.err.println("      incremental dump support:");

+        System.err.println("        chunkcount=<N>    object number counted (in Kilo) to trigger incremental dump");

+        System.err.println("        maxfilesize=<N>   size limit of incremental dump file (in KB)");

 From this description is not clear at all what does the chunkcount mean.

Is it to define how many heap objects are dumped in one chunk?

If so, would it better to name it chunksize instead where chunksize is measured in heap objects?

Then would it better to use the same units to define the maxfilesize as well?

(I'm not insisting on this, just asking.)

The original meaning of  “chunkcount"  is how many objects are dumped in one chunk, and the “maxfilesize” is the limited size of the dump file.

For example, “chunkcount=1, maxfilesize=10” means that intermediated data will be written to the dump file for every 1000 objects, and

when the dump file is larger than 10k，erase the file and rewrite it with the latest dumped data.

The reason I didn’t use object count to control the dump file size is that there can be humongous object, which may cause the file too large.

Do you think use object size instead of chunkcount is a good option? So the two options can be with same units.

BRs,

Lin

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.java.net/pipermail/serviceability-dev/attachments/20190520/d1062660/attachment-0001.html>