Low-Overhead Heap Profiling

Mon Oct 23 15:27:50 UTC 2017

Dear all,

Small update this week with this new webrev:
  - http://cr.openjdk.java.net/~rasbold/8171119/webrev.13/
  - Incremental is here:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/

I patched the code changes showed by Robbin last week and I refactored
collectedHeap.cpp:
http://cr.openjdk.java.net/~rasbold/8171119/webrev.12_13/src/hotspot/share/gc/shared/collectedHeap.cpp.patch

The original code became a bit too complex in my opinion with the
handle_heap_sampling handling too many things. So I subdivided the logic
into two smaller methods and moved out a bit of the logic to make it more
clear. Hopefully it is :)

Let me know if you have any questions/comments :)
Jc

On Mon, Oct 16, 2017 at 9:34 AM, JC Beyler <jcbeyler at google.com> wrote:

> Hi Robbin,
>
> That is because version 11 to 12 was only a test change. I was going to
> write about it and say here are the webrev links:
> Incremental:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
>
> Full webrev:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.12/
>
> This change focused only on refactoring the tests to be more manageable,
> readable, maintainable. As all tests are looking at allocations, I moved
> common code to a java class:
> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11_12/
> test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
> HeapMonitor.java.patch
>
> And then most tests call into that class to turn on/off the sampling,
> allocate, etc. This has removed almost 500 lines of test code so I'm happy
> about that.
>
> Thanks for your changes, a bit of relics of previous versions :). I've
> already integrated them into my code and will make a new webrev end of this
> week with a bit of refactor of the code handling the tlab slow path. I find
> it could use a bit of refactoring to make it easier to follow so I'm going
> to take a stab at it this week.
>
> Any other issues/comments?
>
> Thanks!
> Jc
>
>
> On Mon, Oct 16, 2017 at 8:46 AM, Robbin Ehn <robbin.ehn at oracle.com> wrote:
>
>> Hi JC,
>>
>> I saw a webrev.12 in the directory, with only test changes(11->12), so I
>> took that version.
>> I had a look and tested the tests, worked fine!
>>
>> First glance at the code (looking at full v12) some minor things below,
>> mostly unused stuff.
>>
>> Thanks, Robbin
>>
>> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.cpp
>> --- a/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
>> 16:54:06 2017 +0200
>> +++ b/src/hotspot/share/runtime/heapMonitoring.cpp      Mon Oct 16
>> 17:42:42 2017 +0200
>> @@ -211,2 +211,3 @@
>>    void initialize(int max_storage) {
>> +    // validate max_storage to sane value ? What would 0 mean ?
>>      MutexLocker mu(HeapMonitor_lock);
>> @@ -227,8 +228,4 @@
>>    bool initialized() { return _initialized; }
>> -  volatile bool *initialized_address() { return &_initialized; }
>>
>>   private:
>> -  // Protects the traces currently sampled (below).
>> -  volatile intptr_t _stack_storage_lock[1];
>> -
>>    // The traces currently sampled.
>> @@ -313,3 +310,2 @@
>>    _initialized(false) {
>> -    _stack_storage_lock[0] = 0;
>>  }
>> @@ -532,13 +528,2 @@
>>
>> -// Delegate the initialization question to the underlying storage system.
>> -bool HeapMonitoring::initialized() {
>> -  return StackTraceStorage::storage()->initialized();
>> -}
>> -
>> -// Delegate the initialization question to the underlying storage system.
>> -bool *HeapMonitoring::initialized_address() {
>> -  return
>> -      const_cast<bool*>(StackTraceStorage::storage()->initialized_
>> address());
>> -}
>> -
>>  void HeapMonitoring::get_live_traces(jvmtiStackTraces *traces) {
>> diff -r 9047e0d726d6 src/hotspot/share/runtime/heapMonitoring.hpp
>> --- a/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
>> 16:54:06 2017 +0200
>> +++ b/src/hotspot/share/runtime/heapMonitoring.hpp      Mon Oct 16
>> 17:42:42 2017 +0200
>> @@ -35,3 +35,2 @@
>>    static uint64_t _rnd;
>> -  static bool _initialized;
>>    static jint _monitoring_rate;
>> @@ -92,7 +91,2 @@
>>
>> -  // Is the profiler initialized and where is the address to the
>> initialized
>> -  // boolean.
>> -  static bool initialized();
>> -  static bool *initialized_address();
>> -
>>    // Called when o is to be sampled from a given thread and a given size.
>>
>>
>>
>> On 10/10/2017 12:57 AM, JC Beyler wrote:
>>
>>> Dear all,
>>>
>>> Thread-safety is back!! Here is the update webrev:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/
>>>
>>> Full webrev is here:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.11/
>>>
>>> In order to really test this, I needed to add this so thought now was a
>>> good time. It required a few changes here for the creation to ensure
>>> correctness and safety. Now we keep the static pointer but clear the data
>>> internally so on re-initialize, it will be a bit more costly than before. I
>>> don't think this is a huge use-case so I did not think it was a problem. I
>>> used the internal MutexLocker, I think I used it well, let me know.
>>>
>>> I also added three tests:
>>>
>>> 1) Stack depth test:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorStackDepthTest.java.patch
>>>
>>> This test shows that the maximum stack depth system is working.
>>>
>>> 2) Thread safety:
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorThreadTest.java.patch
>>>
>>> The test creates 24 threads and they all allocate at the same time. The
>>> test then checks it does find samples from all the threads.
>>>
>>> 3) Thread on/off safety
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10_11/tes
>>> t/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/H
>>> eapMonitorThreadOnOffTest.java.patch
>>>
>>> The test creates 24 threads that all allocate a bunch of memory. Then
>>> another thread turns the sampling on/off.
>>>
>>> Btw, both tests 2 & 3 failed without the locks.
>>>
>>> As I worked on this, I saw a lot of places where the tests are doing
>>> very similar things, I'm going to clean up the code a bit and make a
>>> HeapAllocator class that all tests can call directly. This will greatly
>>> simplify the code.
>>>
>>> Thanks for any comments/criticisms!
>>> Jc
>>>
>>>
>>> On Mon, Oct 2, 2017 at 8:52 PM, JC Beyler <jcbeyler at google.com <mailto:
>>> jcbeyler at google.com>> wrote:
>>>
>>>     Dear all,
>>>
>>>     Small update to the webrev:
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.09_10/>
>>>
>>>     Full webrev is here:
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/>
>>>
>>>     I updated a bit of the naming, removed a TODO comment, and I added a
>>> test for testing the sampling rate. I also updated the maximum stack depth
>>> to 1024, there is no
>>>     reason to keep it so small. I did a micro benchmark that tests the
>>> overhead and it seems relatively the same.
>>>
>>>     I compared allocations from a stack depth of 10 and allocations from
>>> a stack depth of 1024 (allocations are from the same helper method in
>>>     http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_fi
>>> les/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/
>>> MyPackage/HeapMonitorStatRateTest.java
>>>     <http://cr.openjdk.java.net/~rasbold/8171119/webrev.10/raw_f
>>> iles/new/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor
>>> /MyPackage/HeapMonitorStatRateTest.java>):
>>>                - For an array of 1 integer allocated in a loop; stack
>>> depth 1024 vs stack depth 10: 1% slower
>>>                - For an array of 200k integers allocated in a loop;
>>> stack depth 1024 vs stack depth 10: 3% slower
>>>
>>>     So basically now moving the maximum stack depth to 1024 but we only
>>> copy over the stack depths actually used.
>>>
>>>     For the next webrev, I will be adding a stack depth test to show
>>> that it works and probably put back the mutex locking so that we can see
>>> how difficult it is to keep
>>>     thread safe.
>>>
>>>     Let me know what you think!
>>>     Jc
>>>
>>>
>>>
>>>     On Mon, Sep 25, 2017 at 3:02 PM, JC Beyler <jcbeyler at google.com
>>> <mailto:jcbeyler at google.com>> wrote:
>>>
>>>         Forgot to say that for my numbers:
>>>           - Not in the test are the actual numbers I got for the various
>>> array sizes, I ran the program 30 times and parsed the output; here are the
>>> averages and standard
>>>         deviation:
>>>                1000:     1.28% average; 1.13% standard deviation
>>>                10000:    1.59% average; 1.25% standard deviation
>>>                100000:   1.26% average; 1.26% standard deviation
>>>
>>>         The 1000/10000/100000 are the sizes of the arrays being
>>> allocated. These are allocated 100k times and the sampling rate is 111
>>> times the size of the array.
>>>
>>>         Thanks!
>>>         Jc
>>>
>>>
>>>         On Mon, Sep 25, 2017 at 3:01 PM, JC Beyler <jcbeyler at google.com
>>> <mailto:jcbeyler at google.com>> wrote:
>>>
>>>             Hi all,
>>>
>>>             After a bit of a break, I am back working on this :). As
>>> before, here are two webrevs:
>>>
>>>             - Full change set: http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.09/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.09/>
>>>             - Compared to version 8: http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.08_09/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.08_09/>
>>>                  (This version is compared to version 8 I last showed
>>> but ported to the new folder hierarchy)
>>>
>>>             In this version I have:
>>>                - Handled Thomas' comments from his email of 07/03:
>>>                     - Merged the logging to be standard
>>>                     - Fixed up the code a bit where asked
>>>                     - Added some notes about the code not being
>>> thread-safe yet
>>>                 - Removed additional dead code from the version that
>>> modifies interpreter/c1/c2
>>>                 - Fixed compiler issues so that it compiles with
>>> --disable-precompiled-header
>>>                      - Tested with ./configure --with-boot-jdk=<jdk8>
>>> --with-debug-level=slowdebug --disable-precompiled-headers
>>>
>>>             Additionally, I added a test to check the sanity of the
>>> sampler: HeapMonitorStatCorrectnessTest
>>>             (http://cr.openjdk.java.net/~rasbold/8171119/webrev.08_09/te
>>> st/hotspot/jtreg/serviceability/jvmti/HeapMonitor/MyPackage/
>>> HeapMonitorStatCorrectnessTest.java.patch <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.08_09/test/hotspot/jtreg/serviceabilit
>>> y/jvmti/HeapMonitor/MyPackage/HeapMonitorStatCorrectnessTest.java.patch
>>> >)
>>>                 - This allocates a number of arrays and checks that we
>>> obtain the number of samples we want with an accepted error of 5%. I tested
>>> it 100 times and it
>>>             passed everytime, I can test more if wanted
>>>                 - Not in the test are the actual numbers I got for the
>>> various array sizes, I ran the program 30 times and parsed the output; here
>>> are the averages and
>>>             standard deviation:
>>>                    1000:     1.28% average; 1.13% standard deviation
>>>                    10000:    1.59% average; 1.25% standard deviation
>>>                    100000:   1.26% average; 1.26% standard deviation
>>>
>>>             What this means is that we were always at about 1~2% of the
>>> number of samples the test expected.
>>>
>>>             Let me know what you think,
>>>             Jc
>>>
>>>             On Wed, Jul 5, 2017 at 9:31 PM, JC Beyler <
>>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>>
>>>                 Hi all,
>>>
>>>                 I apologize, I have not yet handled your remarks but
>>> thought this new webrev would also be useful to see and comment on perhaps.
>>>
>>>                 Here is the latest webrev, it is generated slightly
>>> different than the others since now I'm using webrev.ksh without the -N
>>> option:
>>>                 http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.08/>
>>>
>>>                 And the webrev.07 to webrev.08 diff is here:
>>>                 http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07_08/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07_08/>
>>>
>>>                 (Let me know if it works well)
>>>
>>>                 It's a small change between versions but it:
>>>                    - provides a fix that makes the average sample rate
>>> correct (more on that below).
>>>                    - fixes the code to actually have it play nicely with
>>> the fast tlab refill
>>>                    - cleaned up a bit the JVMTI text and now use
>>> jvmtiFrameInfo
>>>                 - moved the capability to be onload solo
>>>
>>>                 With this webrev, I've done a small study of the random
>>> number generator we use here for the sampling rate. I took a small program
>>> and it can be simplified to:
>>>
>>>                 for (outer loop)
>>>                 for (inner loop)
>>>                 int[] tmp = new int[arraySize];
>>>
>>>                 - I've fixed the outer and inner loops to being 800 for
>>> this experiment, meaning we allocate 640000 times an array of a given array
>>> size.
>>>
>>>                 - Each program provides the average sample size used for
>>> the whole execution
>>>
>>>                 - Then, I ran each variation 30 times and then
>>> calculated the average of the average sample size used for various array
>>> sizes. I selected the array size to
>>>                 be one of the following: 1, 10, 100, 1000.
>>>
>>>                 - When compared to 512kb, the average sample size of 30
>>> runs:
>>>                 1: 4.62% of error
>>>                 10: 3.09% of error
>>>                 100: 0.36% of error
>>>                 1000: 0.1% of error
>>>                 10000: 0.03% of error
>>>
>>>                 What it shows is that, depending on the number of
>>> samples, the average does become better. This is because with an allocation
>>> of 1 element per array, it
>>>                 will take longer to hit one of the thresholds. This is
>>> seen by looking at the sample count statistic I put in. For the same number
>>> of iterations (800 *
>>>                 800), the different array sizes provoke:
>>>                 1: 62 samples
>>>                 10: 125 samples
>>>                 100: 788 samples
>>>                 1000: 6166 samples
>>>                 10000: 57721 samples
>>>
>>>                 And of course, the more samples you have, the more
>>> sample rates you pick, which means that your average gets closer using that
>>> math.
>>>
>>>                 Thanks,
>>>                 Jc
>>>
>>>                 On Thu, Jun 29, 2017 at 10:01 PM, JC Beyler <
>>> jcbeyler at google.com <mailto:jcbeyler at google.com>> wrote:
>>>
>>>                     Thanks Robbin,
>>>
>>>                     This seems to have worked. When I have the next
>>> webrev ready, we will find out but I'm fairly confident it will work!
>>>
>>>                     Thanks agian!
>>>                     Jc
>>>
>>>                     On Wed, Jun 28, 2017 at 11:46 PM, Robbin Ehn <
>>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com>> wrote:
>>>
>>>                         Hi JC,
>>>
>>>                         On 06/29/2017 12:15 AM, JC Beyler wrote:
>>>
>>>                             B) Incremental changes
>>>
>>>
>>>                         I guess the most common work flow here is using
>>> mq :
>>>                         hg qnew fix_v1
>>>                         edit files
>>>                         hg qrefresh
>>>                         hg qnew fix_v2
>>>                         edit files
>>>                         hg qrefresh
>>>
>>>                         if you do hg log you will see 2 commits
>>>
>>>                         webrev.ksh -r -2 -o my_inc_v1_v2
>>>                         webrev.ksh -o my_full_v2
>>>
>>>
>>>                         In  your .hgrc you might need:
>>>                         [extensions]
>>>                         mq =
>>>
>>>                         /Robbin
>>>
>>>
>>>                             Again another newbiew question here...
>>>
>>>                             For showing the incremental changes, is
>>> there a link that explains how to do that? I apologize for my newbie
>>> questions all the time :)
>>>
>>>                             Right now, I do:
>>>
>>>                                ksh ../webrev.ksh -m -N
>>>
>>>                             That generates a webrev.zip and send it to
>>> Chuck Rasbold. He then uploads it to a new webrev.
>>>
>>>                             I tried commiting my change and adding a
>>> small change. Then if I just do ksh ../webrev.ksh without any options, it
>>> seems to produce a similar
>>>                             page but now with only the changes I had (so
>>> the 06-07 comparison you were talking about) and a changeset that has it
>>> all. I imagine that is
>>>                             what you meant.
>>>
>>>                             Which means that my workflow would become:
>>>
>>>                             1) Make changes
>>>                             2) Make a webrev without any options to show
>>> just the differences with the tip
>>>                             3) Amend my changes to my local commit so
>>> that I have it done with
>>>                             4) Go to 1
>>>
>>>                             Does that seem correct to you?
>>>
>>>                             Note that when I do this, I only see the
>>> full change of a file in the full change set (Side note here: now the page
>>> says change set and not
>>>                             patch, which is maybe why Serguei was having
>>> issues?).
>>>
>>>                             Thanks!
>>>                             Jc
>>>
>>>
>>>
>>>                             On Wed, Jun 28, 2017 at 1:12 AM, Robbin Ehn <
>>> robbin.ehn at oracle.com <mailto:robbin.ehn at oracle.com> <mailto:
>>> robbin.ehn at oracle.com
>>>                             <mailto:robbin.ehn at oracle.com>>> wrote:
>>>
>>>                                  Hi,
>>>
>>>                                  On 06/28/2017 12:04 AM, JC Beyler wrote:
>>>
>>>                                      Dear Thomas et al,
>>>
>>>                                      Here is the newest webrev:
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/>>
>>>
>>>
>>>
>>>                                  You have some more bits to in there but
>>> generally this looks good and really nice with more tests.
>>>                                  I'll do and deep dive and re-test this
>>> when I get back from my long vacation with whatever patch version you have
>>> then.
>>>
>>>                                  Also I think it's time you provide
>>> incremental (v06->07 changes) as well as complete change-sets.
>>>
>>>                                  Thanks, Robbin
>>>
>>>
>>>
>>>
>>>                                      Thomas, I "think" I have answered
>>> all your remarks. The summary is:
>>>
>>>                                      - The statistic system is up and
>>> provides insight on what the heap sampler is doing
>>>                                           - I've noticed that, though
>>> the sampling rate is at the right mean, we are missing some samples, I have
>>> not yet tracked out why
>>>                             (details below)
>>>
>>>                                      - I've run a tiny benchmark that is
>>> the worse case: it is a very tight loop and allocated a small array
>>>                                           - In this case, I see no
>>> overhead when the system is off so that is a good start :)
>>>                                           - I see right now a high
>>> overhead in this case when sampling is on. This is not a really too
>>> surprising but I'm going to see if
>>>                             this is consistent with our
>>>                                      internal implementation. The
>>> benchmark is really allocation stressful so I'm not too surprised but I
>>> want to do the due diligence.
>>>
>>>                                         - The statistic system up is up
>>> and I have a new test
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>> r/MyPackage/HeapMonitorStatTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorStatTest.java.patch>>
>>>                                            - I did a bit of a study
>>> about the random generator here, more details are below but basically it
>>> seems to work well
>>>
>>>                                         - I added a capability but since
>>> this is the first time doing this, I was not sure I did it right
>>>                                           - I did add a test though for
>>> it and the test seems to do what I expect (all methods are failing with the
>>>                             JVMTI_ERROR_MUST_POSSESS_CAPABILITY error).
>>>                                               -
>>>                             http://cr.openjdk.java.net/~ra
>>> sbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonito
>>> r/MyPackage/HeapMonitorNoCapabilityTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>
>>>                                                                 <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.07/test/
>>> serviceability/jvmti/HeapMonitor/MyPackage/HeapMonitorNoCapa
>>> bilityTest.java.patch
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.07/test/serviceability/jvmti/HeapMonit
>>> or/MyPackage/HeapMonitorNoCapabilityTest.java.patch>>
>>>
>>>                                         - I still need to figure out
>>> what to do about the multi-agent vs single-agent issue
>>>
>>>                                         - As far as measurements, it
>>> seems I still need to look at:
>>>                                           - Why we do the 20 random
>>> calls first, are they necessary?
>>>                                           - Look at the mean of the
>>> sampling rate that the random generator does and also what is actually
>>> sampled
>>>                                           - What is the overhead in
>>> terms of memory/performance when on?
>>>
>>>                                      I have inlined my answers, I think
>>> I got them all in the new webrev, let me know your thoughts.
>>>
>>>                                      Thanks again!
>>>                                      Jc
>>>
>>>
>>>                                      On Fri, Jun 23, 2017 at 3:52 AM,
>>> Thomas Schatzl <thomas.schatzl at oracle.com <mailto:thomas.schatzl at oracle.
>>> com>
>>>                             <mailto:thomas.schatzl at oracle.com <mailto:
>>> thomas.schatzl at oracle.com>> <mailto:thomas.schatzl at oracle.com <mailto:
>>> thomas.schatzl at oracle.com>
>>>
>>>                                      <mailto:thomas.schatzl at oracle.com
>>> <mailto:thomas.schatzl at oracle.com>>>> wrote:
>>>
>>>                                           Hi,
>>>
>>>                                           On Wed, 2017-06-21 at 13:45
>>> -0700, JC Beyler wrote:
>>>                                           > Hi all,
>>>                                           >
>>>                                           > First off: Thanks again to
>>> Robbin and Thomas for their reviews :)
>>>                                           >
>>>                                           > Next, I've uploaded a new
>>> webrev:
>>>                                           >
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.06/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.06/>>>
>>>
>>>                                           >
>>>                                           > Here is an update:
>>>                                           >
>>>                                           > - @Robbin, I forgot to say
>>> that yes I need to look at implementing
>>>                                           > this for the other
>>> architectures and testing it before it is all
>>>                                           > ready to go. Is it common to
>>> have it working on all possible
>>>                                           > combinations or is there a
>>> subset that I should be doing first and we
>>>                                           > can do the others later?
>>>                                           > - I've tested slowdebug,
>>> built and ran the JTreg tests I wrote with
>>>                                           > slowdebug and fixed a few
>>> more issues
>>>                                           > - I've refactored a bit of
>>> the code following Thomas' comments
>>>                                           >    - I think I've handled
>>> all the comments from Thomas (I put
>>>                                           > comments inline below for
>>> the specifics)
>>>
>>>                                           Thanks for handling all those.
>>>
>>>                                           > - Following Thomas' comments
>>> on statistics, I want to add some
>>>                                           > quality assurance tests and
>>> find that the easiest way would be to
>>>                                           > have a few counters of what
>>> is happening in the sampler and expose
>>>                                           > that to the user.
>>>                                           >    - I'll be adding that in
>>> the next version if no one sees any
>>>                                           > objections to that.
>>>                                           >    - This will allow me to
>>> add a sanity test in JTreg about number of
>>>                                           > samples and average of
>>> sampling rate
>>>                                           >
>>>                                           > @Thomas: I had a few
>>> questions that I inlined below but I will
>>>                                           > summarize the "bigger ones"
>>> here:
>>>                                           >    - You mentioned constants
>>> are not using the right conventions, I
>>>                                           > looked around and didn't see
>>> any convention except normal naming then
>>>                                           > for static constants. Is
>>> that right?
>>>
>>>                                           I looked through
>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui <
>>> https://wiki.openjdk.java.net/display/HotSpot/StyleGui>
>>>                             <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>>
>>>                                      <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>
>>>                             <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui <https://wiki.openjdk.java.net
>>> /display/HotSpot/StyleGui>>>
>>>                                           de and the rule is to "follow
>>> an existing pattern and must have a
>>>                                           distinct appearance from other
>>> names". Which does not help a lot I
>>>                                           guess :/ The GC team started
>>> using upper camel case, e.g.
>>>                                           SomeOtherConstant, but very
>>> likely this is probably not applied
>>>                                           consistently throughout. So I
>>> am fine with not adding another style
>>>                                           (like kMaxStackDepth with the
>>> "k" in front with some unknown meaning)
>>>                                           is fine.
>>>
>>>                                           (Chances are you will find
>>> that style somewhere used anyway too,
>>>                                           apologies if so :/)
>>>
>>>
>>>                                      Thanks for that link, now I know
>>> where to look. I used the upper camel case in my code as well then :) I
>>> should have gotten them all.
>>>
>>>
>>>                                            > PS: I've also inlined my
>>> answers to Thomas below:
>>>                                            >
>>>                                            > On Tue, Jun 13, 2017 at
>>> 8:03 AM, Thomas Schatzl <thomas.schatzl at oracl
>>>                                            > e.com <http://e.com> <
>>> http://e.com> <http://e.com>> wrote:
>>>                                            > > Hi all,
>>>                                            > >
>>>                                            > > On Mon, 2017-06-12 at
>>> 11:11 -0700, JC Beyler wrote:
>>>                                            > > > Dear all,
>>>                                            > > >
>>>                                            > > > I've continued working
>>> on this and have done the following
>>>                                            > > webrev:
>>>                                            > > >
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/ <
>>> http://cr.openjdk.java.net/~rasbold/8171119/webrev.05/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>>
>>>                                      <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>
>>>                             <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/ <http://cr.openjdk.java.net/~r
>>> asbold/8171119/webrev.05/>>>
>>>
>>>                                            > >
>>>                                            > > [...]
>>>                                            > > > Things I still need to
>>> do:
>>>                                            > > >    - Have to fix that
>>> TLAB case for the FastTLABRefill
>>>                                            > > >    - Have to start
>>> looking at the data to see that it is
>>>                                            > > consistent and does
>>> gather the right samples, right frequency, etc.
>>>                                            > > >    - Have to check the
>>> GC elements and what that produces
>>>                                            > > >    - Run a slowdebug
>>> run and ensure I fixed all those issues you
>>>                                            > > saw > Robbin
>>>                                            > > >
>>>                                            > > > Thanks for looking at
>>> the webrev and have a great week!
>>>                                            > >
>>>                                            > >   scratching a bit on the
>>> surface of this change, so apologies for
>>>                                            > > rather shallow comments:
>>>                                            > >
>>>                                            > > -
>>> macroAssembler_x86.cpp:5604: while this is compiler code, and I
>>>                                            > > am not sure this is
>>> final, please avoid littering the code with
>>>                                            > > TODO remarks :) They tend
>>> to be candidates for later wtf moments
>>>                                            > > only.
>>>                                            > >
>>>                                            > > Just file a CR for that.
>>>                                            > >
>>>                                            > Newcomer question: what is
>>> a CR and not sure I have the rights to do
>>>                                            > that yet ? :)
>>>
>>>                                           Apologies. CR is a change
>>> request, this suggests to file a bug in the
>>>                                           bug tracker. And you are
>>> right, you can't just create a new account in
>>>                                           the OpenJDK JIRA yourselves. :(
>>>
>>>
>>>                                      Ok good to know, I'll continue with
>>> my own todo list but I'll work hard on not letting it slip in the webrevs
>>> anymore :)
>>>
>>>
>>>                                           I was mostly referring to the
>>> "... but it is a TODO" part of that
>>>                                           comment in
>>> macroassembler_x86.cpp. Comments about the why of the code
>>>                                           are appreciated.
>>>
>>>                                           [Note that I now understand
>>> that this is to some degree still work in
>>>                                           progress. As long as the final
>>> changeset does no contain TODO's I am
>>>                                           fine (and it's not a hard
>>> objection, rather their use in "final" code
>>>                                           is typically limited in my
>>> experience)]
>>>
>>>                                           5603   // Currently, if this
>>> happens, just set back the actual end to
>>>                                           where it was.
>>>                                           5604   // We miss a chance to
>>> sample here.
>>>
>>>                                           Would be okay, if explaining
>>> "this" and the "why" of missing a chance
>>>                                           to sample here would be best.
>>>
>>>                                           Like maybe:
>>>
>>>                                           // If we needed to refill
>>> TLABs, just set the actual end point to
>>>                                           // the end of the TLAB again.
>>> We do not sample here although we could.
>>>
>>>                                      Done with your comment, it works
>>> well in my mind.
>>>
>>>                                           I am not sure whether "miss a
>>> chance to sample" meant "we could, but
>>>                                           consciously don't because it's
>>> not that useful" or "it would be
>>>                                           necessary but don't because
>>> it's too complicated to do.".
>>>
>>>                                           Looking at the original
>>> comment once more, I am also not sure if that
>>>                                           comment shouldn't referring to
>>> the "end" variable (not actual_end)
>>>                                           because that's the variable
>>> that is responsible for taking the sampling
>>>                                           path? (Going from the member
>>> description of ThreadLocalAllocBuffer).
>>>
>>>
>>>                                      I've moved this code and it no
>>> longer shows up here but the rationale and answer was:
>>>
>>>                                      So.. Yes, end is the variable
>>> provoking the sampling. Actual end is the actual end of the TLAB.
>>>
>>>                                      What was happening here is that the
>>> code is resetting _end to point towards the end of the new TLAB. Because,
>>> we now have the end for
>>>                             sampling and _actual_end for
>>>                                      the actual end, we need to update
>>> the actual_end as well.
>>>
>>>                                      Normally, were we to do the real
>>> work here, we would calculate the (end - start) offset, then do:
>>>
>>>                                      - Set the new end to : start +
>>> (old_end - old_start)
>>>                                      - Set the actual end like we do
>>> here now where it because it is the actual end.
>>>
>>>                                      Why is this not done here now
>>> anymore?
>>>                                          - I was still debating which
>>> path to take:
>>>                                             - Do it in the fast refill
>>> code, it has its perks:
>>>                                                 - In a world where fast
>>> refills are happening all the time or a lot, we can augment there the code
>>> to do the sampling
>>>                                             - Remember what we had as an
>>> end before leaving the slowpath and check on return
>>>                                                 - This is what I'm doing
>>> now, it removes the need to go fix up all fast refill paths but if you
>>> remain in fast refill paths,
>>>                             you won't get sampling. I
>>>                                      have to think of the consequences
>>> of that, maybe a future change later on?
>>>                                                    - I have the
>>> statistics now so I'm going to study that
>>>                                                       -> By the way,
>>> though my statistics are showing I'm missing some samples, if I turn off
>>> FastTlabRefill, it is the same
>>>                             loss so for now, it seems
>>>                                      this does not occur in my simple
>>> test.
>>>
>>>
>>>
>>>                                           But maybe I am only confused
>>> and it's best to just leave the comment
>>>                                           away. :)
>>>
>>>                                           Thinking about it some more,
>>> doesn't this not-sampling in this case
>>>                                           mean that sampling does not
>>> work in any collector that does inline TLAB
>>>                                           allocation at the moment? (Or
>>> is inline TLAB alloc automatically
>>>                                           disabled with sampling
>>> somehow?)
>>>
>>>                                           That would indeed be a bigger
>>> TODO then :)
>>>
>>>
>>>                                      Agreed, this remark made me think
>>> that perhaps as a first step the new way of doing it is better but I did
>>> have to:
>>>                                         - Remove the const of the
>>> ThreadLocalBuffer remaining and hard_end methods
>>>                                         - Move hard_end out of the
>>> header file to have a bit more logic there
>>>
>>>                                      Please let me know what you think
>>> of that and if you prefer it this way or changing the fast refills. (I
>>> prefer this way now because it
>>>                             is more incremental).
>>>
>>>
>>>                                           > > - calling
>>> HeapMonitoring::do_weak_oops() (which should probably be
>>>                                           > > called weak_oops_do() like
>>> other similar methods) only if string
>>>                                           > > deduplication is enabled
>>> (in g1CollectedHeap.cpp:4511) seems wrong.
>>>                                           >
>>>                                           > The call should be at least
>>> around 6 lines up outside the if.
>>>                                           >
>>>                                           > Preferentially in a method
>>> like process_weak_jni_handles(), including
>>>                                           > additional logging. (No new
>>> (G1) gc phase without minimal logging
>>>                                           > :)).
>>>                                           > Done but really not sure
>>> because:
>>>                                           >
>>>                                           > I put for logging:
>>>                                           >   log_develop_trace(gc,
>>> freelist)("G1ConcRegionFreeing [other] : heap
>>>                                           > monitoring");
>>>
>>>                                           I would think that "gc, ref"
>>> would be more appropriate log tags for
>>>                                           this similar to jni handles.
>>>                                           (I am als not sure what weak
>>> reference handling has to do with
>>>                                           G1ConcRegionFreeing, so I am a
>>> bit puzzled)
>>>
>>>
>>>                                      I was not sure what to put for the
>>> tags or really as the message. I cleaned it up a bit now to:
>>>                                           log_develop_trace(gc,
>>> ref)("HeapSampling [other] : heap monitoring processing");
>>>
>>>
>>>
>>>                                           > Since weak_jni_handles
>>> didn't have logging for me to be inspired
>>>                                           > from, I did that but
>>> unconvinced this is what should be done.
>>>
>>>                                           The JNI handle processing does
>>> have logging, but only in
>>>                                           ReferenceProcessor::process_discovered_references().
>>> In
>>>                                           process_weak_jni_handles()
>>> only overall time is measured (in a G1
>>>                                           specific way, since only G1
>>> supports disabling reference procesing) :/
>>>
>>>                                           The code in ReferenceProcessor
>>> prints both time taken
>>>                                           referenceProcessor.cpp:254, as
>>> well as the count, but strangely only in
>>>                                           debug VMs.
>>>
>>>                                           I have no idea why this
>>> logging is that unimportant to only print that
>>>                                           in a debug VM. However there
>>> are reviews out for changing this area a
>>>                                           bit, so it might be useful to
>>> wait for that (JDK-8173335).
>>>
>>>
>>>                                      I cleaned it up a bit anyway and
>>> now it returns the count of objects that are in the system.
>>>
>>>
>>>                                           > > - the change doubles the
>>> size of
>>>                                           > >
>>> CollectedHeap::allocate_from_tlab_slow() above the "small and nice"
>>>                                           > > threshold. Maybe it could
>>> be refactored a bit.
>>>                                           > Done I think, it looks
>>> better to me :).
>>>
>>>                                           In
>>> ThreadLocalAllocBuffer::handle_sample() I think the
>>>                                           set_back_actual_end()/pick_next_sample()
>>> calls could be hoisted out of
>>>                                           the "if" :)
>>>
>>>
>>>                                      Done!
>>>
>>>
>>>                                           > > -
>>> referenceProcessor.cpp:261: the change should add logging about
>>>                                           > > the number of references
>>> encountered, maybe after the corresponding
>>>                                           > > "JNI weak reference count"
>>> log message.
>>>                                           > Just to double check, are
>>> you saying that you'd like to have the heap
>>>                                           > sampler to keep in store how
>>> many sampled objects were encountered in
>>>                                           > the
>>> HeapMonitoring::weak_oops_do?
>>>                                           >    - Would a return of the
>>> method with the number of handled
>>>                                           > references and logging that
>>> work?
>>>
>>>                                           Yes, it's fine if
>>> HeapMonitoring::weak_oops_do() only returned the
>>>                                           number of processed weak oops.
>>>
>>>
>>>                                      Done also (but I admit I have not
>>> tested the output yet) :)
>>>
>>>
>>>                                           >    - Additionally, would you
>>> prefer it in a separate block with its
>>>                                           > GCTraceTime?
>>>
>>>                                           Yes. Both kinds of information
>>> is interesting: while the time taken is
>>>                                           typically more important, the
>>> next question would be why, and the
>>>                                           number of references typically
>>> goes a long way there.
>>>
>>>                                           See above though, it is
>>> probably best to wait a bit.
>>>
>>>
>>>                                      Agreed that I "could" wait but, if
>>> it's ok, I'll just refactor/remove this when we get closer to something
>>> final. Either, JDK-8173335
>>>                                      has gone in and I will notice it
>>> now or it will soon and I can change it then.
>>>
>>>
>>>                                           > > -
>>> threadLocalAllocBuffer.cpp:331: one more "TODO"
>>>                                           > Removed it and added it to
>>> my personal todos to look at.
>>>                                           >      > >
>>>                                           > > -
>>> threadLocalAllocBuffer.hpp: ThreadLocalAllocBuffer class
>>>                                           > > documentation should be
>>> updated about the sampling additions. I
>>>                                           > > would have no clue what
>>> the difference between "actual_end" and
>>>                                           > > "end" would be from the
>>> given information.
>>>                                           > If you are talking about the
>>> comments in this file, I made them more
>>>                                           > clear I hope in the new
>>> webrev. If it was somewhere else, let me know
>>>                                           > where to change.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20171023/75f197eb/attachment-0001.html>