ZGC logs

Thu Apr 23 17:31:32 UTC 2020

On 4/22/20 12:46 PM, Per Liden wrote:
> Hi Kirk,
> 
> On 4/21/20 11:23 PM, Kirk Pepperdine wrote:
>> Hi,
>>
>> I’ve been looking at GC log data trying to reconcile some of the numbers.
>>
>> [4.719s][info ][gc,heap                     ] 
>> GC(3)                       Mark Start          Mark End               
>> Relocate Start         Relocate End           High               Low
>> [4.719s][info ][gc,heap                     ] GC(3)  Capacity:     
>> 1032M (25%)        1176M (29%)         1176M (29%)        1176M 
>> (29%)        1176M (29%)        1032M (25%)
>> [4.719s][info ][gc,heap                     ] GC(3)   Reserve:         
>> 42M (1%)              42M (1%)              42M (1%)              42M 
>> (1%)           42M (1%)           42M (1%)
>> [4.719s][info ][gc,heap                     ] GC(3)      Free:        
>> 3064M (75%)        2954M (72%)        3818M (93%)        3704M 
>> (90%)        3842M (94%)        2920M (71%)
>> [4.719s][info ][gc,heap                     ] GC(3)      Used:         
>> 990M (24%)        1100M (27%)          236M (6%)            350M 
>> (9%)         1134M (28%)         212M (5%)
>> [4.719s][info ][gc,heap                     ] GC(3)      
>> Live:               -                          10M (0%)              
>> 10M (0%)              10M (0%)             -                  -
>> [4.719s][info ][gc,heap                     ] GC(3) 
>> Allocated:           -                        208M (5%)             
>> 240M (6%)            764M (19%)            -                  -
>> [4.719s][info ][gc,heap                     ] GC(3)   
>> Garbage:          -                        979M (24%)             83M 
>> (2%)                5M (0%)             -                  -
>> [4.719s][info ][gc,heap                     ] GC(3) Reclaimed:         
>> -                              -                       896M 
>> (22%)           974M (24%)            -                  -
>>
>>
>> If I understand what this log is telling me, there is initially 990M 
>> of data of which 10M is marked Live leaving 979M of garbage (-1 
>> rounding error). During the concurrent mark an additional 208M of data 
>> was allocated requiring an additional 144M from committed memory. By 
>> Mark End, used increase by 110M, not 208M. That leaves 98M unaccounted 
>> for. I have some ideas but I’d feel more comfortable if someone could 
>> offer a comment on how to reconcile the live, allocated, garbage and 
>> reclaimed numbers with used. For example, I would assume that of the 
>> 350M remaining at Relocate End is a mix of live and floating garbage. 
>> That said, I cannot seem to reconcile the Used, allocated, and  
>> reclaimed in a way that yields 350M. The calculation that seems to 
>> work is used at Mark End + allocated - recovered = used @ Relocate 
>> Start. That is 1100 + (240-208)-896 = 336. However applying that logic 
>> shifted over fails. 236 + (764-240) - (974-896) = 681 != 350.
>>
>> So I’m seem to be missing bits that my starting at zStat hasn’t 
>> cleared up.
>>
>> All comments appreciated.
> 
> What's confusing, as it doesn't show in the log, is that the "allocated" 
> number can be inflated because some page allocations where "undone". 
> This can happen in situations where, for example, two Java threads 
> competed to allocate a new shared medium page, both threads allocate one 
> medium page each, but only one thread will win the race to install that 
> page. The thread who lost the race will then immediately free its newly 
> allocated page (undo the allocation). However, we only increase the 
> "allocated" number when pages are allocated, but we don't decrease it 
> when we undo an allocation.
> 
> You can confirm that it's cause by "undo" by looking at the statistics 
> table and look for the "Memory: Undo Page Allocation" line, which should 
> show non-zero numbers. The need to undo allocations is usually a rare 
> event, at least on most workloads.
> 
> One could argue that the "allocated" number is correct (in the sense 
> that these pages where in fact allocated), but I think it would be more 
> helpful to log the "allocated - undone" number, as the undo part is more 
> of an artifact of how things work internally, and not what a user 
> expects to see.
> 
> Here's a patch to adjust "allocated" so that it takes undone allocations 
> into account. Feel free to try it out and see if the numbers make more 
> sense.
> 
> diff --git a/src/hotspot/share/gc/z/zPageAllocator.cpp 
> b/src/hotspot/share/gc/z/zPageAllocator.cpp
> --- a/src/hotspot/share/gc/z/zPageAllocator.cpp
> +++ b/src/hotspot/share/gc/z/zPageAllocator.cpp
> @@ -287,12 +287,14 @@
>   }
> 
>   void ZPageAllocator::decrease_used(size_t size, bool reclaimed) {
> +  // Only pages explicitly released with the reclaimed flag set
> +  // counts as reclaimed bytes. This flag is true when a worker
> +  // releases a page after relocation, and is false when we
> +  // release a page to undo an allocation.
>     if (reclaimed) {
> -    // Only pages explicitly released with the reclaimed flag set
> -    // counts as reclaimed bytes. This flag is typically true when
> -    // a worker releases a page after relocation, and is typically
> -    // false when we release a page to undo an allocation.
>       _reclaimed += size;
> +  } else {
> +    _allocated -= size;
>     }
>     _used -= size;
>     if (_used < _used_low) {
> 

Just a follow up, to say that he above fix has now been integrated.

http://hg.openjdk.java.net/jdk/jdk/rev/5736a55d1389

cheers,
Per