ZGC logs
Per Liden
per.liden at oracle.com
Thu Apr 23 17:31:32 UTC 2020
On 4/22/20 12:46 PM, Per Liden wrote:
> Hi Kirk,
>
> On 4/21/20 11:23 PM, Kirk Pepperdine wrote:
>> Hi,
>>
>> I’ve been looking at GC log data trying to reconcile some of the numbers.
>>
>> [4.719s][info ][gc,heap ]
>> GC(3) Mark Start Mark End
>> Relocate Start Relocate End High Low
>> [4.719s][info ][gc,heap ] GC(3) Capacity:
>> 1032M (25%) 1176M (29%) 1176M (29%) 1176M
>> (29%) 1176M (29%) 1032M (25%)
>> [4.719s][info ][gc,heap ] GC(3) Reserve:
>> 42M (1%) 42M (1%) 42M (1%) 42M
>> (1%) 42M (1%) 42M (1%)
>> [4.719s][info ][gc,heap ] GC(3) Free:
>> 3064M (75%) 2954M (72%) 3818M (93%) 3704M
>> (90%) 3842M (94%) 2920M (71%)
>> [4.719s][info ][gc,heap ] GC(3) Used:
>> 990M (24%) 1100M (27%) 236M (6%) 350M
>> (9%) 1134M (28%) 212M (5%)
>> [4.719s][info ][gc,heap ] GC(3)
>> Live: - 10M (0%)
>> 10M (0%) 10M (0%) - -
>> [4.719s][info ][gc,heap ] GC(3)
>> Allocated: - 208M (5%)
>> 240M (6%) 764M (19%) - -
>> [4.719s][info ][gc,heap ] GC(3)
>> Garbage: - 979M (24%) 83M
>> (2%) 5M (0%) - -
>> [4.719s][info ][gc,heap ] GC(3) Reclaimed:
>> - - 896M
>> (22%) 974M (24%) - -
>>
>>
>> If I understand what this log is telling me, there is initially 990M
>> of data of which 10M is marked Live leaving 979M of garbage (-1
>> rounding error). During the concurrent mark an additional 208M of data
>> was allocated requiring an additional 144M from committed memory. By
>> Mark End, used increase by 110M, not 208M. That leaves 98M unaccounted
>> for. I have some ideas but I’d feel more comfortable if someone could
>> offer a comment on how to reconcile the live, allocated, garbage and
>> reclaimed numbers with used. For example, I would assume that of the
>> 350M remaining at Relocate End is a mix of live and floating garbage.
>> That said, I cannot seem to reconcile the Used, allocated, and
>> reclaimed in a way that yields 350M. The calculation that seems to
>> work is used at Mark End + allocated - recovered = used @ Relocate
>> Start. That is 1100 + (240-208)-896 = 336. However applying that logic
>> shifted over fails. 236 + (764-240) - (974-896) = 681 != 350.
>>
>> So I’m seem to be missing bits that my starting at zStat hasn’t
>> cleared up.
>>
>> All comments appreciated.
>
> What's confusing, as it doesn't show in the log, is that the "allocated"
> number can be inflated because some page allocations where "undone".
> This can happen in situations where, for example, two Java threads
> competed to allocate a new shared medium page, both threads allocate one
> medium page each, but only one thread will win the race to install that
> page. The thread who lost the race will then immediately free its newly
> allocated page (undo the allocation). However, we only increase the
> "allocated" number when pages are allocated, but we don't decrease it
> when we undo an allocation.
>
> You can confirm that it's cause by "undo" by looking at the statistics
> table and look for the "Memory: Undo Page Allocation" line, which should
> show non-zero numbers. The need to undo allocations is usually a rare
> event, at least on most workloads.
>
> One could argue that the "allocated" number is correct (in the sense
> that these pages where in fact allocated), but I think it would be more
> helpful to log the "allocated - undone" number, as the undo part is more
> of an artifact of how things work internally, and not what a user
> expects to see.
>
> Here's a patch to adjust "allocated" so that it takes undone allocations
> into account. Feel free to try it out and see if the numbers make more
> sense.
>
> diff --git a/src/hotspot/share/gc/z/zPageAllocator.cpp
> b/src/hotspot/share/gc/z/zPageAllocator.cpp
> --- a/src/hotspot/share/gc/z/zPageAllocator.cpp
> +++ b/src/hotspot/share/gc/z/zPageAllocator.cpp
> @@ -287,12 +287,14 @@
> }
>
> void ZPageAllocator::decrease_used(size_t size, bool reclaimed) {
> + // Only pages explicitly released with the reclaimed flag set
> + // counts as reclaimed bytes. This flag is true when a worker
> + // releases a page after relocation, and is false when we
> + // release a page to undo an allocation.
> if (reclaimed) {
> - // Only pages explicitly released with the reclaimed flag set
> - // counts as reclaimed bytes. This flag is typically true when
> - // a worker releases a page after relocation, and is typically
> - // false when we release a page to undo an allocation.
> _reclaimed += size;
> + } else {
> + _allocated -= size;
> }
> _used -= size;
> if (_used < _used_low) {
>
Just a follow up, to say that he above fix has now been integrated.
http://hg.openjdk.java.net/jdk/jdk/rev/5736a55d1389
cheers,
Per
More information about the zgc-dev
mailing list