RFR (XXS): 7110173: GCNotifier::pushNotification publishes stale data.
John Cuthbertson
john.cuthbertson at oracle.com
Fri Nov 18 18:31:54 UTC 2011
Hi Mandy,
Yes, thanks, and thanks to everyone. This clarifies what needs to be
done. I'll make the change and a modified test case against all the
collectors.
JohnC
On 11/17/11 08:05, Mandy Chung wrote:
>
>
> On 11/17/2011 5:43 AM, Frederic Parain wrote:
>> Hi Mandy,
>>
>> It would have been nice to have notifications providing
>> detailed information about CMS different phases, but it
>> doesn't seem easy to do right now. I did the experiments
>> with CMS using JConsole, and the result is not really
>> satisfactory:
>> - several notifications are sent for each CMS cycle
>> but the notification lacks the information to identify
>> the CMS phase associated with the notification (worse
>> than that, they all claim they notify the "end of a major
>> GC cycle" which is certainly not true for CMS.
>> - according to the TraceCMSMemoryManagerStats constructor
>> code, some information reported by the notification
>> are inaccurate because the stats recorded for the current
>> phase are still in _current_gc_stat while the notification
>> code is using _last_gc_stat (so basically some notifications
>> contain data from the previous GC cycle instead of data
>> from the current cycle)
>
> I suggest to create a CR for this problem and fix it later.
>>
>> So I agree that moving the pushNotification to inside the "if
>> (countCollection)" statement would be a correct fix for CMS.
>>
>> +1 for the renaming of the 'countCollection' variable.
>>
>
> John - hope this clarifies it. Sending notification only if
> countCollection is true would be the right fix.
>
> Thanks
> Mandy
>
>> Fred
>>
>> On 11/16/11 7:23 PM, Mandy Chung wrote:
>>> John, Frederic,
>>>
>>> I looked at the implementation and I still think pushNotification
>>> should only be called *once* at the end of the GC cycle but not
>>> every MemoryManager::gc_end call (might be better to rename
>>> gc_end to something else).
>>>
>>> MemoryManager::gc_end is called at the destructor of
>>> TraceMemoryManagerStats and also TraceCMSMemoryManagerStats (
>>> a subclass of TraceMemoryManagerStats). For the stop-the-world
>>> collectors, there is a clear begin and end of GC cycle. But
>>> for CMS, it has 3 stop-the-world phases. That's the reason why
>>> it has a list of flags to specify what to record since at different
>>> CMS phase, it will record different stats. At the end of entire
>>> CMS cycle, it will increment the collector count and gc end time
>>> etc. So the GC notification should be pushed only at the end of
>>> one collection cycle (for CMS case - would be SWEEPING phase).
>>>
>>> Below is the code of the TraceCMSMemoryManagerStats constructor
>>> that would make it clear:
>>>
>>> switch (phase) {
>>> case CMSCollector::InitialMarking:
>>> initialize(true /* fullGC */ ,
>>> cause /* cause of the GC */,
>>> true /* recordGCBeginTime */,
>>> true /* recordPreGCUsage */,
>>> false /* recordPeakUsage */,
>>> false /* recordPostGCusage */,
>>> true /* recordAccumulatedGCTime */,
>>> false /* recordGCEndTime */,
>>> false /* countCollection */ );
>>> break;
>>>
>>> case CMSCollector::FinalMarking:
>>> initialize(true /* fullGC */ ,
>>> cause /* cause of the GC */,
>>> false /* recordGCBeginTime */,
>>> false /* recordPreGCUsage */,
>>> false /* recordPeakUsage */,
>>> false /* recordPostGCusage */,
>>> true /* recordAccumulatedGCTime */,
>>> false /* recordGCEndTime */,
>>> false /* countCollection */ );
>>> break;
>>>
>>> case CMSCollector::Sweeping:
>>> initialize(true /* fullGC */ ,
>>> cause /* cause of the GC */,
>>> false /* recordGCBeginTime */,
>>> false /* recordPreGCUsage */,
>>> true /* recordPeakUsage */,
>>> true /* recordPostGCusage */,
>>> false /* recordAccumulatedGCTime */,
>>> true /* recordGCEndTime */,
>>> true /* countCollection */ );
>>> break;
>>>
>>> default:
>>> ShouldNotReachHere();
>>> }
>>>
>>> I believe moving the pushNotification to inside the "if
>>> (countCollection)"
>>> statement would be a correct fix. It might be better to rename
>>> "countCollection" variable to something like "gcCompleted".
>>>
>>> I suggest to run
>>> jdk/test/com/sun/management/GarbageCollectorMXBean/GarbageCollectionNotificationContentTest.java
>>>
>>>
>>>
>>> on CMS and see how many notifications are sent and what data it
>>> includes. You may need to modify it to print any other necessary
>>> information. I can help further next week once I finish a few
>>> things that I committed to complete this week.
>>>
>>> Mandy
>>>
>>>
>>> On 11/16/11 2:00 AM, Frederic Parain wrote:
>>>> Hi,
>>>>
>>>> The reason why the memory usage data are bundled with the notification
>>>> is to prevent data lost. If the notification doesn't provide the
>>>> memory usage data, the client has to perform a getLastGcInfo() call
>>>> to retrieve the data each time it receives a notification. But several
>>>> GC cycles might occur between the time the notification is received
>>>> and
>>>> the time getLastGcInfo() is invoked. And there's no API to get memory
>>>> usage data older than the one from the last GC cycle. With the
>>>> notification including the memory usage data, the client is ensured
>>>> to get data for all GC cycles.
>>>>
>>>> That said, the notification contains two memory usage datasets, one
>>>> created at the beginning of the GC cycle and one created at the end
>>>> of the cycle. There's no reason to condition the notification to
>>>> the recording of the post GC cycle data recording. It's reasonable
>>>> to send a notification for a GC which only collects the pre GC data
>>>> for instance. So I think it's ok to move the notification code out
>>>> of the recordPostGCUsage==true condition as long as the values in
>>>> the non recorded memory usage data set can clearly be identified
>>>> as being invalid. Right now they are initialized to zero, so it
>>>> looks ok to me.
>>>>
>>>> Moving the notification code after the "if(countCollection)" block
>>>> looks good too.
>>>>
>>>> However, I'm not an official reviewer, so you'll need more approvals
>>>> for this changeset.
>>>>
>>>> Fred
>>>>
>>>> On 11/16/11 03:26 AM, Mandy Chung wrote:
>>>>> I'm including Frederic and serviceability-dev since this is part
>>>>> of the
>>>>> instrumentation done for serviceability.
>>>>>
>>>>> On 11/15/2011 4:59 PM, David Holmes wrote:
>>>>>> Hi John,
>>>>>>
>>>>>> On 16/11/2011 3:33 AM, John Cuthbertson wrote:
>>>>>>> Thanks for the review. I did and I did that consciously. As you
>>>>>>> say -
>>>>>>> with the current code the listener would only be notified if the
>>>>>>> _recordPostGCUsage field of the TraceMemoryManagerStats object
>>>>>>> is true
>>>>>>> (which it is by default) and none of the TraceMemoryManagerStats
>>>>>>> instances created by the collectors change this. But there's
>>>>>>> nothing to
>>>>>>> stop a collector creating a TraceMemoryManagerStats object with
>>>>>>> _recordPostGCUsage false and_recordGCEndTime true. In this case
>>>>>>> wouldn't
>>>>>>> we still want to notify the listener? I may be wrong (in which case
>>>>>>> I'll
>>>>>>> add the extra guard) but I believe that recording (some) data and
>>>>>>> notification should be two independent operations.
>>>>>>
>>>>>> It depends on what data the listener is expecting to receive. If you
>>>>>> push the notification when there hasn't been an update does that
>>>>>> make
>>>>>> sense? I don't know what the spec is for this.
>>>>>>
>>>>>
>>>>> The GC notifier and pushNotification call was added as this
>>>>> changeset:
>>>>> http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/78542e2b5e35
>>>>>
>>>>> The spec for this is:
>>>>> http://download.oracle.com/javase/7/docs/jre/api/management/extension/com/sun/management/GarbageCollectionNotificationInfo.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>> Anyway I just wanted to flag the change in potential behaviour. If
>>>>>> no-one from GC has any issue with this then it's fine by me.
>>>>>>
>>>>>
>>>>> I think David is right that it should check
>>>>> recordPostGCUsage==true to
>>>>> push a notification. The notification is sent with the last GC
>>>>> statistics. If my memory serves correctly, if recordPostGCUsage is
>>>>> false, it doesn't update the last GC usage but sending a notification
>>>>> when recordPostGCUsage is false seems to be incorrect. Also, the
>>>>> recording was specifically added for CMS since CMS has separate
>>>>> phases
>>>>> that it needs to record different things.
>>>>>
>>>>> Frederic would be the best person to comment on this. Frederic - it
>>>>> seems to me that the pushNotification can be moved to the end of
>>>>> gc_end
>>>>> function but within the if (countCollection) statement. Sorry I don't
>>>>> have time to check the details out. It'd be good if you can help.
>>>>>
>>>>> Thanks
>>>>> Mandy
>>>>>
>>>>>> David
>>>>>> -----
>>>>>>
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> JohnC
>>>>>>>
>>>>>>> On 11/14/11 22:35, David Holmes wrote:
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> On 15/11/2011 4:46 AM, John Cuthbertson wrote:
>>>>>>>>> Can I have a couple of volnteers to review the fix for this
>>>>>>>>> CR? The
>>>>>>>>> webrev can be found at:
>>>>>>>>> http://cr.openjdk.java.net/~johnc/7110173/webrev.0/.
>>>>>>>>
>>>>>>>> In the original code the pushNotification is conditional on
>>>>>>>> recordPostGCUsage, but you've removed that guard by moving the
>>>>>>>> code.
>>>>>>>>
>>>>>>>> David
>>>>>>>> -----
>>>>>>>>
>>>>>>>>> The issue here was that the routine
>>>>>>>>> GCNotifier::pushNotification(),
>>>>>>>>> which uses GC data held in GCMemoryManager::_last_gc_stat, was
>>>>>>>>> being
>>>>>>>>> called before the values in GCMemoryManager::_last_gc_stat were
>>>>>>>>> being
>>>>>>>>> populated for the current GC. As a result the JVM could pass
>>>>>>>>> uninitialized or stale data to a listener. The fix is to move the
>>>>>>>>> call
>>>>>>>>> to GCNotifier::pushNotification() after the code that populates
>>>>>>>>> GCMemoryManager::_last_gc_stat. I also modified the GCStatInfo
>>>>>>>>> constructor to fully initialize instances of that class.
>>>>>>>>>
>>>>>>>>> Testing: The supplied test case on Windows, a crafted test
>>>>>>>>> case on
>>>>>>>>> Solaris, and the nsk GC tests.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> JohnC
>>>>>>>
>>>>
>>>
>>
More information about the hotspot-gc-dev
mailing list