jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool

Bengt Rutisson bengt.rutisson at oracle.com
Wed Oct 23 08:31:18 PDT 2013


Hi again Jaroslav,

On 2013-10-23 17:07, Jaroslav Bachorik wrote:
> On 23.10.2013 16:43, Bengt Rutisson wrote:
>>
>> Hi Jaroslav,
>>
>> On 2013-10-23 16:32, Jaroslav Bachorik wrote:
>>> On 23.10.2013 15:15, Bengt Rutisson wrote:
>>>>
>>>> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>>>>> Hi Bengt,
>>>>>
>>>>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>>>>
>>>>>> Hi Jaroslav,
>>>>>>
>>>>>> A couple of questions.
>>>>>>
>>>>>> I don't understand why this is a CMS only problem? Why don't the 
>>>>>> other
>>>>>> collectors have the same issue? I guess it is less likely that the
>>>>>> other
>>>>>> collectors start (or complete) a GC without a lot of allocation 
>>>>>> going
>>>>>> on. But at least G1 should have the same problem.
>>>>>
>>>>> I don't really know. If there are other pools that can have the 
>>>>> "used"
>>>>> value 0 before a GC happens then yes, they are susceptible to the 
>>>>> same
>>>>> problem.
>>>>
>>>> I think all the "old" pools can have 0 used before a GC happens. But
>>>> except for CMS and G1 it is less likely that a GC happens unless 
>>>> you do
>>>> allocations. As long as they keep the 0 used the test will pass. 
>>>> So, my
>>>> guess is that to be on the safe side all "old" pools should make 
>>>> sure to
>>>> do a full GC first.
>>>>
>>>>>
>>>>>>
>>>>>> Also, from the problem description in the CR I would have guessed 
>>>>>> that
>>>>>> you want the GC to happen between these two statements:
>>>>>>
>>>>>> p.setUsageThreshold(1);
>>>>>> MemoryUsage u = p.getUsage();
>>>>>
>>>>> This is all but a heuristic here. The problem lies in the fact 
>>>>> that it
>>>>> is not possible to retrieve the pool usage and the "threshold
>>>>> exceeded" flag consistently in one atomic operation. I might get
>>>>> usable data from the first call and then I don't need to force GC.
>>>>
>>>> Right. This is why I think you want to avoid a GC after you have 
>>>> fetched
>>>> getUsage() but before you do isUsageThresholdExceeded(). With your
>>>> suggested patch you are explicitly inserting a GC at that point. To me
>>>> this sounds like the opposite of what you want to do.
>>>
>>> I've updated the patch. The GC is called even before the first attempt
>>> to get the pool memory usage and System.gc() is used to perform GC (no
>>> MXBean checks). This should simplify the change a bit.
>>>
>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>>
>> Thanks for doing this update so quickly!
>>
>> Have you been able to verify that this change still fixes the issue? I
>> think it should, but it would be good if we could verify it.
>
> Yep, it still fixes the problem. Unfortunatelly, the only way to 
> reproduce the problem locally is to run the test under debugger and 
> invoke GC explicitly between getting the pool memory usage and 
> threshold flag.
>
>>
>> This code worries me a little bit:
>>
>>   114     private static MemoryUsage getUsage(MemoryPoolMXBean p) {
>>   115         MemoryUsage u = null;
>>   116         do  {
>>   117             System.gc();
>>   118             u = p.getUsage();
>>   119         } while (u.getUsed() == 0);
>>   120         return u;
>>   121     }
>>
>> I think one call to System.gc() should be enough. And if it is not, if
>> we still get 0 as used, I think that another System.gc() call will just
>> render the same result. Thus, I'm a bit worried that this will be an
>> endless loop.
>
> Sounds reasonable. My motivation was to try to make sure some objects 
> are promoted to old gen but it seems redundant and in case of 
> non-oldgen pools might not even work :(
>
>>
>> Since the test actually handles the case where used is 0, I think it is
>> enough to just do a single call to System.gc() and then get the usage 
>> data.
>
> Hm, this makes the patch even simpler ...
> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03

Yes, I think this looks simple and good. :-)

Thanks,
Bengt


>
> -JB-
>
>
>>
>> Thanks,
>> Bengt
>>
>>
>>>
>>> -JB-
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Now you have added the GC just after these statements. I thought 
>>>>>> that
>>>>>> was what caused the problem. That you read the usage data at one
>>>>>> point,
>>>>>> then a GC happens and you compare the cached usage
>>>>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>>>>
>>>>>> Looking at the promoteToOldGen() method I assume that the intent is
>>>>>> that
>>>>>> the code should be using the return value. So my guess is that this
>>>>>> code:
>>>>>>
>>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>>    95             promoteToOldGen(p, u);
>>>>>>    96         }
>>>>>>
>>>>>> Should be:
>>>>>>
>>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>>    95             u = promoteToOldGen(p, u);
>>>>>>    96         }
>>>>>
>>>>> Indeed. It was meant to re-fetch the usage after GC.
>>>>
>>>> OK. Good. With this code I think it should work. Now you make sure to
>>>> get the GC before you do getUsage().
>>>>
>>>>>
>>>>>>
>>>>>> With that, I think it might work. But I still don't understand why
>>>>>> this
>>>>>> is only a CMS problem.
>>>>>>
>>>>>> One more question about the promoteToOldGen() and forceGC() 
>>>>>> methods. I
>>>>>> don't really know much about how the different beans work, but 
>>>>>> are we
>>>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the
>>>>>> same
>>>>>> pool names? That is, are you sure that forceGC() actually will do
>>>>>> anything?
>>>>>
>>>>> They use the pool names as reported by the GC infrastracture so they
>>>>> should be the same.
>>>>
>>>> Ok.
>>>>
>>>>>
>>>>>>
>>>>>> As for just doing a System.gc() to force a GC I think you can 
>>>>>> rely on
>>>>>> that System.gc() does a full GC in Hotspot unless someone sets
>>>>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>>>>> relying on Hotspot specifc names for pools I don't think it is a
>>>>>> limitation to the test to rely on the Hotspot implementatoin of
>>>>>> System.gc().
>>>>>
>>>>> Good to know. I guess I could simplify the change and just call
>>>>> System.gc(), after all.
>>>>
>>>> Yes, I think that' simpler.
>>>>
>>>> Thanks,
>>>> Bengt
>>>>
>>>>>
>>>>> Thanks,
>>>>>
>>>>> -JB-
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>> Bengt
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>
>>>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>>>>> I think you can simplify the logic for forcing a GC to just a
>>>>>>>>> simple
>>>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>>>>> collection to happen for all collectors.
>>>>>>>> Hm, will it now? I had the impression that it was just hinting
>>>>>>>> the GC
>>>>>>>> system to perform GC but it might decide to ignore it. I need 
>>>>>>>> to be
>>>>>>>> sure that the GC was performed before continuing - otherwise I 
>>>>>>>> might
>>>>>>>> get inconsistent data again.
>>>>>>> According to the spec it's just a hint, but I think the
>>>>>>> implementation
>>>>>>> happens to be a force. But better safe than sorry. :)
>>>>>>>
>>>>>>> /Staffan
>>>>>>>
>>>>>>>> -JB-
>>>>>>>>
>>>>>>>>> /Staffan
>>>>>>>>>
>>>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>>
>>>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>>
>>>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>>> Please, review the following test fix:
>>>>>>>>>>>>
>>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>>>>> Webrev: 
>>>>>>>>>>>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>>>>
>>>>>>>>>>> Have you considered to force GC when getUsed() == 0 
>>>>>>>>>>> regardless of
>>>>>>>>>>> which
>>>>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>>>>> gen in
>>>>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>>>>> different
>>>>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>>>>> should
>>>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>>>>> space
>>>>>>>>>>> case?).
>>>>>>>>>> Hi Mandy,
>>>>>>>>>>
>>>>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>>>>
>>>>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>>>>
>>>>>>>>>> -JB-
>>>>>>>>>>
>>>>>>>>>>> Mandy
>>>>>>>>>>>
>>>>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>>>>> trigger
>>>>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>>>>> other.
>>>>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>>>>> regard
>>>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>>>>> and the
>>>>>>>>>>>> usage is retrieved before a GC promotes some objects to the 
>>>>>>>>>>>> old
>>>>>>>>>>>> gen
>>>>>>>>>>>> but the usage threshold is checked after the GC has promoted
>>>>>>>>>>>> some
>>>>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>>>>
>>>>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>>>>> "CMS
>>>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>>>>> semi-consistent view.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>
>>>>>>>>>>>> -JB-
>>>>>>
>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list