jmx-dev RFR 8020467: Inconsistency between usage.getUsed() and isUsageThresholdExceeded() with CMS Old Gen pool

Jaroslav Bachorik jaroslav.bachorik at oracle.com
Wed Oct 23 08:07:13 PDT 2013


On 23.10.2013 16:43, Bengt Rutisson wrote:
>
> Hi Jaroslav,
>
> On 2013-10-23 16:32, Jaroslav Bachorik wrote:
>> On 23.10.2013 15:15, Bengt Rutisson wrote:
>>>
>>> On 2013-10-23 14:55, Jaroslav Bachorik wrote:
>>>> Hi Bengt,
>>>>
>>>> On 23.10.2013 14:40, Bengt Rutisson wrote:
>>>>>
>>>>> Hi Jaroslav,
>>>>>
>>>>> A couple of questions.
>>>>>
>>>>> I don't understand why this is a CMS only problem? Why don't the other
>>>>> collectors have the same issue? I guess it is less likely that the
>>>>> other
>>>>> collectors start (or complete) a GC without a lot of allocation going
>>>>> on. But at least G1 should have the same problem.
>>>>
>>>> I don't really know. If there are other pools that can have the "used"
>>>> value 0 before a GC happens then yes, they are susceptible to the same
>>>> problem.
>>>
>>> I think all the "old" pools can have 0 used before a GC happens. But
>>> except for CMS and G1 it is less likely that a GC happens unless you do
>>> allocations. As long as they keep the 0 used the test will pass. So, my
>>> guess is that to be on the safe side all "old" pools should make sure to
>>> do a full GC first.
>>>
>>>>
>>>>>
>>>>> Also, from the problem description in the CR I would have guessed that
>>>>> you want the GC to happen between these two statements:
>>>>>
>>>>> p.setUsageThreshold(1);
>>>>> MemoryUsage u = p.getUsage();
>>>>
>>>> This is all but a heuristic here. The problem lies in the fact that it
>>>> is not possible to retrieve the pool usage and the "threshold
>>>> exceeded" flag consistently in one atomic operation. I might get
>>>> usable data from the first call and then I don't need to force GC.
>>>
>>> Right. This is why I think you want to avoid a GC after you have fetched
>>> getUsage() but before you do isUsageThresholdExceeded(). With your
>>> suggested patch you are explicitly inserting a GC at that point. To me
>>> this sounds like the opposite of what you want to do.
>>
>> I've updated the patch. The GC is called even before the first attempt
>> to get the pool memory usage and System.gc() is used to perform GC (no
>> MXBean checks). This should simplify the change a bit.
>>
>> http://cr.openjdk.java.net/~jbachorik/8020467/webrev.02
>
> Thanks for doing this update so quickly!
>
> Have you been able to verify that this change still fixes the issue? I
> think it should, but it would be good if we could verify it.

Yep, it still fixes the problem. Unfortunatelly, the only way to 
reproduce the problem locally is to run the test under debugger and 
invoke GC explicitly between getting the pool memory usage and threshold 
flag.

>
> This code worries me a little bit:
>
>   114     private static MemoryUsage getUsage(MemoryPoolMXBean p) {
>   115         MemoryUsage u = null;
>   116         do  {
>   117             System.gc();
>   118             u = p.getUsage();
>   119         } while (u.getUsed() == 0);
>   120         return u;
>   121     }
>
> I think one call to System.gc() should be enough. And if it is not, if
> we still get 0 as used, I think that another System.gc() call will just
> render the same result. Thus, I'm a bit worried that this will be an
> endless loop.

Sounds reasonable. My motivation was to try to make sure some objects 
are promoted to old gen but it seems redundant and in case of non-oldgen 
pools might not even work :(

>
> Since the test actually handles the case where used is 0, I think it is
> enough to just do a single call to System.gc() and then get the usage data.

Hm, this makes the patch even simpler ...
http://cr.openjdk.java.net/~jbachorik/8020467/webrev.03

-JB-


>
> Thanks,
> Bengt
>
>
>>
>> -JB-
>>
>>>
>>>>
>>>>>
>>>>> Now you have added the GC just after these statements. I thought that
>>>>> was what caused the problem. That you read the usage data at one
>>>>> point,
>>>>> then a GC happens and you compare the cached usage
>>>>> data to the new data that p.isUsageThresholdExceeded() will fetch.
>>>>>
>>>>> Looking at the promoteToOldGen() method I assume that the intent is
>>>>> that
>>>>> the code should be using the return value. So my guess is that this
>>>>> code:
>>>>>
>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>    95             promoteToOldGen(p, u);
>>>>>    96         }
>>>>>
>>>>> Should be:
>>>>>
>>>>>    94         if (p.getName().equals("CMS Old Gen")) {
>>>>>    95             u = promoteToOldGen(p, u);
>>>>>    96         }
>>>>
>>>> Indeed. It was meant to re-fetch the usage after GC.
>>>
>>> OK. Good. With this code I think it should work. Now you make sure to
>>> get the GC before you do getUsage().
>>>
>>>>
>>>>>
>>>>> With that, I think it might work. But I still don't understand why
>>>>> this
>>>>> is only a CMS problem.
>>>>>
>>>>> One more question about the promoteToOldGen() and forceGC() methods. I
>>>>> don't really know much about how the different beans work, but are we
>>>>> sure that the MemoryPoolMXBeans and GarbageCollectorMXBeans use the
>>>>> same
>>>>> pool names? That is, are you sure that forceGC() actually will do
>>>>> anything?
>>>>
>>>> They use the pool names as reported by the GC infrastracture so they
>>>> should be the same.
>>>
>>> Ok.
>>>
>>>>
>>>>>
>>>>> As for just doing a System.gc() to force a GC I think you can rely on
>>>>> that System.gc() does a full GC in Hotspot unless someone sets
>>>>> -XX:+DisableExplicitGC on the command line. Considering that you are
>>>>> relying on Hotspot specifc names for pools I don't think it is a
>>>>> limitation to the test to rely on the Hotspot implementatoin of
>>>>> System.gc().
>>>>
>>>> Good to know. I guess I could simplify the change and just call
>>>> System.gc(), after all.
>>>
>>> Yes, I think that' simpler.
>>>
>>> Thanks,
>>> Bengt
>>>
>>>>
>>>> Thanks,
>>>>
>>>> -JB-
>>>>
>>>>>
>>>>> Thanks,
>>>>> Bengt
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 2013-10-23 10:18, Staffan Larsen wrote:
>>>>>> On 23 okt 2013, at 10:12, Jaroslav Bachorik
>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>
>>>>>>> On 23.10.2013 10:08, Staffan Larsen wrote:
>>>>>>>> I think you can simplify the logic for forcing a GC to just a
>>>>>>>> simple
>>>>>>>> call to "System.gc();". AFAIK System.gc() will cause a full
>>>>>>>> collection to happen for all collectors.
>>>>>>> Hm, will it now? I had the impression that it was just hinting
>>>>>>> the GC
>>>>>>> system to perform GC but it might decide to ignore it. I need to be
>>>>>>> sure that the GC was performed before continuing - otherwise I might
>>>>>>> get inconsistent data again.
>>>>>> According to the spec it's just a hint, but I think the
>>>>>> implementation
>>>>>> happens to be a force. But better safe than sorry. :)
>>>>>>
>>>>>> /Staffan
>>>>>>
>>>>>>> -JB-
>>>>>>>
>>>>>>>> /Staffan
>>>>>>>>
>>>>>>>> On 23 okt 2013, at 10:02, Jaroslav Bachorik
>>>>>>>> <jaroslav.bachorik at oracle.com> wrote:
>>>>>>>>
>>>>>>>>> On 22.10.2013 22:04, Mandy Chung wrote:
>>>>>>>>>> Hi Jaroslav,
>>>>>>>>>>
>>>>>>>>>> On 10/22/13 6:47 AM, Jaroslav Bachorik wrote:
>>>>>>>>>>> Please, review the following test fix:
>>>>>>>>>>>
>>>>>>>>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8020467
>>>>>>>>>>> Webrev: http://cr.openjdk.java.net/~jbachorik/8020467/webrev.01
>>>>>>>>>>>
>>>>>>>>>> Have you considered to force GC when getUsed() == 0 regardless of
>>>>>>>>>> which
>>>>>>>>>> memory pool it is?  This will avoid special casing for CMS old
>>>>>>>>>> gen in
>>>>>>>>>> the test and will handle similar issue in the future for a
>>>>>>>>>> different
>>>>>>>>>> collector implementation.  To make the test reliable, the test
>>>>>>>>>> should
>>>>>>>>>> still pass if the memory pool has no object in it (G1 survivor
>>>>>>>>>> space
>>>>>>>>>> case?).
>>>>>>>>> Hi Mandy,
>>>>>>>>>
>>>>>>>>> I don't know whether GC will help for other pools - but I can
>>>>>>>>> enable it for all pools - it should not hurt.
>>>>>>>>>
>>>>>>>>> The test should pass even with on object in the monitored pool
>>>>>>>>> since the pool should not report an exceeded threshold.
>>>>>>>>>
>>>>>>>>> -JB-
>>>>>>>>>
>>>>>>>>>> Mandy
>>>>>>>>>>
>>>>>>>>>>> The test tries to make sure that the "pool usage threshold"
>>>>>>>>>>> trigger
>>>>>>>>>>> and the reported pool memory usage are not contradicting each
>>>>>>>>>>> other.
>>>>>>>>>>> The problem is that it is not possible to get the "pool usage
>>>>>>>>>>> threshold exceeded" flag and the pool memory usage atomicly in
>>>>>>>>>>> regard
>>>>>>>>>>> to the GC. Specifically, when "CMS Old Gen" pool is examined
>>>>>>>>>>> and the
>>>>>>>>>>> usage is retrieved before a GC promotes some objects to the old
>>>>>>>>>>> gen
>>>>>>>>>>> but the usage threshold is checked after the GC has promoted
>>>>>>>>>>> some
>>>>>>>>>>> instance into the old gen the test will fail.
>>>>>>>>>>>
>>>>>>>>>>> The patch makes sure that there are some instances promoted in
>>>>>>>>>>> "CMS
>>>>>>>>>>> Old Gen" before checking the "pool usage threshold" to get
>>>>>>>>>>> semi-consistent view.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> -JB-
>>>>>
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list