RFR: 8008917 CMS: Concurrent mode failure tracing event

Kevin Walls kevin.walls at oracle.com
Fri Mar 8 02:08:26 PST 2013


Hi Erik - aha yes, so not that rare then for the right kind of program!

I'll talk to you about when/how to do this and the new event.

Thanks
Kevin


On 08/03/13 09:50, Erik Helin wrote:
> Kevin,
>
> On 03/07/2013 05:52 PM, Kevin Walls wrote:
>> Hi Erik -
>>
>> Yes, now you mention it I can see the route to printing the old warning
>> or logging the event twice...
>>
>> I don't think it's reported as a problem, or maybe it's very rare and
>> nobody has spotted it.
>
> It is possible to reproduce the problem with a Java program that 
> simply allocates until OOM:
>
>   import java.util.ArrayList;
>
>   public class Reproducer {
>       public static ArrayList<byte[]> garbage = new ArrayList<byte[]>();
>       public static void main(String[] args) {
>           while (true) {
>               garbage.add(new byte[1024]);
>           }
>       }
>   }
>
> Compile the above Java program and then run:
>
> java -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails 
> -XX:-UseCMSCompactAtFullCollection -Xmx100m Reproducer
>
> Depending on how fast machine you have, the Java program will run for 
> about 2 seconds and the GC logs will show (near the start of the output):
>
> [GC (Allocation Failure) [ParNew: 30720K->30720K(30720K), 0.0001940 
> secs][CMS[CMS-concurrent-mark: 0.035/0.036 secs] [Times: user=0.03 
> sys=0.00, real=0.04 secs]
>  (concurrent mode failure) (concurrent mode failure)[YG occupancy: 
> 30720 K (30720 K)][checkpointRootsFinalWork[Rescan (parallel) , 
> 0.0314620 secs][refProcessingWork[weak refs processing, 0.0000320 
> secs][class unloading, 0.0006770 secs][scrub symbol table, 0.0009850 
> secs][scrub string table, 0.0000690 secs], 0.0021430 secs], 0.0339750 
> secs]: 50655K->50654K(68288K), 0.0972030 secs] 81375K->81374K(99008K), 
> [Metaspace: 2662K->2662K(4486K/110592K)], 0.0980400 secs] [Times: 
> user=0.11 sys=0.00, real=0.10 secs]
>
> Please notice the two "(concurrent mode failure)" above which are 
> printed for the reason I explained in my previous email.
>
> On 03/07/2013 05:52 PM, Kevin Walls wrote:
>> But assuming it's not a "user-requested" collection, to get that false
>> "should_compact" in acquire_control_and_collect, we need to call
>> decide_foreground_collection_type(), and when it calls
>> incremental_collection_will_fail(), that returns false.
>
> This is assuming that the flag UseCMSCompactAtFullCollection is true.
>
> On 03/07/2013 05:52 PM, Kevin Walls wrote:
>> Possibly that is why we don't see the event reported twice in
>> practice***: if we've got to this point, and state>Idling, we are
>> usually here because that inc. collection would fail/is failing...
>
> This is most likely the reason, since UseCMSCompactAtFullCollection is 
> true by default and if users aren't changing it, then your reasoning 
> is correct.
>
> I suggest that we file a separate bug for CMS printing out 
> "(concurrent mode failure)" twice, fix that and then we base your 
> trace code on that.
>
> What do you think?
>
> Thanks,
> Erik
>
>> Thanks
>> Kevin
>>
>> *** if anybody really does hit this, or think it's likely, we can look
>> again...
>>
>>
>>
>> On 06/03/13 18:19, Erik Helin wrote:
>>> Hi Kevin,
>>>
>>> I think that there _might_ be a bug in CMS which was present even
>>> before you added the event tracing.
>>>
>>> If you look in aquire_control_and_collect, you will see that
>>> "should_compact" can be set to false by
>>> decide_foreground_collection_type. If this is the case, then we will
>>> end up in do_mark_sweep_work.
>>>
>>> The problem is that you have already reported, and CMS has already
>>> printed, that a concurrent mode failure has occurred in
>>> acquire_control_and_collect. Then, when you enter do_mark_sweep_work,
>>> you will once again report, and CMS will again print, that concurrent
>>> mode failure has happened.
>>>
>>> I am not 100% sure that I am right, by I believe that this can happen.
>>>
>>> What do you think?
>>>
>>> Thanks,
>>> Erik
>>>
>>> On 03/01/2013 06:34 PM, Kevin Walls wrote:
>>>> Hi,
>>>>
>>>> I'd like some reviews on this CMS Concurrent Mode Failure event:
>>>>
>>>> http://cr.openjdk.java.net/~kevinw/8008917/hotspot/
>>>>
>>>> The event doesn't actually carry any new information, but it is a
>>>> warning we need to capture.
>>>>
>>>> This is against hsx24, I'll prepare the same, or reviewed, changes
>>>> against very latest hotspot also.
>>>>
>>>> Thanks
>>>> Kevin
>>>>
>>>
>>
>



More information about the serviceability-dev mailing list