CRR: 7034139: G1: assert(Thread::current()->is_ConcurrentGC_thread()) failed: only a conc GC thread can call this (S)

Fri Apr 29 14:38:47 UTC 2011

Bengt,

I clearly misread your e-mail. Apologies for the confusion!

Tony

On 4/29/2011 1:45 AM, Bengt Rutisson wrote:
>
> Tony,
>
>>> I like that you have a way of testing this, but if it is possible I 
>>> would like to see if it can be made more obvious in the code what 
>>> parts are "real" code and what parts are there just for testing.
>>
>> Basically, the fix is just the conditional calling of stsLeave() and 
>> stsJoin(). The rest is instrumentation to artificially cause N 
>> overflows per concurrent cycle and remark. We could try to write a 
>> test to cause that condition but, IMHO, it'd be quite hard (in 
>> particular, it'd be hard to cause the overflow during remark; which 
>> is why we've only seen this failure only once). So, with the 
>> instrumentation enabled, the overflow condition will be caused every 
>> time there's a marking cycle.
>
> Sorry for being unclear. I agree with you here. I think it would be 
> difficult to write a test case that will trigger the overflows in the 
> desired way. This is not what I meant when I talked about "real code" 
> and "testing code". What I meat was what you are pointing out further 
> down. Whether there is an acceptable way to signal already at the call 
> site that a call has no effect in product builds.
>
> I'll comment on that in a response to Ramki's email shortly.
>
> Bengt
>
>
>>
>>> For example, I think I would prefer that concurrentMark.cpp had 
>>> calls like:
>>>
>>> DEBUG_ONLY(force_overflow()->update());
>>>
>>> rather than having this in concurrentMark.hpp:
>>>
>>> void update() PRODUCT_RETURN;
>>>
>>> That way someone browsing the code in concurrentMark.cpp can quickly 
>>> see the difference between test code and product code. I realize 
>>> that this might be a matter of taste, so I'm ok with the way it is 
>>> now as well.
>>
>> I do see the point you and Jesper (he made in his subsequent e-mail) 
>> and, of course :-), we're generally not consistent in taking one 
>> approach over the other in HotSpot. Having said this, this is the 
>> sort of situation for which the PRODUCT_RETURN macro was introduced 
>> in the first place. I personally like using PRODUCT_RETURN since it 
>> cuts down the clutter. Compare:
>>
>> void foo() PRODUCT_RETURN;
>>
>> #ifndef PRODUCT
>> void foo() {
>>  ...
>> }
>> #endif
>>
>> foo();
>>
>> with:
>>
>> NOT_PRODUCT(void foo();)
>>
>> #ifndef PRODUCT
>> void foo() {
>>  ...
>> }
>> #endif
>>
>> NOT_PRODUCT(foo();)
>>
>>
>> But I'd be OK with the latter for the reasons you and Jesper brought 
>> up. Anyone else in group have a strong opinion one way or another?
>>
>> Tony
>> Tony
>>
>>
>>> Bengt
>>>
>>> On 2011-04-23 00:58, Tony Printezis wrote:
>>>> Thanks to John Cuthbertson for looking at this. I took his advice 
>>>> and I'm going to disable the forced overflow by default (by setting 
>>>> the default parameter to 0), but leave the code in as it's helpful. 
>>>> Latest version here:
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.2/
>>>>
>>>> Tony
>>>>
>>>> Tony Printezis wrote:
>>>>> Hi all,
>>>>>
>>>>> I'd still like a couple of code reviews for this. Here's the 
>>>>> latest version (I only rephrased a couple of comments, so if 
>>>>> you're looking at the earlier version already you can ignore this 
>>>>> one):
>>>>>
>>>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.1/
>>>>>
>>>>> Tony
>>>>>
>>>>> Tony Printezis wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Could I get a couple of people to look at this? (I'd like to push 
>>>>>> this this week if possible)
>>>>>>
>>>>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.0/
>>>>>>
>>>>>> The actual fix is reasonably small (leave / join the 
>>>>>> SuspendibleThreadSet only if we are in concurrent mode). Most of 
>>>>>> the changes are new infrastructure to cause a fixed number of 
>>>>>> overflows during marking (in non-product builds of course) to 
>>>>>> stress the overflow code. This was the only way I could reliably 
>>>>>> reproduce the failure. This did uncover a couple of extra issues 
>>>>>> which I also fixed:
>>>>>>
>>>>>> - If we overflow during remark we should not actually deal with 
>>>>>> it during remark but we should abort the remark pause and restart 
>>>>>> a concurrent mark phase. For some reason we were not doing that. 
>>>>>> I fixed that (for this I had to ensure that the overflow flag is 
>>>>>> not cleared when we exit the do_marking_step() method).
>>>>>> - Because we were clearing the overflow, it was also possible 
>>>>>> that the workers would deadlock (for that to happen a worker had 
>>>>>> to finish handling one overflow and immediately raise another 
>>>>>> one, so it was highly unlikely to occur in prcatice; good to find 
>>>>>> it and eliminate it though).
>>>>>>
>>>>>> I've already tested it, I'll run more tests overnight.
>>>>>>
>>>>>> Tony
>>>>>
>>>
>