CRR: 7034139: G1: assert(Thread::current()->is_ConcurrentGC_thread()) failed: only a conc GC thread can call this (S)

Fri Apr 29 05:45:48 UTC 2011

Tony,

>> I like that you have a way of testing this, but if it is possible I 
>> would like to see if it can be made more obvious in the code what 
>> parts are "real" code and what parts are there just for testing.
>
> Basically, the fix is just the conditional calling of stsLeave() and 
> stsJoin(). The rest is instrumentation to artificially cause N 
> overflows per concurrent cycle and remark. We could try to write a 
> test to cause that condition but, IMHO, it'd be quite hard (in 
> particular, it'd be hard to cause the overflow during remark; which is 
> why we've only seen this failure only once). So, with the 
> instrumentation enabled, the overflow condition will be caused every 
> time there's a marking cycle.

Sorry for being unclear. I agree with you here. I think it would be 
difficult to write a test case that will trigger the overflows in the 
desired way. This is not what I meant when I talked about "real code" 
and "testing code". What I meat was what you are pointing out further 
down. Whether there is an acceptable way to signal already at the call 
site that a call has no effect in product builds.

I'll comment on that in a response to Ramki's email shortly.

Bengt

>
>> For example, I think I would prefer that concurrentMark.cpp had calls 
>> like:
>>
>> DEBUG_ONLY(force_overflow()->update());
>>
>> rather than having this in concurrentMark.hpp:
>>
>> void update() PRODUCT_RETURN;
>>
>> That way someone browsing the code in concurrentMark.cpp can quickly 
>> see the difference between test code and product code. I realize that 
>> this might be a matter of taste, so I'm ok with the way it is now as 
>> well.
>
> I do see the point you and Jesper (he made in his subsequent e-mail) 
> and, of course :-), we're generally not consistent in taking one 
> approach over the other in HotSpot. Having said this, this is the sort 
> of situation for which the PRODUCT_RETURN macro was introduced in the 
> first place. I personally like using PRODUCT_RETURN since it cuts down 
> the clutter. Compare:
>
> void foo() PRODUCT_RETURN;
>
> #ifndef PRODUCT
> void foo() {
>  ...
> }
> #endif
>
> foo();
>
> with:
>
> NOT_PRODUCT(void foo();)
>
> #ifndef PRODUCT
> void foo() {
>  ...
> }
> #endif
>
> NOT_PRODUCT(foo();)
>
>
> But I'd be OK with the latter for the reasons you and Jesper brought 
> up. Anyone else in group have a strong opinion one way or another?
>
> Tony
> Tony
>
>
>> Bengt
>>
>> On 2011-04-23 00:58, Tony Printezis wrote:
>>> Thanks to John Cuthbertson for looking at this. I took his advice 
>>> and I'm going to disable the forced overflow by default (by setting 
>>> the default parameter to 0), but leave the code in as it's helpful. 
>>> Latest version here:
>>>
>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.2/
>>>
>>> Tony
>>>
>>> Tony Printezis wrote:
>>>> Hi all,
>>>>
>>>> I'd still like a couple of code reviews for this. Here's the latest 
>>>> version (I only rephrased a couple of comments, so if you're 
>>>> looking at the earlier version already you can ignore this one):
>>>>
>>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.1/
>>>>
>>>> Tony
>>>>
>>>> Tony Printezis wrote:
>>>>> Hi,
>>>>>
>>>>> Could I get a couple of people to look at this? (I'd like to push 
>>>>> this this week if possible)
>>>>>
>>>>> http://cr.openjdk.java.net/~tonyp/7034139/webrev.0/
>>>>>
>>>>> The actual fix is reasonably small (leave / join the 
>>>>> SuspendibleThreadSet only if we are in concurrent mode). Most of 
>>>>> the changes are new infrastructure to cause a fixed number of 
>>>>> overflows during marking (in non-product builds of course) to 
>>>>> stress the overflow code. This was the only way I could reliably 
>>>>> reproduce the failure. This did uncover a couple of extra issues 
>>>>> which I also fixed:
>>>>>
>>>>> - If we overflow during remark we should not actually deal with it 
>>>>> during remark but we should abort the remark pause and restart a 
>>>>> concurrent mark phase. For some reason we were not doing that. I 
>>>>> fixed that (for this I had to ensure that the overflow flag is not 
>>>>> cleared when we exit the do_marking_step() method).
>>>>> - Because we were clearing the overflow, it was also possible that 
>>>>> the workers would deadlock (for that to happen a worker had to 
>>>>> finish handling one overflow and immediately raise another one, so 
>>>>> it was highly unlikely to occur in prcatice; good to find it and 
>>>>> eliminate it though).
>>>>>
>>>>> I've already tested it, I'll run more tests overnight.
>>>>>
>>>>> Tony
>>>>
>>