RFR: JDK-8061259: ParNew promotion failed is serialized on a lock

Sat Nov 8 01:41:49 UTC 2014

http://cr.openjdk.java.net/~rasbold/8061259/webrev.05/
I'll be OOO next week, so I created the up-to-date webrev.
Let me know if there is anything else that needs to be done.

On Fri, Nov 7, 2014 at 3:32 AM, Bengt Rutisson <bengt.rutisson at oracle.com>
wrote:

>
> Hi Kim and Jungwoo,
>
>
> On 2014-11-07 02:36, Jungwoo Ha wrote:
>
>
>>
>> ------------------------------------------------------------------------------
>>
>>
>> src/share/vm/gc_implementation/concurrentMarkSweep/concurrentMarkSweepGeneration.cpp
>> 271   // Support for CMSFastPromotionFailure
>> 272   reset_promotion_failed();
>>
>> Why this and not just initialize the data member in the constructor
>> initializer list?
>>
>>
>  Okay. I will fix that.
>
>
>>
>> ------------------------------------------------------------------------------
>>
>> [All of code discussed here is in par_promote().]
>>
>> The tests of CMSFastPromotionFailure && has_promotion_failed() that
>> are outside the lock have a sort of DCLP (double checked locking
>> pattern) feel to them.  The goal here is to avoid the locked region
>> and go do something else, instead of waiting for the current lock
>> holder to complete the work the checker would be doing if only they
>> could get the lock.
>>
>> I think I would like it better if the outside the lock tests were
>> moved closer to the lock entries, to emphasize that relationship, and
>> to reduce the code affected by the introduction of this set of
>> changes. That might result in the thread that gives up doing a little
>> more work before giving up.  I'm particularly thinking of
>>
>> 1358   if (CMSFastPromotionFailure && has_promotion_failed()) {
>> 1359     return NULL;
>> 1360   }
>>
>> removal of which would let execution proceed to
>>
>> 1373     if (!expand_and_ensure_spooling_space(promoInfo)) {
>> 1374       return NULL;
>>
>> and give up because the moved test caused expand_xxx to return false.
>> I don't know if the placement of the test at line 1358 rather than
>> inside expand_and_ensure_spooling_space() would be significant.  I
>> suspect not, but without measurements I don't have a lot of confidence
>> in that supposition.  But I think that change would improve the
>> understandability of the code.
>>
>> In the other case,
>>
>> 1381      if (CMSFastPromotionFailure && has_promotion_failed()) {
>> 1382        return NULL;
>> 1383      }
>> 1384      obj_ptr = expand_and_par_lab_allocate(ps, alloc_sz);
>>
>> I would definitely prefer that test be moved into
>> expand_and_par_lab_allocate().  This is the only call to that
>> function, and the first thing it does is lock the mutex and re-perform
>> the test.
>>
>
>  These are really important for performance. If we remove line 1358, the
> pause time with the given example goes up by 2x.
> I'll move 1381 to inside expand_and_par_lab_allocate().
>
>
>>
>>
>> ------------------------------------------------------------------------------
>>
>> I'm suspicious of the during-review change to make
>> _has_promotion_failed non-atomic.
>>
>> I see the argument for the interaction between set and test; the set
>> occurs in lock context so gets released on the way out of the lock,
>> while the first time each core gets a stale false value it will
>> acquire the lock and acquire the updated set value, so that future
>> tests on the same core will avoid the code in question, until the
>> reset finally happens.
>>
>> However, I think the present interaction between reset and test is not
>> safe. A test seeing a stale true value leads to an inappropriate skip
>> of the protected code, with nothing obvious to prevent that from
>> occurring "forever".
>>
>>
>> ------------------------------------------------------------------------------
>>
>
>  I am sure the code is correct under x86 as is, but please let me know
> how to fix it for other architecture.
>
>  I'll post the update after the discussion is settled.
>
>
> I don't think a thread can ever see a stale true value. We only reset
> after a GC and there is a lot of things happening, including several locks
> being taken and released, before we start a new GC. So, as far as I can
> tell this does not require any more barriers. I may be missing something
> though. If I am, can you give a more specific example, Kim?
>
> Thanks,
> Bengt
>
>
>  --Jungwoo
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20141107/bd52c8a4/attachment.htm>