CMS parallel initial mark; vm thread sneaking in Thread::try_lock()

Mon Jun 17 23:59:37 UTC 2013

> I also agree that adding the assert(!
>>> SafepointSynchronize::is_at_**safepoint()) (or assert(!
>>> Thread->current()->is_VM_**thread())) may be good enough and if sneaking
>>> can be turned off for a case like this, it'd be safer.
>>>
>> I think we should go for this solution, i.e. adding the assert (because
>> it would be wrong anyway to call this code during safepoint) and using
>> try_lock()/unlock().
>>
>> Try_lock() itself is the same as the current code, and unlock() too
>> (except for some additional checks that should fail fast). The code that
>> is guarded handles occasional skipping already and actually exits fast
>> if the frequency is too high, so I do not see a big advantage doing
>> custom code.
>
>
I tried a version with try_lock()/unlock() as in:

void CMSCollector::sample_eden_chunk() {
  assert(!SafepointSynchronize::is_at_safepoint(), "Should not be at a
safepoint.");
  if (CMSEdenChunksRecordAlways && _eden_chunk_array != NULL) {
    if (_eden_chunk_lock->try_lock()) {
      // Record a sample. This is the critical section. The contents
      // of the _eden_chunk_array have to be non-decreasing in the
      // address order.
      _eden_chunk_array[_eden_chunk_index] = *_top_addr;
      assert(_eden_chunk_array[_eden_chunk_index] <= *_end_addr,
             "Unexpected state of Eden");
      if (_eden_chunk_index == 0 ||
          ((_eden_chunk_array[_eden_chunk_index] >
_eden_chunk_array[_eden_chunk_index-1]) &&
           (pointer_delta(_eden_chunk_array[_eden_chunk_index],
                          _eden_chunk_array[_eden_chunk_index-1]) >=
CMSSamplingGrain))) {
        _eden_chunk_index++;  // commit sample
      }
      _eden_chunk_lock->unlock();
    }
  }
}

It appears that the VM thread can call this at a safepoint (a young gen
collection) unfortunately, and the !is_at_**safepoint() assert fails.
Here's a snippet of the stack trace:

#7  0xf64e0be4 in CMSCollector::sample_eden_chunk (this=0x709366b0)
#8  0xf652ef14 in DefNewGeneration::allocate (this=0xf5c42900,
word_size=18, is_tlab=false)
#9  0xf6654c62 in GenCollectedHeap::attempt_allocation (this=0xf5c2c8a8,
size=18, is_tlab=false, first_only=false)
#10 0xf646ec02 in GenCollectorPolicy::satisfy_failed_allocation
(this=0xf5c2c7a8, size=18, is_tlab=false)
#11 0xf6cf4880 in VM_GenCollectForAllocation::doit (this=0x6b0fe718)
#12 0xf6d1d324 in VM_Operation::evaluate (this=0x6b0fe718)
#13 0xf6d1aa3c in VMThread::evaluate_operation (this=0x6e02c800,
op=0x6b0fe718)
#14 0xf6d1b349 in VMThread::loop (this=0x6e02c800)
#15 0xf6d1b560 in VMThread::run (this=0x6e02c800)

This seems to me like a valid allocation code path.

Now, if I comment out the assert, it seems to work (though I haven't tested
it very long.) This may be good if in fact sneak won't happen with
try_lock()/unlock() only.

I haven't tried this, but another potential approach might be to give up
sampling (just return) if it's called by the VM thread at a safepoint,
though the VM thread might allocate a large object, and the evenness of the
sample distribution could suffer to some extent.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/hotspot-gc-dev/attachments/20130617/2a53ed0b/attachment.htm>