RFR (M): 8159422: Very high Concurrent Mark mark stack contention

Kim Barrett kim.barrett at oracle.com
Wed Sep 7 21:20:56 UTC 2016


> On Sep 5, 2016, at 6:20 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi Erik, Kim,
> 
>   sorry for the late reply...
> 
> [snip]

>  due to time constraints I moved to the suggested use of a global lock
> to fix the ABA problem.
> 
> I did some re-runs on the applications this change is targeted for, and
> did not see a noticable regression in conjunction with JDK-8057003.
> 
> Please have a look at http://cr.openjdk.java.net/~tschatzl/8159422/webr
> ev.1/ (full) and http://cr.openjdk.java.net/~tschatzl/8159422/webrev.1/
>  (diff) which also address Mikael's concerns.
> 
> Thanks,
>   Thomas

------------------------------------------------------------------------------  

FYI, there's a 8159422/webrev.02 directory that is something else
entirely. 

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1ConcurrentMark.inline.hpp
G1CMMarkStack::iterate no longer has assert_at_safepoint?

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1ConcurrentMark.inline.hpp 
 105       fn((oop)cur->data[i]);

Unnecessary cast.

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1ConcurrentMark.inline.hpp 
  94 inline void G1CMMarkStack::iterate(Fn fn) {

Seems like this function could be const.

------------------------------------------------------------------------------   
src/share/vm/gc/g1/g1ConcurrentMark.hpp 
 155 // add an remove entries on that basis.

Probably "add *and* remove" was intended.
But I think "add *or* remove" is better.

------------------------------------------------------------------------------ 
src/share/vm/gc/g1/g1ConcurrentMark.hpp 
 202   OopChunk* remove_chunk_from_list(OopChunk* volatile* list);  bool  _should_expand;

Need new line for _should_expand declaration.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
 234 G1CMMarkStack::OopChunk* G1CMMarkStack::allocate_new_chunk() {
 235   // This dirty read is okay because we only ever increase the _hwm in parallel code.
 236   if (_hwm >= _chunk_capacity) {
 237     return NULL;
 238   }
 239 
 240   size_t cur_idx = Atomic::add(1, &_hwm) - 1;
 241   if (cur_idx >= _chunk_capacity) {
 242     return NULL;
 243   }

The test of _hwm on line 236 seems like a waste of time.  It doesn't
prevent _hwm from exceeding _chunk_capacity, since there could be
multiple threads (notionally) at line 239.

------------------------------------------------------------------------------  
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
 268   add_chunk_to_list(&_chunk_list, new_chunk);
 269   Atomic::inc(&_chunks_in_chunk_list);

 273   OopChunk* cur = remove_chunk_from_list(&_chunk_list);
...
 279   Atomic::dec(&_chunks_in_chunk_list);

It looks okay to have _chunks_in_chunk_list not precisely track the
actual length, but it might be helpful to make that more explicit
somewhere. 

Also unfortunate that separate atomic operations are needed, rather
than doing the inc/dec inside the lock.

A bit of refactoring could eliminate these concerns.  It might also
remove the rationale for padding between _chunk_list and
_chunks_in_chunk_list.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp 

The list manipulation (_chunk_list and _free_list) both use the
ParGCRareEvent_lock. Might it be worthwhile using different locks for
those two lists?

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp 
 228   bool is_empty() const { return _chunk_list == NULL && _chunks_in_chunk_list == 0; }

Given this is inherently racy, it's not clear that checking both
values is actually useful.  Is it ok to be racy?  Hopefully so, since
it's not clear how it could be avoided, other than by only calling in
certain limited contexts.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp 
 250 void G1CMMarkStack::par_push_chunk(oop* ptr_arr) {
...
 264   for (size_t i = 0; i < OopsPerChunk; i++) {
 265     new_chunk->data[i] = ptr_arr[i];
 266 }

 272 bool G1CMMarkStack::par_pop_chunk(oop* ptr_arr) {
...
 281   for (size_t i = 0; i < OopsPerChunk; i++) {
 282     ptr_arr[i] = (oop)cur->data[i];
 283   }

Either these loops should be checking for early termination because of
a NULL entry, or they should be replaced by Copy:: operations.  I
assume the rationale for not checking for early termination is that it
should be rare, so not worth the extra test.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.cpp    
1727     // The do_oop work routines of the keep_alive and drain_marking_stack
1728     // oop closures will set the has_overflown flag if we overflow the
1729     // global marking stack.
1730 
1731     assert(_global_mark_stack.is_out_of_memory() || _global_mark_stack.is_empty(),
1732             "mark stack should be empty (unless it overflowed)");

The comment seems rather dated, with it's reference to the
"has_overflown" flag.  But see the next two comments.

------------------------------------------------------------------------------
src/share/vm/gc/g1/g1ConcurrentMark.hpp  
 196   bool  _out_of_memory;

Shouldn't this be volatile?  It is both read and written by multiple
threads.  Those reads and writes are racy, but look to be so in an
okay way.

------------------------------------------------------------------------------

If we run out of mark stack, it appears that we set _out_of_memory to
true and continue with marking until it appears to be all "done", and
only then notice the failure, clean up, and start over.  That seems
odd, rather than bailing out early, though changing it should perhaps
be a different CR.

OTOH, I'm not sure, but it looks like if par_push_chunk were changed
to return a bool success flag, then we could eliminate _out_of_memory
and just use the has_overflown state.  Eliminating a perhaps redundant
racy state flag would be a good thing.

------------------------------------------------------------------------------





More information about the hotspot-gc-dev mailing list