RFR (L): 8230706: Waiting on completion of strong nmethod processing causes long pause times with G1

Tue Oct 22 01:20:33 UTC 2019

> On Oct 19, 2019, at 9:06 AM, Thomas Schatzl <thomas.schatzl at oracle.com> wrote:
> 
> Hi all,
> 
>  there is a new webrev at 
> 
> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2/ (full only,
> there is no point in providing a diff)
> 
> since I like this solution a lot as it removes a lot of additional
> post-processing.
> 
> Testing has been a bit of a headache: interference between strong and
> weak processing is extremely rare, so I had to make it pretty common by
> 
> 1) only a single thread doing strong processing
> 2) the weak processing stage has to be moved right after the root
> processing so they overlap with a lot higher probability
> 
> hs-tier 1-5 passes with and without these changes, with a noticable
> amount of overlap according to additional log messages. That change can
> be looked at at 
> http://cr.openjdk.java.net/~tschatzl/8230706/webrev.2.testing/ .
> Obviously I am not going to push this.
> 
> Surprisingly there had to be no changes to Shenandoah as it does not
> use the claim mechanism changed here, implementing something else.
> Shenandoah also passed vmTestbase/gc with these changes with no
> problem.
> 
> Below this email is a copy of Kim's suggestion about the state machine
> again for reference. I also added documentation about why and how the
> code is supposed to work.

I'm glad the new state machine worked out, and allowed the extra task
to be eliminated. Thanks for going the extra mile with the testing.
And thanks for turning my pseudo-code into something more readable. My
comments here mostly suggestions for more of that; I don't think I'd
want to have to decipher this in 6 months without some helpful
commentary. :)

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.cpp
Removed:
1829 #define NMETHOD_SENTINEL ((nmethod*)badAddress)

Yay!

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.hpp
 118   // SR -> SD: the nmethod has been processed strongly from the beginning.

I think this is just the tail of
 114   // WR -> SR -> SD: during weak processing another thread found that the nmethod
and is not what is needed here. I think what you are really looking
for here is the unclaimed -> SD case. I think the state progressions
are

unclaimed -> WR -> WD
unclaimed -> WR -> SR -> SD
unclaimed -> WR -> WD -> SD
unclaimed -> SD

The first is terminal (at WD) if the nmethod doesn't need strong
processing.

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.hpp
  95   // We store state and claim information in the _oops_do_mark_link member, using
  96   // the two LSBs for the state and the rest for linking together nmethods that
  97   // were visited.

There's no description of the upper bits in this comment.  In
particular, the self-loop to indicate end of list isn't mentioned.
Also, the specific values for the upper bits in the transitions turned
out to be important, as discussed in the pseudo-code.

So if N is the nmethod and X is the "next" value (which is N at end of
list), then the state progressions might be described as

unclaimed -> WR(N) -> WD(X)
unclaimed -> WR(N) -> SR(N) -> SD(X)
unclaimed -> WR(N) -> WD(X) -> SD(X)
unclaimed -> SD(N) -> SD(X)

(The text descriptions of the progressions seem okay.)

It also might help to indicate which thread performs each step. If C
is the claiming thread, and O is some other thread, then something
like

unclaimed (C)-> WR(N) (C)-> WD(X)
unclaimed (C)-> WR(N) (O)-> SR(N) (C)-> SD(X)
unclaimed (C)-> WR(N) (C)-> WD(X) (O)-> SD(X)
unclaimed (C)-> SD(N) (C)-> SD(X)

(Admittedly, that's pretty dense notation.)

I think the comments describing the various transition functions might
be better if they explicitly state which of the above transitions they
(attempt to) perform, e.g.

  // Attempt unclaimed -> WR(N) transition, returning true if successful.
  bool oops_do_try_claim_weak_request();

I found the existing text descriptions hard to map onto the specific
steps, even though they are (mostly?) one-to-one.  I was finding it
easier to ignore the descriptions and just use the names, though that
isn't trivial either.

------------------------------------------------------------------------------  
src/hotspot/share/code/nmethod.hpp
 160   oops_do_mark_link* oops_do_try_claim_weak_request_as_strong_request(oops_do_mark_link* next);

I think this function is misnamed; it doesn't really claim anything.
Instead it attempts to add a strong request (SR) to a weak request
(WR), and should be called oops_do_try_add_strong_request.

------------------------------------------------------------------------------  
src/hotspot/share/code/nmethod.cpp
1848 bool nmethod::oops_do_try_claim_weak_request() {
1849   assert(SafepointSynchronize::is_at_safepoint(), "only at safepoint");
1850 
1851   if (_oops_do_mark_link != NULL) {
1852     return false;
1853   }
1854   if (!Atomic::replace_if_null(mark_link(this, claim_weak_request_tag), &_oops_do_mark_link)) {
1855     return false;
1856   }
1857   oops_do_log_change("oops_do, mark weak request");
1858   return true;
1859 }

I found the various "!"s and early returns in the above made it hard
to read. I think simpler is the following. YMMV.

bool nmethod::oops_do_try_claim_weak_request() {
  assert(SafepointSynchronize::is_at_safepoint(), "only at safepoint");

  if ((_oops_do_mark_link == NULL) &&
      Atomic::replace_if_null(mark_link(this, claim_weak_request_tag), &_oops_do_mark_link)) {
    oops_do_log_change("oops_do, mark weak request");
    return true;
  }
  return false;
}

That's also more similar to the style of the other functions.

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.hpp
 130     assert(((uintptr_t)nm & 0x3) == 0, "nmethod pointer must have zero lower two LSB");

assert(is_aligned(nm, 2), ...);

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.cpp
1906   if (old_head == NULL) {
1907     old_head = this;
1908   }
...
1922   if (old_head == NULL) {
1923     old_head = this;
1924   }
...
2013   } while (cur != next);

In none of these places nor in the header comments is there any
mention of the use of self-loop to indicate the end of the list (nor
why that's being done).

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.cpp
2014   _oops_do_mark_nmethods = NULL;

Maybe move this up to immediately following
1997   nmethod* next = _oops_do_mark_nmethods;

to make it more immediately obvious that we're taking and processing
the whole list.

------------------------------------------------------------------------------
src/hotspot/share/code/nmethod.cpp
1987 void nmethod::oops_do_marking_prologue() {
...
1991   _oops_do_mark_nmethods = NULL;

That assignment ought to be a nop, and could instead be an assert.

------------------------------------------------------------------------------