RFR (S): Buffered TQ buffer breaks LIFO

Tue Jan 24 21:33:01 UTC 2017

Hi,

When doing the single-entry buffer in TQ, I missed an obvious thing: if we
bypass buffer on queue push, then the queue stops being LIFO. In other words,
current code does:

template <class E, MEMFLAGS F, unsigned int N>
inline bool BufferedOverflowTaskQueue<E, F, N>::push(E t)
{
  if (_buf_empty) {
    _elem = t;
    _buf_empty = false;
    return true;
  } else {
    return taskqueue_t::push(t); // oops, jumping over the buf
  }
}

...which means that if we push and pop (1, 2, 3); we will get (1, 3, 2), not (3,
2, 1), as expected. Among other things, this has implications for work stealing.
In case of divide-and-conquer array handling, we keep the largest task in the
buffer, while we should have pushed it out into the tail.

Fix:
 http://cr.openjdk.java.net/~shade/shenandoah/taskqueue-fix-lifo/webrev.01/

This, of course, makes array marking much faster:

100M array:
  before: 121.96 s (a =   916964 us) (n =   133)
            (lvls, us =   707031,   853516,   923828,   982422,  1154833)

   after:  79.54 s (a =   580586 us) (n =   137)
            (lvls, us =   544922,   566406,   574219,   582031,   648559)

...while keeping Tree and HashMap traversals roughly the same:

20M Tree:
  before:  59.56 s (a =   437911 us) (n =   136)
            (lvls, us =   408203,   425781,   433594,   435547,   488014)

   after:  62.18 s (a =   457238 us) (n =   136)
            (lvls, us =   425781,   445312,   451172,   460938,   509219)

20M HashMap:
  before: 139.84 s (a =  1282913 us) (n =   109)
            (lvls, us =  1152344,  1250000,  1289062,  1308594,  1363561)

   after: 137.04 s (a =  1268875 us) (n =   108)
            (lvls, us =  1171875,  1230469,  1269531,  1289062,  1327203)

Testing: hotspot_gc_shenandoah, targeted benchmarks

Thanks,
-Aleksey