RFR(L): 8203197: C2: consider all paths in loop body for loop predication

Mon Jun 4 14:05:26 UTC 2018

Hi Vladimir,

> SpeculativeTrapData has only one cell for Method* which should be only 
> one cell, even in 32-bit VM. Change it to 2 cells is wrong I think.
>
> I think old assumption is invalid with your changes:
>
> // This code assumes an entry for a SpeculativeTrapData is 2 cells
>
> because you changed header size for 32 bit - it is now 2 cells.

The extra data area where the speculative traps are stored is allocated
with:

  // Add some extra DataLayout cells (at least one) to track stray traps.
  int extra_data_count = compute_extra_data_count(data_size, empty_bc_count, needs_speculative_traps);
  object_size += extra_data_count * DataLayout::compute_size_in_bytes(0);

  // Add a cell to record information about modified arguments.
  int arg_size = method->size_of_parameters();
  object_size += DataLayout::compute_size_in_bytes(arg_size+1);

So

1) extra_data_count * DataLayout::compute_size_in_bytes(0) guarantees
the size of the space for traps is a multiple of the header size so a
multiple of 2 cells on 32 bits

2) space is reserved after the traps for entries with tag
DataLayout::arg_info_data_tag. That stuff is still inside the extra data
area.

When MethodData::bci_to_extra_data() allocates a new trap in the extra
data area, it must make sure there's still space available. It checks
that there's room by finding the last trap in the area, moving to the
next entry and checking if:

- it's still within the extra data area
- it's not an arg_info_data_tag, i.e. we're not overflowing in the
argument area
- if the trap is a speculative trap it also needs to check that there's
room, not only for the header, but also for the Method*

Now if traps are either 2 (regular traps) or 3 (speculative traps)
cells, that code cannot be left as is. Let say we have 6 cells for extra
data. We allocate a regular trap and a speculative trap. Next entry is
at cell number 5 (assuming they are numbered from 0). Let's say now we
want to allocate a regular trap. We're still within the extra data area
(but don't have sufficient room), we're half way on the first arg info
but reading the tag can't be used to reliably tell us that.

Adding the padding allows us to keep the existing logic. 

> It may actually cause issues (with 32-bit VM) in other places with 
> similar assumption.
>
> An other issue is that header can't be allocated/set/load atomically in 
> 32-bit VM as MethodData::bci_to_extra_data_helper() expecting.

The header contains:

bci/tag, flags and traps

bci/tag are constant. They are set once for all when the MethodData is
built. There's an exception for traps as discussed above but then
allocation happens under a lock and once a trap is allocated the bci/tag
is constant.

So, the only things being updated are flags and traps. It's possible we
loose an update because of concurrency but that was true before and the
change doesn't make it worse. I don't see what other problem the lack of
atomicity could cause.

The method data is laid out statically in a way that, AFAICT, is quite
robust to a 2 cell header. Once it is built, updates mostly happen in
fields of the individual profile data. So I don't really see what else
could break.

> Then I don't understand your changes. I thought you move out checks (as 
> new predicates) which have profiling information which means you have an 
> other valid path branch from dominating check. Such check can't be moved 
> from loop as I remember. I don't think you can move out from loop loop 
> exit checks.
>
> Please, explain which checks you moved from loop as profiling predicates 
> without moving dominating checks.

Say we have:

while() {
  if (some_condition) { // both branches profiled taken
    if (!null_check) unc(null_check);
    if (!range_check) unc(range_check);
    ...
  }
}

The existing loop predication wouldn't consider those predicates. My
change does if there's a speculated pay off.

if (!null_check) unc(profile_predicate);
if (!range_check) unc(profile_predicate);
while() {
  if (some_condition) { // both branches profiled taken
    ...
  }
}

It's otherwise strictly identical to the existing loop predication:
predicates are moved only if all inputs are loop independent.

Now:

while() {
  if (!null_check1) unc(null_check);
  if (!range_check1) unc(range_check);
  ...
  if (some_condition) { // both branches profiled taken
    if (!null_check2) unc(null_check);
    if (!range_check2) unc(range_check);
    ...
  }
}

becomes:

if (!null_check1) unc(predicate);
if (!range_check1) unc(predicate);

if (!null_check2) unc(profile_predicate);
if (!range_check2) unc(profile_predicate);

while() {
  ...
  if (some_condition) { // both branches profiled taken
    ...
  }
}

that's because loop predication is now more speculative, so it's more
likely to fail and I don't want a failure with profile predicates to
disable predication entirely and cause a regression.

First, a pass over the loop body moves regular predicates out of the
loop and then a second pass moves other predicates as profile
predicates.

There can be dependencies between predicates. Dependencies are preserved
because predicates are kept out of the loop in the order they were moved
out of loop.

With this:

while() {
  if (!null_check1) unc(null_check);
  if (!range_check1) unc(range_check);
  ...
  if (some_condition) { // both branches profiled taken
    if (!null_check2) unc(null_check);
    if (!range_check2) unc(range_check);
    ...
  }
  if (!null_check3) unc(null_check);
  if (!range_check3) unc(range_check);
}

Let's say, the first time loop predication is applies, it finds
null_check1/range_check1 and null_check2/range_check2 to hoist but not
null_check3/range_check3 because they are not loop independent at that
point.

if (!null_check1) unc(predicate);
if (!range_check1) unc(predicate);

if (!null_check2) unc(profile_predicate);
if (!range_check2) unc(profile_predicate);

while() {
  ...
  if (some_condition) { // both branches profiled taken
    ...
  }
  if (!null_check3) unc(null_check);
  if (!range_check3) unc(range_check);
}

Now let's say a later loop predication pass finds
it can hoist null_check3/range_check3. To guarantee correct ordering of
predicates, they will be moved as profile_predicates even though they
are not in a branch:

if (!null_check1) unc(predicate);
if (!range_check1) unc(predicate);

if (!null_check2) unc(profile_predicate);
if (!range_check2) unc(profile_predicate);
if (!null_check3) unc(profile_predicate);
if (!range_check3) unc(profile_predicate);

while() {
  ...
  if (some_condition) { // both branches profiled taken
    ...
  }
}

When you say loop exit are you talking about something like?

while() {
  if (some_condition) { // both branches profiled taken
    break;
  }
  if (!null_check) unc(null_check);
  if (!range_check) unc(range_check);
  ...
}

It would be optimized as:

if (!null_check) unc(profile_predicate);
if (!range_check) unc(profile_predicate);
while() {
  if (some_condition) { // both branches profiled taken
    break;
  }
}

As long as inputs for the tests are loop independent I don't see why
that would be illegal. Worst case, they trigger a deopt and next
compilation won't apply profile predication.

Roland.