[External] : Re: RFC - Improving C2 Escape Analysis

Mon Oct 4 22:20:02 UTC 2021

On Oct 4, 2021, at 2:32 PM, Cesar Soares Lucas <Divino.Cesar at microsoft.com> wrote:
> 
> ...
> I think it's safe to say that the Inlining horizon and the presence of
> control/data-flow merge points in the compilation unit are the major hurdles to
> the effectiveness of the current EA and dependent optimizations. I understand
> that Valhalla (Inline Types) will help tremendously with the inlining horizon
> aspect of the problem. How much of a problem is control-flow/data-flow merge
> points in a world where Inline Types are in place?

There’s an easy thought experiment or two here:  Suppose
a class V is known to be identity-insensitive.  This means
a physical copy of V can be recopied into another location
(on the heap or even registers or stack) with no violation
of semantics.  If there is some use u(v) of a V-value v,
we can pass the physical representation of v through
an copy operation c(v) which preserves the value but
physically rebuffers it somewhere else, so that the
new graph is u(c(v)).  We call this “splitting the use”
u(v).

Experiment 1:  At the horizon of the current task,
split all incoming and outgoing uses of type V.
Check how this transform affects your favorite
EA.  I think you suddenly don’t care about identity
escapes of those values, beyond the horizon,
because they take up new identities there.

Experiment 2: In the current task, split
all uses of type V that touch phis.  (Where
there are already copies, of the form c(v),
use the identity c(c(v)) = c(v).)  Again, check
how this affects your favorite flow-sensitive
EA.  Now I think you have decoupled questions
of identity from the control flow graph.

The limiting case is to treat values of type
V as always splitting.  And you probably don’t
ever need explicit c-nodes.  The c notation
helps to explain how V-type references
behave differently from normal references,
in classic EA algorithms.

> From my (external) point of
> view, I can  imagine that this will still be a problem.

What’s an example of such a problem, that
remains after uses are split with the c-operator?

(There’s an issue of removing needless splits,
but if the c-operator is never materialized,
then the problem reduces, I think, to tracking
“souvenirs” of last-known bufferings of given
value bundles.)

> If so, it looks to me
> that this is an issue that  once solved would benefit not only the current EA
> implementation but also Valhalla.  What do you [all] think?
>  
> You mention about the "remaining [EA] hard cases once Inline Types are in
> place". Can you please expand a little bit on that?

Oops, I left a sentence incomplete there.  It should
have been simply:

Not to make EA unnecessary, by any means,
but rather to focus EA better on some remaining
“hard cases”.

The hard cases continue to be behavior of objects
outside of the inlining horizon that might disrupt
optimistic optimizations inside the horizon.

A couple of examples:

A. We might wish to use thread-local allocation for
“confined” objects which never leave the thread,
but which need to be accessed as if they were on
the heap.  (Stack allocation is a *special case* of
thread-local allocation:  It requires an extra
assurance that the object lives only during the
lifetime of a stack frame.  BTW I think concurrent
cross-thread access to thread-local objects is
likely to be very buggy, which is why I draw
the line at confinement, not the different line
of “during the *global* lifetime of a stack frame”.)

B. We might wish to vectorize over data sources
and sinks assuming (after a check perhaps) that
the memory streams are non-overlapping with
writes we are vectorizing (a “noalias” condition).

A hard case of A. is when code might let a
mostly-confined reference escape to somewhere
another thread can see it.  There are many
ways to build a fence around this, such as
static inter-task analysis beyond the horizon
(what is called “interprocedural”, but the
JIT task not the procedure is the key unit).
Or GC barriers that detect escapes dynamically.

A hard case of B. is similar, but you might test
for different conditions, trying to prove that
distinct alias categories stay distinct beyond
the horizon.  Actually, I don’t have any good
suggestions for that, other than the usual
technique of predicating on a non-overlap
condition inside the horizon, before a loop
that needs the condition.

Even if we use only static inter-task techniques
to analyze and summarize the global access
to references (passed to and from method
calls and data structures), there is probably
always a dynamic component.  At least, if
you add a new subclass you probably have
to adjoin the summaries of its overriding
methods, if you are relying on summaries
of related methods; in the worst case you
might have to retract optimizations in
running code (as we do when we de-opt
when a devirtualization is not longer
correct).  Turning that around, there might
be cases where a reference *might* escape
but it dynamically *does not*.  (Example:
A so-far-never-taken path makes a reference
escape, where the other paths keep it confined.)
We might impose dynamic checks on methods
whose behavior we have summarized (for
the benefit of compile tasks which do not
inline them), so that the summary can be
sharper, at the cost of doing de-opt if the
sharper summary is no longer valid.

There’s a frontier here, of more and more
clever and aggressive and speculative
optimizations we can do.  I think it helps
to have them operate on fewer (larger?)
objects.

So I guess what I’m hoping for here is that if
many small objects are made identity-free,
then we can better use limited resources of
time, space, and engineering cleverness
to track the remaining objects whose access
we wish to optimize.

> Thank you again for taking the time to read the report and provide your
> perspective!

My pleasure!