stack manipulation APIs

Wed Mar 26 22:34:25 PDT 2008

On Mar 26, 2008, at 4:17 PM, Lukas Stadler wrote:

> I am currently working on APIs for the multi-language VM that would
> allow Java code to access and manipulate its own stacks. The overall
> scope is pretty general, ranging from call-with-current-continuation
> (call/cc) for languages like scheme and coroutine implementations to
> dynamic recompilation for compilers (like what the JIT does, but
> for ?-to-bytecode compilers). In the end it should be possible to
> change, replace, remove, etc. stack frames or even synthesize a new
> stack from scratch. *But*: It seems almost impossible to make this
> secure - some useful higher-level APIs will be needed. How these could
> be implemented under the covers - that's what I'm thinking about  
> right now.
>
> I'm starting with the most simple use case for now - continuations.  
> Not
> really "stack manipulation", just saving/restoring stacks.
>
> Some questions/remarks that crossed my mind: (most of this is "...  
> am I
> correct assuming that:")
>
>     *  From what I've understood a call/cc can be invoked even  
> after the
>       method in which it was created has returned.

Yes.  From a low-level point of view, a copyStack operation can
return more than once.  The first time it returns, it produces a new
snapshot of (at least part of) the thread stack.  If that snapshot is
then passed to restoreStack, the thread makes a discontinuous
jump back to the state of affairs as of the corresponding copyStack
operation, and the copyStack operation returns a second time.

(By discontinuous, I mean that the control stack as of the
restoreStack call is at least partially irrelevant to the future
of the computation.  In that sense it is like a throw.)

On this second return from the same call to copyStack,
the control stack is once again in the state it was when
the snapshot was made.  For generality and convenience,
the call to returnStack should be able to specify either
a return value or a throwable with which to continue
(normally or with a throw) from the copyStack call.

(BTW, the method names and details are from some
POC code, for which I owe a blog and code review.)

So, yes, at least part of the call/cc computation can occur more
than once, because of those discontinuous restoreStack calls.

>     This could lead to all sorts of harmful behavior, like exiting a
>     monitor twice, etc. Should this be possible, or will only a
>     restricted case be implemented? There would have to be a big red
>     sign (and possibly some kind of verifier) with all the things that
>     aren't allowed in such code.

Yes.  This is probably the single biggest problem with
the low-level copyStack/restoreStack idea.

The basic idea is that if method has bracket pairs that must be
properly matched, any copyStack that copies that method's
stack frame while one or more bracket is open must ensure
that the brackets continue to be matched properly.

(By bracket pairs I mean especially monitorenter vs. monitorexit
and the entry and exit of security states in doPrivileged, etc.
They can also include anything that try/finally is used to
clean up, such as open/close of a file, or some sort of
push/pop on a thread local variable.)

Java programmers are used to putting try/finally in their
code to make sure a closing bracket gets executed.
But there's no corresponding convention to make sure
an opening bracket gets re-executed.  With continuations
the brackets are more symmetric.  Scheme's version
of try/finally has a sort of 'initially' clause which is
reliably executed before the main body is executed.
This is not the Scheme syntax, but Java-ified it might be:
	initially { x.monitorenter(); }
	try { doSomethingWithXLocked(); }
	finally { x.monitorexit(); }

Both the initially and finally clauses may be executed
more than once, but they are always properly matched.
(If they were to print  'I' and 'F', then the output would always
match the regular expression (IF)+.)

It's almost (but perhaps not quite) possible to recover the
block structure of 'synchronized' statements from bytecodes.
I think that's like the verifier thing you are talking about.
However, that does not guarantee that the code is likely
to work properly if it is re-entered even with the intended
monitorenter instructions.

If there is doubt, I think it is better to require that methods
positively declare that they are re-entry safe, and provide
the right hooks for 'initially' actions, before restoreStack
is allowed to re-enter them.

The interesting question is whether there is some large
class of methods for which a positive declaration of safety
is not needed, because there isn't doubt.  I don't know the
answer to this; I think the answer will come by looking
carefully at actual code and from experience.

>     I recently looked at the apache commons Javaflow library - they
>     implement storing the current stack state using only bytecode
>     instrumentation and a small and unintrusive runtime framework. (I
>     can write a short summary of how they're doing this if anyone is
>     interested.) I think that, as their implementation is inherently
>     safe, we could partly adopt its behaviour.

Yes, that is the sort of experience I'm hoping we can use.

>     * I think that there are two variants to consider: one-shot
>       continuations that really only qualify as nonlocal returns and
>       full-fledged continuations that are invoked many times from
>       everywhere.

There's another degree of freedom:  How deep is the stack
captured by copyStack?  (Relatedly, how many frames does
the restoreStack operation change?  There's a Hamming
distance between stack traces.)

In the use case of a coroutine-like generator, each restoreStack
will not pop any frames, and will just push a frame or two of
suspended generator state.  (I'm not saying that copyStack/restoreStack
is the best way to implement generators, but I am suggesting that
it is a good way to experiment with them.)

In the use case of an application reoptimizing itself, restoreStack
operations will be infrequent but will replace most or all stack frames.

> I'm just now starting to explore the OpenJDK code, so any
> remarks/pointers are very welcome - especially where to look for
> examples on how to deal with stack frames (I thought about the
> deoptimization code...)

Look a vframes and vframe arrays.  A vframe is a virtualized view
onto a stack frame.

> Some more use cases for the stack-manipulation that came to my mind:
>
>     * sophisticated error logs (stack traces with local variables  
> etc.)
>     * checkpointing in servers
>     * transferring threads between servers / nomadic lightweight  
> threads
>       (Second Life's agent execution model)
>     * script interpreters/compilers that can switch between  
> interpreted
>       and compiled mode, like the JVM

The Hotspot JVM switches back and forth between interpreted and compiled
mode, as it optimizes and deoptimizes code in response to dynamically  
changing
properties of the application (as measured by changes to things like  
type profiles
and the class hierarchy).  I believe future VM-based systems will  
provide
something like vframes to pluggable library-specific and application- 
specific
optimization frameworks, which will do similar tricks, based on their  
own
profiles and metrics.  Think on-the-fly profile-directed  
parallelization.
Continuations may be part of the optimization arsenal required to run
many-core systems efficiently.

It's great that you are looking at this stuff!

Best wishes,
-- John