Loom implementation design update

Tue Mar 20 12:01:10 UTC 2018

Hi Ron

On 13/03/18 07:26 AM, Ron Pressler wrote:
>  
> A major change in the implementation approach of Project Loom was made recently,
> and I want to update the list about it.
> 
> We've discussed various options regarding the management of continuation stacks.
> There were two major considerations: 1. where to store the stacks -- whether on
> the Java heap, as Java objects, or on the C heap, using some specialized memory
> management mechanism, and 2. whether the continuation stack layout should mirror
> the ordinary layout used by frames on thread stacks, or a different one (for
> example, using separate objects to store primitives and references). The two
> main requirements I outlined in a previous email have directed our discussion,
> namely, a. mounting/dismounting continuations must be very fast and b. we expect
> hundreds of thousands or even millions of continuations.
> 
> As far as question 1 is concerned, we reached the conclusion that, as
> continuation frames may hold references to the Java heap that would need to
> somehow be traversed by the GC, and as the number of continuations is high and
> so treating all continuation stacks as roots (as we do for thread stacks) is
> unviable, managing continuation stacks in a separate memory region would require
> what amounts to creating a new and non-trivial garbage collection mechanism, and
> so wouldn't simplify matters; relying on the existing memory management
> facilities of the Java heap would be easier.
> 
> Question 2, that of layout, turned out to be more troublesome, as all current
> and upcoming HotSpot GCs cannot easily support objects that store a reference in
> some memory slot and then, at some later time, contain a primitive at the same
> slot. The restriction on dynamic layout imposed by the GCs meant that the
> continuation stack layout would need to be substantially different from that of
> the ordinary stack layout [1].
> 
> Regardless of our preference concerning both questions, we believed that the
> requirement for fast task-switching meant we must execute continuation code
> "inside" the continuation stack, meaning, by pointing the stack pointer (which
> may have needed to be split into more than one pointer, depending on our chosen
> layout) to the continuation stack. The need for a different layout combined with
> the need to run "inside" the continuation stack would require drastically
> different machine code to be generated (by the compilers and the interpreter)
> for Java code running in a continuation. In addition, in order to avoid
> performing various GC barriers whenever such code was executing (on the Java
> heap), a rather complex handshake between the continuation and the GC would need
> to be performed on each mount or dismount.
> 
> So executing code directly on the continuation stack would prove to be quite a
> challenge, but executing code only on the thread stack and copying the
> continuation stack back and forth between the heap and the stack on each
> mount/dismount would likely be to costly, and it incurs a cost that is linear in
> the entire depth of the continuation stack, regardless of how much work is done
> while the continuation is mounted (even if no methods are pushed or popped).
> 
> As a result, we came up with a compromise solution, which we find appealing. We
> call it "lazy copy", and the idea is as follows: Execution proceeds only on
> ordinary thread stacks, and continuation frames are copied to and from the heap
> when dis/mounting, but instead of copying the entire continuation frame when
> mounting, we copy only the topmost frame (or some small batch of frames), and
> install a "return barrier" that, when the bottom-most of the copied frames is
> popped, copies over another frame from the heap. Upon dismount, those
> continuation frames that are on the thread stack (and form the top portion of
> the continuation stack) are copied over to the heap. While still based on
> copying stack frames, we only need to copy however many frames we've actually
> used.
> 
> With this solution, the question of the precise continuation stack layout
> becomes secondary, as it only affects the mount/dismount code but not any
> machine-code generation. It may also allow us to compress the stack frames and
> store them on the heap in a less wasteful way. Finally, it may be possible to
> reduce the task switching cost even further by storing some small number of
> recently-used continuations in a cache of thread stacks (that would be treated
> as ordinary thread stacks), to only be copied to the heap when evicted. The
> effectiveness of caching depends, of course, on the cache-friendliness of the
> the manner in which continuations are used -- which would need to be studied.
> 
> Ron
> 
> [1]: Unless we were to use a data structure of linked frame object, where a new
> frame object would be allocated whenever the primitive/reference slots changed.
> This would happen very often, and would entail GC pressure even when no
> application objects are explicitly allocated.
> 
> 
> 
> 
> 
> 

There is considerable interest in this project at Red Hat. Is there
a place where we can monitor the design considerations and discussion?
Also any idea of the schedule would be useful.
( A public read-only email archive would be OK. )

Chris