Aggressive unboxing of values: status update
Brian Goetz
brian.goetz at oracle.com
Tue Nov 4 14:57:14 UTC 2014
To put this in context: the question Albert is exploring here is "what
if we didn't have to pessimistically preserve the identity of boxed
primitives/values". Right now, when the VM sees an invocation to
Integer.valueOf(int), it (a) can't tell whether the call was generated
by the compiler or came from the source code, and (b) can't assume that
the caller won't make use of the identity of the resulting box. The
latter is especially frustrating as it cripples many useful
optimizations, but in reality the identity is only used rarely.
The rule for "lightweight boxes" (hboxes) that we're exploring here is:
what if it were legal to box/unbox values/hboxes at will at any time,
without affecting program semantics? Then, the identity of an hbox
would be a pure implementation detail (note that we defined the identity
of lambdas in this way too, what a coincidence!)
Another way to think of this is normalizing boxing (rather than the box
for 'int' being the hand-written class 'Integer', it is something
mechanically derived from int and all other value types in the same
way), formally declaring boxes unsuitable for identity-sensitive
operations (which likely no one will miss), and pushing said normalized
boxing out of the front-end compiler and into the VM (cue "selective
sedimentation" discussion) where the VM has more control over
representation.
Where this may lead (we don't know yet, its an experiment), among other
things, is it may reduce the invasiveness of specialization and expose
more opportunities for the VM to dynamically make decisions about
specialized classes. Currently specialization has to eagerly and
unconditionally rewrite nearly all signatures and data-movement
bytecodes before the class enters the VM; if boxing/unboxing can be made
suitably cheap, this enables a less invasive rewriting that enables
later binding to key decisions, so the VM is then in control of the
tradeoff between footprint and type specificity, which is the sort of
tradeoffs VMs are good at.
On 11/4/2014 5:11 AM, Albert Noll wrote:
> Hi all,
>
> I've been working on aggressive unboxing of values over the past couple
> of weeks. The project is in a very early state. The current prototype is
> designed around two principles: (1) implement unboxing as described
> below, (2) keep the required changes reasonably small. As a result,
> there is lots of potential for optimization. These optimizations can be
> added in later stages of the project. Aggressive unboxing aims at
> providing an efficient way to:
>
> (1) pass values as parameters to functions
> (2) return values from functions
> (3) load and store flattened data in the heap
> (4) provide insight into the implementation of value types
>
> Let's consider this simple example:
>
> value Complex {
> int x, int y;
> private Complex(int x, int y) {
> this.x = x;
> this.y = y;
> }
> public static Complex make(int x, int y) {
> return new Complex(int x, int y);
> }
> }
>
> class Test {
> void non_inlined_callee(Complex c) {
> int sum = c.x + c.y;
> System.out.println(sum);
> }
> void caller() {
> Complex c = Complex.make(1, 2);
> non_inlined_callee(c);
> }
> }
>
> Let's assume that Complex.make() and the constructor in Complex.make()
> is inlined into caller(). The compiler decides to not inline
> non_inlined_callee(). The current prototype recognizes that 'Complex' is
> a value and uses the unboxed components (x and y) to pass them to
> non_inlined_callee. I.e., the JIT compiler transforms the original call
> 'non_inlined_callee(Complex c)' to 'non_inlined_callee(Complex c, int x,
> int y)'. Why to we keep the parameter 'Complex c'? The reason is that we
> want to keep - for now - the interpreter unmodified. I.e., by keeping
> 'Complex c' in the signature, non_inlined_callee() can call native
> methods, the interpreter, or a C1 compiled method.
>
> This is not optimal, but keeps things simple for now. In a later stage
> of the project, we plan to implement a 'boxing operation'. Boxing an
> unboxed value would be done only on demand. One thing that we need to
> think about is how we deal OOMEs when implementing 'lazy boxing'.
> If non_inlined_callee() is compiled with C2, the compiled code expects
> the unboxed parameters and uses the unboxed arguments instead of the
> original reference to 'Complex c'. One benefit of using the unboxed
> parameters instead of the original object is that the JIT needs to be
> less conservative and therefore the compiled code quality is potentially
> better.
> Providing this functionality (to prototype is not yet stable) requires
> changing ~2k lines in Hotspot. Many of the affected changes are in
> 'critical' places. That's why I want to make sure that the prototype is
> reasonably stable before pushing. Current support for unboxing is
> implemented only in C2. The interpreter and C1 can remain unchanged for
> now. I have a bachelor student who is looking into a corresponding C1
> implementation.
>
> Development Plan:
> - Finish passing values as unboxed parameters for static methods
> - Return values in registers for static methods
> - Implement boxing operation
> - Pass 'this' unboxed
> - Make function calls without passing the allocated value object
>
> Best,
> Albert
More information about the valhalla-dev
mailing list