Boxing, still a limit of invokedynamic?

Sun May 13 13:58:09 PDT 2012

Inline...

On May 13, 2012 3:15 PM, "Jochen Theodorou" <blackdrag at gmx.org> wrote:
>
> Am 13.05.2012 19:21, schrieb Charles Oliver Nutter:
> [...]
> > You could also encode "a+b-c" as a single invokedynamic operation, but
> > I guess you're looking for a general solution...
>
> yes, I am looking for a general solution. I was thinking of making the
> whole expression as a MethodHandle combination, which then has a,b,c as
> input arguments... but that's a pretty big step to do. I don't want to
> spend months in changing the compiler just to find it doesn't give me
> the performance I am looking for. Plus this approach has its own
> problems with evaluation order and such.

Yeah, that might be a good point, though this particular case of int + int
- int has no side effects if you retrieve the values all at once before
calculating...

> I don't know what part it does, but I assume EA is right.

Yeah, if you implemented this statically you would very likely see EA
eliminate all boxing, since it is a pretty simple case. Indy is getting in
the way here.

> well.. in my example the result of tmp-c is returned, so it escapes. But
> even if I only store it in a bytecode slot... I mean I wouldn't EA
> expect to even optimize these cases.... on further thought though it
> might be possible.

Even if the tmp-c produces an Integer, it could still EA away if *that*
object doesn't escape from this compile unit. Basically, if object
construction and all possible code paths that would see an object inline
together, EA can potentially eliminate the object (except if the unlined
logic involves Indy or MH call paths...for now).

> Indeed, I was kind of assuming that. You telling me it does not makes
> some results much clearer to me. The question then is... should I wait
> for EA working across indy boundaries? And when would that be available?

Well, I suppose that is up to you. In JRuby we have worked around the JVM
in places, but so far I have not felt like it is worth the compiler
complexity to try to optimize math down to primitive speeds right now. My
goal for JRuby 1.7 is to utilize Indy as much as possible and be ready for
upcoming optimization work, rather than trying to be tricky now. The ideal
case for me is that we get Indy fully-integrated into JRuby, and then sit
back and wait for (and help) the JVM catch up.

> I may not have written that part clearly enough. We don't know that +
> and - return int. You may vagualy remember my JVM talk 2 years ago in
> which I explained how I plan to make a primitive optimization path. In
> this path the compiler will indeed assume that a+b will return an int
> and will then emit iadd instead of using static method calls or any
> other helpers. This optimized path has basically the same performance as
> Java in the best case, but it is guarded, which reduces the performance
> to half of Java speed in the best case. The problem is that prim opts
> cannot handle more complex cases and it is really easy to turn them
> off... That plus the problem of almost doubling the method bytecode make
> them a sub optimal solution. But it is one indy has to compete with.

My position is that the JVM should be doing that for us, so I am wiring up
Indy in the logical way and working with JVM guys to make that happen. I am
less concerned about short-term math perf than I am about making best
possible use of Indy.

>
> >> I am asking because I was experimenting with method signatures and for
> >> such plus and minus methods and got mixed results. I was expecting the
> >> primtive versions would achieve the best results, but that was not the
> >> case. a plus(int, int) had worse performance than a plus(int,Integer)
or
> >> plus(Integer,int) in some cases and sometimes it looked like
> >> plus(Integer,Integer) is worse, in other cases not. Well, this is
> >> causing me some problems. Why do I get such strange results? I would
> >> assume it depends on the JIT and the boxing logic it is able to
> >> recognize and not.
> >
> > What does the assembly look like?
>
> you mean the compiled code? I will try to give examples of this later.

Well, I mean the assembly :-)  if you really want to see why two pieces of
code perform differently, the assembly output for hotspot will show you the
answer.

> But if
>
> > And again remember...I don't think the JIT in u4- does anything with
> > the boxing coming out of these calls. It might do something on the
> > other side, but not across the invokedynamic call.
>
> is right, then it is no wonder, that one time this and another time that
> is faster. But I suspect it is worse. It is not only across indy calls,
> that the JIT does nothing with boxing, I assume it is even across
> MethodHandles in the same indy call. To be more exact with my suspecion,
> I expect a constant int boxed by an MethodHandle and then unboxed by
> another one in the same indy call to be slower, than just returning the
> int itself.

I would not at all be surprised if EA does nothing across MH boundaries
either. That is (among other things) what the new LambdaForm is supposed to
fix by translating method handle graphs into a form the JVM can optimize
along with surrounding code.

> If I have a+1, then the ideal plus is one that takes int,int and returns
> Integer, because that way everything can happen inside the
> invokeddynamic part. if I have a=a+1 (a being an int) then
> plus(int,int):int is probably better, but using the one from before and
> unboxing the Integer to int is not. And depending from where your
> results are coming from you get better performance by using
> plus(Integer,int) plus(int,Integer) and plus(Integer,Integer)... with
> different return types probably as well.

I'm sure there's many reasons for variability here. Seek out LogCompilation
and PrintAssembly, my son! :-)

- Charlie
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/mlvm-dev/attachments/20120513/1551b2ea/attachment.html