Boxing, still a limit of invokedynamic?

Sun May 13 13:14:40 PDT 2012

Am 13.05.2012 19:21, schrieb Charles Oliver Nutter:
[...]
> You could also encode "a+b-c" as a single invokedynamic operation, but
> I guess you're looking for a general solution...

yes, I am looking for a general solution. I was thinking of making the 
whole expression as a MethodHandle combination, which then has a,b,c as 
input arguments... but that's a pretty big step to do. I don't want to 
spend months in changing the compiler just to find it doesn't give me 
the performance I am looking for. Plus this approach has its own 
problems with evaluation order and such.

[...]
> First of all...how are you expecting that JIT will see through the
> first boxing? If the return result is going to be Object, it's going
> to go into an Integer. Perhaps you are hoping for escape analysis to
> get rid of it?

I don't know what part it does, but I assume EA is right.

> If that's the case, why wouldn't the same expectation apply to the
> second call? If (a+b) returns an Integer that's immediately passed
> into (tmp-c) and both calls inline, in theory EA should have enough to
> eliminate the intermediate. If the result of (tmp-c) is never used as
> an object and never escapes, then EA should be able to get rid of that
> too.

well.. in my example the result of tmp-c is returned, so it escapes. But 
even if I only store it in a bytecode slot... I mean I wouldn't EA 
expect to even optimize these cases.... on further thought though it 
might be possible.

> Of course this is all assuming that EA will be working across indy
> boundaries in the near future. Currently, it does not.

Indeed, I was kind of assuming that. You telling me it does not makes 
some results much clearer to me. The question then is... should I wait 
for EA working across indy boundaries? And when would that be available?

[...]
> A confusing point for me: in your case, where you know they're all
> ints, how do you not know that + and - also return int? Can't you
> determine statically that this whole expression will return a
> primitive int?

I may not have written that part clearly enough. We don't know that + 
and - return int. You may vagualy remember my JVM talk 2 years ago in 
which I explained how I plan to make a primitive optimization path. In 
this path the compiler will indeed assume that a+b will return an int 
and will then emit iadd instead of using static method calls or any 
other helpers. This optimized path has basically the same performance as 
Java in the best case, but it is guarded, which reduces the performance 
to half of Java speed in the best case. The problem is that prim opts 
cannot handle more complex cases and it is really easy to turn them 
off... That plus the problem of almost doubling the method bytecode make 
them a sub optimal solution. But it is one indy has to compete with.

>> I am asking because I was experimenting with method signatures and for
>> such plus and minus methods and got mixed results. I was expecting the
>> primtive versions would achieve the best results, but that was not the
>> case. a plus(int, int) had worse performance than a plus(int,Integer) or
>> plus(Integer,int) in some cases and sometimes it looked like
>> plus(Integer,Integer) is worse, in other cases not. Well, this is
>> causing me some problems. Why do I get such strange results? I would
>> assume it depends on the JIT and the boxing logic it is able to
>> recognize and not.
>
> What does the assembly look like?

you mean the compiled code? I will try to give examples of this later. 
But if

> And again remember...I don't think the JIT in u4- does anything with
> the boxing coming out of these calls. It might do something on the
> other side, but not across the invokedynamic call.

is right, then it is no wonder, that one time this and another time that 
is faster. But I suspect it is worse. It is not only across indy calls, 
that the JIT does nothing with boxing, I assume it is even across 
MethodHandles in the same indy call. To be more exact with my suspecion, 
I expect a constant int boxed by an MethodHandle and then unboxed by 
another one in the same indy call to be slower, than just returning the 
int itself.

If I have a+1, then the ideal plus is one that takes int,int and returns 
Integer, because that way everything can happen inside the 
invokeddynamic part. if I have a=a+1 (a being an int) then 
plus(int,int):int is probably better, but using the one from before and 
unboxing the Integer to int is not. And depending from where your 
results are coming from you get better performance by using 
plus(Integer,int) plus(int,Integer) and plus(Integer,Integer)... with 
different return types probably as well.

>> One more thing I noticed is, that if I have a = b+c, with all of them
>> being int and b+c returning object, then letting the MethodHandle do the
>> conversion from Object to int is actually much worse performance wise,
>> than a cast to integer and calling valueOf. Shouldn't that be at least
>> equal, if not as fast considering that the result of b+c was first boxed
>> and then is unboxed?
>
> Perhaps doing it in the handles makes the code more opaque? Do the
> non-handle way and the handle way have exactly the same logic?

the non handle way means to calculate b+c using a handle and then unbox 
the result using a library function from Groovy... afaik. The handle way 
uses the abilities of MethodHandles to convert the Integer into an int. 
I don't know for sure what that part is doing in the end, but normally 
it shouldn't be slower.

bye Jochen