Boxing function types
Reinier Zwitserloot
reinier at zwitserloot.com
Fri Nov 27 03:44:59 PST 2009
Crazy idea, perhaps, but can't the hotspot compiler rather easily eliminate
an sequential boxing-unboxing, even across a method boundary?
Let's say that methodA does this:
long x = someLong();
callB(x);
and B looks like:
public void callB(Long y) {
long x = y;
// ... never touch y again
}
after inlining, the bytecode should read:
//long on stack
invokestatic java/lang/Long valueOf
//there was a method call here, but it's been inlined.
invokevirtual java/lang/Long longValue
//long on stack again.
when seeing this exact pattern with any primitive, the jit compiler can just
nix both invokes; they are each others opposite. Now, some method taking a
primitive wrapper, immediately unboxing it, and then never touching the
boxed version again is not _that_ common, but if whatever closure proposal
java ends up with does try to 'fit' closures into existing SAM types AND
will consider boxing/unboxing of type (so a #(long)int could get
auto-converted into a SAM where the SAM's method signature is
(Long)Integer), then this would happen rather a lot. Regardless of speed
reasons, this would be quite convenient. After all, if the closure proposal
does NOT support this, then this:
Collections.sort(integerList, #(int a, int b) a-b;);
wouldn't actually work; you'd have to write #(Integer a, Integer b), and I
can imagine that is somewhat confusing.
That's not exactly an easy conversion due to all the concerns stated in this
thread, but if it can be done, then something like PA's Ops file will lose
90% of the interfaces in there; you can then use an Op<Long, Long>; it would
be exactly as fast as a LongOp after JIT.
I did some testing on this topic. Right now, this optimization is quite
clearly not happening. I took a part of the ParallelArray code and rewrite
ParallelLongArray to call into an Op<Long, Long> instead of into a LongOp,
and the implementation of this Op<Long, Long> unboxed as first action, then
never touched the boxed version again. So, ParallelArray boxes at the latest
possible moment, and the 'closure' unboxes at the earliest possible moment.
Yet, in a PA operation with very little to do per loop, this takes more than
twice the time compared to the real PA, with customized 'LongOp' interface.
(tested jvm 1.6.0_15-b03-226 with hotspot 14.1-b02-92 mixed mode)
Once I changed the code to calculate a Base64 of the long per loop, the
boxing/unboxing became insignificant (less than 2% performance drop), but
that's not surprising.
--Reinier Zwitserloot
On Thu, Nov 26, 2009 at 1:43 PM, Doug Lea <dl at cs.oswego.edu> wrote:
> Neal Gafter wrote:
>
>> The value of supporting the assignment conversions in the function subtype
>> relation is that it enables one to generalize generics over primitive types.
>> That can be optimized automatically by code specialization, which
>> eliminates the boxing and unboxing at runtime.
>>
>
> It would be great if people worked on turning
> that "can be" into the combination of source-level
> compiler support, bytecode tools, class-loading
> facilities and JIT support that could perform
> specialization well enough for people to rely on.
> The Scala folks have done some of this for scalars
> using @specialized (upcoming for Scala 2.8), and
> X10 generics etc have been defined to
> support it for scalars and structs (see
> http://dist.codehaus.org/x10/documentation/languagespec/x10-200.pdf).
> And similarly but less so in C#.
>
> My main point is that there is a cluster
> of language features and underlying support
> for multicore-friendly parallel programming that
> you need to consider as a group, even if they are
> not all introduced as a group in a particular
> Java release. Including lambda-like closures,
> function types, embeddable structs/values/tuples,
> improved arrays (especially 2d dense), revising
> JLS exception rules to better deal with async
> failures, and possibly further extensions for
> non-shared-memory parallelism. All of these
> correspond to likely directions for evolving
> improved platform-level support, leveraging the
> natural advantages of JVMs over other platforms
> (dynamic compilation, high-performance GC, etc)
> when it comes to supporting parallelism.
>
> Digressing further: The situation for fine-grained
> parallelism right now is not too different than
> it was for coarser-grained concurrency in 1995:
> There was a big gap between language features
> (threads+synchronized+wait/notify) and what most
> people building concurrent middleware etc wanted to
> do. We reduced that gap in JDK5 java.util.concurrent,
> but that was a bit easier since it didn't interact
> much with language features. Some days I think the
> Java language evolution story is just too hard,
> and that it would be more profitable to focus
> on developing other JVM-hosted languages like X10 that
> have worked out a more coherent story about it.
> But every now and again people surprise me by
> suggesting that we give a serious shot at
> incremental Java language changes that might
> get us closer to these goals. I'm all for
> trying this. But...
>
>
>
>> But as I said, we may have already designed ourselves out of this option.
>>
>
> And as I said, let's not do this again by ignoring
> the nearly-inevitable follow-ons.
>
> -Doug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/closures-dev/attachments/20091127/ce0799d0/attachment.html
More information about the closures-dev
mailing list