Spread problem with > 10 arguments
Fredrik Öhrström
fredrik.ohrstrom at oracle.com
Fri Jul 10 02:24:35 PDT 2009
John Rose skrev:
> Implementation code, you'll see places where bytecode generation is
> stubbed out, and that is some cause for concern.
Yes, this is an important discussion point since bytecode generation
was what we are trying to avoid by creating invokedyn et al.
> you can implement the varargs mechanisms on top of it, but
> the code complexity is lower (I think) than putting varargs processing
> directly into the runtime.
Ah, but generic invocation does not create Object
arrays! I could have proposed this, since obviously
JRockit can optimize these away. But I didn't. The
reason is that varargs enforce a permanently added
cost, but generic invocation does not.
Generic invocation requires the number of
arguments to be the same because it will
reuse the storage locations for the arguments!
If the JVM has detected that an exact invocation call
is not possible using the single compare that you
have already in place.
Then for an interpreter, the extra cost of a generic
invocation call:
1) where all args are references and the destination
method takes all object references, is almost zero!
No boxing, no casting.
2) where all args are references and the destination method takes
non-object references, is a cast per non-object reference. This cast
can be fast-pathed into a single compare per object reference.
If a destination requires a primitive, the unboxing is simply a
mov instruction.
3) where some args are primitive. Then you have to box the primitive
before the call. Yes, this requires an alloc of an object.
However, if the destination method actually does something more
complex than working with primitives (most likely), then you
will have object allocations here as well. Therefore the
box alloc will only be a percentage of the total
number of allocations needed to execute the call.
>
> Here's an alternate and more likely cause for that silence: No one
> complains about this with closures because the people arguing about
> closures are language and applications people, and they rarely think
> in great detail about JVM implementation. This conversation right
> here and now is likely to be a cutting edge conversation about JVM
> implications of function pointers and the like.
>
True. :-)
>
> Good point. Counter-points: It is risky to declare a minimum size
> for a JVM. Also, cycles on a small device drain power, which is the
> real limited resource. So those 100's of MHz are only a peak rate.
> Running more frequent GCs or a complex JIT could warm up your handset
> more than the system designers will permit. As for deeply optimizing
> background compilers (like HotSpot's), only on big power-guzzling
> multi-core machines is there a concept of pre-paid cycles, which a JIT
> can soak up almost "for free".
First, every JVM will compile/optimize a hotspot, even though
it will temporarily slow down the execution of the application.
This is a typical case of cost/benefit analysis.
But as I am trying to say, the cost for generic invocation in an
interpreter is not that high. For a chain of transforms the cost
will only be incurred at entry into the chain and at exit.
If the dynamic language uses object for its primitive values
(like RubyFixnum for example) there will be no boxing at
entry at all! If we add real fixnums to java, there will
be no boxing at all either!
I would like to throw out an open question:
For dynamic languages like Ruby, that do not have
typed primitives and can work with arbitrarily large
numbers, when will the transform happen where these numbers
are translated from number-objects into ints when used
as method arguments?
If this can happen at static compile time of the Ruby
program into bytecodes. Then you have a weird situation
where you are able to deduce that both the caller
and the destination will never exceed the int-capacity,
but you will not know the actual destination! Since
if you did, at static compile time, you would be
able to use a normal call instead.
On the other hand, if this can only happen during
runtime, then for a small interpreting-only JVM,
you will have to take into account that the Ruby
optimizer will be interpreted! What will the
cost/benefit analysis be for this? It seems
unlikely to me that it would make sense to
over-optimize in such an environment. Better
to work with number-objects and get almost
zero cost at the transforms using generic
invocation.
What do you think?
//Fredrik
More information about the mlvm-dev
mailing list