Spread problem with > 10 arguments

Fri Jul 10 02:24:35 PDT 2009

John Rose skrev:

> Implementation code, you'll see places where bytecode generation is  
> stubbed out, and that is some cause for concern.
Yes, this is an important discussion point since bytecode generation
was what we are trying to avoid by creating invokedyn et al.

> you can implement the varargs mechanisms on top of it, but  
> the code complexity is lower (I think) than putting varargs processing  
> directly into the runtime.
Ah, but generic invocation does not create Object 
arrays! I could have proposed this, since obviously 
JRockit can optimize these away. But I didn't. The 
reason is that varargs enforce a permanently added 
cost, but generic invocation does not. 

Generic invocation requires the  number of 
arguments to be the same because it will
reuse the storage locations for the arguments!

If the JVM has detected that an exact invocation call 
is not possible using the single compare that you 
have already in place. 

Then for an interpreter, the extra cost of a generic 
invocation call:

1)  where all args are references and the destination 
method takes all object references, is almost zero! 
No boxing, no casting.

2)  where all args are references and the destination method takes
non-object references, is a cast per non-object reference. This cast
can be fast-pathed into a single compare per object reference.
If a destination requires a primitive, the unboxing is simply a
mov instruction. 

3) where some args are primitive. Then you have to box the primitive
before the call. Yes, this requires an alloc of an object. 
However, if the destination method actually does something more 
complex than working with primitives (most likely), then you 
will have object allocations here as well. Therefore the 
box alloc will only be a percentage of the total 
number of allocations needed to execute the call.

>
> Here's an alternate and more likely cause for that silence:  No one  
> complains about this with closures because the people arguing about  
> closures are language and applications people, and they rarely think  
> in great detail about JVM implementation.  This conversation right  
> here and now is likely to be a cutting edge conversation about JVM  
> implications of function pointers and the like.
>
True. :-)

>
> Good point.  Counter-points:  It is risky to declare a minimum size  
> for a JVM.  Also, cycles on a small device drain power, which is the  
> real limited resource.  So those 100's of MHz are only a peak rate.   
> Running more frequent GCs or a complex JIT could warm up your handset  
> more than the system designers will permit.  As for deeply optimizing  
> background compilers (like HotSpot's), only on big power-guzzling  
> multi-core machines is there a concept of pre-paid cycles, which a JIT  
> can soak up almost "for free".
First, every JVM will compile/optimize a hotspot, even though
it will temporarily slow down the execution of the application.
This is a typical case of cost/benefit analysis. 

But as I am trying to say, the cost for generic invocation in an 
interpreter is not that high. For a chain of transforms the cost 
will only be incurred at entry into the chain and at exit. 
If the dynamic language uses object for its primitive values 
(like RubyFixnum for example) there will be no boxing at 
entry at all! If we add real fixnums to java, there will
be no boxing at all either!

I would like to throw out an open question:

For dynamic languages like Ruby, that do not have 
typed primitives and can work with arbitrarily large
numbers, when will the transform happen where these numbers
are translated from number-objects into ints when used
as method arguments?

If this can happen at static compile time of the Ruby
program into bytecodes. Then you have a weird situation
where you are able to deduce that both the caller
and the destination will never exceed the int-capacity,
but you will not know the actual destination! Since 
if you did, at static compile time, you would be
able to use a normal call instead. 

On the other hand, if this can only happen during 
runtime, then for a small interpreting-only JVM,
you will have to take into account that the Ruby 
optimizer will be interpreted! What will the
cost/benefit analysis be for this? It seems
unlikely to me that it would make sense to 
over-optimize in such an environment. Better
to work with number-objects and get almost
zero cost at the transforms using generic 
invocation.

What do you think?

//Fredrik