Serialization opt-in syntax (again)

Sun Sep 30 06:04:10 PDT 2012

On 09/29/2012 11:14 PM, Brian Goetz wrote:
>> >Brian, do you have data about the supplementary cost of creating Serializable lambda ?
> Sadly I do not  yet, and may not for a while.  But I know that there will be a nonzero footprint and capture cost, and it imposes some additional hurdles on some VM optimizations we want to do.  I think that's all the information we will ahve for a while, but I think we have to go on the assumption that the cost will be enough that "make all lambdas serializable" is not a very good choice.
>
>

I've done some tests asking the VM to dump the assembly code so see the 
impact of having lambdas serializable or not.

The serialization is implemented by storing the parameter of the lambda 
metafactory to be able to write them in the stream and replay them when 
deserializing. These parameters are the same for all lambdas created 
from the same call site
(that why they are sent has bootstrap constants by the way) so you can 
create an object containing all the parameters once by callsite.

First, if the lambda is constant i.e. don't capture any states, then the 
lambda is created once and reused,
in that case the cost of a serializable lambda is only a memory cost, 
the object containing the serialization info is allocated once and the 
lambda has a supplementary field story a reference to that object.

If the lambda capture some value from the scope, because the VM does 
escape analysis, there are two cases, or the escape analysis works or it 
doesn't work. The escape analysis as currently implemented in hotspot 
only works if the creation of the lambda and the call are in the same 
inlining blob.
By example, in the following code, the escape analysis find that it's 
not necessary to allocate the lambda proxy object.

   private static void foo()  {
     int[] array = new int[10_000_000];
     for(int i=0; i<array.length; i++) {
       Runnable r = () -> { array[i] = i; };
       r.run();
     }
   }

In that case, the serializable info object is allocated once and never 
used so there is no supplementary cost.

And if the escape analysis fails, in the following example, the lambda 
object is stored in a static field thus escape
and must be allocated.

   private static Runnable r1;
   private static void foo() {
     int[] array = new int[10_000_000];
     for(int i=0; i<array.length; i++) {
       r1 = () -> { array[i] = i; };
     }
   }

In that case, the serialization object needs to be allocated once and 
for each iteration of the loop, the proxy object is created with a 
supplementary field which is assigned in one movl because the 
serializable info object is considered as constant.

with serialization:
0x00007f649906dbc8: mov    $0xc7cc47e8,%r10d  ; 
{oop('serialization/Test2$ProxySer')}
0x00007f649906dbce: mov    0xb0(%r10),%r10
0x00007f649906dbd5: mov    %r10,(%rax)
0x00007f649906dbd8: movl   $0xc7cc47e8,0x8(%rax)  ; 
{oop('serialization/Test2$ProxySer')}
0x00007f649906dbdf: mov    %ebx,0xc(%rax)     ;*putfield index
0x00007f649906dbe2: movl   $0xeef41bc0,0x10(%rax)  ;*putfield info
0x00007f649906dbe9: mov    %rbp,%r10
0x00007f649906dbec: mov    %r10d,0x14(%rax)
...

without:
0x00007f78cce556cc: mov    $0xc7cc15b8,%r10d  ; 
{oop('serialization/Test2$Proxy')}
0x00007f78cce556d2: mov    0xb0(%r10),%r10
0x00007f78cce556d9: mov    %r10,(%rax)
0x00007f78cce556dc: movl   $0xc7cc15b8,0x8(%rax)  ; 
{oop('serialization/Test2$Proxy')}
0x00007f78cce5565f: mov    %r12d,0x14(%rax)
0x00007f78cce55663: mov    %ebx,0xc(%rax)     ;*putfield index
0x00007f78cce55666: mov    %rbp,%r10
0x00007f78cce55669: mov    %r10d,0x10(%rax)
...

You can see the supplementary movl at 0x00007f649906dbe2.

I think that the cost of serialization is negligible because if a lambda 
escape it means it will be used in a collection stream
or in Doug's parallel stuff which usually allocates a small number of 
objects before doing the computation
so the supplementary cost involved by creating a serializable lambda 
will be hidden by the cost of creating
these objects.

so I don't see why we have to burden users with a special syntax for 
something that they can get for free.
The only concern is if someone want a lambda that is not serializable 
for security reason, but I think this case
is not frequent enough to ask him to write an inner class instead.

Rémi
* by the way the last two instructions of the snippets show that the 
array variable is spilled on stack
which is weird, the array is a hot variable of the loop, it should never 
be spilled.