Serialization opt-in syntax (again)
Remi Forax
forax at univ-mlv.fr
Sun Sep 30 06:04:10 PDT 2012
On 09/29/2012 11:14 PM, Brian Goetz wrote:
>> >Brian, do you have data about the supplementary cost of creating Serializable lambda ?
> Sadly I do not yet, and may not for a while. But I know that there will be a nonzero footprint and capture cost, and it imposes some additional hurdles on some VM optimizations we want to do. I think that's all the information we will ahve for a while, but I think we have to go on the assumption that the cost will be enough that "make all lambdas serializable" is not a very good choice.
>
>
I've done some tests asking the VM to dump the assembly code so see the
impact of having lambdas serializable or not.
The serialization is implemented by storing the parameter of the lambda
metafactory to be able to write them in the stream and replay them when
deserializing. These parameters are the same for all lambdas created
from the same call site
(that why they are sent has bootstrap constants by the way) so you can
create an object containing all the parameters once by callsite.
First, if the lambda is constant i.e. don't capture any states, then the
lambda is created once and reused,
in that case the cost of a serializable lambda is only a memory cost,
the object containing the serialization info is allocated once and the
lambda has a supplementary field story a reference to that object.
If the lambda capture some value from the scope, because the VM does
escape analysis, there are two cases, or the escape analysis works or it
doesn't work. The escape analysis as currently implemented in hotspot
only works if the creation of the lambda and the call are in the same
inlining blob.
By example, in the following code, the escape analysis find that it's
not necessary to allocate the lambda proxy object.
private static void foo() {
int[] array = new int[10_000_000];
for(int i=0; i<array.length; i++) {
Runnable r = () -> { array[i] = i; };
r.run();
}
}
In that case, the serializable info object is allocated once and never
used so there is no supplementary cost.
And if the escape analysis fails, in the following example, the lambda
object is stored in a static field thus escape
and must be allocated.
private static Runnable r1;
private static void foo() {
int[] array = new int[10_000_000];
for(int i=0; i<array.length; i++) {
r1 = () -> { array[i] = i; };
}
}
In that case, the serialization object needs to be allocated once and
for each iteration of the loop, the proxy object is created with a
supplementary field which is assigned in one movl because the
serializable info object is considered as constant.
with serialization:
0x00007f649906dbc8: mov $0xc7cc47e8,%r10d ;
{oop('serialization/Test2$ProxySer')}
0x00007f649906dbce: mov 0xb0(%r10),%r10
0x00007f649906dbd5: mov %r10,(%rax)
0x00007f649906dbd8: movl $0xc7cc47e8,0x8(%rax) ;
{oop('serialization/Test2$ProxySer')}
0x00007f649906dbdf: mov %ebx,0xc(%rax) ;*putfield index
0x00007f649906dbe2: movl $0xeef41bc0,0x10(%rax) ;*putfield info
0x00007f649906dbe9: mov %rbp,%r10
0x00007f649906dbec: mov %r10d,0x14(%rax)
...
without:
0x00007f78cce556cc: mov $0xc7cc15b8,%r10d ;
{oop('serialization/Test2$Proxy')}
0x00007f78cce556d2: mov 0xb0(%r10),%r10
0x00007f78cce556d9: mov %r10,(%rax)
0x00007f78cce556dc: movl $0xc7cc15b8,0x8(%rax) ;
{oop('serialization/Test2$Proxy')}
0x00007f78cce5565f: mov %r12d,0x14(%rax)
0x00007f78cce55663: mov %ebx,0xc(%rax) ;*putfield index
0x00007f78cce55666: mov %rbp,%r10
0x00007f78cce55669: mov %r10d,0x10(%rax)
...
You can see the supplementary movl at 0x00007f649906dbe2.
I think that the cost of serialization is negligible because if a lambda
escape it means it will be used in a collection stream
or in Doug's parallel stuff which usually allocates a small number of
objects before doing the computation
so the supplementary cost involved by creating a serializable lambda
will be hidden by the cost of creating
these objects.
so I don't see why we have to burden users with a special syntax for
something that they can get for free.
The only concern is if someone want a lambda that is not serializable
for security reason, but I think this case
is not frequent enough to ask him to write an inner class instead.
Rémi
* by the way the last two instructions of the snippets show that the
array variable is spilled on stack
which is weird, the array is a hot variable of the loop, it should never
be spilled.
More information about the lambda-spec-experts
mailing list