Towards better serialization

Sun Jun 30 21:14:08 UTC 2019

The idea is to introduce a factory interface, which would not be 
implemented by the serializable classes, so I don't think interface 
inheritance concerns apply in this case.
The SerializedForm record I mentioned, is a generic record that is used 
by the factory interface, so there wouldn't be something like 
`FooSerializedForm`.
Assuming the JDK would provide a default implementation of 
SerializationFactory, I envision something like:

```
module foo {
     provides SerializationFactory with FooSerializationFactory;
}

public class FooSerializationFactory extends DefaultSerializationFactory {}
```

with DefaultSerializationFactory something like:

```
public class DefaultSerializationFactory implements SerializationFactory {
     public <T> Function<? super T, SerializedForm> serializer(Class<T> 
type) {
         var method = // find the @Serializer-annotated method
         var version = // extract the version from it
         return t -> new SerializedForm(version, (Object[]) 
method.invoke(t));
     }

     public <T> Function<SerializedForm, ? extends T> 
deserializer(Class<T> type) {
         Map<Integer, Method> methods = // find all 
@Deserializer-annotated methods, keyed by version
         return sf -> methods.get(sf.version()).invoke(null, 
sf.components());
     }

}
```

Kind regards,
Anthony

On 25/06/2019 18:07, Brian Goetz wrote:
> There’s a few layers here.  The high-order bit of the approach outlined in the paper is: make the back-door API explicit, so that it can be seen and therefore understood.  Once we accept this model — where decomposition and reconstruction are part of the classes API, then we have a number of choices for how to expose these members in the source code.  The approach you outline is basically saying ‘we have (or will soon have) records, which are basically nominal tuples, so why not use those?”  And that’s an entirely valid direction to explore.
>
> Using an interface has the problem that interfaces are inherited; that means that a serializable superclass imposes serialization requirements on its subclasses.  That’s not necessarily a good thing, but let’s set that aside.
>
> Using records as our carrier for serialization state has some pros and cons.  On the one hand, we can centralize all the serialization-related code in the carrier class:
>
>      class Foo {
>          record FooSerializedForm(int a, int b) {
>              // serialization members
>          }
>
>          FooSerializedForm serialize() { … }
>      }
>
> This is nice from a code organization perspective, as we can put all the serialization-related members together (and we can use inline records eventually to eliminate the extra allocation.). On the other hand, it means we give up on the ability to share members between the front-door and back-door APIs — we can’t just take an existing constructor and tag it as “also use this for deserialization.”
>
> So, I think this goes in the “details to be fleshed out” bucket — because this is mostly about how we expose out the serialization behaviors into the source code.
>
>> On Jun 22, 2019, at 9:46 AM, Anthony Vanelverdinghe <anthonyv.be at outlook.com> wrote:
>>
>> How about introducing an explicit interface and using the module
>> system's services mechanism?
>> This is an obvious "Why not just ..."-kind of question (i.e. I'm sure it
>> has been considered already), so I assume either the idea was dismissed,
>> or it was put on the list of "details to be fleshed out". However, I
>> can't readily see any flaws in it, and since it allows to do without
>> additional encapsulation relaxation, I don't consider it a detail either.
>>
>> For example, java.base could contain something like:
>> ```
>> public record SerializedForm(int version, Object... components) {}
>>
>> public interface SerializationFactory {
>>      <T> Function<? super T, SerializedForm> serializer(Class<T> type);
>>      <T> Function<SerializedForm, ? extends T> deserializer(Class<T> type);
>> }
>> ```
>>
>> And application modules would `provides SerializationFactory with ...`,
>> while serialization frameworks would `uses SerializationFactory`.
>> Application modules would be free to implement the interface as they see
>> fit, but `java.base` could provide a default implementation which would
>> work as detailed in the draft (i.e. using reflection and the
>> @Serializer/@Deserializer annotations).
>> Serialization frameworks would then solely rely on `ServiceLoader` to do
>> their work.
>>
>> One issue is that I was unable to implement `SerializationFactory` as
>> given above, due to the generics. But I assume pattern matching would
>> make this feasible, wouldn't it?
>>
>> Thanks in advance for any insights,
>> Anthony
>>
>> On 17/06/2019 01:14, Brian Goetz wrote:
>>>> While I agree that the proposed usage of `open` would be a natural
>>>> extension of the existing concept, I was always under the impression
>>>> that `open` was intended as a “temporary" migration aid. That
>>>> frameworks were supposed to move away from reflection and adopt
>>>> solutions based on `MethodHandles.Lookup` instead. So I'm surprised
>>>> to see the use of reflection promoted now.
>>> Two reactions:
>>>
>>>   - First, don't take the paper as something written in stone; it is a
>>> first draft.  (A first draft that reflects hundreds of hours of
>>> analysis (if not more) by multiple people over multiple years, but a
>>> first draft nonetheless.  This is just the first version that reached
>>> a level of "doesn't suck" sufficient that it was good enough to share
>>> publicly, but I have no illusions that this is in any sense "done" --
>>> it's more that we have finally arrived at the starting line.)
>>> Accordingly, the specifics of how fine-grained the relaxation
>>> mechanism, are a stake in the ground -- the high order bit here is
>>> "there should be some way to identify individual methods as having
>>> different dynamic accessibility as static accessibility, allowing
>>> private methods in private classes in non-exported packages to still
>>> somehow be callable dynamically -- based on an explicit indication in
>>> the source."  The exact details are to be determined. Similarly,
>>> whether exposed via classic reflection vs Lookup is a detail to be
>>> determined.
>>>
>>>   - I think you may have over-rotated towards the "reflection is dead"
>>> meme.  Yes, Lookup is "better" because it is explicit, and allows the
>>> access checks to be done at lookup time rather than on each
>>> invocation.  But, reflection does things that Lookup does not (or at
>>> least, not yet); you can't iterate over the methods of a class via a
>>> Lookup, let alone interrogate them for their annotations, or query
>>> their Signature attributes, or any number of other things frameworks
>>> like to do.  So it is likely that frameworks will be using reflection
>>> for quite a while, and that's OK.
>>>
>>>> An advantage of open packages, is that they are able to specify whom
>>>> they're exporting to. So I can say: `opens foo.ser to
>>>> some.ser.framework` and `opens foo.cdi to some.cdi.framework`.
>>> There's a pretty broad spectrum of granularity possible here.  On the
>>> one extreme, you could just say "we don't need open methods, we have
>>> open packages -- if you want to serialize, open the package." On the
>>> other extreme, you could say that open methods are way too coarse
>>> grained; they tar serialization frameworks and dependency injection
>>> frameworks and mocking frameworks with the same brush. And there's a
>>> lot in the middle.
>>>
>>> There's also a danger that the search for more accurate permission
>>> granularity becomes a rathole; for example, the security manager
>>> permissions model is quite fine-grained, but in reality people rarely
>>> use that mechanism to tailor just the right security policy -- it's
>>> too hard, too fussy, too much work, too hard to keep it in sync with
>>> what the code actually needs.  So while we might over time attempt to
>>> put more structure on "back door APIs", this is probably a good
>>> starting position.
>>>
>>> Further, putting "opens X to Y" in the source code may actually
>>> require us to name Y before we actually know it; when you're writing a
>>> library class, do you really know which serialization frameworks the
>>> application into which your library is incorporated will be using?
>>> This seems more an issue for application assembly time -- Y is often
>>> only known when the entire application is put together -- than of
>>> component development time.  But if the `opens` clause is in the
>>> source file, we only know what is known at component development time.
>>>
>>> So, yes, there are likely to be more mechanisms to model accessibility
>>> on the backdoor API (including, probably, the ability to say things
>>> like "I know there are methods in module X that are open, but I still
>>> want them encapsulated in MY application") -- but I think its
>>> premature at this stage to try to design them now.
>>>
>>>
>>> Cheers,
>>> -Brian
>>>
>>>
>>>