RFR: 8313961: Enhance identification of special serialization methods

Raffaello Giulietti raffaello.giulietti at oracle.com
Wed Aug 23 10:23:43 UTC 2023


Here's some more context about the issues that the PR attempts to fix.

Currently, serialization identifies special "magic" methods by querying 
classical reflection. It does so by invoking `getDeclaredMethod(name, 
parametertypes)`. This works correctly under the assumption that a 
serializable class contains at most one method with that name and those 
parameter types. While the assumption is valid in the Java language, it 
is not necessarily valid for JVM classes.

The specification of `Class::getDeclaredMethod` makes it clear that in 
case there are multiple methods with the same name and parameter types, 
the one with the most specific return type is chosen, or an arbitrary 
one is returned if there's no most specific one. This non-determinism 
means that serialization might fail to identify the relevant method. 
Even more, it might identify the relevant method on some implementations 
of the JDK, or during some runs, and fail to find it in other 
implementations or in other runs.

For example, suppose a serializable class (in the JVMS sense) contains 
the following methods (pseudo-Java)
```
     private int writeObject(ObjectOutputStream oos) {...}
     private void writeObject(ObjectOutputStream oos) {...}
```
Neither method has a most specific return type. Depending on how the 
methods appear in the class, the current implementation of 
`Class::getDeclaredMethod` returns the first or the second method. Only 
the second is relevant for serialization. When the first is returned, 
the relevant method is ignored by serialization, despite being present 
in the class.

The solution proposed by the PR identifies the relevant method even in 
such cases. However, this might break behavioral compatibility for 
classes which, for example, have multiple 
`writeObject(ObjectOutputStream)` methods.



A similar problem exists for special "magic" fields, which are currently 
identified by invoking `getDeclaredField(name)`. Here, however, there 
are a couple of additional issues.

Firstly, the specification of `Class::getDeclaredField` completely 
ignores the possible presence of multiple fields with the same name. It 
says nothing about, and returns an arbitrary one of the homonymous 
fields if there are more.

Secondly, even if the choice is made deterministic, it is unclear which 
field to choose when there are multiple ones.
Consider a class with 4 `serialPersistentFields` fields (pseudo-Java)
```
     private static final Object              serialPersistentFields = 
new ObjectStreamField[0];
     private static final Cloneable           serialPersistentFields = 
new ObjectStreamField[0];
     private static final Serializable        serialPersistentFields = 
new ObjectStreamField[0];
     private static final ObjectStreamField[] serialPersistentFields = 
new SubclassOfObjectStreamField[0];
```
Which one to choose? Which one is preferable?



We thus face a dilemma: the current behavior is potentially 
non-deterministic, which is uncomfortable, although it does not seem a 
problem in practice.
On the other hand, any solution to non-determinism can potentially break 
existing classes at run-time.

Which one is less evil?
Before progressing with the PR, I'd like to hear more opinions.


Greetings
Raffaello


More information about the core-libs-dev mailing list