RFR: 8313961: Enhance identification of special serialization methods
Raffaello Giulietti
raffaello.giulietti at oracle.com
Wed Aug 23 10:23:43 UTC 2023
Here's some more context about the issues that the PR attempts to fix.
Currently, serialization identifies special "magic" methods by querying
classical reflection. It does so by invoking `getDeclaredMethod(name,
parametertypes)`. This works correctly under the assumption that a
serializable class contains at most one method with that name and those
parameter types. While the assumption is valid in the Java language, it
is not necessarily valid for JVM classes.
The specification of `Class::getDeclaredMethod` makes it clear that in
case there are multiple methods with the same name and parameter types,
the one with the most specific return type is chosen, or an arbitrary
one is returned if there's no most specific one. This non-determinism
means that serialization might fail to identify the relevant method.
Even more, it might identify the relevant method on some implementations
of the JDK, or during some runs, and fail to find it in other
implementations or in other runs.
For example, suppose a serializable class (in the JVMS sense) contains
the following methods (pseudo-Java)
```
private int writeObject(ObjectOutputStream oos) {...}
private void writeObject(ObjectOutputStream oos) {...}
```
Neither method has a most specific return type. Depending on how the
methods appear in the class, the current implementation of
`Class::getDeclaredMethod` returns the first or the second method. Only
the second is relevant for serialization. When the first is returned,
the relevant method is ignored by serialization, despite being present
in the class.
The solution proposed by the PR identifies the relevant method even in
such cases. However, this might break behavioral compatibility for
classes which, for example, have multiple
`writeObject(ObjectOutputStream)` methods.
A similar problem exists for special "magic" fields, which are currently
identified by invoking `getDeclaredField(name)`. Here, however, there
are a couple of additional issues.
Firstly, the specification of `Class::getDeclaredField` completely
ignores the possible presence of multiple fields with the same name. It
says nothing about, and returns an arbitrary one of the homonymous
fields if there are more.
Secondly, even if the choice is made deterministic, it is unclear which
field to choose when there are multiple ones.
Consider a class with 4 `serialPersistentFields` fields (pseudo-Java)
```
private static final Object serialPersistentFields =
new ObjectStreamField[0];
private static final Cloneable serialPersistentFields =
new ObjectStreamField[0];
private static final Serializable serialPersistentFields =
new ObjectStreamField[0];
private static final ObjectStreamField[] serialPersistentFields =
new SubclassOfObjectStreamField[0];
```
Which one to choose? Which one is preferable?
We thus face a dilemma: the current behavior is potentially
non-deterministic, which is uncomfortable, although it does not seem a
problem in practice.
On the other hand, any solution to non-determinism can potentially break
existing classes at run-time.
Which one is less evil?
Before progressing with the PR, I'd like to hear more opinions.
Greetings
Raffaello
More information about the core-libs-dev
mailing list