Towards better serialization

Sun Jun 16 09:33:44 UTC 2019

Hi Stuart,

Thank you very much for such a throughout response.
Please, find my comments inline.

On 15/06/2019 18:52:11, "Stuart Marks" <stuart.marks at oracle.com> wrote:

>On 6/13/19 9:24 AM, Kłeczek, Michał wrote:
>>The whole premise of the proposal we are discussing is that convenience is the root of all evil.
>
>Hi Michał,
>
>This is an inaccurate characterization of the proposal.
>
>Also, your earlier statements,
>
>>What's more - it does not really address security concerns! ...
>>The issue here is that we try to fix security problems in the wrong place. Almost all security issues with serialization are not really caused by serialization itself but by:
>>- huge classpath with all libraries accessible to each other (ie. deserialization gadgets availability in classpath)
>>- running applications with no SecurityManager (starting a JVM with no SecurityManager by default was the single biggest mistake Java designers made in the past IMHO)
>
>Those are indeed security concerns, but you've overlooked an important and fundamental class of security issues that are directly attributable to the way serialization was designed and implemented in Java.
>
>The line of reasoning about convenience in the proposal is not that convenience itself is evil, but that in pursuit of convenience, the original design adopted extralinguistic mechanisms to achieve it. This weakens some of the fundamentals of the Java platform, and it has led directly to several bugs and security holes, several of which I've fixed personally.  Let me illustrate this with a couple examples.

Here I would like to share some more general remarks describing maybe 
the difference in sentiment between the Java implementor (you) and a 
user from the field (me).
Java from the very beginning has been built on a _pragmatic_ approach to 
language and runtime design - providing quite a lot of "magic" (ie. 
extralinguistic mechanisms), garbage collector being the most prominent 
example (and this is not supposed to be demagogy - see below why).
As a user I don't care if a certain feature is a library feature or 
requires language/VM runtime support. I care about the utility and ease 
of use of the feature.

I understand some features might be difficult to implement. On the other 
hand I don't think the right solution is to (silently? - sorry for 
demagogy :) ) push the responsibility of implementing it to the users. 
The point is that transparent/declarative serialization is _very_ useful 
- not unlike the GC. (Maybe the definition of what actually constitutes 
"serialization" - ie. the requirements - is indeed something that has to 
be stated clearly).

It is actually _this_ sentiment that has driven me to comment on the 
proposal.

>
>
>First, consider the bug JDK-6896297 [1] which I fixed several years ago. Briefly, the problem is that a test failed intermittently, throwing ConcurrentModificationException. The class in question is thread-safe, and locking is applied within all method calls. How could the CME occur?
>
>The exception occurred when another thread took a snapshot of this object periodically; the snapshot was performed using serialization. This object didn't provide a readObject() method, so the serialization mechanism "magically" provided one that serialized the object using direct field access. This direct access bypassed the locking protocol established by the rest of the class, causing a race condition.
>
>The code was in place for ten years before I fixed it. During that time applications, were potentially exposed to corrupted snapshots. In a sense, we were lucky that a CME was thrown. If it weren't thrown, we might never have noticed the problem.

And this example - similar to the example with ServerConnection from the 
paper - actually illustrates my points really well:

1. The bug is clearly caused by *abusing* serialization to create 
snapshots of object state in the situation when the target object did 
not provide facilities to do it. But serialization does not provide any 
guarantees of providing atomic snapshots of object state. The number of 
bugs similar to https://bugs.openjdk.java.net/browse/JDK-4261109 is the 
evidence.
This is just yet another concurrency bug related to mutable state - it 
would be the same if the snapshot was taken using Object.clone() or 
@Deserializer marked method because...

2. What's more important - the proposal does not address it at all. The 
reason being: it can't!
Taking atomic snapshot of the state of object graph is not possible 
without global coordination - and to make it transparent (which I 
personally would find *very* useful) even more "magic" extralinguistic 
mechanisms would be needed - I guess something similar to what any 
concurrent GC is doing (this is why GC example above is not 
demagogical).

>
>
>The second issue concerns a whole class of security vulnerabilities that arise because serialization bypasses some fundamental mechanisms of the language. I won't describe the vulnerabilities in detail, but I'll show this by describing to an old and well-known Java security bug.

I think without discussing security vulnerabilities case by case, 
finding the root cause of each and synthesizing the knowledge - we are 
destined to base the solution on some general but not-necessary-true 
assumptions.

>
>
>As you know, String is immutable, and its methods have well-defined behavior. Therefore, it's possible to write secure code that relies on these characteristics, e.g. a String reference can be stored in a data structure without making a defensive copy, because Strings are immutable.
>
>It turns out that in early versions of Java [2] it was possible to load a "spoof" version of java.lang.String and hand instances of the spoofed String to sensitive code. It's likely that this code is relying on well-known, safe characteristics of the "real" java.lang.String. However, the spoofed String class could supply different behavior for its methods or mutate itself.
>
>This is impossible to see by inspecting the secure code. The security bug existed because the fundamental assumptions the secure code was making about the type-safety of the platform were violated by the spoof class.

I am not sure how this relates to serialization and bypassing 
constructors and - what's worse - I am afraid without SecurityManager 
and proper security policy it is still possible to spoof any part of the 
runtime by simply overwriting files in the filesystem.

>
>
>What does this have to do with serialization? Brian's proposal states that serialization bypasses the constructors of serializable classes. Big deal, just use readObject(), right?
>
>No. If you look carefully at the Java specifications, you'll see that constructors have a bunch of special characateristics. The sequence of steps that occurs when an object is created are precise and well-defined. [3] There are other characteristics of constructors as well (which one can find by digging through the JLS) such as: an object isn't finalizable until after the Object() constructor returns; writes to final fields in constructors happen-before reads that occur outside the constructor; the compiler ensures that all final fields of an object are definitely assigned through all paths through constructors and initializers; field and instance initializers are executed in a well-defined order; and so forth.
>
>Deserializing an object bypasses the constructors, thus none of this applies to objects created via deserialization.
>
>What are the consequences of this? Briefly, it means that it's possible to create objects that appear impossible to create. At least, they appear impossible, if you're trying to assess the security of the code by inspecting it. Such objects might have unknown and unexpected behaviors. Since the system's security and correctness depends on well-defined behaviors, it means that all bets are off. No matter how carefully you inspect code to try to ensure that it's secure, if it's handling objects that can violate Java's fundamentals, you can't guarantee anything.

There are two aspects of this:

1. There is a question of why deserialization needs its own rules of 
object instantiation and bypassing constructors. The answer is actually 
quite obvious I guess: it is the *requirement* to support deserializing 
cycles in object graphs.
So the proposal "solves" the problem of bypassing constructors by 
dropping one of the basic requirements for serialization. To be honest 
it looks a little like the tail waging the dog.

2. Maybe I have missed it but I haven't seen any real discussion about 
the available solution space and any proof that dropping this 
requirement is necessary.
I can imagine (forgive me my naiveness :) ) something along the lines 
of:

// Deserializer is actually very similar to current ObjectInputStream
// the difference is that it is _magic_ - can initialize final members
// and javac is aware of this
class java.lang.Deserializer {
void deserialize(Object instance) throws IOException;
}

class Other implements Serializable {

final MySerializable serializableFinalMember;

Other(Deserializer deserializer) throws IOException {
// the compiler does not complain about
// uninitialized final members here
deserializer.deserialize(this);
}

}

class MySerializable implements Serializable {

final Other serializableFinalMember;

// the below constructor is synthesized by compiler/runtime
// the deserializer is free to create an instance of Other and
// initialize its members with this object reference

MySerializable(Deserializer deserializer) throws IOException {
super(deserializer);
//or in case non-serializable superclass
super();

//here magic happens
deserializer.deserialize(this);
}

}

>
>
>**
>
>THIS is the point of the proposal. Bringing serialization into the realm of well-defined language constructs, instead of using extralinguistic "magic" mechanisms, is a huge step forward in improving quality and security of Java programs.

The reason why I have started this discussion is that I see it the other 
way around: the proposal instead of providing _solutions_ pushes the 
problems onto users (ie. developers).
If there is one thing I can be sure based on my experience it is: the 
ad-hoc solutions that field developers will come up with are going to be 
much worse than what you guys can implement :)

>
>s'marks

--
Michal