Towards better serialization

Sat Jun 15 16:52:11 UTC 2019

On 6/13/19 9:24 AM, Kłeczek, Michał wrote:
> The whole premise of the proposal we are discussing is that convenience is the 
> root of all evil.

Hi Michał,

This is an inaccurate characterization of the proposal.

Also, your earlier statements,

> What's more - it does not really address security concerns! ...
> The issue here is that we try to fix security problems in the wrong place. Almost all security issues with serialization are not really caused by serialization itself but by:
> - huge classpath with all libraries accessible to each other (ie. deserialization gadgets availability in classpath)
> - running applications with no SecurityManager (starting a JVM with no SecurityManager by default was the single biggest mistake Java designers made in the past IMHO)

Those are indeed security concerns, but you've overlooked an important and 
fundamental class of security issues that are directly attributable to the way 
serialization was designed and implemented in Java.

The line of reasoning about convenience in the proposal is not that convenience 
itself is evil, but that in pursuit of convenience, the original design adopted 
extralinguistic mechanisms to achieve it. This weakens some of the fundamentals 
of the Java platform, and it has led directly to several bugs and security 
holes, several of which I've fixed personally.  Let me illustrate this with a 
couple examples.

First, consider the bug JDK-6896297 [1] which I fixed several years ago. 
Briefly, the problem is that a test failed intermittently, throwing 
ConcurrentModificationException. The class in question is thread-safe, and 
locking is applied within all method calls. How could the CME occur?

The exception occurred when another thread took a snapshot of this object 
periodically; the snapshot was performed using serialization. This object didn't 
provide a readObject() method, so the serialization mechanism "magically" 
provided one that serialized the object using direct field access. This direct 
access bypassed the locking protocol established by the rest of the class, 
causing a race condition.

The code was in place for ten years before I fixed it. During that time 
applications, were potentially exposed to corrupted snapshots. In a sense, we 
were lucky that a CME was thrown. If it weren't thrown, we might never have 
noticed the problem.

The second issue concerns a whole class of security vulnerabilities that arise 
because serialization bypasses some fundamental mechanisms of the language. I 
won't describe the vulnerabilities in detail, but I'll show this by describing 
to an old and well-known Java security bug.

As you know, String is immutable, and its methods have well-defined behavior. 
Therefore, it's possible to write secure code that relies on these 
characteristics, e.g. a String reference can be stored in a data structure 
without making a defensive copy, because Strings are immutable.

It turns out that in early versions of Java [2] it was possible to load a 
"spoof" version of java.lang.String and hand instances of the spoofed String to 
sensitive code. It's likely that this code is relying on well-known, safe 
characteristics of the "real" java.lang.String. However, the spoofed String 
class could supply different behavior for its methods or mutate itself.

This is impossible to see by inspecting the secure code. The security bug 
existed because the fundamental assumptions the secure code was making about the 
type-safety of the platform were violated by the spoof class.

What does this have to do with serialization? Brian's proposal states that 
serialization bypasses the constructors of serializable classes. Big deal, just 
use readObject(), right?

No. If you look carefully at the Java specifications, you'll see that 
constructors have a bunch of special characateristics. The sequence of steps 
that occurs when an object is created are precise and well-defined. [3] There 
are other characteristics of constructors as well (which one can find by digging 
through the JLS) such as: an object isn't finalizable until after the Object() 
constructor returns; writes to final fields in constructors happen-before reads 
that occur outside the constructor; the compiler ensures that all final fields 
of an object are definitely assigned through all paths through constructors and 
initializers; field and instance initializers are executed in a well-defined 
order; and so forth.

Deserializing an object bypasses the constructors, thus none of this applies to 
objects created via deserialization.

What are the consequences of this? Briefly, it means that it's possible to 
create objects that appear impossible to create. At least, they appear 
impossible, if you're trying to assess the security of the code by inspecting 
it. Such objects might have unknown and unexpected behaviors. Since the system's 
security and correctness depends on well-defined behaviors, it means that all 
bets are off. No matter how carefully you inspect code to try to ensure that 
it's secure, if it's handling objects that can violate Java's fundamentals, you 
can't guarantee anything.

**

THIS is the point of the proposal. Bringing serialization into the realm of 
well-defined language constructs, instead of using extralinguistic "magic" 
mechanisms, is a huge step forward in improving quality and security of Java 
programs.

s'marks

[1] https://bugs.openjdk.java.net/browse/JDK-6896297

[2] Vijay Saraswat. Java is not type-safe. 1997. Copy available at 
https://www.cis.upenn.edu/~bcpierce/courses/629/papers/Saraswat-javabug.html

[3] https://docs.oracle.com/javase/specs/jls/se12/html/jls-12.html#jls-12.5