Data Oriented Programming, Beyond Records

Brian Goetz brian.goetz at oracle.com
Tue Jan 13 21:52:47 UTC 2026


Here's a snapshot of where my head is at with respect to extending the 
record goodies (including pattern matching) to a broader range of 
classes, deconstructors for classes and interfaces, and compatible 
evolution of records. Hopefully this will unblock quite a few things.

As usual, let's discuss concepts and directions rather than syntax.



# Data-oriented Programming for Java: Beyond records

Everyone loves records; they allow us to create shallowly immutable data 
holder
classes -- which we can think of as "nominal tuples" -- derived from a 
concise
state description, and to destructure records through pattern matching.  But
records have strict constraints, and not all data holder classes fit 
into the
restrictions of records.  Maybe they have some mutable state, or derived or
cached state that is not part of the state description, or their 
representation
and their API do not match up exactly, or they need to break up their state
across a hierarchy.  In these classes, even though they may also be “data
holders”, the user experience is like falling off a cliff.  Even a small
deviation from the record ideal means one has to go back to a blank 
slate and
write explicit constructor declarations, accessor method declarations, and
Object method implementations -- and give up on destructuring through 
pattern
matching.

Since the start of the design process for records, we’ve kept in mind 
the goal
of enabling a broader range of classes to gain access to the "record 
goodies":
reduced declaration burden, participating in destructuring, and soon,
[reconstruction](https://openjdk.org/jeps/468). During the design of 
records, we
also explored a number of weaker semantic models that would allow for 
greater
flexibility. While at the time they all failed to live up to the goals _for
records_, there is a weaker set of semantic constraints we can impose that
allows for more flexibility and still enables the features we want, 
along with
some degree of syntactic concision that is commensurate with the 
distance from
the record-ideal, without fall-off-the-cliff behaviors.

Records, sealed classes, and destructuring with record patterns 
constitute the
first feature arc of "data-oriented programming" for Java.  After 
considering
numerous design ideas, we're now ready to move forward with the next "data
oriented programming" feature arc: _carrier classes_ (and interfaces.)

## Beyond record patterns

Record patterns allow a record instance to be destructured into its 
components.
Record patterns can be used in `instanceof` and `switch`, and when a record
pattern is also exhaustive, will be usable in the upcoming [_pattern 
assignment
statement_](https://mail.openjdk.org/pipermail/amber-spec-experts/2026-January/004306.html) 
feature.

In exploring the question "how will classes be able to participate in 
the same
sort of destructuring as records", we had initially focused on a new form of
declaration in a class -- a "deconstructor" -- that operated as a 
constructor in
reverse. Just as a constructor takes component values and produces an 
aggregate
instance, a deconstructor would take an aggregate instance and recover its
component values.

But as this exploration played out, the more interesting question turned 
out to
be: which classes are suitable for destructuring in the first place? And the
answer to that question led us to a different approach for expressing
deconstruction.  The classes that are suitable for destructuring are 
those that,
like records, are little more than carriers for a specific tuple of 
data. This
is not just a thing that a class _has_, like a constructor or method, but
something a class _is_.  And as such, it makes more sense to describe
deconstruction as a top-level property of a class.  This, in turn, leads 
to a
number of simplifications.

## The power of the state description

Records are a semantic feature; they are only incidentally concise.  But 
they
_are_ concise; when we declare a record

     record Point(int x, int y) { ... }

we automatically get a sensible API (canonical constructor, deconstruction
pattern, accessor methods for each component) and implementation (fields,
constructor, accessor methods, Object methods.)  We can explicitly 
specify most
of these (except the fields) if we like, but most of the time we don't 
have to,
because the default is exactly what we want.

A record is a shallowly-immutable, final class whose API and 
representation are
_completely defined_ by its _state description_.  (The slogan for records is
"the state, the whole state, and nothing but the state.")  The state 
description
is the ordered list of _record components_ declared in the record's 
header.  A
component is more than a mere field or accessor method; it is an API 
element on
its own, describing a state element that instances of the class have.

The state description of a record has several desirable properties:

  - The components in the order specified, are the _canonical_ 
description of the
    record's state.
  - The components are the _complete_ description of the record’s state.
  - The components are _nominal_; their names are a committed part of the
    record's API.

Records derive their benefits from making two commitments:

  - The _external_ commitment that the data-access API of a record 
(constructor,
    deconstruction pattern, and component accessor methods) is defined 
by the
    state description.
  - The _internal_ commitments that the _representation_ of the record (its
    fields) is also completely defined by the state description.

These semantic properties are what enable us to derive almost everything 
about
records.  We can derive the API of the canonical constructor because the 
state
description is canonical.  We can derive the API for the component accessor
methods because the state description is nominal.  And we can derive a
deconstruction pattern from the accessor methods because the state 
description
is complete (along with sensible implementations for the state-related 
`Object`
methods.)

The internal commitment that the state description is also the 
representation
allows us to completely derive the rest of the implementation. Records get a
(private, final) field for each component, but more importantly, there is a
clear mapping between these fields and their corresponding components, 
which is
what allows us to derive the canonical constructor and accessor method
implementations.

Records can additionally declare a _compact constructor_ that allows us 
to elide
the boilerplate aspects of record constructors -- the argument list and 
field
assignments -- and just specify the code that is _not_ mechanically 
derivable.
This is more concise, less error-prone, and easier to read:

     record Rational(int num, int denom) {
         Rational {
             if (denom == 0)
                 throw new IllegalArgumentException("denominator cannot 
be zero");
         }
     }

is shorthand for the more explicit

     record Rational(int num, int denom) {
         Rational(int num, int denom) {
             if (denom == 0)
                 throw new IllegalArgumentException("denominator cannot 
be zero");
             this.num = num;
             this.denom = denom;
         }
     }

While compact constructors are pleasantly concise, the more important 
benefit is
that by eliminating the mechanically derivable code, the "more 
interesting" code
comes to the fore.

Looking ahead, the state description is a gift that keeps on giving.  These
semantic commitments are enablers for a number of potential future 
language and
library features for managing object lifecycle, such as:

  - [Reconstruction](https://openjdk.org/jeps/468) of record instances, 
allowing
    the appearance of controlled mutation of record state.
  - Automatic marshalling and unmarshalling of record instances.
  - Instantiating or destructuring record instances identifying components
    nominally rather than positionally.

### Reconstruction

JEP 468 proposes a mechanism by which a new record instance can be 
derived from
an existing one using syntax that is evocative of direct mutation, via a 
`with`
expression:

     record Complex(double re, double im) { }
     Complex c = ...
     Complex cConjugate = c with { im = -im; };

The block on the right side of `with` can contain any Java statements, 
not just
assignments.  It is enhanced with mutable variables (_component 
variables_) for
each component of the record, initialized to the value of that component 
in the
record instance on the left, the block is executed, and a new record 
instance is
created whose component values are the ending values of the component 
variables.

A reconstruction expression implicitly destructures the record instance 
using
the canonical deconstruction pattern, executes the block in a scope enhanced
with the component variables, and then creates a new record using the 
canonical
constructor.  Invariant checking is centralized in the canonical 
constructor, so
if the new state is not valid, the reconstruction will fail.  JEP 468 
has been
"on hold" for a while, primarily because we were waiting for sufficient
confidence that there was a path to extending it to suitable classes before
committing to it for records.  The ideal path would be for those classes 
to also
support a notion of canonical constructor and deconstruction pattern.

Careful readers will note a similarity between the transformation block of a
`with` expression and the body of a compact constructor.  In both cases, the
block is "preloaded" with a set of component variables, initialized to 
suitable
starting values, the block can mutate those variables as desired, and upon
normal completion of the block, those variables are passed to a canonical
constructor to produce the final result.  The main difference is where the
starting values come from; for a compact constructor, it is from the 
constructor
parameters, and for a reconstruction expression, it is from the canonical
deconstruction pattern of the source record to the left of `with`.

### Breaking down the cliff

Records make a strong semantic commitment to derive both their API and
representation from the state description, and in return get a lot of 
help from
the language.  We can now turn our attention to smoothing out "the cliff" --
identifying weaker semantic commitments that classes can make that would 
still
allow classes to get _some_ help from the language.  And ideally, the 
amount of
help you give up would be proportional to the degree of deviation from the
record ideal.

With records, we got a lot of mileage out of having a complete, canonical,
nominal state description.  Where the record contract is sometimes too
constraining is the _implementation_ contract that the representation aligns
exactly with the state description, that the class is final, that the 
fields are
final, and that the class may not extend anything but `Record`.

Our path here takes one step back and one step forward: keeping the external
commitment to the state description, but dropping the internal 
commitment that
the state description _is_ the representation -- and then _adding back_ 
a simple
mechanism for mapping fields representing components back to their 
corresponding
components, where practical.  (With records, because we derive the
representation from the state description, this mapping can be safely 
inferred.)

As a thought experiment, imagine a class that makes the external 
commitment to a
state description -- that the state description is a complete, canonical,
nominal description of its state -- but is on its own to provide its
representation.  What can we do for such a class?  Quite a bit, 
actually.  For
all the same reasons we can for records, we can derive the API 
requirement for a
canonical constructor and component accessor methods.  From there, we 
can derive
both the requirement for a canonical deconstruction pattern, and also the
implementation of the deconstruction pattern (as it is implemented in 
terms of
the accessor methods). And since the state description is complete, we can
further derive sensible default implementations of the Object methods 
`equals`,
`hashCode`, and `toString` in terms of the accessor methods as well. And 
given
that there is a canonical constructor and deconstruction pattern, it can 
also
participate in reconstruction.  The author would just have to provide the
fields, accessor methods, and canonical constructor.  This is good 
progress, but
we'd like to do better.

What enables us to derive the rest of the implementation for records 
(fields,
constructor, accessor methods, and Object methods) is the knowledge of 
how the
representation maps to the state description.  Records commit to their state
description _being_ the representation, so is is a short leap from there 
to a
complete implementation.

To make this more concrete, let's look at a typical "almost record" class, a
carrier for the state description `(int x, int y, Optional<String> s)` 
but which
has made the representation choice to internally store `s` as a nullable
`String`.

```
class AlmostRecord {
     private final int x;
     private final int y;
     private final String s;                                 // *

     public AlmostRecord(int x, int y, Optional<String> s) {
         this.x = x;
         this.y = y;
         this.s = s.orElse(null);                            // *
     }

     public int x() { return x; }
     public int y() { return y; }
     public Optional<String> s() {
         return Optional.ofNullable(s);                      // *
     }

     public boolean equals(Object other) { ... }     // derived from 
x(), y(), s()
     public int hashCode() { ... }                   //    "
     public String toString() { ... }                //    "
}
```

The main differences between this class and the expansion of its record 
analogue
are the lines marked with a `*`; these are the ones that deal with the 
disparity
between the state description and the actual representation.  It would 
be nice
if the author of this class _only_ had to write the code that was 
different from
what we could derive for a record; not only would this be pleasantly 
concise,
but it would mean that all the code that _is_ there exists to capture the
differences between its representation and its API.

## Carrier classes

A _carrier class_ is a normal class declared with a state description.  
As with
a record, the state description is a complete, canonical, nominal 
description of
the class's state.  In return, the language derives the same API 
constraints as
it does for records: canonical constructor, canonical deconstruction 
pattern,
and component accessor methods.

    class Point(int x, int y) {                // class, not record!
        // explicitly declared representation

        ...

        // must have a constructor taking (int x, int y)
        // must have accessors for x and y
        // supports a deconstruction pattern yielding (int x, int y)
    }

Unlike a record, the language makes no assumptions about the object's
representation; the class author has to declare that just as with any other
class.

Saying the state description is "complete" means that it carries all the
“important” state of the class -- if we were to extract this state and 
recreate
the object, that should yield an “equivalent” instance.  As with 
records, this
can be captured by tying together the behavior of construction, 
accessors, and
equality:

```
Point p = ...
Point q = new Point(p.x(), p.y());
assert p.equals(q);
```

We can also derive _some_ implementation from the information we have so 
far; we
can derive sensible implementations of the `Object` methods (implemented 
in terms
of component accessor methods) and we can derive the canonical 
deconstruction
pattern (again in terms of the component accessor methods).  And from 
there, we
can derive support for reconstruction (`with` expressions.) 
Unfortunately, we
cannot (yet) derive the bulk of the state-related implementation: the 
canonical
constructor and component accessor methods.

### Component fields and accessor methods

One of the most tedious aspects of data-holder classes is the accessor 
methods;
there are often many of them, and they are almost always pure 
boilerplate.  Even
though IDEs can reduce the writing burden by generating these for us, 
readers
still have to slog through a lot of low-information code -- just to 
learn that
they didn't actually need to slog through that code after all.  We can 
derive
the implementation of accessor methods for records because records make the
internal commitment that the components are all backed with individual 
fields
whose name and type align with the state description.

For a carrier class, we don't know whether _any_ of the components are 
directly
backed by a single field that aligns to the name or type of the 
component.  But
it is a pretty good bet that many carrier class components will do 
exactly this
for at least _some_ of their fields.  If we can tell the language that this
correspondence is not merely accidental, the language can do more for us.

We do so by allowing suitable fields of a carrier class to be declared as
`component` fields.  (As usual at this stage, syntax is provisional, but 
not
currently a topic for discussion.)  A component field must have the same 
name
and type as a component of the current class (though it need not be 
`private` or
`final`, as record fields are.)  This signals that this field _is_ the
representation for the corresponding component, and hence we can derive the
accessor method for this component as well.

```
class Point(int x, int y) {
     private /* mutable */ component int x;
     private /* mutable */ component int y;

     // must have a canonical constructor, but (so far) must be explicit
     public Point(int x, int y) {
         this.x = x;
         this.y = y;
     }

     // derived implementations of accessors for x and y
     // derived implementations of equals, hashCode, toString
}
```

This is getting better; the class author had to bring the representation 
and the
mapping from representation to components (in the form of the `component`
modifier), and the canonical constructor.

### Compact constructors

Just as we are able to derive the accessor method implementation if we are
given an explicit correspondence between a field and a component, we can 
do the
same for constructors.  For this, we build on the notion of _compact
constructors_ that was introduced for records.

As with a record, a compact constructor in a carrier class is a 
shorthand for a
canonical constructor, which has the same shape as the state 
description, but
which is freed of the responsibility of actually committing the ending 
value of
the component parameters to the fields.  The main difference is that for a
record, _all_ of the components are backed by a component field, whereas 
for a
carrier class, only some of them might be.  But we can generalize compact
constructors by freeing the author of the responsibility to initialize the
_component_ fields, while leaving them responsible for initializing the 
rest of
the fields.  In the limiting case where all components are backed by 
component
fields, and there is no other logic desired in the constructor, the compact
constructor may be elided.

For our mutable `Point` class, this means we can elide nearly 
everything, except
the field declarations themselves:

```
class Point(int x, int y) {
     private /* mutable */ component int x;
     private /* mutable */ component int y;

     // derived compact constructor
     // derived accessors for x, y
     // derived implementations of equals, hashCode, toString
}
```

We can think of this class as having an implicit empty compact constructor,
which in turn means that the component fields `x` and `y` are 
initialized from
their corresponding constructor parameters.  There are also implicitly 
derived
accessor methods for each component, and implementations of `Object` methods
based on the state description.

This is great for a class where all the components are backed by fields, but
what about our `AlmostRecord` class?  The story here is good as well; we can
derive the accessor methods for the components backed by component 
fields, and
we can elide the initialization of the component fields from the compact
constructor, meaning that we _only_ have to specify the code for the 
parts that
deviate from the "record ideal":

```
class AlmostRecord(int x,
                    int y,
                    Optional<String> s) {

     private final component int x;
     private final component int y;
     private final String s;

     public AlmostRecord {
         this.s = s.orElse(null);
         // x and y fields implicitly initialized
     }

     public Optional<String> s() {
         return Optional.ofNullable(s);
     }

     // derived implementation of x and y accessors
     // derived implementation of equals, hashCode, toString
}
```

Because so many real-world almost-records differ from their record ideal in
minor ways, we expect to get a significant concision benefit for most 
carrier
classes, as we did for `AlmostRecord`.  As with records, if we want to
explicitly implement the constructor, accessor methods, or `Object` 
methods, we
are still free to do so.

### Derived state

One of the most frequent complaints about records is the inability to derive
state from the components and cache it for fast retrieval.  With carrier
classes, this is simple: declare a non-component field for the derived 
quantity,
initialize it in the constructor, and provide an accessor:

```
class Point(int x, int y) {
     private final component int x;
     private final component int y;
     private final double norm;

     Point {
         norm = Math.hypot(x, y);
     }

     public double norm() { return norm; }

     // derived implementation of x and y accessors
     // derived implementation of equals, hashCode, toString
}
```

### Deconstruction and reconstruction

Like records, carrier classes automatically acquire deconstruction 
patterns that
match the canonical constructor, so we can destructure our `Point` class 
as if
it were a record:

     case Point(var x, var y):

Because reconstruction (`with`) derives from a canonical constructor and
corresponding deconstruction pattern, when we support reconstruction of 
records,
we will also be able to do so for carrier classes:

     point = point with { x = 3; }

## Carrier interfaces

A state description makes sense on interfaces as well.  It makes the 
statement
that the state description is a complete, canonical, nominal description 
of the
interface's state (subclasses are allowed to add additional state), and
accordingly, implementations must provide accessor methods for the 
components.
This enables such interfaces to participate in pattern matching:

```
interface Pair<T,U>(T first, U second) {
     // implicit abstract accessors for first() and second()
}

...

if (o instanceof Pair(var a, var b)) { ... }
```

Along with the upcoming feature for pattern assignment in foreach-loop 
headers,
if `Map.Entry` became a carrier interface (which it will), we would be 
able to
iterate a `Map` like:

     for (Map.Entry(var key, var val) : map.entrySet()) { ... }

It is a common pattern in libraries to export an interface that is 
sealed to a
single private implementation.  In this pattern, the interface and
implementation can share a common state description:

```
public sealed interface Pair<T,U>(T first, U second) { }

private record PairImpl<T, U>(T first, U second) implements Pair<T, U> { }
```

Compared to the old way of doing this, we get enhanced semantics, better 
type
checking, and more concision.

### Extension

The main obligation of a carrier class author is to ensure that the 
fundamental
claim -- that the state description is a complete, canonical, nominal
description of the object's state -- is actually true.  This does not 
rule out
having the representation of a carrier class spread out over a hierarchy, so
unlike records, carrier classes are not required to be final or 
concrete, nor
are they restricted in their extension.

There are several cases that arise when carrier classes can participate in
extension:

  - A carrier class extends a non-carrier class;
  - A non-carrier class extends a carrier class;
  - A carrier class extends another carrier class, where all of the 
superclass
    components are subsumed by the subclass state description;
  - A carrier class extends another carrier class, but there are one or more
    superclass components that are not subsumed by the subclass state
    description.

Extending a non-carrier class with a carrier class will usually be 
motiviated by
the desire to "wrap" a state description around an existing hierarchy 
which we
cannot or do not want to modify directly, but we wish to gain the 
benefits of
deconstruction and reconstruction.  Such an implementation would have to 
ensure
that the class actually conforms to the state description, and that the
canonical constructor and component accessors are implemented.

When one carrier class extends another, the more straightforward case is 
that it
simply adds new components to the state description of the superclass.  For
example, given our `Point` class:

```
class Point(int x, int y) {
     component int x;
     component int y;

     // everything else for free!
}
```

we can use this as the base class for a 3d point class:

```
class Point3d(int x, int y, int z) extends Point {
     component int z;

     Point3d {
         super(x, y);
     }
}
```

In this case -- because the superclass components are all part of the 
subclass
state description -- we can actually omit the constructor as well, 
because we
can derive the association between subclass components and superclass
components, and thereby derive the needed super-constructor invocation.  
So we
could actually write:

```
class Point3d(int x, int y, int z) extends Point {
     component int z;

     // everything else for free!
}
```

One might think that we would need some marking on the `x` and `y` 
components of
`Point3d` to indicate that they map to the corresponding components of 
`Point`,
as we did for associating component fields with their corresponding 
components.
But in this case, we need no such marking, because there is no way that 
an `int
x` component of `Point` and an `int x` component of its subclass could 
possibly
refer to different things -- since they both are tied to the same `int x()`
accessor methods.  So we can safely infer which subclass components are 
managed
by superclasses, just by matching up their names and types.

In the other carrier-to-carrier extension case, where one or more superclass
components are _not_ subsumed by the subclass state description, it is 
necessary
to provide an explicit `super` constructor call in the subclass 
constructor.

A carrier class may be also declared abstract; the main effect of this 
is that
we will not derive `Object` method implementations, instead leaving that 
for the
subclass to do.

### Abstract records

This framework also gives us an opportunity to relax one of the 
restrictions on
records: that records can't extend anything other than 
`java.lang.Record`.  We
can also allow records to be declared `abstract`, and for records to extend
abstract records.

Just as with carrier classes that extend other carrier classes, there 
are two
cases: when the component list of the superclass is entirely contained 
within
that of the subclass, and when one or more superclass components are derived
from subclass components (or are constant), but are not components of the
subclass itself.  And just as with carrier classes, the main difference is
whether an explicit `super` call is required in the subclass constructor.

When a record extends an abstract record, any components of the subclass 
that
are also components of the superclass do not implicitly get component 
fields in
the subclass (because they are already in the superclass), and they 
inherit the
accessor methods from the superclass.

### Records are carriers too

With this framework in place, records can now be seen to be "just" carrier
classes that are implicitly final, extend `java.lang.Record`, that 
implicitly
have private final component fields for each component, and can have no 
other
fields.

## Migration compatibility

There will surely be some existing classes that would like to become carrier
classes.  This is a compatible migration as long as none of the mandated 
members
conflict with existing members of the class, and the class adheres to the
requirement that the state description is a complete, canonical, and nominal
description of the object state.

### Compatible evolution of records and carrier classes

To date, libraries have been reluctant to use records in public APIs because
of the difficulty of evolving them compatibly.  For a record:

```
record R(A a, B b) { }
```

that wants to evolve by adding new components:

```
record R(A a, B b, C c, D d) { }
```

we have several compatibility challenges to manage.  As long as we are only
adding and not removing/renaming, accessor method invocations will 
continue to
work. And existing constructor invocations can be allowed to continue 
work by
explicitly adding back a constructor that has the old shape:

```
record R(A a, B b, C c, D d) {

     // Explicit constructor for old shape required
     public R(A a, B b) {
         this(a, b, DEFAULT_C, DEFAULT_D);
     }

}
```

But, what can we do about existing uses of record _patterns_? While the
translation of record patterns would make adding components 
binary-compatible,
it would not be source-compatible, and there is no way to explicitly add a
deconstruction pattern for the old shape as we did with the constructor.

We can take advantage of the simplification offered by there being 
_only_ the
canonical deconstruction pattern, and allow uses of deconstruction 
patterns to
supply nested patterns for any _prefix_ of the component list.  So for the
evolved record R:

     case R(P1, P2)

would be interpreted as:

     case R(P1, P2, _, _)

where `_` is the match-all pattern.  This means that one can compatibly 
evolve a
record by only adding new components at the end, and adding a suitable
constructor for compatibility with existing constructor invocations.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-experts/attachments/20260113/0b0eadcc/attachment-0001.htm>


More information about the amber-spec-experts mailing list