Data Oriented Programming, Beyond Records
Anthony Vanelverdinghe
anthonyv.be at outlook.com
Sun Jan 18 10:59:46 UTC 2026
On 1/18/2026 9:54 AM, Olexandr Rotan wrote:
>
> Correcting myself: what about "state-based equals with immutable
> fields-based hashCode", where "immutable fields" would be defined
> as "final fields of types for which it is known that their
> hashCode is immutable" (primitive types, enums, records and
> carrier classes with the default-generated hashCode method, value
> classes)?
>
> I would like to once more point out that none of the listed cases
> except for the first two actually have any guarantee of hashcode
> immutability, as records and value types are only shallowly immutable,
> so this whole discussion seems to tackle a much more global topic than
> just carrier classes equals/hashCode, and challenging the way hashCode
> is generated in regards to mutability in carriers would need to
> challenge the same behaviour in records as well, because the fact that
> only final fields are used to compute hash only defers problem one
> layer of indirection deeper, to any possibly-mutable value of final
> field/component
Thanks, you're right, of course. And I should've said "... for which it
is known that their hashCode is constant", not immutable.
So I propose that records, carrier classes, and value classes have a
default (1) state-based equals and (2) hashCode whose result is constant
for any given instance. This would require changing the default hashCode
for records, but I believe this would be a backward compatible change.
And hashCode would be able to use at least components/final fields of
primitive types, enums, and records/carrier classes/value classes with a
default hashCode. Moreover, javac could recognize patterns like `record
Names(List<String> value) { public Names { value = List.copyOf(value); }
}` to use the `value` component as well.
Anthony
>
> On Sun, Jan 18, 2026 at 8:01 AM Anthony Vanelverdinghe
> <anthonyv.be at outlook.com> wrote:
>
> On 1/17/2026 6:58 PM, Anthony Vanelverdinghe wrote:
>> On 1/17/2026 5:46 PM, Brian Goetz wrote:
>>>> With a mutable class with equals/hashCode/toString generated,
>>>> it's too easy to store an object in a collection, mutate it,
>>>> and then never been able to find it again.
>>>
>>> Yes, but also: everyone here knows about this risk. You don't
>>> need to belabor the example :)
>>>
>>> This is a reflection of a problem we already have: equals is a
>>> semantic part of the type's definition, about when two instances
>>> represent the "same" value, and mutability is pat of the type's
>>> definition, and "whether you put it in a hash-based collection
>>> and then mutate it" is about _how the instances are used by
>>> clients_.
>>>
>>> While immutability is a good default, its not always _wrong_ to
>>> use mutability; its just riskier. And for a mutable class,
>>> state-based equality is _still_ a sensible possible
>>> implementation of equality; its just riskier. And putting
>>> mutable objects in hash-based collections is also not wrong; its
>>> just riskier. For the bad thing to happen, all of these have to
>>> happen _and then it has to be mutated_. But if we have to
>>> assign primary blame here, it is not the guy who didn't write
>>> `final` on the fields, and not the guy who said that equality
>>> was state-based, but the guy who put it in the collection and
>>> mutated it.
>>>
>>> If we decided that avoiding this risk were the primary design
>>> goal, then we would have to either disallow mutable fields, or
>>> change the way we define the default equals/hashCode behavior.
>>> Potentially ways to do the latter include:
>>>
>>> - never provide a default implementation, inherit the object
>>> default
>>> - don't provide a default implementation if there are any
>>> mutable fields
>>> - leave mutable fields out of the default implementation, but
>>> use the other fields
>>>
>>> While "disallow mutable fields" is a potentially principled
>>> answer, it is pretty restrictive. Of the others, I claim that
>>> the proposed behavior is better than any of them.
>>>
>>> Carrier classes are about data, and come with a semantic claim:
>>> that the state description is a complete, canonical description
>>> of the state. It seems pretty questionable then to use identity
>>> equality for such a class. But the other two alternatives
>>> listed are both some form of "action at a distance", harder to
>>> keep track of, are still only guesses at what the user actually
>>> wants. The two principled options are "don't provide
>>> equals/hashCode", and "state-based equals/hashCode", and of the
>>> two, the latter makes much more sense.
> Correcting myself: what about "state-based equals with immutable
> fields-based hashCode", where "immutable fields" would be defined
> as "final fields of types for which it is known that their
> hashCode is immutable" (primitive types, enums, records and
> carrier classes with the default-generated hashCode method, value
> classes)?
>>
>> What about "state-based equals with final fields-based hashCode"?
>> (Maybe this is actually what you meant with `leave mutable fields
>> out of the default implementation, but use the other fields`, but
>> then I don't understand how that's "action at a distance" and
>> "harder to keep track of".) That would solve the HashSet issue
>> and be a safe, intuitive default. There might be performance
>> issues for carrier classes without final fields that are used in
>> large HashSets, but in that case it's easy enough to provide
>> one's own implementation of `hashCode`. And by doing so, one
>> would implicitly consent to the implications of doing so (I could
>> imagine javac issuing a lint warning for this and/or javadoc
>> adding a warning to the Javadoc that the carrier class suffers
>> from the HashSet issue).
>>
>> Kind regards, Anthony
>>
>>> It is not a bug to put a mutable object in a HashSet; it is a
>>> bug to do that _and_ to later mutate it. So detuning the
>>> semantics of carriers, from something simple and principled to
>>> something complicated and which is just a guess about what the
>>> user really wants, just because someone might do two things that
>>> are each individually OK but together not OK, seems like an
>>> over-rotation.
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/amber-spec-observers/attachments/20260118/ab19f4a8/attachment-0001.htm>
More information about the amber-spec-observers
mailing list