RFR: 8221836: Avoid recalculating String.hash when zero

Aleksey Shipilev shade at redhat.com
Mon Apr 8 11:40:43 UTC 2019


On 4/8/19 1:28 PM, Peter Levart wrote:
> On 4/8/19 12:28 PM, Aleksey Shipilev wrote:
>>> However, you also said in your opening criticism
>>>
>>>   "I had hard time convincing myself that code is concurrency-safe"
>>>
>>> I think that is a more telling complaint. Can you elaborate on why you
>>> found it hard to convince yourself of this? (I know what I think is the
>> Because the whole thing in current code is "benign data race" on hash field. Pulling in another
>> field into race needs careful consideration if it breaks the benignity. It apparently does not, but
>> the cognitive complexity involved in reading that code makes the minuscule benefit much more
>> questionable.
> 
> The reasoning is very similar as with just one field. With one field (hash) the thread sees either
> the default value (0) or a non-zero value calculated either by this thread sometime before or by a
> concurrent thread that has already stored it. Regardless of ordering, the thread either uses the
> non-zero value or (re)calculates it (again). The value calculation is deterministic and uses
> immutable published state (the array), so it always calculates the same value for the same object.
> Idempotence is guaranteed.
> 
> The same reasoning can be extended to a general case where there are many fields used for caching of
> a calculated state from some immutable published state. The constraint is that the calculation must
> be deterministic and must also deterministically choose which of the many fields used for caching is
> to be modified. Only one field may be modified, never more than one. The thread therefore sees
> either the default values of all fields or the default values of all but one field which has been
> set by either this thread sometime before or by a concurrent thread. Regardless of ordering, the
> thread either uses the state combined from the default values of all fields but one and a
> non-default value of a single field or (re)calculates the non-default value of the single field. The
> value calculation is deterministic, uses immutable published state and deterministically chooses the
> field to modify, so it always calculates the same "next" state for the object. Idempotence is
> guaranteed.

Thank you, the mere existence of this wall of text solidifies my argument: the need to invoke the
argument like that is exactly the cognitive complexity I've been talking about, and it speaks about
maintainability/risk cost, while benefits are still around the machine epsilon.

Let's just draw the line on micro-optimizations, okay? This one is interesting experiment in itself,
and it certainly passes the "hold my beer" curiosity threshold, but it does not pass the "should we
actually do this" bar for me.

-Aleksey



More information about the core-libs-dev mailing list