RFR: 8221836: Avoid recalculating String.hash when zero
Peter Levart
peter.levart at gmail.com
Mon Apr 8 12:44:31 UTC 2019
On 4/8/19 1:40 PM, Aleksey Shipilev wrote:
> On 4/8/19 1:28 PM, Peter Levart wrote:
>> The reasoning is very similar as with just one field. With one field (hash) the thread sees either
>> the default value (0) or a non-zero value calculated either by this thread sometime before or by a
>> concurrent thread that has already stored it. Regardless of ordering, the thread either uses the
>> non-zero value or (re)calculates it (again). The value calculation is deterministic and uses
>> immutable published state (the array), so it always calculates the same value for the same object.
>> Idempotence is guaranteed.
>>
>> The same reasoning can be extended to a general case where there are many fields used for caching of
>> a calculated state from some immutable published state. The constraint is that the calculation must
>> be deterministic and must also deterministically choose which of the many fields used for caching is
>> to be modified. Only one field may be modified, never more than one. The thread therefore sees
>> either the default values of all fields or the default values of all but one field which has been
>> set by either this thread sometime before or by a concurrent thread. Regardless of ordering, the
>> thread either uses the state combined from the default values of all fields but one and a
>> non-default value of a single field or (re)calculates the non-default value of the single field. The
>> value calculation is deterministic, uses immutable published state and deterministically chooses the
>> field to modify, so it always calculates the same "next" state for the object. Idempotence is
>> guaranteed.
> Thank you, the mere existence of this wall of text solidifies my argument: the need to invoke the
> argument like that is exactly the cognitive complexity I've been talking about, and it speaks about
> maintainability/risk cost, while benefits are still around the machine epsilon.
I tried to write the two descriptions side by side to show that the 2nd
is not more complex than the 1st. It's just using longer "nouns". The
sentences are otherwise equivalent and there's additional text that
describes the "nouns". I could have done a better job though...
So here's 2nd try:
The String hash code caching (as it is written today) is an example of a
benign data race that can be described as caching of lazily calculated
state from immutable published state, both modeled in the same object.
Data race is benign if:
- the published state which is used as input of the calculation is immutable
- the calculation is deterministic
- threads observe the cached calculated state of the object to be
updated just once atomically. Meaning that there are only two different
observable states of object: "initial" state where the calculated cached
data is not set and "updated" state where the the calculated cached data
is set.
Java fields up to 32 bits wide (+ reference fields regardless of width)
exhibit atomic updates.
So if the update of the object state (transition from "initial" to
"updated" state) is performed by a write of a deterministically
calculated value to a single deterministically chosen field of no more
than 32 bits (or a reference field), the whole object state is observed
to change atomically and the data race is benign.
Current and proposed caching differ only in the number of fields used
for caching the calculated state, but both adhere to the above rules.
So the reasoning stays the same as with current code. It only takes a
little to realize that it's all about a single field that is updated
while the presence of other fields (zero or more) don't change the
picture since they are constant for the whole lifetime of object.
If you're afraid that a future maintainer of that code would not realize
that, then a simple comment put into String.hashCode method and
java_lang_String::set_hash C++ metohd that would say something like the
following:
// only a single field may be modified so that the Object state is
updated atomically
...is surely going to help him/her keep the String free from bugs...
Regards, Peter
More information about the hotspot-gc-dev
mailing list