RFR: 8221836: Avoid recalculating String.hash when zero

Peter Levart peter.levart at gmail.com
Mon Apr 8 12:44:31 UTC 2019



On 4/8/19 1:40 PM, Aleksey Shipilev wrote:
> On 4/8/19 1:28 PM, Peter Levart wrote:
>> The reasoning is very similar as with just one field. With one field (hash) the thread sees either
>> the default value (0) or a non-zero value calculated either by this thread sometime before or by a
>> concurrent thread that has already stored it. Regardless of ordering, the thread either uses the
>> non-zero value or (re)calculates it (again). The value calculation is deterministic and uses
>> immutable published state (the array), so it always calculates the same value for the same object.
>> Idempotence is guaranteed.
>>
>> The same reasoning can be extended to a general case where there are many fields used for caching of
>> a calculated state from some immutable published state. The constraint is that the calculation must
>> be deterministic and must also deterministically choose which of the many fields used for caching is
>> to be modified. Only one field may be modified, never more than one. The thread therefore sees
>> either the default values of all fields or the default values of all but one field which has been
>> set by either this thread sometime before or by a concurrent thread. Regardless of ordering, the
>> thread either uses the state combined from the default values of all fields but one and a
>> non-default value of a single field or (re)calculates the non-default value of the single field. The
>> value calculation is deterministic, uses immutable published state and deterministically chooses the
>> field to modify, so it always calculates the same "next" state for the object. Idempotence is
>> guaranteed.
> Thank you, the mere existence of this wall of text solidifies my argument: the need to invoke the
> argument like that is exactly the cognitive complexity I've been talking about, and it speaks about
> maintainability/risk cost, while benefits are still around the machine epsilon.

I tried to write the two descriptions side by side to show that the 2nd 
is not more complex than the 1st. It's just using longer "nouns". The 
sentences are otherwise equivalent and there's additional text that 
describes the "nouns". I could have done a better job though...

So here's 2nd try:

The String hash code caching (as it is written today) is an example of a 
benign data race that can be described as caching of lazily calculated 
state from immutable published state, both modeled in the same object. 
Data race is benign if:

- the published state which is used as input of the calculation is immutable
- the calculation is deterministic
- threads observe the cached calculated state of the object to be 
updated just once atomically. Meaning that there are only two different 
observable states of object: "initial" state where the calculated cached 
data is not set and "updated" state where the the calculated cached data 
is set.

Java fields up to 32 bits wide (+ reference fields regardless of width) 
exhibit atomic updates.

So if the update of the object state (transition from "initial" to 
"updated" state) is performed by a write of a deterministically 
calculated value to a single deterministically chosen field of no more 
than 32 bits (or a reference field), the whole object state is observed 
to change atomically and the data race is benign.

Current and proposed caching differ only in the number of fields used 
for caching the calculated state, but both adhere to the above rules.

So the reasoning stays the same as with current code. It only takes a 
little to realize that it's all about a single field that is updated 
while the presence of other fields (zero or more) don't change the 
picture since they are constant for the whole lifetime of object.

If you're afraid that a future maintainer of that code would not realize 
that, then a simple comment put into String.hashCode method and 
java_lang_String::set_hash C++ metohd that would say something like the 
following:

// only a single field may be modified so that the Object state is 
updated atomically

...is surely going to help him/her keep the String free from bugs...

Regards, Peter





More information about the hotspot-gc-dev mailing list