RFR: 8354300: Fields in String are not trusted
ExE Boss
duke at openjdk.org
Sat Apr 19 06:30:51 UTC 2025
On Mon, 14 Apr 2025 18:09:35 GMT, Chen Liang <liach at openjdk.org> wrote:
>> Hello Per, I'm not too familiar with runtime compiler optimizations. So consider this as a basic question.
>>
>>> This means the VM can trust these fields to never change which enables constant folding optimizations.
>>
>> If I'm not wrong, then it is the `hash` field value that we want to be considered as a constant (once computed) so that calls to `String.hashCode()` would get replaced with the constant computed value.
>>
>> Looking at the current implementation of `String.hashCode()`:
>>
>>
>> public int hashCode() {
>> // The hash or hashIsZero fields are subject to a benign data race,
>> // making it crucial to ensure that any observable result of the
>> // calculation in this method stays correct under any possible read of
>> // these fields. Necessary restrictions to allow this to be correct
>> // without explicit memory fences or similar concurrency primitives is
>> // that we can ever only write to one of these two fields for a given
>> // String instance, and that the computation is idempotent and derived
>> // from immutable state
>> int h = hash;
>> if (h == 0 && !hashIsZero) {
>> h = isLatin1() ? StringLatin1.hashCode(value)
>> : StringUTF16.hashCode(value);
>> if (h == 0) {
>> hashIsZero = true;
>> } else {
>> hash = h;
>> }
>> }
>> return h;
>> }
>>
>>
>> If I'm reading that correctly, and keeping aside concurrent calls from this discussion, then only one of `hash` or the `hashIsZero` fields will have its value changed to a non-default value. i.e. if `hashCode()` implementation computes a non-zero value then the `hash` field will be assigned a (non-default) value and if that method computes a hash of 0, then `hashIsZero` will get assigned a (non-default) value. It then means that the other field will never move out of its initial value and thus will never be considered "stable".
>>
>> Am I right? If yes, then would the runtime (hotspot) compiler still replace the call to `String.hashCode()` with a constant value?
>
> Also re @jaikiran: yes, you are right that the current code cannot constant-fold the scenario where the hash is 0; so `"".hashCode()` is not constant as a result. The solution I shared above can address this scenario, but it cannot completely bring performance to parity with other constant-folded cases in Remi's shared benchmark (see https://github.com/liachmodded/jdk/commit/247e8bd92e6dbad6df2dd50ad83caa49983a81b4)
@liach
> ```java
> var isZero = hashIsZero;
> if (isZero == 1) return 0;
> if (isZero == 2) {
> int h = hash;
> if (h != 0) return h;
> }
> return computeHash(); // and set hash, hashIsZero fields
> ```
Instead of `hashIsZero`, the field should be called `hashState` in that case.
And maybe use `-1` for the `hashCode() == 0` state and `+1` for the `hashCode() != 0` state.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/24625#issuecomment-2816567930
More information about the core-libs-dev
mailing list