RFR: 8297235: ZGC: assert(regs[i] != regs[j]) failed: Multiple uses of register: rax [v3]

Roberto Castañeda Lozano rcastanedalo at openjdk.org
Tue Dec 13 09:03:43 UTC 2022


On Thu, 8 Dec 2022 09:01:08 GMT, Axel Boldt-Christmas <aboldtch at openjdk.org> wrote:

>> Tests java/util/stream/test/org/openjdk/tests/java/util/* with -XX:+UseZGC -Xcomp -XX:-TieredCompilation crashes with `assert(regs[i] != regs[j]) failed: Multiple uses of register: rax`. More specifically compilation of java.util.concurrent.ForkJoinTask::awaitDone.
>> 
>> The reason seems to be that the compare value and the memory input ends up sharing a register. (Uses Unsafe CAS which CAS an object reference into a field of that object, `oldval: rax` and `mem: [rax+offset]`). The Z load barrier stub dispatch implementation require that the reference and reference address occupy distinct registers. In the loadP nodes this is established by marking all but the memory TEMP which results in no sharing.
>> 
>> This is not possible for the CompareAndSwapP / CompareAndExchangeP nodes as the compare value is an input node. 
>> 
>> The solution proposed here is less than ideal as it makes the CAS nodes require one extra TEMP register, which in the common case is unused. This puts unnecessary extra strain on the register allocation. The problem is that there is no way currently (that I can find) to express in .ad that a memory input must not share registers with a specific other input. 
>> 
>> There is an alternative solution for this specific crash which does not use a second TEMP register (see commit: cfd5ced4e97e986fc10c5a8721b543cd3101c58a). It accomplish this by using the same trick that the aarch64 Z CAS node uses which is to specify the memory as indirect which results in the address being LEA into a register. However from what I can see this does not guarantee that the address and the reference does not share a register (`oldval: rax` and `mem: [rax]`). So it is theoretically broken, (and so is the aarch64 implementation). 
>> 
>> It is unclear to me if there is ever a way for C2 to generation a CAS which compares the address of the field with its content. 
>> 
>> I call on anyone with more knowledge about `adlc` and `C2` for feedback. And specifically I want to open up a discussion with these points:
>> * Is there some other way of expressing in the .ad file that a memory input should not share some register?
>>   * If not, is this a worthwhile RFE? As it seems to be a patterned used at least in other places in Z.
>> * Will the indirect input ever share a register with oldval and/or are the aarch64/riscv implementations broken because of this? How about ppc?
>> 
>> Testing: linux-x64 zgc tagged tests tier 1-7 and some specific crashing tests with `-XX:+UseZGC -Xcomp -XX:-TieredCompilation` (in: java/util/stream/, java/util/concurrent/)
>
> Axel Boldt-Christmas has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains five additional commits since the last revision:
> 
>  - Remove problem listed tests
>  - Merge remote-tracking branch 'upstream_jdk/master' into JDK-8297235
>  - indirect zXChgP as well
>  - indirect alternative
>  - JDK-8297235: ZGC: assert(regs[i] != regs[j]) failed: Multiple uses of register: rax

I agree with @fisk's analysis above: the proposed solution should be safe as long as the first CompareAndSwap operand (`mem`) has a non-zero offset (which should at least be the case in the failing Unsafe-based patterns): the register allocator will then treat `mem` and `oldval` as two distinct, interfering values and assign them different registers.

I attached a [minimal reproducer](https://bugs.openjdk.org/secure/attachment/102007/Reproducer.java) to the JBS issue, feel free to include it in this PR as a test case if you think it adds value.

   I do not think there is a general way to express the constraint you want in .ad files, but I am not an expert in this area, maybe someone at Intel could comment on this (@sviswa7, @jatin-bhateja?). I also do not have a feeling for what would be the benefit vs. cost of implementing such construct. An alternative approach could be to enforce the constraint at the C2 IR level, by adding some kind of pseudo-node redefining the input to the first `CompareAndSwap` operand so that it always interferes with `oldval`.

   Regarding the impact on other architectures, it seems they all follow the solution proposed here, so they should be as safe as in this case, that is, as long as C2 does not generate a CAS comparing the address of the field with its content. I cannot think how C2 could generate such pattern - which of course is not a guarantee that it will never do it ;).

-------------

Marked as reviewed by rcastanedalo (Reviewer).

PR: https://git.openjdk.org/jdk/pull/11410


More information about the hotspot-dev mailing list