[jdk16] RFR: 8258384: AArch64: SVE verify_ptrue fails on some tests [v2]

Andrew Dinn adinn at openjdk.java.net
Tue Jan 5 10:43:58 UTC 2021


On Tue, 5 Jan 2021 02:57:15 GMT, Ningsheng Jian <njian at openjdk.org> wrote:

>> After applying [1], some Vector API tests fail with SIGILL on SVE
>> system. The SIGILL was triggered by verify_ptrue before c2 compiled
>> function returns, which means that the preserved p7 register (as ptrue)
>> has been clobbered before returning to c2 compiled code. (p7 is not
>> preserved cross function calls, and system calls [2]).
>> 
>> Currently we try to reinitialize ptrue at each entrypoint of returning
>> from non-c2 compiled code, which indicating possible C or system calls.
>> However, there's still one entrypoint missing, exception handling, as
>> we may jump to c2 compiled code for exception handler. See
>> OptoRuntime::generate_exception_blob().
>> 
>> Adding reinitialize_ptrue before jumping back to c2 compiled code in
>> generate_exception_blob() could solve those Vector API test failures.
>> Actually I had that in my initial test patch [3], I don't know why I
>> missed that in final patch... I reran tests with the same approach of
>> [3] and found that there's still something missing, the
>> nmethod_entry_barrier() in c2 function prolog. The barrier may call to
>> runtime code (see generate_method_entry_barrier()). To reduce the risk
>> of missing such reinitialize_ptrue in newly added code in future, I
>> think it would be better to do the reinitialize in
>> pop_call_clobbered_registers().
>> 
>> P.S. the SIGILL message is also not clear, it should print detailed
>> message as indicated by MacroAssembler::stop() call. This is caused by
>> JDK-8255711 removing the message printing code. This patch also adds it
>> back, so that it could print detailed message for abort.
>> 
>> Tested with tier1-3 on SVE hardware. Also verified with the same
>> approach of patch [3] with jtreg tests hotspot_all_no_apps and
>> jdk:tier1-3 passed without incorrect ptrue value assertion failure.
>> 
>> [1] https://github.com/openjdk/jdk/pull/1621
>> [2] https://github.com/torvalds/linux/blob/master/Documentation/arm64/sve.rst
>> [3] http://cr.openjdk.java.net/~njian/8231441/0001-RFC-Block-one-caller-save-register-for-C2.patch
>
> Ningsheng Jian has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update copyright year to 2021.

src/hotspot/os_cpu/linux_aarch64/os_linux_aarch64.cpp line 252:

> 250:                                 pc, info, NULL, NULL, 0, 0);
> 251:         va_end(detail_args);
> 252:       }

I'm not sure it is ok to revert this code. The fix this is part of (for JDK-8255711) was provided explicitly to re-organize the flow of control for handling of fatal errors. Reverting this code appears to undermine the goal of that issue. I would like to get Thomas Stuefe's (@tstuefe) opinion on whether it is appropriate to abort the JVM here vs returning false before accepting this specific change.

-------------

PR: https://git.openjdk.java.net/jdk16/pull/50


More information about the hotspot-compiler-dev mailing list