From yunyao.zxl at alibaba-inc.com Mon Nov 7 04:17:12 2022 From: yunyao.zxl at alibaba-inc.com (Xiaolin Zheng) Date: Mon, 07 Nov 2022 12:17:12 +0800 Subject: =?UTF-8?B?UlZDIGJ5IGRlZmF1bHQgKGNvbnQnZCk=?= Message-ID: Hi team, As RVC's proposed patches have been merged into the mainline, in response to the former thread[1] I would like to turn it on by default before the December RDP 1 deadline, for currently the hardware feature C extension has been ratified and implemented by mainstream RISC-V hardware like boards produced by Hifive, meaning we can test and verify our implementation on physical boards. Opening another thread to refresh the content. We can turn RVC on for now by using `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC`. In addition we can examine the generated code by using options `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=no-aliases,numeric,show-bytes` combined. I have pushed a simple proposed patch[2] to turn it as default true. The only thing I shall mention here is as we know there is a known issue that may relate to the opensbi lib. Please see previous discussions[3][4]. The pattern of that issue is very easy to be distinguished, which is an uncommon case and which turns out to be bugs hidden in underlying libraries at last. I think it should be users' responsibility to update their outdated libs, and such issue shall not stop our pace. I have opened an JBS issue[5] to record this, marking it as "Won't fix". If there's any suggestion or objection, please let me know. If not, I will file a patch around Nov 15 (may be next week since the deadline is looming) if everything looks okay. Best Regards, Xiaolin [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000609.html [2] https://github.com/zhengxiaolinX/jdk/commit/b5b9c64529c27c40542f8cda720652fabf70682d [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000618.html [4] https://github.com/riscv-collab/riscv-openjdk/issues/23 [5] https://bugs.openjdk.org/browse/JDK-8296350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Mon Nov 7 09:05:31 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 7 Nov 2022 12:05:31 +0300 Subject: RVC by default (cont'd) In-Reply-To: References: Message-ID: <08951A31-0265-4780-8F61-44DF9B5A4BE7@gmail.com> Hello Recently commit [1] introduced support for cpu profiles, for example RVA20U64, RVA22U64. And UseRVC is already a part of UseRVA20U64. Maybe it would be good to go another way. For example make RVA20U64 non experimental and enable it by default, that will enable RVC automatically. Also make UseRVC non-experimental. Regards, Vladimir [1] https://github.com/openjdk/jdk/commit/e0c29307f7b35149aacae0bb935aa9fe524cff72 > 7 ????. 2022 ?., ? 07:17, Xiaolin Zheng ???????(?): > > Hi team, > > As RVC's proposed patches have been merged into the mainline, in response to the former thread[1] I would like to turn it on by default before the December RDP 1 deadline, for currently the hardware feature C extension has been ratified and implemented by mainstream RISC-V hardware like boards produced by Hifive, meaning we can test and verify our implementation on physical boards. > > Opening another thread to refresh the content. > > We can turn RVC on for now by using `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC`. In addition we can examine the generated code by using options `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=no-aliases,numeric,show-bytes` combined. > > I have pushed a simple proposed patch[2] to turn it as default true. > > The only thing I shall mention here is as we know there is a known issue that may relate to the opensbi lib. Please see previous discussions[3][4]. The pattern of that issue is very easy to be distinguished, which is an uncommon case and which turns out to be bugs hidden in underlying libraries at last. I think it should be users' responsibility to update their outdated libs, and such issue shall not stop our pace. > > I have opened an JBS issue[5] to record this, marking it as "Won't fix". > > If there's any suggestion or objection, please let me know. If not, I will file a patch around Nov 15 (may be next week since the deadline is looming) if everything looks okay. > > Best Regards, > Xiaolin > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000609.html > [2] https://github.com/zhengxiaolinX/jdk/commit/b5b9c64529c27c40542f8cda720652fabf70682d > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000618.html > [4] https://github.com/riscv-collab/riscv-openjdk/issues/23 > [5] https://bugs.openjdk.org/browse/JDK-8296350 From yunyao.zxl at alibaba-inc.com Mon Nov 7 09:43:28 2022 From: yunyao.zxl at alibaba-inc.com (Xiaolin Zheng) Date: Mon, 07 Nov 2022 17:43:28 +0800 Subject: =?UTF-8?B?UmU6IFJWQyBieSBkZWZhdWx0IChjb250J2Qp?= In-Reply-To: <08951A31-0265-4780-8F61-44DF9B5A4BE7@gmail.com> References: , <08951A31-0265-4780-8F61-44DF9B5A4BE7@gmail.com> Message-ID: <33c4d5c5-590e-44b0-96da-8c585bd83901.yunyao.zxl@alibaba-inc.com> Hi Vladimir, Thank you for the suggestion and that sounds nice as well to me - more unified. Pushed a new commit to fit your pre-review comment. Best, Xiaolin [1] https://github.com/zhengxiaolinX/jdk/commit/312462e83ea3dcbd884e121ca16b2209b7a6c5c4 ------------------------------------------------------------------ From:Vladimir Kempik Send Time:2022?11?7?(???) 17:05 To:???(??) Cc:riscv-port-dev Subject:Re: RVC by default (cont'd) Hello Recently commit [1] introduced support for cpu profiles, for example RVA20U64, RVA22U64. And UseRVC is already a part of UseRVA20U64. Maybe it would be good to go another way. For example make RVA20U64 non experimental and enable it by default, that will enable RVC automatically. Also make UseRVC non-experimental. Regards, Vladimir [1] https://github.com/openjdk/jdk/commit/e0c29307f7b35149aacae0bb935aa9fe524cff72 > 7 ????. 2022 ?., ? 07:17, Xiaolin Zheng ???????(?): > > Hi team, > > As RVC's proposed patches have been merged into the mainline, in response to the former thread[1] I would like to turn it on by default before the December RDP 1 deadline, for currently the hardware feature C extension has been ratified and implemented by mainstream RISC-V hardware like boards produced by Hifive, meaning we can test and verify our implementation on physical boards. > > Opening another thread to refresh the content. > > We can turn RVC on for now by using `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC`. In addition we can examine the generated code by using options `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=no-aliases,numeric,show-bytes` combined. > > I have pushed a simple proposed patch[2] to turn it as default true. > > The only thing I shall mention here is as we know there is a known issue that may relate to the opensbi lib. Please see previous discussions[3][4]. The pattern of that issue is very easy to be distinguished, which is an uncommon case and which turns out to be bugs hidden in underlying libraries at last. I think it should be users' responsibility to update their outdated libs, and such issue shall not stop our pace. > > I have opened an JBS issue[5] to record this, marking it as "Won't fix". > > If there's any suggestion or objection, please let me know. If not, I will file a patch around Nov 15 (may be next week since the deadline is looming) if everything looks okay. > > Best Regards, > Xiaolin > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000609.html > [2] https://github.com/zhengxiaolinX/jdk/commit/b5b9c64529c27c40542f8cda720652fabf70682d > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000618.html > [4] https://github.com/riscv-collab/riscv-openjdk/issues/23 > [5] https://bugs.openjdk.org/browse/JDK-8296350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Mon Nov 7 13:02:01 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Mon, 7 Nov 2022 21:02:01 +0800 (GMT+08:00) Subject: RVC by default (cont'd) In-Reply-To: <33c4d5c5-590e-44b0-96da-8c585bd83901.yunyao.zxl@alibaba-inc.com> References: , <08951A31-0265-4780-8F61-44DF9B5A4BE7@gmail.com> <33c4d5c5-590e-44b0-96da-8c585bd83901.yunyao.zxl@alibaba-inc.com> Message-ID: <7cb5fec4.c3d8.184522efa0f.Coremail.yangfei@iscas.ac.cn> Hi, Making RVA20U64 profile the default makes sense to me provided that works for current accessible mainstream RV hardwares. Please note that RVA20U64 profile also means availability of the Zicclsm extention which indicates support for unaligned loads/stores [1]. "Zicclsm Misaligned loads and stores to main memory regions with both the cacheability and coherence PMAs must be supported." Thanks, Fei [1] https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20-profiles -----Original Messages----- From:"Xiaolin Zheng" Sent Time:2022-11-07 17:43:28 (Monday) To: "Vladimir Kempik" Cc: riscv-port-dev Subject: Re: RVC by default (cont'd) Hi Vladimir, Thank you for the suggestion and that sounds nice as well to me - more unified. Pushed a new commit to fit your pre-review comment. Best, Xiaolin [1] https://github.com/zhengxiaolinX/jdk/commit/312462e83ea3dcbd884e121ca16b2209b7a6c5c4 ------------------------------------------------------------------ From:Vladimir Kempik Send Time:2022?11?7?(???) 17:05 To:???(??) Cc:riscv-port-dev Subject:Re: RVC by default (cont'd) Hello Recently commit [1] introduced support for cpu profiles, for example RVA20U64, RVA22U64. And UseRVC is already a part of UseRVA20U64. Maybe it would be good to go another way. For example make RVA20U64 non experimental and enable it by default, that will enable RVC automatically. Also make UseRVC non-experimental. Regards, Vladimir [1] https://github.com/openjdk/jdk/commit/e0c29307f7b35149aacae0bb935aa9fe524cff72 > 7 ????. 2022 ?., ? 07:17, Xiaolin Zheng ???????(?): > > Hi team, > > As RVC's proposed patches have been merged into the mainline, in response to the former thread[1] I would like to turn it on by default before the December RDP 1 deadline, for currently the hardware feature C extension has been ratified and implemented by mainstream RISC-V hardware like boards produced by Hifive, meaning we can test and verify our implementation on physical boards. > > Opening another thread to refresh the content. > > We can turn RVC on for now by using `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC`. In addition we can examine the generated code by using options `-XX:+UnlockExperimentalVMOptions -XX:+UseRVC -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:PrintAssemblyOptions=no-aliases,numeric,show-bytes` combined. > > I have pushed a simple proposed patch[2] to turn it as default true. > > The only thing I shall mention here is as we know there is a known issue that may relate to the opensbi lib. Please see previous discussions[3][4]. The pattern of that issue is very easy to be distinguished, which is an uncommon case and which turns out to be bugs hidden in underlying libraries at last. I think it should be users' responsibility to update their outdated libs, and such issue shall not stop our pace. > > I have opened an JBS issue[5] to record this, marking it as "Won't fix". > > If there's any suggestion or objection, please let me know. If not, I will file a patch around Nov 15 (may be next week since the deadline is looming) if everything looks okay. > > Best Regards, > Xiaolin > > [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000609.html > [2] https://github.com/zhengxiaolinX/jdk/commit/b5b9c64529c27c40542f8cda720652fabf70682d > [3] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000618.html > [4] https://github.com/riscv-collab/riscv-openjdk/issues/23 > [5] https://bugs.openjdk.org/browse/JDK-8296350 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Tue Nov 8 12:58:12 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 8 Nov 2022 15:58:12 +0300 Subject: Speeding up copy_memory stub Message-ID: Hello. Currently ( if RVV is not used), we doing copy_memory not so great. At best doing just 8 bytes per loop ( copy8 label, one ld, one sd) I propose we use faster version when possible: using 4 ld in a row then 4 sd. Copying 32 bytes per loop, similar to [1] I have made a prototype [2], check the copy32 label there. It also have some comments on other parts of copy_memory stub Here are results of jmh testing on rvb-ice thead c910 board: Before ( copy8 only ) Benchmark (size) Mode Cnt Score Error Units ArrayCopyObject.conjoint_micro 31 thrpt 25 6653.095 ? 251.565 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 25 4933.970 ? 77.559 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 25 3627.454 ? 34.589 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 25 368.249 ? 0.453 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 25 187.776 ? 0.306 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 25 94.477 ? 0.340 ops/ms after ( with copy32 ) ArrayCopyObject.conjoint_micro 31 thrpt 25 7620.546 ? 69.756 ops/ms ArrayCopyObject.conjoint_micro 63 thrpt 25 6677.978 ? 33.112 ops/ms ArrayCopyObject.conjoint_micro 127 thrpt 25 5206.973 ? 22.612 ops/ms ArrayCopyObject.conjoint_micro 2047 thrpt 25 653.655 ? 31.494 ops/ms ArrayCopyObject.conjoint_micro 4095 thrpt 25 352.905 ? 7.390 ops/ms ArrayCopyObject.conjoint_micro 8191 thrpt 25 165.127 ? 0.832 ops/ms However I still have some issues with the code, when copy mode is (!is_aligned and !is_backward) - I?m getting ClassNotFound exceptions from classLoader, while trying to run JMH tests. I think it?s related to my patch, I have made a simple workaround for this case [3] to be able to make some measurements. Any help on catching these bugs is highly appreciated. Best Regards, Vladimir. [1] https://github.com/eblot/newlib/blob/master/newlib/libc/string/memcpy.c [2] https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc [3] https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc#r89241535 -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Tue Nov 8 16:10:19 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 8 Nov 2022 19:10:19 +0300 Subject: Speeding up copy_memory stub In-Reply-To: References: Message-ID: Hello I have found the issue, it was about what the code does after copy32 loop ends: __ beqz(cnt, done); //if that's all - done __ addi(tmp4, cnt, -8); // if not - copy the reminder __ bltz(tmp4, copy_small); // cnt < 8, go to copy_small, else fall throught to copy8 when beqz(cnt,done) was after bltz(tmp4, copy_small) then it would never be called and it will copy more than required. Moving beqz before bltz I have fixed the issue. I have updated the commit for anyone to try this [1] I?ll do some tests, if no issues found - will submit PR. I also want to get rid of data dependency here: __ addi(cnt, cnt, -wordSize*4); __ addi(tmp4, cnt, -32); __ bgez(tmp4, copy32); // cnt >= 32, do next loop by making it this way: __ addi(tmp4, cnt, -(32+wordSize*4)); __ addi(cnt, cnt, -wordSize*4); __ bgez(tmp4, copy32); // cnt >= 32, do next loop this way it will make two addi instructions independent of each other and allow them to be scheduled concurrently. I have tested this change independently of the rest of this patch and found no perf difference on three different uarches. ( 1 inOrder, 2 OoO) Any comments are welcome. Regards, Vladimir [1] https://github.com/VladimirKempik/jdk/commit/06d21c7f583b19009b5ac1f63462475d264257a4 > Hello. > Currently ( if RVV is not used), we doing copy_memory not so great. > At best doing just 8 bytes per loop ( copy8 label, one ld, one sd) > > I propose we use faster version when possible: > using 4 ld in a row then 4 sd. Copying 32 bytes per loop, similar to [1] > > I have made a prototype [2], check the copy32 label there. It also have some comments on other parts of copy_memory stub > Here are results of jmh testing on rvb-ice thead c910 board: > > Before ( copy8 only ) > Benchmark (size) Mode Cnt Score Error Units > ArrayCopyObject.conjoint_micro 31 thrpt 25 6653.095 ? 251.565 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 25 4933.970 ? 77.559 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 25 3627.454 ? 34.589 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 25 368.249 ? 0.453 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 25 187.776 ? 0.306 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 25 94.477 ? 0.340 ops/ms > > after ( with copy32 ) > > ArrayCopyObject.conjoint_micro 31 thrpt 25 7620.546 ? 69.756 ops/ms > ArrayCopyObject.conjoint_micro 63 thrpt 25 6677.978 ? 33.112 ops/ms > ArrayCopyObject.conjoint_micro 127 thrpt 25 5206.973 ? 22.612 ops/ms > ArrayCopyObject.conjoint_micro 2047 thrpt 25 653.655 ? 31.494 ops/ms > ArrayCopyObject.conjoint_micro 4095 thrpt 25 352.905 ? 7.390 ops/ms > ArrayCopyObject.conjoint_micro 8191 thrpt 25 165.127 ? 0.832 ops/ms > > However I still have some issues with the code, when copy mode is (!is_aligned and !is_backward) - I?m getting ClassNotFound exceptions from classLoader, while trying to run JMH tests. > I think it?s related to my patch, I have made a simple workaround for this case [3] to be able to make some measurements. > > Any help on catching these bugs is highly appreciated. > > Best Regards, Vladimir. > [1] https://github.com/eblot/newlib/blob/master/newlib/libc/string/memcpy.c > [2] https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc > [3] https://github.com/VladimirKempik/jdk/commit/e113d454dc2808889906eceaa1fb9cd560140fbc#r89241535 From shade at redhat.com Tue Nov 8 16:27:36 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Tue, 8 Nov 2022 17:27:36 +0100 Subject: Speeding up copy_memory stub In-Reply-To: References: Message-ID: On 11/8/22 17:10, Vladimir Kempik wrote: >> Any help on catching these bugs is highly appreciated. Try to pass this: $ make test TEST=hotspot_compiler_arraycopy I added those to extensively cover the arraycopy improvements work for x86_64. If RISC-V arraycopy code has any option flags, feel free to add them to test matrix here: https://github.com/openjdk/jdk/blob/dd5d4df5b68a40923987841a206fac5032d72f71/test/hotspot/jtreg/compiler/arraycopy/stress/TestStressArrayCopy.java#L152 -- Thanks, -Aleksey From vladimir.kempik at gmail.com Sun Nov 13 21:14:18 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Mon, 14 Nov 2022 00:14:18 +0300 Subject: Speeding up copy_memory stub In-Reply-To: References: Message-ID: Hello Thank you for hint. The test passes with the changes, I have rerun whole bench.vm subset of micro benchmarks and also found improvements in o.o.b.v.compiler.ArrayClear.testArrayClear , which uses System.arraycopy as well. I was also comparing risc-v versus aarch64, (on rbpi4, which has no sve, and neon was disabled), thead c910 seems to be somewhat comparable( overall, not on jdk-only) to cortex-a72, after normalizing per mhz. And here are results: 1.0 - relative result of A72, first column - before the patch, second - after the patch, the last one - the increment in percents o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.31 0,720 0,855 18,7 o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.63 0,660 0,889 34,8 o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.127 0,675 0,993 47,3 o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.2047 0,545 1,021 87,3 o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.4095 0,541 1,030 90,2 o.o.b.v.compiler.ArrayCopyObject.conjoint_micro.8191 0,580 0,904 55,9 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.31 0,731 0,860 17,7 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.63 0,734 0,958 30,5 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.127 0,663 0,962 45,1 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.2047 0,520 1,023 96,6 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.4095 0,525 1,052 100,5 o.o.b.v.compiler.ArrayCopyObject.disjoint_micro.8191 0,621 1,200 93,4 o.o.b.v.compiler.ArrayClear.testArrayClear 0,819 1,409 71,9 I have published the PR: https://github.com/openjdk/jdk/pull/11058 Regards, Vladimir. > 8 ????. 2022 ?., ? 19:27, Aleksey Shipilev ???????(?): > > On 11/8/22 17:10, Vladimir Kempik wrote: >>> Any help on catching these bugs is highly appreciated. > > Try to pass this: > > $ make test TEST=hotspot_compiler_arraycopy > > I added those to extensively cover the arraycopy improvements work for x86_64. > > If RISC-V arraycopy code has any option flags, feel free to add them to test matrix here: > https://github.com/openjdk/jdk/blob/dd5d4df5b68a40923987841a206fac5032d72f71/test/hotspot/jtreg/compiler/arraycopy/stress/TestStressArrayCopy.java#L152 > > -- > Thanks, > -Aleksey > From vladimir.kempik at gmail.com Tue Nov 15 07:55:32 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 15 Nov 2022 10:55:32 +0300 Subject: Pre-Review: improving Math.min/max on floats Message-ID: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> Hello Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). That requires additional logic to handle the case where only of of src is NaN. Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) fmax_s(dst, src1, src2); // Checking NaNs flt_s(zr, src1, src2); frflags(t0); beqz(t0, Done); // In case of NaNs fadd_s(dst, src1, src2); bind(Done); here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax second thing: I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. if result of sum is not NaN - do fmax and return result. second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these private double dAddBench(double a, double b) { return Math.max(a, b) + Math.min(a, b); } private double dMulBench(double a, double b) { return Math.max(a, b) * Math.min(a, b); } may get reduces to just a + b and a*b respectively. Looking for opinions, which way is better. The results, thead c910: before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op fadd+fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op 2fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op hifive: before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op fadd+fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op 2fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op Regards, Vladimir P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Fri Nov 18 08:06:10 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Fri, 18 Nov 2022 16:06:10 +0800 (GMT+08:00) Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> Message-ID: <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> Hi, I went through both versions and looks like the resulting performance gain will depend on the micro-architecture implementations. Personally I prefer the first version in respect of instruction count (5 compared with 7 instructions when the inputs are not NaNs) and code readability. PS: I would suggest also carry out more conformance/compartibility test as needed for these changes. Thanks, Fei -----Original Messages----- From:"Vladimir Kempik" Sent Time:2022-11-15 15:55:32 (Tuesday) To: riscv-port-dev Cc: Subject: Pre-Review: improving Math.min/max on floats Hello Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). That requires additional logic to handle the case where only of of src is NaN. Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) fmax_s(dst, src1, src2); // Checking NaNs flt_s(zr, src1, src2); frflags(t0); beqz(t0, Done); // In case of NaNs fadd_s(dst, src1, src2); bind(Done); here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax second thing: I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. if result of sum is not NaN - do fmax and return result. second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these private double dAddBench(double a, double b) {return Math.max(a, b) + Math.min(a, b);}private double dMulBench(double a, double b) {return Math.max(a, b) * Math.min(a, b);} may get reduces to just a + b and a*b respectively. Looking for opinions, which way is better. The results, thead c910: before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op fadd+fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op 2fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op hifive: before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op fadd+fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op 2fclass Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op Regards, Vladimir P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Fri Nov 18 09:49:02 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Fri, 18 Nov 2022 12:49:02 +0300 Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> Message-ID: Hello Thanks for taking a look. I think v1 is better too. We have measured fadd latency with lmbench ( can?t remember which platform it was, hifive or thead) and it turned out to be just 4 cycles, that?s ok I think. I have also benched it on third platform - shallow OoO with dual-issue fpu, on fpga, and it showed gains similar to thead?s. I?ll run jtreg?s tiers and submit PR afterwards. Regards, Vladimir > 18 ????. 2022 ?., ? 11:06, yangfei at iscas.ac.cn ???????(?): > > Hi, > > > I went through both versions and looks like the resulting performance gain will depend on the micro-architecture implementations. > > Personally I prefer the first version in respect of instruction count (5 compared with 7 instructions when the inputs are not NaNs) and code readability. > > PS: I would suggest also carry out more conformance/compartibility test as needed for these changes. > > > > Thanks, > > Fei > > > -----Original Messages----- > From:"Vladimir Kempik" > Sent Time:2022-11-15 15:55:32 (Tuesday) > To: riscv-port-dev > Cc: > Subject: Pre-Review: improving Math.min/max on floats > > Hello > Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using > > void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) > > The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). > That requires additional logic to handle the case where only of of src is NaN. > Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) > > > fmax_s(dst, src1, src2); > // Checking NaNs > flt_s(zr, src1, src2); > > frflags(t0); > beqz(t0, Done); > > // In case of NaNs > fadd_s(dst, src1, src2); > > bind(Done); > > > here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) > > I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax > second thing: > > I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. > if result of sum is not NaN - do fmax and return result. > > second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. > > I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. > While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. > > MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these > private double dAddBench(double a, double b) { return Math.max(a, b) + Math.min(a, b); } private double dMulBench(double a, double b) { return Math.max(a, b) * Math.min(a, b); } > may get reduces to just a + b and a*b respectively. > > Looking for opinions, which way is better. > > The results, thead c910: > > before > > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op > FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op > FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op > FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op > MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op > MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op > MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op > MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op > MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op > MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op > MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op > MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op > > fadd+fclass > > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op > FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op > FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op > FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op > MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op > MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op > MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op > MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op > MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op > MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op > MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op > MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op > > > 2fclass > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op > FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op > FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op > FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op > MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op > MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op > MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op > MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op > MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op > MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op > MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op > MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op > > > hifive: > > before > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op > FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op > FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op > FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op > MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op > MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op > MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op > MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op > MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op > MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op > MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op > MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op > > fadd+fclass > > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op > FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op > FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op > FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op > MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op > MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op > MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op > MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op > MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op > MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op > MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op > MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op > > 2fclass > > Benchmark Mode Cnt Score Error Units > FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op > FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op > FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op > FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op > FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op > FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op > MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op > MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op > MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op > MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op > MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op > MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op > MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op > MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op > > Regards, Vladimir > > P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. > > [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 > [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Tue Nov 22 08:28:12 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 22 Nov 2022 11:28:12 +0300 Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> Message-ID: Hello Found an issue with fadd+fclass version: jdk/incubator/vector/FloatMaxVectorTests.java test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) -- test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) -- test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) -- test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) -- test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) -- test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] at org.testng.Assert.fail(Assert.java:99) And 2fclass version ( checking every src argument to be NaN) doesn?t have this issue. So I think I?ll have to go v2 way. Regards, Vladimir. > 18 ????. 2022 ?., ? 12:49, Vladimir Kempik ???????(?): > > Hello > > Thanks for taking a look. > I think v1 is better too. > > We have measured fadd latency with lmbench ( can?t remember which platform it was, hifive or thead) and it turned out to be just 4 cycles, that?s ok I think. > > I have also benched it on third platform - shallow OoO with dual-issue fpu, on fpga, and it showed gains similar to thead?s. > > I?ll run jtreg?s tiers and submit PR afterwards. > > Regards, Vladimir > >> 18 ????. 2022 ?., ? 11:06, yangfei at iscas.ac.cn ???????(?): >> >> Hi, >> >> >> I went through both versions and looks like the resulting performance gain will depend on the micro-architecture implementations. >> >> Personally I prefer the first version in respect of instruction count (5 compared with 7 instructions when the inputs are not NaNs) and code readability. >> >> PS: I would suggest also carry out more conformance/compartibility test as needed for these changes. >> >> >> >> Thanks, >> >> Fei >> >> >> -----Original Messages----- >> From:"Vladimir Kempik" >> Sent Time:2022-11-15 15:55:32 (Tuesday) >> To: riscv-port-dev >> Cc: >> Subject: Pre-Review: improving Math.min/max on floats >> >> Hello >> Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using >> >> void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) >> >> The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). >> That requires additional logic to handle the case where only of of src is NaN. >> Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) >> >> >> fmax_s(dst, src1, src2); >> // Checking NaNs >> flt_s(zr, src1, src2); >> >> frflags(t0); >> beqz(t0, Done); >> >> // In case of NaNs >> fadd_s(dst, src1, src2); >> >> bind(Done); >> >> >> here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) >> >> I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax >> second thing: >> >> I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. >> if result of sum is not NaN - do fmax and return result. >> >> second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. >> >> I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. >> While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. >> >> MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these >> private double dAddBench(double a, double b) { return Math.max(a, b) + Math.min(a, b); } private double dMulBench(double a, double b) { return Math.max(a, b) * Math.min(a, b); } >> may get reduces to just a + b and a*b respectively. >> >> Looking for opinions, which way is better. >> >> The results, thead c910: >> >> before >> >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op >> MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op >> MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op >> MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op >> MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op >> MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op >> MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op >> MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op >> >> fadd+fclass >> >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op >> MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op >> MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op >> MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op >> MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op >> MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op >> MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op >> MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op >> >> >> 2fclass >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op >> MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op >> MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op >> MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op >> MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op >> MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op >> MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op >> MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op >> >> >> hifive: >> >> before >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op >> MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op >> MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op >> MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op >> MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op >> MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op >> MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op >> MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op >> >> fadd+fclass >> >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op >> MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op >> MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op >> MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op >> MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op >> MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op >> MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op >> MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op >> >> 2fclass >> >> Benchmark Mode Cnt Score Error Units >> FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op >> FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op >> FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op >> FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op >> FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op >> FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op >> MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op >> MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op >> MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op >> MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op >> MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op >> MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op >> MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op >> MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op >> >> Regards, Vladimir >> >> P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. >> >> [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 >> [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Tue Nov 22 09:05:11 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Tue, 22 Nov 2022 12:05:11 +0300 Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> Message-ID: <4944EA48-58F5-47AC-A838-06BA2A8134CE@gmail.com> Hello Fei I think I can reduce the amount of opcodes for second version, but I need a second temp register for that ( to AND two results of fclass and check it just once for NaN) then it would look like: is_double ? fclass_d(t0, src1) : fclass_s(t0, src1); is_double ? fclass_d(t1, src2) : fclass_s(t1, src2); and(t0, t0, t1); andi(t0, t0, 0b1100000000); //if any of src is quiet or signaling NaN then return their sum beqz(t0, Compare); is_double ? fadd_d(dst, src1, src2) : fadd_s(dst, src1, src2); j(Done); bind(Compare); Any Hints on how to get a second temp register ? Regards, Vladimir > 22 ????. 2022 ?., ? 11:28, Vladimir Kempik ???????(?): > > Hello > > Found an issue with fadd+fclass version: > > jdk/incubator/vector/FloatMaxVectorTests.java > > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure > java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > -- > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure > java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > -- > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success > test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure > java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > -- > test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure > java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > -- > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure > java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > -- > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success > test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure > java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] > at org.testng.Assert.fail(Assert.java:99) > > > And 2fclass version ( checking every src argument to be NaN) doesn?t have this issue. > So I think I?ll have to go v2 way. > > Regards, Vladimir. > >> 18 ????. 2022 ?., ? 12:49, Vladimir Kempik ???????(?): >> >> Hello >> >> Thanks for taking a look. >> I think v1 is better too. >> >> We have measured fadd latency with lmbench ( can?t remember which platform it was, hifive or thead) and it turned out to be just 4 cycles, that?s ok I think. >> >> I have also benched it on third platform - shallow OoO with dual-issue fpu, on fpga, and it showed gains similar to thead?s. >> >> I?ll run jtreg?s tiers and submit PR afterwards. >> >> Regards, Vladimir >> >>> 18 ????. 2022 ?., ? 11:06, yangfei at iscas.ac.cn ???????(?): >>> >>> Hi, >>> >>> >>> I went through both versions and looks like the resulting performance gain will depend on the micro-architecture implementations. >>> >>> Personally I prefer the first version in respect of instruction count (5 compared with 7 instructions when the inputs are not NaNs) and code readability. >>> >>> PS: I would suggest also carry out more conformance/compartibility test as needed for these changes. >>> >>> >>> >>> Thanks, >>> >>> Fei >>> >>> >>> -----Original Messages----- >>> From:"Vladimir Kempik" >>> Sent Time:2022-11-15 15:55:32 (Tuesday) >>> To: riscv-port-dev >>> Cc: >>> Subject: Pre-Review: improving Math.min/max on floats >>> >>> Hello >>> Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using >>> >>> void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) >>> >>> The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). >>> That requires additional logic to handle the case where only of of src is NaN. >>> Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) >>> >>> >>> fmax_s(dst, src1, src2); >>> // Checking NaNs >>> flt_s(zr, src1, src2); >>> >>> frflags(t0); >>> beqz(t0, Done); >>> >>> // In case of NaNs >>> fadd_s(dst, src1, src2); >>> >>> bind(Done); >>> >>> >>> here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) >>> >>> I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax >>> second thing: >>> >>> I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. >>> if result of sum is not NaN - do fmax and return result. >>> >>> second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. >>> >>> I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. >>> While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. >>> >>> MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these >>> private double dAddBench(double a, double b) { return Math.max(a, b) + Math.min(a, b); } private double dMulBench(double a, double b) { return Math.max(a, b) * Math.min(a, b); } >>> may get reduces to just a + b and a*b respectively. >>> >>> Looking for opinions, which way is better. >>> >>> The results, thead c910: >>> >>> before >>> >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op >>> MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op >>> MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op >>> MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op >>> MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op >>> MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op >>> MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op >>> >>> fadd+fclass >>> >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op >>> MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op >>> MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op >>> MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op >>> MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op >>> MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op >>> MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op >>> >>> >>> 2fclass >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op >>> MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op >>> MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op >>> MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op >>> MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op >>> MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op >>> MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op >>> >>> >>> hifive: >>> >>> before >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op >>> MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op >>> MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op >>> MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op >>> MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op >>> MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op >>> MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op >>> >>> fadd+fclass >>> >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op >>> MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op >>> MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op >>> MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op >>> MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op >>> MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op >>> MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op >>> >>> 2fclass >>> >>> Benchmark Mode Cnt Score Error Units >>> FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op >>> FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op >>> FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op >>> FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op >>> FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op >>> FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op >>> MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op >>> MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op >>> MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op >>> MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op >>> MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op >>> MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op >>> MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op >>> MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op >>> >>> Regards, Vladimir >>> >>> P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. >>> >>> [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 >>> [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangze.linux at gmail.com Wed Nov 23 00:56:57 2022 From: zhangze.linux at gmail.com (Ze Zhang) Date: Wed, 23 Nov 2022 08:56:57 +0800 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Message-ID: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Wed Nov 23 01:08:41 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Wed, 23 Nov 2022 01:08:41 +0000 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: Message-ID: This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. Sincerely, Zixian On 23/11/2022, 11:58, "riscv-port-dev" wrote: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Wed Nov 23 08:33:00 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Wed, 23 Nov 2022 11:33:00 +0300 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: Message-ID: Hello A kind of workaround for this case disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here Regards, Vladimir > 23 ????. 2022 ?., ? 04:08, Zixian Cai ???????(?): > > This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html > > I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. > > Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. > > Sincerely, > Zixian > > On 23/11/2022, 11:58, "riscv-port-dev" wrote: > > > hi, > openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > but I think openjdk as a application, it should not has any limitation on virtual address length, > > even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. > > when will it support the newest kernel and qemu? > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Wed Nov 23 08:49:22 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Wed, 23 Nov 2022 11:49:22 +0300 Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: <4944EA48-58F5-47AC-A838-06BA2A8134CE@gmail.com> References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> <4944EA48-58F5-47AC-A838-06BA2A8134CE@gmail.com> Message-ID: Hello Got a results for new [1] version it shows excelent perf improvements on thead and moderate on hifive ( and it?s better than both previous versions on hifive) thead c910 before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 53752.831 ? 97.198 ns/op FpMinMaxIntrinsics.dMin avgt 25 53707.229 ? 177.559 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 42805.985 ? 9.901 ns/op FpMinMaxIntrinsics.fMax avgt 25 53449.568 ? 215.294 ns/op FpMinMaxIntrinsics.fMin avgt 25 53504.106 ? 180.833 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 42794.579 ? 7.013 ns/op MaxMinOptimizeTest.dAdd avgt 25 381.138 ? 5.692 us/op MaxMinOptimizeTest.dMax avgt 25 4575.094 ? 17.065 us/op MaxMinOptimizeTest.dMin avgt 25 4584.648 ? 18.561 us/op MaxMinOptimizeTest.dMul avgt 25 384.615 ? 7.751 us/op MaxMinOptimizeTest.fAdd avgt 25 318.076 ? 3.308 us/op MaxMinOptimizeTest.fMax avgt 25 4405.724 ? 20.353 us/op MaxMinOptimizeTest.fMin avgt 25 4421.652 ? 18.029 us/op MaxMinOptimizeTest.fMul avgt 25 305.462 ? 19.437 us/op 2fclass_new Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 10712.246 ? 5.607 ns/op FpMinMaxIntrinsics.dMin avgt 25 10732.655 ? 41.894 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 3248.106 ? 2.143 ns/op FpMinMaxIntrinsics.fMax avgt 25 10707.084 ? 3.276 ns/op FpMinMaxIntrinsics.fMin avgt 25 10719.771 ? 14.864 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 3274.775 ? 0.996 ns/op MaxMinOptimizeTest.dAdd avgt 25 383.720 ? 8.849 us/op MaxMinOptimizeTest.dMax avgt 25 429.345 ? 11.160 us/op MaxMinOptimizeTest.dMin avgt 25 439.980 ? 3.757 us/op MaxMinOptimizeTest.dMul avgt 25 390.126 ? 10.258 us/op MaxMinOptimizeTest.fAdd avgt 25 300.005 ? 18.206 us/op MaxMinOptimizeTest.fMax avgt 25 370.467 ? 6.054 us/op MaxMinOptimizeTest.fMin avgt 25 375.134 ? 4.568 us/op MaxMinOptimizeTest.fMul avgt 25 305.344 ? 18.307 us/op hifive before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 30234.224 ? 16.744 ns/op FpMinMaxIntrinsics.dMin avgt 25 30227.686 ? 15.389 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 15766.749 ? 3.724 ns/op FpMinMaxIntrinsics.fMax avgt 25 30140.092 ? 10.243 ns/op FpMinMaxIntrinsics.fMin avgt 25 30149.470 ? 34.041 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 15760.770 ? 5.415 ns/op MaxMinOptimizeTest.dAdd avgt 25 1155.234 ? 4.603 us/op MaxMinOptimizeTest.dMax avgt 25 2597.897 ? 3.307 us/op MaxMinOptimizeTest.dMin avgt 25 2599.183 ? 3.806 us/op MaxMinOptimizeTest.dMul avgt 25 1155.281 ? 1.813 us/op MaxMinOptimizeTest.fAdd avgt 25 750.967 ? 7.254 us/op MaxMinOptimizeTest.fMax avgt 25 2305.085 ? 1.556 us/op MaxMinOptimizeTest.fMin avgt 25 2305.306 ? 1.478 us/op MaxMinOptimizeTest.fMul avgt 25 750.623 ? 7.357 us/op 2fclass_new Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 23599.547 ? 29.571 ns/op FpMinMaxIntrinsics.dMin avgt 25 23593.236 ? 18.456 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 8630.201 ? 1.353 ns/op FpMinMaxIntrinsics.fMax avgt 25 23496.337 ? 18.340 ns/op FpMinMaxIntrinsics.fMin avgt 25 23477.881 ? 8.545 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 8629.135 ? 0.869 ns/op MaxMinOptimizeTest.dAdd avgt 25 1155.479 ? 4.938 us/op MaxMinOptimizeTest.dMax avgt 25 1560.323 ? 3.077 us/op MaxMinOptimizeTest.dMin avgt 25 1558.668 ? 2.421 us/op MaxMinOptimizeTest.dMul avgt 25 1154.919 ? 2.077 us/op MaxMinOptimizeTest.fAdd avgt 25 751.325 ? 7.169 us/op MaxMinOptimizeTest.fMax avgt 25 1306.131 ? 1.102 us/op MaxMinOptimizeTest.fMin avgt 25 1306.134 ? 0.957 us/op MaxMinOptimizeTest.fMul avgt 25 750.968 ? 7.334 us/op Regards, Vladimir [1] https://github.com/VladimirKempik/jdk/commit/fda44a8521f19b25d0fe155531d4bd1e3d7870a5 > 22 ????. 2022 ?., ? 12:05, Vladimir Kempik ???????(?): > > Hello Fei > > I think I can reduce the amount of opcodes for second version, but I need a second temp register for that ( to AND two results of fclass and check it just once for NaN) > then it would look like: > > is_double ? fclass_d(t0, src1) > : fclass_s(t0, src1); > is_double ? fclass_d(t1, src2) > : fclass_s(t1, src2); > and(t0, t0, t1); > andi(t0, t0, 0b1100000000); //if any of src is quiet or signaling NaN then return their sum > beqz(t0, Compare); > is_double ? fadd_d(dst, src1, src2) > : fadd_s(dst, src1, src2); > j(Done); > > bind(Compare); > > Any Hints on how to get a second temp register ? > > Regards, Vladimir > >> 22 ????. 2022 ?., ? 11:28, Vladimir Kempik ???????(?): >> >> Hello >> >> Found an issue with fadd+fclass version: >> >> jdk/incubator/vector/FloatMaxVectorTests.java >> >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure >> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure >> java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure >> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> >> >> And 2fclass version ( checking every src argument to be NaN) doesn?t have this issue. >> So I think I?ll have to go v2 way. >> >> Regards, Vladimir. >> >>> 18 ????. 2022 ?., ? 12:49, Vladimir Kempik ???????(?): >>> >>> Hello >>> >>> Thanks for taking a look. >>> I think v1 is better too. >>> >>> We have measured fadd latency with lmbench ( can?t remember which platform it was, hifive or thead) and it turned out to be just 4 cycles, that?s ok I think. >>> >>> I have also benched it on third platform - shallow OoO with dual-issue fpu, on fpga, and it showed gains similar to thead?s. >>> >>> I?ll run jtreg?s tiers and submit PR afterwards. >>> >>> Regards, Vladimir >>> >>>> 18 ????. 2022 ?., ? 11:06, yangfei at iscas.ac.cn ???????(?): >>>> >>>> Hi, >>>> >>>> >>>> I went through both versions and looks like the resulting performance gain will depend on the micro-architecture implementations. >>>> >>>> Personally I prefer the first version in respect of instruction count (5 compared with 7 instructions when the inputs are not NaNs) and code readability. >>>> >>>> PS: I would suggest also carry out more conformance/compartibility test as needed for these changes. >>>> >>>> >>>> >>>> Thanks, >>>> >>>> Fei >>>> >>>> >>>> -----Original Messages----- >>>> From:"Vladimir Kempik" >>>> Sent Time:2022-11-15 15:55:32 (Tuesday) >>>> To: riscv-port-dev >>>> Cc: >>>> Subject: Pre-Review: improving Math.min/max on floats >>>> >>>> Hello >>>> Currently, in C2, Math.min/max is implemented in c2_MacroAssembler_riscv.cpp using >>>> >>>> void C2_MacroAssembler::minmax_FD(FloatRegister dst, FloatRegister src1, FloatRegister src2, bool is_double, bool is_min) >>>> >>>> The main issue there is Min/Max is required to return NaN if any of its arguments is NaN. In risc-v, fmin/fmax returns NaN only if both of src registers is NaN ( quiet NaN). >>>> That requires additional logic to handle the case where only of of src is NaN. >>>> Currently it?s done this way ( i?ve reduced is_double and is_min case for readability) >>>> >>>> >>>> fmax_s(dst, src1, src2); >>>> // Checking NaNs >>>> flt_s(zr, src1, src2); >>>> >>>> frflags(t0); >>>> beqz(t0, Done); >>>> >>>> // In case of NaNs >>>> fadd_s(dst, src1, src2); >>>> >>>> bind(Done); >>>> >>>> >>>> here we always do two float comparisons ( one in fmax, one in flt), perf shows they are taking equal time ( checking on thead c910) >>>> >>>> I think that?s suboptimal and can be improved: first, move the check before fmin/fmax and if check fails return NaN without doing fmax >>>> second thing: >>>> >>>> I have prepared two version, first one [1] sums src1 and src2, if result is NaN - return it, result is checked with fclass, checking for quiet NaN and signaling NaN. >>>> if result of sum is not NaN - do fmax and return result. >>>> >>>> second version [2] checks both src1 and src2 for being NaN with fclass, without doing any FP arithmetics. if any of them is NaN - return NaN, otherwise do the fmax. >>>> >>>> I have built both versions and compared results to unpatched JDK on hifive unmatched and thead c910. >>>> While on hifive the perf win is moderate ( ~10%), on thead I?m getting up to 10x better results sometimes. >>>> >>>> MicroBenches fAdd/fMul/dAdd/dMul doesn?t show any difference, I think that happens because these >>>> private double dAddBench(double a, double b) { return Math.max(a, b) + Math.min(a, b); } private double dMulBench(double a, double b) { return Math.max(a, b) * Math.min(a, b); } >>>> may get reduces to just a + b and a*b respectively. >>>> >>>> Looking for opinions, which way is better. >>>> >>>> The results, thead c910: >>>> >>>> before >>>> >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 54023.827 ? 268.645 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 54309.850 ? 323.551 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 42192.140 ? 12.114 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 53797.657 ? 15.816 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 54135.710 ? 313.185 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 42196.156 ? 13.424 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 650.810 ? 169.998 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 4561.967 ? 40.367 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 4589.100 ? 75.854 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 759.821 ? 240.092 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 300.137 ? 13.495 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 4348.885 ? 20.061 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 4372.799 ? 27.296 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 304.024 ? 12.120 us/op >>>> >>>> fadd+fclass >>>> >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 10545.196 ? 140.137 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 10454.525 ? 9.972 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 3104.703 ? 0.892 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 10449.709 ? 7.284 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 10445.261 ? 7.206 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 3104.769 ? 0.951 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 487.769 ? 170.711 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 929.394 ? 158.697 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 864.230 ? 284.794 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 894.116 ? 342.550 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 284.664 ? 1.446 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 384.388 ? 15.004 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 371.952 ? 15.295 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 305.226 ? 12.467 us/op >>>> >>>> >>>> 2fclass >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 11415.817 ? 403.757 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 11835.521 ? 329.380 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 5188.436 ? 3.723 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 11667.456 ? 426.731 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 11646.682 ? 416.883 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 5190.395 ? 3.628 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 745.417 ? 209.376 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 581.580 ? 38.046 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 533.442 ? 41.184 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 654.667 ? 267.537 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 294.606 ? 11.712 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 433.842 ? 3.935 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 434.727 ? 1.894 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 305.385 ? 12.980 us/op >>>> >>>> >>>> hifive: >>>> >>>> before >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 30219.666 ? 12.878 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 30242.249 ? 31.374 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 15394.622 ? 2.803 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 30150.114 ? 22.421 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 30149.752 ? 20.813 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 15396.402 ? 4.251 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 1143.582 ? 4.444 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 2556.317 ? 3.795 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 2556.569 ? 2.274 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 1142.769 ? 1.593 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 748.688 ? 7.342 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 2280.381 ? 1.535 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 2280.760 ? 1.532 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 748.991 ? 7.261 us/op >>>> >>>> fadd+fclass >>>> >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 27723.791 ? 22.784 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 27760.799 ? 45.411 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 12875.949 ? 2.829 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 25992.753 ? 23.788 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 25994.554 ? 32.060 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 11200.737 ? 2.169 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 1144.128 ? 4.371 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 1968.145 ? 2.346 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 1970.249 ? 4.712 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 1143.356 ? 2.203 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 748.634 ? 7.229 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 1523.719 ? 0.570 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 1524.534 ? 1.109 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 748.643 ? 7.291 us/op >>>> >>>> 2fclass >>>> >>>> Benchmark Mode Cnt Score Error Units >>>> FpMinMaxIntrinsics.dMax avgt 25 26890.963 ? 13.928 ns/op >>>> FpMinMaxIntrinsics.dMin avgt 25 26919.595 ? 23.140 ns/op >>>> FpMinMaxIntrinsics.dMinReduce avgt 25 11928.938 ? 1.985 ns/op >>>> FpMinMaxIntrinsics.fMax avgt 25 26843.782 ? 27.956 ns/op >>>> FpMinMaxIntrinsics.fMin avgt 25 26825.124 ? 24.104 ns/op >>>> FpMinMaxIntrinsics.fMinReduce avgt 25 11927.765 ? 1.238 ns/op >>>> MaxMinOptimizeTest.dAdd avgt 25 1144.860 ? 3.467 us/op >>>> MaxMinOptimizeTest.dMax avgt 25 1881.809 ? 1.986 us/op >>>> MaxMinOptimizeTest.dMin avgt 25 1882.623 ? 2.225 us/op >>>> MaxMinOptimizeTest.dMul avgt 25 1142.860 ? 1.755 us/op >>>> MaxMinOptimizeTest.fAdd avgt 25 752.557 ? 8.708 us/op >>>> MaxMinOptimizeTest.fMax avgt 25 1587.139 ? 0.903 us/op >>>> MaxMinOptimizeTest.fMin avgt 25 1587.140 ? 1.067 us/op >>>> MaxMinOptimizeTest.fMul avgt 25 748.653 ? 7.278 us/op >>>> >>>> Regards, Vladimir >>>> >>>> P.S. for some reason I can?t use mv opcode on two FloatRegisters ( I think it was possible before) and had to use fmv_s/fmv_d which might be not exactly what I want. >>>> >>>> [1] https://github.com/VladimirKempik/jdk/commit/b6752492f7efd82e248e49e136dc9f5929cc19a2 >>>> [2] https://github.com/VladimirKempik/jdk/commit/384efc3ca59c2e301ec43f8d716f142828d2ac6a >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Wed Nov 23 09:10:37 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Wed, 23 Nov 2022 12:10:37 +0300 Subject: Pre-Review: improving Math.min/max on floats In-Reply-To: <4944EA48-58F5-47AC-A838-06BA2A8134CE@gmail.com> References: <8C202DCD-5E93-4C8D-B2F8-35E98207BB0B@gmail.com> <1a63a094.3bff7.18489c611a9.Coremail.yangfei@iscas.ac.cn> <4944EA48-58F5-47AC-A838-06BA2A8134CE@gmail.com> Message-ID: <45C8AAF1-8BFD-4CEF-8A75-8AA6C3169B1B@gmail.com> Hello Got a results for new [1] version it shows excelent perf improvements on thead and moderate on hifive ( and it?s better than both previous versions on hifive) thead c910 before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 53752.831 ? 97.198 ns/op FpMinMaxIntrinsics.dMin avgt 25 53707.229 ? 177.559 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 42805.985 ? 9.901 ns/op FpMinMaxIntrinsics.fMax avgt 25 53449.568 ? 215.294 ns/op FpMinMaxIntrinsics.fMin avgt 25 53504.106 ? 180.833 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 42794.579 ? 7.013 ns/op MaxMinOptimizeTest.dAdd avgt 25 381.138 ? 5.692 us/op MaxMinOptimizeTest.dMax avgt 25 4575.094 ? 17.065 us/op MaxMinOptimizeTest.dMin avgt 25 4584.648 ? 18.561 us/op MaxMinOptimizeTest.dMul avgt 25 384.615 ? 7.751 us/op MaxMinOptimizeTest.fAdd avgt 25 318.076 ? 3.308 us/op MaxMinOptimizeTest.fMax avgt 25 4405.724 ? 20.353 us/op MaxMinOptimizeTest.fMin avgt 25 4421.652 ? 18.029 us/op MaxMinOptimizeTest.fMul avgt 25 305.462 ? 19.437 us/op 2fclass_new Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 10712.246 ? 5.607 ns/op FpMinMaxIntrinsics.dMin avgt 25 10732.655 ? 41.894 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 3248.106 ? 2.143 ns/op FpMinMaxIntrinsics.fMax avgt 25 10707.084 ? 3.276 ns/op FpMinMaxIntrinsics.fMin avgt 25 10719.771 ? 14.864 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 3274.775 ? 0.996 ns/op MaxMinOptimizeTest.dAdd avgt 25 383.720 ? 8.849 us/op MaxMinOptimizeTest.dMax avgt 25 429.345 ? 11.160 us/op MaxMinOptimizeTest.dMin avgt 25 439.980 ? 3.757 us/op MaxMinOptimizeTest.dMul avgt 25 390.126 ? 10.258 us/op MaxMinOptimizeTest.fAdd avgt 25 300.005 ? 18.206 us/op MaxMinOptimizeTest.fMax avgt 25 370.467 ? 6.054 us/op MaxMinOptimizeTest.fMin avgt 25 375.134 ? 4.568 us/op MaxMinOptimizeTest.fMul avgt 25 305.344 ? 18.307 us/op hifive before Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 30234.224 ? 16.744 ns/op FpMinMaxIntrinsics.dMin avgt 25 30227.686 ? 15.389 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 15766.749 ? 3.724 ns/op FpMinMaxIntrinsics.fMax avgt 25 30140.092 ? 10.243 ns/op FpMinMaxIntrinsics.fMin avgt 25 30149.470 ? 34.041 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 15760.770 ? 5.415 ns/op MaxMinOptimizeTest.dAdd avgt 25 1155.234 ? 4.603 us/op MaxMinOptimizeTest.dMax avgt 25 2597.897 ? 3.307 us/op MaxMinOptimizeTest.dMin avgt 25 2599.183 ? 3.806 us/op MaxMinOptimizeTest.dMul avgt 25 1155.281 ? 1.813 us/op MaxMinOptimizeTest.fAdd avgt 25 750.967 ? 7.254 us/op MaxMinOptimizeTest.fMax avgt 25 2305.085 ? 1.556 us/op MaxMinOptimizeTest.fMin avgt 25 2305.306 ? 1.478 us/op MaxMinOptimizeTest.fMul avgt 25 750.623 ? 7.357 us/op 2fclass_new Benchmark Mode Cnt Score Error Units FpMinMaxIntrinsics.dMax avgt 25 23599.547 ? 29.571 ns/op FpMinMaxIntrinsics.dMin avgt 25 23593.236 ? 18.456 ns/op FpMinMaxIntrinsics.dMinReduce avgt 25 8630.201 ? 1.353 ns/op FpMinMaxIntrinsics.fMax avgt 25 23496.337 ? 18.340 ns/op FpMinMaxIntrinsics.fMin avgt 25 23477.881 ? 8.545 ns/op FpMinMaxIntrinsics.fMinReduce avgt 25 8629.135 ? 0.869 ns/op MaxMinOptimizeTest.dAdd avgt 25 1155.479 ? 4.938 us/op MaxMinOptimizeTest.dMax avgt 25 1560.323 ? 3.077 us/op MaxMinOptimizeTest.dMin avgt 25 1558.668 ? 2.421 us/op MaxMinOptimizeTest.dMul avgt 25 1154.919 ? 2.077 us/op MaxMinOptimizeTest.fAdd avgt 25 751.325 ? 7.169 us/op MaxMinOptimizeTest.fMax avgt 25 1306.131 ? 1.102 us/op MaxMinOptimizeTest.fMin avgt 25 1306.134 ? 0.957 us/op MaxMinOptimizeTest.fMul avgt 25 750.968 ? 7.334 us/op Regards, Vladimir [1] https://github.com/VladimirKempik/jdk/commit/fda44a8521f19b25d0fe155531d4bd1e3d7870a5 > 22 ????. 2022 ?., ? 12:05, Vladimir Kempik ???????(?): > > Hello Fei > > I think I can reduce the amount of opcodes for second version, but I need a second temp register for that ( to AND two results of fclass and check it just once for NaN) > then it would look like: > > is_double ? fclass_d(t0, src1) > : fclass_s(t0, src1); > is_double ? fclass_d(t1, src2) > : fclass_s(t1, src2); > and(t0, t0, t1); > andi(t0, t0, 0b1100000000); //if any of src is quiet or signaling NaN then return their sum > beqz(t0, Compare); > is_double ? fadd_d(dst, src1, src2) > : fadd_s(dst, src1, src2); > j(Done); > > bind(Compare); > > Any Hints on how to get a second temp register ? > > Regards, Vladimir > >> 22 ????. 2022 ?., ? 11:28, Vladimir Kempik ???????(?): >> >> Hello >> >> Found an issue with fadd+fclass version: >> >> jdk/incubator/vector/FloatMaxVectorTests.java >> >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i * 5]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[i + 1]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure >> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure >> java.lang.AssertionError: at index #10 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success >> test FloatMaxVectorTests.MAXReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure >> java.lang.AssertionError: at index #2 expected [Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i * 5]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[i + 1]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTests(float[cornerCaseValue(i)]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[i % 2]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[i % 2]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[i % 2]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> -- >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i * 5], mask[true]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[i + 1], mask[true]): success >> test FloatMaxVectorTests.MINReduceFloatMaxVectorTestsMasked(float[cornerCaseValue(i)], mask[true]): failure >> java.lang.AssertionError: at index #2 expected [-Infinity] but found [NaN] >> at org.testng.Assert.fail(Assert.java:99) >> From ludovic at rivosinc.com Wed Nov 23 23:25:19 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Thu, 24 Nov 2022 00:25:19 +0100 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: Message-ID: Hi, We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. Thanks, Ludovic On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: > Hello > > A kind of workaround for this case > > disable sv57 csr and rebuild qemu - > https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 > here > > Regards, Vladimir > > 23 ????. 2022 ?., ? 04:08, Zixian Cai ???????(?): > > This has been discussed in a previous thread. > https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html > > I agree that it would be nice to support different modes. Although with > the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 > only, and the fact that there are not enough real hardware board supporting > above sv48, I don?t know whether there will be sufficient motivation to fix > the problem in the short term. > > Also worth noting until very recently (Intel implemented 5-level paging > around the release of Ice Lake), x86_64 has been staying 48 bits a for long > time. > > Sincerely, > Zixian > > > On 23/11/2022, 11:58, "riscv-port-dev" > wrote: > > hi, > openjdk 20 crash on linux kernel 5.19, because it can not support huge > VM? > > but I think openjdk as a application, it should not has any limitation on > virtual address length, > > even if it has very close relationship with hardware, just as qemu should > not depend on hardware VM size, it's none of business for an application, > because all other apps can run very well. > > when will it support the newest kernel and qemu? > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Thu Nov 24 10:16:02 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Thu, 24 Nov 2022 18:16:02 +0800 (GMT+08:00) Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: Message-ID: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. That would be better and more elegant than simply crashing the JVM afterwards. I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: $ cat /proc/cpuinfo processor : 0 hart : 1 isa : rv64imafdc mmu : sv39 uarch : sifive,u74-mc Regards, Fei -----Original Messages----- From:"Ludovic Henry" Sent Time:2022-11-24 07:25:19 (Thursday) To: "Vladimir Kempik" Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. Thanks, Ludovic On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: Hello A kind of workaround for this case disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here Regards, Vladimir 23 ????. 2022 ?., ? 04:08, Zixian Cai ???????(?): This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. Sincerely, Zixian On 23/11/2022, 11:58, "riscv-port-dev" wrote: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Thu Nov 24 10:21:19 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Thu, 24 Nov 2022 13:21:19 +0300 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> Message-ID: <90FA447E-F7F3-4513-8467-38B792F04BC3@gmail.com> Hello Sounds good, I have checked all three different risc-v hw I have, all have mmu: sv39 line in cpuinfo file Regards, Vladimir > 24 ????. 2022 ?., ? 13:16, yangfei at iscas.ac.cn ???????(?): > > I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. > That would be better and more elegant than simply crashing the JVM afterwards. > > I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: > > > > $ cat /proc/cpuinfo > processor : 0 > hart : 1 > isa : rv64imafdc > mmu : sv39 > uarch : sifive,u74-mc > > Regards, > > Fei > > -----Original Messages----- > From:"Ludovic Henry" > Sent Time:2022-11-24 07:25:19 (Thursday) > To: "Vladimir Kempik" > Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > Hi, > > We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. > > The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. > > Thanks, > Ludovic > > On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik > wrote: >> Hello >> >> A kind of workaround for this case >> >> disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here >> >> Regards, Vladimir >> >>> 23 ????. 2022 ?., ? 04:08, Zixian Cai > ???????(?): >>> >>> This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html >>> >>> I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. >>> >>> Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. >>> >>> Sincerely, >>> Zixian >>> >>> On 23/11/2022, 11:58, "riscv-port-dev" > wrote: >>> >>> >>> hi, >>> openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? >>> >>> but I think openjdk as a application, it should not has any limitation on virtual address length, >>> >>> even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. >>> >>> when will it support the newest kernel and qemu? >>> >>> https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangfeilong at huawei.com Thu Nov 24 13:16:17 2022 From: jiangfeilong at huawei.com (jiangfeilong) Date: Thu, 24 Nov 2022 13:16:17 +0000 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> Message-ID: <41985ab2042f4cceb15554158bb8661d@huawei.com> Hi, I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and hardware (HiFive Unmatched) by reading /proc/cpuinfo. Turns out we can only get mmu info on QEMU-SYSTEM and hardware. QEMU-USER will return empty string when reading /proc/cpuinfo. Any ideas about that? Here are the outputs: QEMU-USER: $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ release/images/jdk/bin/java -version stap mode: openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) QEMU-SYSTEM [root at fedora-riscv ~]# jdk-satp/bin/java -version vm_mode: sv48 stap mode: sv48 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) HiFive Unmatched: $ jdk-satp/bin/java -version vm_mode: sv39 uarch: sifive,u74-mc stap mode: sv39 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) From: riscv-port-dev On Behalf Of yangfei at iscas.ac.cn Sent: Thursday, November 24, 2022 6:16 PM To: Ludovic Henry Cc: Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. That would be better and more elegant than simply crashing the JVM afterwards. I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: $ cat /proc/cpuinfo processor : 0 hart : 1 isa : rv64imafdc mmu : sv39 uarch : sifive,u74-mc Regards, Fei -----Original Messages----- From:"Ludovic Henry" > Sent Time:2022-11-24 07:25:19 (Thursday) To: "Vladimir Kempik" > Cc: "Zixian Cai" >, "riscv-port-dev at openjdk.org" >, "Ze Zhang" > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. Thanks, Ludovic On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik > wrote: Hello A kind of workaround for this case disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here Regards, Vladimir 23 ????. 2022 ?., ? 04:08, Zixian Cai > ???????(?): This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. Sincerely, Zixian On 23/11/2022, 11:58, "riscv-port-dev" > wrote: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Thu Nov 24 13:43:59 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Thu, 24 Nov 2022 21:43:59 +0800 (GMT+08:00) Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <41985ab2042f4cceb15554158bb8661d@huawei.com> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> Message-ID: <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> Hi, I guess qemu-user doesn't provide /proc/cpuinfo for us to use like qemu-system. But it's still nice if we can enable that detection in qemu-system mode or on real hardware platforms. Thanks, Fei -----Original Messages----- From:jiangfeilong Sent Time:2022-11-24 21:16:17 (Thursday) To: "yangfei at iscas.ac.cn" , "Ludovic Henry" Cc: "Vladimir Kempik" , "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and hardware (HiFive Unmatched) by reading /proc/cpuinfo. Turns out we can only get mmu info on QEMU-SYSTEM and hardware. QEMU-USER will return empty string when reading /proc/cpuinfo. Any ideas about that? Here are the outputs: QEMU-USER: $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ release/images/jdk/bin/java -version stap mode: openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) QEMU-SYSTEM [root at fedora-riscv ~]# jdk-satp/bin/java -version vm_mode: sv48 stap mode: sv48 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) HiFive Unmatched: $ jdk-satp/bin/java -version vm_mode: sv39 uarch: sifive,u74-mc stap mode: sv39 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) From:riscv-port-dev On Behalf Of yangfei at iscas.ac.cn Sent: Thursday, November 24, 2022 6:16 PM To: Ludovic Henry Cc: Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. That would be better and more elegant than simply crashing the JVM afterwards. I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: $ cat /proc/cpuinfo processor : 0 hart : 1 isa : rv64imafdc mmu : sv39 uarch : sifive,u74-mc Regards, Fei -----Original Messages----- From:"Ludovic Henry" Sent Time:2022-11-24 07:25:19 (Thursday) To: "Vladimir Kempik" Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. Thanks, Ludovic On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: Hello A kind of workaround for this case disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here Regards, Vladimir 23 ????. 2022 ?., ?04:08, Zixian Cai ???????(?): This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. Sincerely, Zixian On 23/11/2022, 11:58, "riscv-port-dev" wrote: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs From vladimir.kempik at gmail.com Thu Nov 24 14:12:11 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Thu, 24 Nov 2022 17:12:11 +0300 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> Message-ID: Hello Do we really have that issue (with mmu mode) in qemu-user-mode? I was running qemu7 in user mode with openjdk without any issues. Regards, Vladimir. > 24 ????. 2022 ?., ? 16:43, yangfei at iscas.ac.cn ???????(?): > > > Hi, > > I guess qemu-user doesn't provide /proc/cpuinfo for us to use like qemu-system. > But it's still nice if we can enable that detection in qemu-system mode or on real hardware platforms. > > Thanks, > Fei > > -----Original Messages----- > From:jiangfeilong > Sent Time:2022-11-24 21:16:17 (Thursday) > To: "yangfei at iscas.ac.cn" , "Ludovic Henry" > Cc: "Vladimir Kempik" , "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > Hi, > I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and > hardware (HiFive Unmatched) by reading /proc/cpuinfo. > Turns out we can only get mmu info on QEMU-SYSTEM and hardware. > QEMU-USER will return empty string when reading /proc/cpuinfo. > > Any ideas about that? > > > Here are the outputs: > QEMU-USER: > $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ release/images/jdk/bin/java -version > stap mode: > openjdk version "20" 2022-11-24 > OpenJDK Runtime Environment OpenJDK (build 20) > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > QEMU-SYSTEM > [root at fedora-riscv ~]# jdk-satp/bin/java -version > vm_mode: sv48 > stap mode: sv48 > openjdk version "20" 2022-11-24 > OpenJDK Runtime Environment OpenJDK (build 20) > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > HiFive Unmatched: > $ jdk-satp/bin/java -version > vm_mode: sv39 > uarch: sifive,u74-mc > stap mode: sv39 > openjdk version "20" 2022-11-24 > OpenJDK Runtime Environment OpenJDK (build 20) > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > From:riscv-port-dev On Behalf Of yangfei at iscas.ac.cn > Sent: Thursday, November 24, 2022 6:16 PM > To: Ludovic Henry > Cc: Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang > Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. > That would be better and more elegant than simply crashing the JVM afterwards. > I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: > > $ cat /proc/cpuinfo > processor : 0 > hart : 1 > isa : rv64imafdc > mmu : sv39 > uarch : sifive,u74-mc > Regards, > Fei > -----Original Messages----- > From:"Ludovic Henry" > Sent Time:2022-11-24 07:25:19 (Thursday) > To: "Vladimir Kempik" > Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > Hi, > > > We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. > > > > The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. > > > > Thanks, > > Ludovic > > > > On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: > > Hello > > > A kind of workaround for this case > > > > disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here > > > > Regards, Vladimir > > > 23 ????. 2022 ?., ?04:08, Zixian Cai ???????(?): > > > This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html > > > > I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. > > > > Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. > > > > Sincerely, > > > Zixian > > > > On 23/11/2022, 11:58, "riscv-port-dev" wrote: > > > > hi, > > > openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > > > > > but I think openjdk as a application, it should not has any limitation on virtual address length, > > > > > > even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. > > > > > > when will it support the newest kernel and qemu? > > > > > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs > > > From yangfei at iscas.ac.cn Thu Nov 24 14:32:59 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Thu, 24 Nov 2022 22:32:59 +0800 (GMT+08:00) Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> Message-ID: <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Hi, That's a good question. Maybe someone familiar with qemu could help answer? I am not sure about how qemu-user mode works and whether it will be bound to certain kernel versions. I am told by the people who raised this discussion that they are actually using qemu-system mode. Thanks, Fei > -----Original Messages----- > From: "Vladimir Kempik" > Sent Time: 2022-11-24 22:12:11 (Thursday) > To: yangfei at iscas.ac.cn > Cc: jiangfeilong , "Ludovic Henry" , "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > Hello > > Do we really have that issue (with mmu mode) in qemu-user-mode? > > I was running qemu7 in user mode with openjdk without any issues. > > Regards, Vladimir. > > > 24 ????. 2022 ?., ? 16:43, yangfei at iscas.ac.cn ???????(?): > > > > > > Hi, > > > > I guess qemu-user doesn't provide /proc/cpuinfo for us to use like qemu-system. > > But it's still nice if we can enable that detection in qemu-system mode or on real hardware platforms. > > > > Thanks, > > Fei > > > > -----Original Messages----- > > From:jiangfeilong > > Sent Time:2022-11-24 21:16:17 (Thursday) > > To: "yangfei at iscas.ac.cn" , "Ludovic Henry" > > Cc: "Vladimir Kempik" , "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > > Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > > > Hi, > > I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and > > hardware (HiFive Unmatched) by reading /proc/cpuinfo. > > Turns out we can only get mmu info on QEMU-SYSTEM and hardware. > > QEMU-USER will return empty string when reading /proc/cpuinfo. > > > > Any ideas about that? > > > > > > Here are the outputs: > > QEMU-USER: > > $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ release/images/jdk/bin/java -version > > stap mode: > > openjdk version "20" 2022-11-24 > > OpenJDK Runtime Environment OpenJDK (build 20) > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > QEMU-SYSTEM > > [root at fedora-riscv ~]# jdk-satp/bin/java -version > > vm_mode: sv48 > > stap mode: sv48 > > openjdk version "20" 2022-11-24 > > OpenJDK Runtime Environment OpenJDK (build 20) > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > HiFive Unmatched: > > $ jdk-satp/bin/java -version > > vm_mode: sv39 > > uarch: sifive,u74-mc > > stap mode: sv39 > > openjdk version "20" 2022-11-24 > > OpenJDK Runtime Environment OpenJDK (build 20) > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > > > From:riscv-port-dev On Behalf Of yangfei at iscas.ac.cn > > Sent: Thursday, November 24, 2022 6:16 PM > > To: Ludovic Henry > > Cc: Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang > > Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > > > I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. > > That would be better and more elegant than simply crashing the JVM afterwards. > > I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: > > > > $ cat /proc/cpuinfo > > processor : 0 > > hart : 1 > > isa : rv64imafdc > > mmu : sv39 > > uarch : sifive,u74-mc > > Regards, > > Fei > > -----Original Messages----- > > From:"Ludovic Henry" > > Sent Time:2022-11-24 07:25:19 (Thursday) > > To: "Vladimir Kempik" > > Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" > > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > Hi, > > > > > > We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. > > > > > > > > The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. > > > > > > > > Thanks, > > > > Ludovic > > > > > > > > On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: > > > > Hello > > > > > > A kind of workaround for this case > > > > > > > > disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here > > > > > > > > Regards, Vladimir > > > > > > 23 ????. 2022 ?., ?04:08, Zixian Cai ???????(?): > > > > > > This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html > > > > > > > > I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. > > > > > > > > Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. > > > > > > > > Sincerely, > > > > > > Zixian > > > > > > > > On 23/11/2022, 11:58, "riscv-port-dev" wrote: > > > > > > > > hi, > > > > > > openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? > > > > > > > > > > > > but I think openjdk as a application, it should not has any limitation on virtual address length, > > > > > > > > > > > > even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. > > > > > > > > > > > > when will it support the newest kernel and qemu? > > > > > > > > > > > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs > > > > > > From ludovic at rivosinc.com Thu Nov 24 18:01:26 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Thu, 24 Nov 2022 19:01:26 +0100 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: Hi. You wouldn't have that issue in qemu user-mode except if the underlying architecture supports sv57+. Given x86 is sv48 and I am guessing most of us are using an x86 machine to run qemu on, it doesn't have that issue. On Thu, Nov 24, 2022 at 3:33 PM wrote: > Hi, > > That's a good question. Maybe someone familiar with qemu could help > answer? > I am not sure about how qemu-user mode works and whether it will be > bound to certain kernel versions. > I am told by the people who raised this discussion that they are > actually using qemu-system mode. > > Thanks, > Fei > > > -----Original Messages----- > > From: "Vladimir Kempik" > > Sent Time: 2022-11-24 22:12:11 (Thursday) > > To: yangfei at iscas.ac.cn > > Cc: jiangfeilong , "Ludovic Henry" < > ludovic at rivosinc.com>, "Zixian Cai" , " > riscv-port-dev at openjdk.org" , "Ze Zhang" < > zhangze.linux at gmail.com> > > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can > not support huge VM? > > > > Hello > > > > Do we really have that issue (with mmu mode) in qemu-user-mode? > > > > I was running qemu7 in user mode with openjdk without any issues. > > > > Regards, Vladimir. > > > > > 24 ????. 2022 ?., ? 16:43, yangfei at iscas.ac.cn ???????(?): > > > > > > > > > Hi, > > > > > > I guess qemu-user doesn't provide /proc/cpuinfo for us to use > like qemu-system. > > > But it's still nice if we can enable that detection in > qemu-system mode or on real hardware platforms. > > > > > > Thanks, > > > Fei > > > > > > -----Original Messages----- > > > From:jiangfeilong > > > Sent Time:2022-11-24 21:16:17 (Thursday) > > > To: "yangfei at iscas.ac.cn" , "Ludovic > Henry" > > > Cc: "Vladimir Kempik" , "Zixian Cai" > , "riscv-port-dev at openjdk.org" < > riscv-port-dev at openjdk.org>, "Ze Zhang" > > > Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because > it can not support huge VM? > > > > > > Hi, > > > I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and > > > hardware (HiFive Unmatched) by reading /proc/cpuinfo. > > > Turns out we can only get mmu info on QEMU-SYSTEM and > hardware. > > > QEMU-USER will return empty string when reading > /proc/cpuinfo. > > > > > > Any ideas about that? > > > > > > > > > Here are the outputs: > > > QEMU-USER: > > > $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ > release/images/jdk/bin/java -version > > > stap mode: > > > openjdk version "20" 2022-11-24 > > > OpenJDK Runtime Environment OpenJDK (build 20) > > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > > > QEMU-SYSTEM > > > [root at fedora-riscv ~]# jdk-satp/bin/java -version > > > vm_mode: sv48 > > > stap mode: sv48 > > > openjdk version "20" 2022-11-24 > > > OpenJDK Runtime Environment OpenJDK (build 20) > > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > > > HiFive Unmatched: > > > $ jdk-satp/bin/java -version > > > vm_mode: sv39 > > > uarch: sifive,u74-mc > > > stap mode: sv39 > > > openjdk version "20" 2022-11-24 > > > OpenJDK Runtime Environment OpenJDK (build 20) > > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) > > > > > > > > > From:riscv-port-dev On > Behalf Of yangfei at iscas.ac.cn > > > Sent: Thursday, November 24, 2022 6:16 PM > > > To: Ludovic Henry > > > Cc: Vladimir Kempik ; Zixian Cai < > zixian.cai at anu.edu.au>; riscv-port-dev at openjdk.org; Ze Zhang < > zhangze.linux at gmail.com> > > > Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because > it can not support huge VM? > > > > > > I think at this stage we should do some detection for sv57+ > at JVM startup time and explicitly issue a warning and stop early. > > > That would be better and more elegant than simply crashing > the JVM afterwards. > > > I see this kind of information is availble on linux-riscv64 > at /proc/cpuinfo: > > > > > > $ cat /proc/cpuinfo > > > processor : 0 > > > hart : 1 > > > isa : rv64imafdc > > > mmu : sv39 > > > uarch : sifive,u74-mc > > > Regards, > > > Fei > > > -----Original Messages----- > > > From:"Ludovic Henry" > > > Sent Time:2022-11-24 07:25:19 (Thursday) > > > To: "Vladimir Kempik" > > > Cc: "Zixian Cai" , " > riscv-port-dev at openjdk.org" , "Ze Zhang" < > zhangze.linux at gmail.com> > > > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it > can not support huge VM? > > > Hi, > > > > > > > > > We are currently working on contributing to Qemu a > command-line option to disable/enable certain modes in Qemu. I'll keep you > posted as soon as I've anything material to share. > > > > > > > > > > > > The solution on the OpenJDK should IMO to probe at startup > for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and > cheapest movptr according to this value. I wouldn't want to pay the full > cost of sv57 or sv64 while no existing boards or hardware even support > anything more than sv48. Especially given the current discussions in RISC-V > on reducing the cost of auipc/jalr, movptr, and trampolines. > > > > > > > > > > > > Thanks, > > > > > > Ludovic > > > > > > > > > > > > On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik < > vladimir.kempik at gmail.com> wrote: > > > > > > Hello > > > > > > > > > A kind of workaround for this case > > > > > > > > > > > > disable sv57 csr and rebuild qemu - > https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 > here > > > > > > > > > > > > Regards, Vladimir > > > > > > > > > 23 ????. 2022 ?., ?04:08, Zixian Cai > ???????(?): > > > > > > > > > This has been discussed in a previous thread. > https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html > > > > > > > > > > > > > I agree that it would be nice to support different > modes. Although with the patches in QEMU/Kernel that can restrict the OS to > run with sv39/sv48 only, and the fact that there are not enough real > hardware board supporting above sv48, I don?t know whether there will be > sufficient motivation to fix the problem in the short term. > > > > > > > > > > > > Also worth noting until very recently (Intel > implemented 5-level paging around the release of Ice Lake), x86_64 has been > staying 48 bits a for long time. > > > > > > > > > > > > Sincerely, > > > > > > > > > Zixian > > > > > > > > > > > > On 23/11/2022, 11:58, "riscv-port-dev" < > riscv-port-dev-retn at openjdk.org> wrote: > > > > > > > > > > > > hi, > > > > > > > > > openjdk 20 crash on linux kernel 5.19, because it > can not support huge VM? > > > > > > > > > > > > > > > > > > but I think openjdk as a application, it should not > has any limitation on virtual address length, > > > > > > > > > > > > > > > > > > even if it has very close relationship with > hardware, just as qemu should not depend on hardware VM size, it's none of > business for an application, because all other apps can run very well. > > > > > > > > > > > > > > > > > > > when will it support the newest kernel and qemu? > > > > > > > > > > > > > > > > > > > > https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs > > > ; > > > > > > > > vladimir.kempik at gmail.com> riscv-port-dev at openjdk.org> vladimir.kempik at gmail.com> > ludovic at rivosinc.com> zhangze.linux at gmail.com> zixian.cai at anu.edu.au> > > zixian.cai at anu.edu.au> vladimir.kempik at gmail.com> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangze.linux at gmail.com Fri Nov 25 04:16:17 2022 From: zhangze.linux at gmail.com (Ze Zhang) Date: Fri, 25 Nov 2022 12:16:17 +0800 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: Only qemu 7.10 system mode + risc-v kernel 5.19 can satisfy SV57, so in qemu system mode, run qemu-riscv64 -L xxxx /usr/bin/java -version, result is Segmentation fault, but other app also Segmentation fault: root at qemuriscv64:~# qemu-riscv64 -L recipe-sysroot /bin/ls Segmentation fault even when qemu 7.10 system mode(disable sv57, so sv48 ) + risc-v kernel 5.19, result is the same, root at qemuriscv64:~# cat /proc/cpuinfo processor : 0 hart : 3 isa : rv64imafdch mmu : sv48 root at qemuriscv64:~# qemu-riscv64 -L recipe-sysroot /bin/ls Segmentation fault I don't known whether this is qemu issue, but no one will do this in normal use. Ludovic Henry ?2022?11?25??? 02:01??? > Hi. > > You wouldn't have that issue in qemu user-mode except if the underlying > architecture supports sv57+. Given x86 is sv48 and I am guessing most of us > are using an x86 machine to run qemu on, it doesn't have that issue. > > On Thu, Nov 24, 2022 at 3:33 PM wrote: > >> Hi, >> >> That's a good question. Maybe someone familiar with qemu could help >> answer? >> I am not sure about how qemu-user mode works and whether it will be >> bound to certain kernel versions. >> I am told by the people who raised this discussion that they are >> actually using qemu-system mode. >> >> Thanks, >> Fei >> >> > -----Original Messages----- >> > From: "Vladimir Kempik" >> > Sent Time: 2022-11-24 22:12:11 (Thursday) >> > To: yangfei at iscas.ac.cn >> > Cc: jiangfeilong , "Ludovic Henry" < >> ludovic at rivosinc.com>, "Zixian Cai" , " >> riscv-port-dev at openjdk.org" , "Ze Zhang" < >> zhangze.linux at gmail.com> >> > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can >> not support huge VM? >> > >> > Hello >> > >> > Do we really have that issue (with mmu mode) in qemu-user-mode? >> > >> > I was running qemu7 in user mode with openjdk without any issues. >> > >> > Regards, Vladimir. >> > >> > > 24 ????. 2022 ?., ? 16:43, yangfei at iscas.ac.cn ???????(?): >> > > >> > > >> > > Hi, >> > > >> > > I guess qemu-user doesn't provide /proc/cpuinfo for us to use >> like qemu-system. >> > > But it's still nice if we can enable that detection in >> qemu-system mode or on real hardware platforms. >> > > >> > > Thanks, >> > > Fei >> > > >> > > -----Original Messages----- >> > > From:jiangfeilong >> > > Sent Time:2022-11-24 21:16:17 (Thursday) >> > > To: "yangfei at iscas.ac.cn" , "Ludovic >> Henry" >> > > Cc: "Vladimir Kempik" , "Zixian >> Cai" , "riscv-port-dev at openjdk.org" < >> riscv-port-dev at openjdk.org>, "Ze Zhang" >> > > Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because >> it can not support huge VM? >> > > >> > > Hi, >> > > I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and >> >> > > hardware (HiFive Unmatched) by reading /proc/cpuinfo. >> > > Turns out we can only get mmu info on QEMU-SYSTEM and >> hardware. >> > > QEMU-USER will return empty string when reading >> /proc/cpuinfo. >> > > >> > > Any ideas about that? >> > > >> > > >> > > Here are the outputs: >> > > QEMU-USER: >> > > $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ >> release/images/jdk/bin/java -version >> > > stap mode: >> > > openjdk version "20" 2022-11-24 >> > > OpenJDK Runtime Environment OpenJDK (build 20) >> > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) >> > > >> > > QEMU-SYSTEM >> > > [root at fedora-riscv ~]# jdk-satp/bin/java -version >> > > vm_mode: sv48 >> > > stap mode: sv48 >> > > openjdk version "20" 2022-11-24 >> > > OpenJDK Runtime Environment OpenJDK (build 20) >> > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) >> > > >> > > HiFive Unmatched: >> > > $ jdk-satp/bin/java -version >> > > vm_mode: sv39 >> > > uarch: sifive,u74-mc >> > > stap mode: sv39 >> > > openjdk version "20" 2022-11-24 >> > > OpenJDK Runtime Environment OpenJDK (build 20) >> > > OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) >> > > >> > > >> > > From:riscv-port-dev On >> Behalf Of yangfei at iscas.ac.cn >> > > Sent: Thursday, November 24, 2022 6:16 PM >> > > To: Ludovic Henry >> > > Cc: Vladimir Kempik ; Zixian Cai < >> zixian.cai at anu.edu.au>; riscv-port-dev at openjdk.org; Ze Zhang < >> zhangze.linux at gmail.com> >> > > Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because >> it can not support huge VM? >> > > >> > > I think at this stage we should do some detection for sv57+ >> at JVM startup time and explicitly issue a warning and stop early. >> > > That would be better and more elegant than simply crashing >> the JVM afterwards. >> > > I see this kind of information is availble on linux-riscv64 >> at /proc/cpuinfo: >> > > >> > > $ cat /proc/cpuinfo >> > > processor : 0 >> > > hart : 1 >> > > isa : rv64imafdc >> > > mmu : sv39 >> > > uarch : sifive,u74-mc >> > > Regards, >> > > Fei >> > > -----Original Messages----- >> > > From:"Ludovic Henry" >> > > Sent Time:2022-11-24 07:25:19 (Thursday) >> > > To: "Vladimir Kempik" >> > > Cc: "Zixian Cai" , " >> riscv-port-dev at openjdk.org" , "Ze Zhang" < >> zhangze.linux at gmail.com> >> > > Subject: Re: openjdk 20 crash on linux kernel 5.19, because it >> can not support huge VM? >> > > Hi, >> > > >> > > >> > > We are currently working on contributing to Qemu a >> command-line option to disable/enable certain modes in Qemu. I'll keep you >> posted as soon as I've anything material to share. >> > > >> > > >> > > >> > > The solution on the OpenJDK should IMO to probe at startup >> for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and >> cheapest movptr according to this value. I wouldn't want to pay the full >> cost of sv57 or sv64 while no existing boards or hardware even support >> anything more than sv48. Especially given the current discussions in RISC-V >> on reducing the cost of auipc/jalr, movptr, and trampolines. >> > > >> > > >> > > >> > > Thanks, >> > > >> > > Ludovic >> > > >> > > >> > > >> > > On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik < >> vladimir.kempik at gmail.com> wrote: >> > > >> > > Hello >> > > >> > > >> > > A kind of workaround for this case >> > > >> > > >> > > >> > > disable sv57 csr and rebuild qemu - >> https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 >> here >> > > >> > > >> > > >> > > Regards, Vladimir >> > > >> > > >> > > 23 ????. 2022 ?., ?04:08, Zixian Cai >> ???????(?): >> > > >> > > >> > > This has been discussed in a previous thread. >> https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html >> >> > > >> > > >> > > >> > > I agree that it would be nice to support different >> modes. Although with the patches in QEMU/Kernel that can restrict the OS to >> run with sv39/sv48 only, and the fact that there are not enough real >> hardware board supporting above sv48, I don?t know whether there will be >> sufficient motivation to fix the problem in the short term. >> > > >> > > >> > > >> > > Also worth noting until very recently (Intel >> implemented 5-level paging around the release of Ice Lake), x86_64 has been >> staying 48 bits a for long time. >> > > >> > > >> > > >> > > Sincerely, >> > > >> > > >> > > Zixian >> > > >> > > >> > > >> > > On 23/11/2022, 11:58, "riscv-port-dev" < >> riscv-port-dev-retn at openjdk.org> wrote: >> > > >> > > >> > > >> > > hi, >> > > >> > > >> > > openjdk 20 crash on linux kernel 5.19, because it >> can not support huge VM? >> > > >> > > >> > > >> > > >> > > >> > > but I think openjdk as a application, it should not >> has any limitation on virtual address length, >> > > >> > > >> > > >> > > >> > > >> > > even if it has very close relationship with >> hardware, just as qemu should not depend on hardware VM size, it's none of >> business for an application, because all other apps can run very well. >> >> > > >> > > >> > > >> > > >> > > >> > > when will it support the newest kernel and qemu? >> >> > > >> > > >> > > >> > > >> > > >> > > >> https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs >> > >> ; >> > >> > > >> > > > vladimir.kempik at gmail.com>> riscv-port-dev at openjdk.org>> vladimir.kempik at gmail.com>> zhangze.linux at gmail.com>> vladimir.kempik at gmail.com>> riscv-port-dev-retn at openjdk.org>> riscv-port-dev at openjdk.org>> vladimir.kempik at gmail.com>> jiangfeilong at huawei.com> >> > zixian.cai at anu.edu.au>> vladimir.kempik at gmail.com> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vladimir.kempik at gmail.com Fri Nov 25 09:56:29 2022 From: vladimir.kempik at gmail.com (Vladimir Kempik) Date: Fri, 25 Nov 2022 12:56:29 +0300 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: Hello > run qemu-riscv64 -L xxxx /usr/bin/java -version, result is Segmentation fault, That?s actually user-mode, isn?t it ? Regards, Vladimir -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangze.linux at gmail.com Sat Nov 26 00:06:34 2022 From: zhangze.linux at gmail.com (Ze Zhang) Date: Sat, 26 Nov 2022 08:06:34 +0800 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: yes, all I have tried in the previous mail is about user-mode, because ONLY run linux kernel 5.19 on qemu 7.10 system mode, we can get a SV57 VM, so I run qemu system mode first, then in qemu system mode environment, run qemu-riscv64 user-mode cmdline, which means run qemu user mode in a qemu system mode linux system, otherwise test is not for SV57. openjdk team should do the detail test about this, I only care about qemu system mode. sincerely Vladimir Kempik ?2022?11?25??? 17:56??? > Hello > > run qemu-riscv64 -L xxxx /usr/bin/java -version, result > is Segmentation fault, > > > That?s actually user-mode, isn?t it ? > > Regards, Vladimir > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ludovic at rivosinc.com Sat Nov 26 01:49:39 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Sat, 26 Nov 2022 02:49:39 +0100 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: The more proper fix in QEMU is being upstreamed with https://lists.gnu.org/archive/html/qemu-riscv/2022-11/msg00105.html On Sat, Nov 26, 2022 at 1:06 AM Ze Zhang wrote: > yes, > > all I have tried in the previous mail is about user-mode, because ONLY run > linux kernel 5.19 on qemu 7.10 system mode, we can get a SV57 VM, so I run > qemu system mode first, then in qemu system mode environment, run > qemu-riscv64 user-mode cmdline, which means run qemu user mode in a qemu > system mode linux system, otherwise test is not for SV57. > openjdk team should do the detail test about this, I only care about qemu > system mode. > > sincerely > > Vladimir Kempik ?2022?11?25??? 17:56??? > >> Hello >> >> run qemu-riscv64 -L xxxx /usr/bin/java -version, result >> is Segmentation fault, >> >> >> That?s actually user-mode, isn?t it ? >> >> Regards, Vladimir >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangze.linux at gmail.com Sat Nov 26 04:48:22 2022 From: zhangze.linux at gmail.com (Ze Zhang) Date: Sat, 26 Nov 2022 12:48:22 +0800 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: but I think there is a error about this part of the patch: diff --git a/target/riscv/csr.c b/target/riscv/csr.c index 5c9a7ee287..d26b830f1a 100644 --- a/target/riscv/csr.c +++ b/target/riscv/csr.c @@ -1109,10 +1109,12 @@ static RISCVException read_mstatus(CPURISCVState *env, int csrno, static int validate_vm(CPURISCVState *env, target_ulong vm) { + vm &= 0xf; + if (riscv_cpu_mxl(env) == MXL_RV32) { - return valid_vm_1_10_32[vm & 0xf]; + return valid_vm_1_10_32[vm] && (vm <=RISCV_CPU(env_cpu(env))->cfg.satp_mode); } else { - return valid_vm_1_10_64[vm & 0xf]; + return valid_vm_1_10_64[vm] && (vm <=RISCV_CPU(env_cpu(env))->cfg.satp_mode); } } maybe it should be: static int validate_vm(CPURISCVState *env, target_ulong vm) { + vm &= 0xf; + if (riscv_cpu_mxl(env) == MXL_RV32) { - return valid_vm_1_10_32[vm & 0xf]; + if (vm <= RISCV_CPU(env_cpu(env))->cfg.satp_mode) + { + return valid_vm_1_10_32[vm]; + } else { + return 0; + } } else { - return valid_vm_1_10_64[vm & 0xf]; + if (vm <= RISCV_CPU(env_cpu(env))->cfg.satp_mode) + { + return valid_vm_1_10_64[vm]; + } else { + return 0; + } } } Ludovic Henry ?2022?11?26??? 09:49??? > The more proper fix in QEMU is being upstreamed with > https://lists.gnu.org/archive/html/qemu-riscv/2022-11/msg00105.html > > On Sat, Nov 26, 2022 at 1:06 AM Ze Zhang wrote: > >> yes, >> >> all I have tried in the previous mail is about user-mode, because ONLY >> run linux kernel 5.19 on qemu 7.10 system mode, we can get a SV57 VM, so I >> run qemu system mode first, then in qemu system mode environment, run >> qemu-riscv64 user-mode cmdline, which means run qemu user mode in a qemu >> system mode linux system, otherwise test is not for SV57. >> openjdk team should do the detail test about this, I only care about qemu >> system mode. >> >> sincerely >> >> Vladimir Kempik ?2022?11?25??? 17:56??? >> >>> Hello >>> >>> run qemu-riscv64 -L xxxx /usr/bin/java -version, result >>> is Segmentation fault, >>> >>> >>> That?s actually user-mode, isn?t it ? >>> >>> Regards, Vladimir >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zhangze.linux at gmail.com Sat Nov 26 05:09:17 2022 From: zhangze.linux at gmail.com (Ze Zhang) Date: Sat, 26 Nov 2022 13:09:17 +0800 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> <6726cc10.722.184aa0e5fd5.Coremail.yangfei@iscas.ac.cn> Message-ID: ignore my mis-understanding Ze Zhang ?2022?11?26??? 12:48??? > but I think there is a error about this part of the patch: > > diff --git a/target/riscv/csr.c b/target/riscv/csr.c > index 5c9a7ee287..d26b830f1a 100644 > --- a/target/riscv/csr.c > +++ b/target/riscv/csr.c > @@ -1109,10 +1109,12 @@ static RISCVException read_mstatus(CPURISCVState > *env, > int csrno, > > static int validate_vm(CPURISCVState *env, target_ulong vm) > { > + vm &= 0xf; > + > if (riscv_cpu_mxl(env) == MXL_RV32) { > - return valid_vm_1_10_32[vm & 0xf]; > + return valid_vm_1_10_32[vm] && (vm > <=RISCV_CPU(env_cpu(env))->cfg.satp_mode); > } else { > - return valid_vm_1_10_64[vm & 0xf]; > + return valid_vm_1_10_64[vm] && (vm > <=RISCV_CPU(env_cpu(env))->cfg.satp_mode); > } > } > > maybe it should be: > > static int validate_vm(CPURISCVState *env, target_ulong vm) > { > + vm &= 0xf; > + > if (riscv_cpu_mxl(env) == MXL_RV32) { > - return valid_vm_1_10_32[vm & 0xf]; > + if (vm <= RISCV_CPU(env_cpu(env))->cfg.satp_mode) > + { > + return valid_vm_1_10_32[vm]; > + } else { > + return 0; > + } > } else { > - return valid_vm_1_10_64[vm & 0xf]; > + if (vm <= RISCV_CPU(env_cpu(env))->cfg.satp_mode) > + { > + return valid_vm_1_10_64[vm]; > + } else { > + return 0; > + } > } > } > > > Ludovic Henry ?2022?11?26??? 09:49??? > >> The more proper fix in QEMU is being upstreamed with >> https://lists.gnu.org/archive/html/qemu-riscv/2022-11/msg00105.html >> >> On Sat, Nov 26, 2022 at 1:06 AM Ze Zhang wrote: >> >>> yes, >>> >>> all I have tried in the previous mail is about user-mode, because ONLY >>> run linux kernel 5.19 on qemu 7.10 system mode, we can get a SV57 VM, so I >>> run qemu system mode first, then in qemu system mode environment, run >>> qemu-riscv64 user-mode cmdline, which means run qemu user mode in a qemu >>> system mode linux system, otherwise test is not for SV57. >>> openjdk team should do the detail test about this, I only care about >>> qemu system mode. >>> >>> sincerely >>> >>> Vladimir Kempik ?2022?11?25??? 17:56??? >>> >>>> Hello >>>> >>>> run qemu-riscv64 -L xxxx /usr/bin/java -version, result >>>> is Segmentation fault, >>>> >>>> >>>> That?s actually user-mode, isn?t it ? >>>> >>>> Regards, Vladimir >>>> >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From jiangfeilong at huawei.com Mon Nov 28 03:08:30 2022 From: jiangfeilong at huawei.com (jiangfeilong) Date: Mon, 28 Nov 2022 03:08:30 +0000 Subject: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? In-Reply-To: <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> References: <2aeabcd1.38b.184a923222d.Coremail.yangfei@iscas.ac.cn> <41985ab2042f4cceb15554158bb8661d@huawei.com> <28239ffd.5f6.184a9e1812a.Coremail.yangfei@iscas.ac.cn> Message-ID: Hi, According to the previous /proc/cpuinfo pattern, I have added mmu detection logic and it now will print "Unsupported satp mode" when JDK is running on mmu higher than SV48 (rv64) or SV32(rv32). Output would be like: root at qemuriscv64:~# jdk/bin/java -version Error occurred during initialization of VM Unsupported stap mode: 10 // 10 represents SV57 which is defined in privileged ISA patch link: https://github.com/openjdk/jdk/commit/946aad566dc0c4e36a53c7d604ed3705cf1be3f1 -----Original Message----- From: yangfei at iscas.ac.cn Sent: Thursday, November 24, 2022 9:44 PM To: jiangfeilong Cc: Ludovic Henry ; Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang Subject: Re: RE: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, I guess qemu-user doesn't provide /proc/cpuinfo for us to use like qemu-system. But it's still nice if we can enable that detection in qemu-system mode or on real hardware platforms. Thanks, Fei -----Original Messages----- From:jiangfeilong Sent Time:2022-11-24 21:16:17 (Thursday) To: "yangfei at iscas.ac.cn" , "Ludovic Henry" Cc: "Vladimir Kempik" , "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" Subject: RE: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, I?m trying to get satp mode on QEMU-USER, QEMU-SYSTEM and hardware (HiFive Unmatched) by reading /proc/cpuinfo. Turns out we can only get mmu info on QEMU-SYSTEM and hardware. QEMU-USER will return empty string when reading /proc/cpuinfo. Any ideas about that? Here are the outputs: QEMU-USER: $ /riscv-qemu/bin/qemu-riscv64 -L ~/riscv-sysroot/ release/images/jdk/bin/java -version stap mode: openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) QEMU-SYSTEM [root at fedora-riscv ~]# jdk-satp/bin/java -version vm_mode: sv48 stap mode: sv48 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) HiFive Unmatched: $ jdk-satp/bin/java -version vm_mode: sv39 uarch: sifive,u74-mc stap mode: sv39 openjdk version "20" 2022-11-24 OpenJDK Runtime Environment OpenJDK (build 20) OpenJDK 64-Bit Server VM OpenJDK (build 20, mixed mode) From:riscv-port-dev On Behalf Of yangfei at iscas.ac.cn Sent: Thursday, November 24, 2022 6:16 PM To: Ludovic Henry Cc: Vladimir Kempik ; Zixian Cai ; riscv-port-dev at openjdk.org; Ze Zhang Subject: Re: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? I think at this stage we should do some detection for sv57+ at JVM startup time and explicitly issue a warning and stop early. That would be better and more elegant than simply crashing the JVM afterwards. I see this kind of information is availble on linux-riscv64 at /proc/cpuinfo: $ cat /proc/cpuinfo processor : 0 hart : 1 isa : rv64imafdc mmu : sv39 uarch : sifive,u74-mc Regards, Fei -----Original Messages----- From:"Ludovic Henry" Sent Time:2022-11-24 07:25:19 (Thursday) To: "Vladimir Kempik" Cc: "Zixian Cai" , "riscv-port-dev at openjdk.org" , "Ze Zhang" Subject: Re: openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? Hi, We are currently working on contributing to Qemu a command-line option to disable/enable certain modes in Qemu. I'll keep you posted as soon as I've anything material to share. The solution on the OpenJDK should IMO to probe at startup for the satp mode (sv39/sv48/sv57/sv64) and generate the appropriate and cheapest movptr according to this value. I wouldn't want to pay the full cost of sv57 or sv64 while no existing boards or hardware even support anything more than sv48. Especially given the current discussions in RISC-V on reducing the cost of auipc/jalr, movptr, and trampolines. Thanks, Ludovic On Wed, Nov 23, 2022 at 9:33 AM Vladimir Kempik wrote: Hello A kind of workaround for this case disable sv57 csr and rebuild qemu - https://github.com/qemu/qemu/blob/master/target/riscv/csr.c#L1027 put 0 here Regards, Vladimir 23 ????. 2022 ?., ?04:08, Zixian Cai ???????(?): This has been discussed in a previous thread. https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000636.html I agree that it would be nice to support different modes. Although with the patches in QEMU/Kernel that can restrict the OS to run with sv39/sv48 only, and the fact that there are not enough real hardware board supporting above sv48, I don?t know whether there will be sufficient motivation to fix the problem in the short term. Also worth noting until very recently (Intel implemented 5-level paging around the release of Ice Lake), x86_64 has been staying 48 bits a for long time. Sincerely, Zixian On 23/11/2022, 11:58, "riscv-port-dev" wrote: hi, openjdk 20 crash on linux kernel 5.19, because it can not support huge VM? but I think openjdk as a application, it should not has any limitation on virtual address length, even if it has very close relationship with hardware, just as qemu should not depend on hardware VM size, it's none of business for an application, because all other apps can run very well. when will it support the newest kernel and qemu? https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/pILY0WGHhOs