From yangfei at iscas.ac.cn Tue Oct 4 07:39:20 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Tue, 4 Oct 2022 15:39:20 +0800 (GMT+08:00) Subject: Discuss the RVC implementation In-Reply-To: <3d27d06d-6cac-44d2-90b7-15b4ebb07ddd.yunyao.zxl@alibaba-inc.com> References: <2d7bbad2-7ade-4b38-91b5-12c4c0a91602.yunyao.zxl@alibaba-inc.com>, <77e347f0.29ad8.183887bfeb5.Coremail.yangfei@iscas.ac.cn> <3d27d06d-6cac-44d2-90b7-15b4ebb07ddd.yunyao.zxl@alibaba-inc.com> Message-ID: <4a86a30d.339e1.183a1ef5784.Coremail.yangfei@iscas.ac.cn> Hi Xiaolin, Thanks for the thorough consierations. Comments inlined. > Hi Felix, > > Thank you for taking the time to consider this, and the discussions. > > I think it's certainly a fairly good observation, regarding the three versions that can theoretically cover any case in combination, in an instruction-level granularity. But in reality, I may have some of my personal practices to share: such might be too fine-grained to implement a high-level control, please let me explain it. > > Let alone correctness, there are also code styles and maintenance that we have to focus on for sure. For example, if we want to rewrite one piece of code[1] with a fixed length by removing the `IncompressibleRegion` thing, to an instruction-level granularity, it might become [2]. Please see my comments in that gist. > > 1. From the code style aspect: > We can see it is not looking so promising. In fact, my RVC prototype was in exactly the same way as your thought (so I guess it might be an intuitive and general thought :-) ), in an instruction-level granularity. And I sadly found the code style was messy even to myself. We have to overload lots of things such as _ld(Register, Address), _ld(Register, address), (see my comments) and so on to fulfill any usage in an incompressible piece of code: the overall API changes (like _ld in any form) are not convergent. > > In the comments from the gist, we can see we certainly have to make incompressible all the callees, even the callees of the callees, and so on, in a transitive relation. For example, the 'la(Register, Address)' API itself must be incompressible if we are in an instruction granularity. So we have to make its callee, 'la(Register, address)' API incompressible as well, and so on. It might be indeed an inferno... Yes, I think the 'la' case here already demonstrates the complexity of my proposal. I agree an 'IncompressibleRegion' mark would be simpler and easier for the developers. > 2. From the compression rate aspect: > Besides, we are just talking about la() here. If we directly mark la()s as incompressible, then the la()s called by actually safe and compressible code will be left as incompressible forever. The compression rate will be definitely lower: the main issue here is, of course, the granularity problem -- instruction-level granularity is too fine-grained, which cannot allow us to make high-level controls. > > The current `CompressibleRegion` combined with `IncompressibleRegion` can implement a function-level granularity (neither too fine nor too coarse), which I think is very suitable for the current backend, that we can use them combined to mark everything without many efforts and with a concentration (like the current implementation: the unified relocate() with a lambda[3] and an IncompressibleRegion hidden inside). With them both, we can avoid the above problems with no effort, please see the first line of [1]: the incompressible region directly controls the current function, marking THE 'la' it currently uses incompressible, without affecting the 'la' definitions themselves (movptr, ld ... are as well). So we can avoid lots of invasions to the current backend code base. Nice, right? I have went through your local changes about unified relocate() with a lambda. And I think it looks better and we can go on with this solution if no objections. > 3. From the maintenance aspect: Explicitly adding '_' to every compressible instruction might be a burden for developers and porters. One may say, just adding some '_'s, why burdens? In fact, considering we are porting code like [1] again from AArch64 port. We not only have to change instructions to RISC-V's, but also have to consider RVC... does one instruction have '_' or not? Do its callees, even its callees' callees, have an incompressible version? Even if to myself, it might be a heavy burden :-) I might feel very troublesome - I may just want to ctrl+c and ctrl+v some code without other confusion. So, why not directly throw an `IncompressionRegion` to that stub with a fixed length, so that programmers can normally write their code with the normal "ld", "la" and "addi"? Everything is easily solved without caring for the trifling :-) That makes sense :-) > Just sharing some practices from the same thought and might be verbose again -- there are things not easy to foresee at a glance. When implementing, the pitfalls might be obvious then. From my personal perspective, I may consider the CompressibleRegions plan looks better though, and I am looking forward to your views and suggestions. > > > Best, > Xiaolin > > > [1] https://github.com/zhengxiaolinX/jdk/blob/2ee3204ace5a7767482819be2240982cc0744f8c/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L196-L275 > [2] https://gist.github.com/zhengxiaolinX/3151db356a9001f58827d272c8330bb7 > [3] https://github.com/zhengxiaolinX/jdk/blob/2ee3204ace5a7767482819be2240982cc0744f8c/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2167-L2178 -------------- next part -------------- An HTML attachment was scrubbed... URL: From yangfei at iscas.ac.cn Tue Oct 4 08:35:45 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Tue, 4 Oct 2022 16:35:45 +0800 (GMT+08:00) Subject: Issue with llvm compiled jvm In-Reply-To: <4795B212-813F-4D48-AEC8-3EC6740F55D0@gmail.com> References: <4795B212-813F-4D48-AEC8-3EC6740F55D0@gmail.com> Message-ID: <11a69784.33b38.183a222fd6c.Coremail.yangfei@iscas.ac.cn> > -----Original Messages----- > From: "Vladimir Kempik" > Sent Time: 2022-09-28 22:37:23 (Wednesday) > To: riscv-port-dev at openjdk.org > Cc: > Subject: Issue with llvm compiled jvm > > Hello > > I was playing with clang compiled hotspot and found an issue in one configuration: > clang + sysroot from gcc ( aka link with libgcc_s.so.1) > in such combo the __builtin___clear_cache() function calls __clear_cache from libgcc_s.so which is basically a dummy function doing nothing. > it doesn?t happen when using gcc, it shouldn?t happen if clang is used with compiler-rt libs ( where __clear_cache is properly implemented) > it?s a bug of compiler, but we may want to make a workaround: > > #IFDEF llvm THEN (use old style direct call of syscall OR __riscv_flush_icache(..)) ELSE __builtin___clear_cache(..) > > Looking for opinions - should we implement a workaround in openjdk or just ignore it ? I assume that's not an valid deployment for the clang/llvm toolchain, right? It looks strange to me for the clang/llvm toolchain to link the application code to libgcc_s.so. Note that the libgcc_s.so as a low-level runtime library is there only for the GNU/GCC compiler [1]. So I would not suggest we handle this kind of problem in the openjdk code. Thanks, Fei [1] https://gcc.gnu.org/onlinedocs/gccint/Libgcc.html From ludovic at rivosinc.com Tue Oct 4 10:45:59 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Tue, 4 Oct 2022 11:45:59 +0100 Subject: Non-zero build crash on kernel 5.17+? In-Reply-To: References: Message-ID: Hello, Some updates on working around that issue. In order to force QEMU to use sv48 in place of sv57, you can comment out this line and compile your own version of QEMU. There is unfortunately no way currently to disable sv57 through an option, and Linux will probe the CSR SATP to know whether the machine supports sv57 while ignoring the device tree provided by QEMU. I hope that helps, Ludovic On Fri, Sep 23, 2022 at 3:05 PM Ludovic Henry wrote: > Hi, > > I did run into the same issue locally. Unfortunately, there doesn't seem > to be an option to disable sv57 support in Qemu (I couldn't find anything > in the sources either). Using an older kernel (5.17) seems to be the only > solution for now. > > Thanks, > Ludovic > > On Fri, Sep 23, 2022 at 1:17 PM Xiaolin Zheng > wrote: > >> Hi Zixian, >> >> The current backend supports sv48 and below only. Please see [1] for more >> details. >> >> The kernel 5.17 supports sv48 and 5.18 supports sv57. Your address ` >> 0xfffffff71136b8` is a 56-bit address, which is not supported by the >> backend currently. >> >> To bypass this issue, you can try to use kernel 5.17 directly or find if >> there are options for QEMU to limit the address space to an sv48 one. >> >> Not sure if there will be support for a larger address space recently in >> the backend, for there seems no hardware supporting even sv48 now. >> >> >> Thanks, >> Xiaolin >> >> [1] >> https://github.com/openjdk/jdk/blob/master/src/hotspot/cpu/riscv/macroAssembler_riscv.hpp#L910-L914 >> >> ------------------------------------------------------------------ >> From:Zixian Cai >> Send Time:2022?9?23?(???) 16:57 >> To:riscv-port-dev at openjdk.org >> Subject:Non-zero build crash on kernel 5.17+? >> >> Hi all, >> >> >> >> I found that a non-zero build of jdk-20+16 crashes on Ubuntu 22.10 >> (kernel 5.19) running on QEMU. >> >> The same build works on Ubuntu 22.04 (kernel 5.15) running on QEMU. >> >> The error message is as follows. >> >> >> >> # To suppress the following error report, specify this argument >> >> # after -XX: or in .hotspotrc: SuppressErrorAt=/assembler_riscv.cpp:285 >> >> # >> >> # A fatal error has been detected by the Java Runtime Environment: >> >> # >> >> # Internal Error >> (/home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/assembler_riscv.cpp:285), >> pid=907, tid=908 >> >> # assert(is_unsigned_imm_in_range(imm64, 47, 0) || (imm64 == >> (int64_t)-1)) failed: bit 47 overflows in address constant >> >> # >> >> # JRE version: (20.0) (slowdebug build ) >> >> # Java VM: OpenJDK 64-Bit Server VM (slowdebug >> 20-testing-builds.shipilev.net-openjdk-jdk-b212-20220922, mixed mode, >> sharing, tiered, compressed oops, compressed class ptrs, g1 gc, >> linux-riscv64) >> >> # Problematic frame: >> >> # V [libjvm.so+0x39f41c] Assembler::movptr_with_offset(Register, >> unsigned char*, int&)+0x96 >> >> # >> >> # Core dump will be written. Default location: Core dumps may be >> processed with "/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g >> -- %E" (or dumping to /home/ubuntu/core.907) >> >> # >> >> # An error report file with more information is saved as: >> >> # /home/ubuntu/hs_err_pid907.log >> >> # >> >> # >> >> >> >> Here is the backtrace and local variables seen in gdb. >> >> >> >> (gdb) bt >> >> #0 0x00fffffff674941c in Assembler::movptr_with_offset >> (this=0xfffffff0000e30, Rd=..., >> >> addr=0xfffffff71136b8 >> > char*)> "9q\006\374\"\370", , >> offset=@0xfffffff632f00c: 0) >> >> at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/assembler_riscv.cpp:284 >> >> #1 0x00fffffff6f17c5c in MacroAssembler::call_VM_leaf_base >> (this=0xfffffff0000e30, >> >> entry_point=0xfffffff71136b8 >> > char*)> "9q\006\374\"\370", , >> number_of_arguments=2, >> >> retaddr=0x0) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:568 >> >> #2 0x00fffffff6f17da2 in MacroAssembler::call_VM_leaf >> (this=0xfffffff0000e30, >> >> entry_point=0xfffffff71136b8 >> > char*)> "9q\006\374\"\370", , arg_0=..., >> arg_1=...) >> >> at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/macroAssembler_riscv.cpp:588 >> >> #3 0x00fffffff7222308 in StubGenerator::generate_forward_exception >> (this=0xfffffff632f1e8) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp:546 >> >> #4 0x00fffffff7231506 in StubGenerator::generate_initial >> (this=0xfffffff632f1e8) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp:3870 >> >> #5 0x00fffffff7231956 in StubGenerator::StubGenerator >> (this=0xfffffff632f1e8, code=0xfffffff632f3c8, phase=0) >> >> at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp:3971 >> >> #6 0x00fffffff721faa0 in StubGenerator_generate (code=0xfffffff632f3c8, >> phase=0) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/cpu/riscv/stubGenerator_riscv.cpp:3988 >> >> #7 0x00fffffff72322c8 in StubRoutines::initialize1 () at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/runtime/stubRoutines.cpp:228 >> >> #8 0x00fffffff72330d2 in stubRoutines_init1 () at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/runtime/stubRoutines.cpp:389 >> >> #9 0x00fffffff6c7823a in init_globals () at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/runtime/init.cpp:123 >> >> #10 0x00fffffff72bcc34 in Threads::create_vm (args=0xfffffff632f7e0, >> canTryAgain=0xfffffff632f70b) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/runtime/threads.cpp:570 >> >> #11 0x00fffffff6d891ae in JNI_CreateJavaVM_inner (vm=0xfffffff632f838, >> penv=0xfffffff632f840, args=0xfffffff632f7e0) >> >> at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/prims/jni.cpp:3628 >> >> #12 0x00fffffff6d893a8 in JNI_CreateJavaVM (vm=0xfffffff632f838, >> penv=0xfffffff632f840, args=0xfffffff632f7e0) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/hotspot/share/prims/jni.cpp:3714 >> >> #13 0x00fffffff7fb1a44 in InitializeJVM (pvm=0xfffffff632f838, >> penv=0xfffffff632f840, ifn=0xfffffff632f890) >> >> at >> /home/buildbot/worker/build-jdkX-debian10/build/src/java.base/share/native/libjli/java.c:1457 >> >> #14 0x00fffffff7faef16 in JavaMain (_args=0xffffffffffc0d8) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/java.base/share/native/libjli/java.c:413 >> >> #15 0x00fffffff7fb50ea in ThreadJavaMain (args=0xffffffffffc0d8) at >> /home/buildbot/worker/build-jdkX-debian10/build/src/java.base/unix/native/libjli/java_md.c:650 >> >> #16 0x00fffffff7ed7450 in start_thread (arg=) at >> ./nptl/pthread_create.c:442 >> >> #17 0x00fffffff7f24ed2 in __thread_start () at >> ../sysdeps/unix/sysv/linux/riscv/clone.S:85 >> >> (gdb) info locals >> >> imm64 = 0xfffffff71136b8 >> >> imm = 0xfffffff632efb0 >> >> upper = 0xfffffff632efb0 >> >> lower = 0xffffff80000000 >> >> >> >> I suspect that the issue is due to the newer kernels (5.17+) supports >> sv48, and that increases the bits in the addresses that the assembler needs >> to handle. See kernel changelog >> https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.17. >> >> >> >> To reproduce the issue, I use the following. >> >> Guest Ubuntu 22.10: >> https://cdimage.ubuntu.com/ubuntu-server/daily-preinstalled/current/kinetic-preinstalled-server-riscv64+unmatched.img.xz >> >> Guest Ubuntu 22.04: >> https://cdimage.ubuntu.com/releases/22.04.1/release/ubuntu-22.04.1-preinstalled-server-riscv64+unmatched.img.xz >> >> JDK slowdebug build: >> https://builds.shipilev.net/openjdk-jdk/openjdk-jdk-linux-riscv64-server-slowdebug-gcc8-glibc2.28.tar.xz >> (OpenJDK 64-Bit Server VM (slowdebug build >> 20-testing-builds.shipilev.net-openjdk-jdk-b212-20220922, mixed mode)) >> >> QEMU: installed via apt on Ubuntu 22.04 host >> >> QEMU setup: https://wiki.ubuntu.com/RISC-V >> >> >> >> Sincerely, >> >> Zixian >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Tue Oct 4 11:14:51 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Tue, 4 Oct 2022 11:14:51 +0000 Subject: Non-zero build crash on kernel 5.17+? In-Reply-To: References: Message-ID: Hi Ludovic, Thanks! Yes, I settled on the exact same workaround, but haven?t gotten around to post it to this list. I agree that this is awkward as it requires people to build their own QEMU from source. By coincidence, I had a Twitter exchange with Felix (Archlinux developer) regarding this. Felix opened a kernel issue for providing an option to disable sv57 (https://bugzilla.kernel.org/show_bug.cgi?id=216545) and mentioned that Ubuntu is likely to disable sv57 in the kernel to be shipped with 22.10 (http://sprunge.us/WGq6Pm). Sincerely, Zixian On 4/10/2022 21:45, Ludovic Henry wrote: Hello, Some updates on working around that issue. In order to force QEMU to use sv48 in place of sv57, you can comment out this line and compile your own version of QEMU. There is unfortunately no way currently to disable sv57 through an option, and Linux will probe the CSR SATP to know whether the machine supports sv57 while ignoring the device tree provided by QEMU. I hope that helps, Ludovic -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Tue Oct 4 11:59:57 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Tue, 4 Oct 2022 11:59:57 +0000 Subject: Choosing registers for assembly snippets Message-ID: Hi all, I?m just wondering what the best practices are in terms of choosing registers. I was looking at the assembly code for tlab allocation (https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L131) and noticed the following. 1. tmp1 register is invalid. 2. tmp2 register is valid but sometimes clashes with var_size_in_bytes, which requires the workaround here https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L155 3. t0 is used as a temporary register despite not passed into the method as a temporary register https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L149 I?d really appreciate if someone could shed insight on how these decisions are made so that I can avoid common pitfalls when writing assembly code. The only resource I found at the moment is the architecture description file https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/riscv.ad#L73 which suggests x5(t0)-x6(t1) can always be used as temporary registers. Sincerely, Zixian -------------- next part -------------- An HTML attachment was scrubbed... URL: From ludovic at rivosinc.com Wed Oct 5 09:09:43 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Wed, 5 Oct 2022 10:09:43 +0100 Subject: Choosing registers for assembly snippets In-Reply-To: References: Message-ID: Hi, My understanding is it's quite a mixed-bag. There are some places where it is assumed which register is used (ex: StubGenerator::generate_zero_blocks assumes x28 and x29 are used in MacroAssembler::zero_words) while other places pass the registers to be used. It's surprising that the TLAB allocation passes the registers to be used (tmp1 and tmp2) but there are cases where tmp1 is noreg (like templateTable_riscv.cpp [1]). In that specific case of TLAB allocation, it seems to me the right approach is to make sure `tmp1` is always valid (AFAIU, the only required change is to pass `t0` in [1]), and use `tmp1` in place of `t0` in BarrierSetAssembler::tlab_allocate. It would also require asserting that both tmp1 and tmp2 are not noreg. Finally, on always using t0 and t1 as temporary registers, it's the case most of the time, but there are cases where you can't (StubGenerator::generate_zero_blocks and MacroAssembler::zero_words for example). So IMHO the best practice is to pass the registers you can use. The exception is when you can't (like the StubGenerator which generates the code at startup); then it's a tradeoff between passing the parameters as normal arguments (c_rarg0, c_rarg1, etc.) or make an assumption or where the value is currently stored (x28. x29 in MacroAssembler::zero_words case for example). Cheers, Ludovic [1] https://github.com/openjdk/jdk/blob/953ce8da2c7ddd60b09a18c7875616a2477e5ba5/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3406 On Tue, Oct 4, 2022 at 1:01 PM Zixian Cai wrote: > Hi all, > > > > I?m just wondering what the best practices are in terms of choosing > registers. > > > > I was looking at the assembly code for tlab allocation ( > https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L131) > and noticed the following. > > 1. tmp1 register is invalid. > 2. tmp2 register is valid but sometimes clashes with > var_size_in_bytes, which requires the workaround here > https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L155 > 3. t0 is used as a temporary register despite not passed into the > method as a temporary register > https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L149 > > > > I?d really appreciate if someone could shed insight on how these decisions > are made so that I can avoid common pitfalls when writing assembly code. > The only resource I found at the moment is the architecture description file > > https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/riscv.ad#L73 > which suggests x5(t0)-x6(t1) can always be used as temporary registers. > > > > Sincerely, > > Zixian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Wed Oct 5 11:26:40 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Wed, 5 Oct 2022 11:26:40 +0000 Subject: Choosing registers for assembly snippets In-Reply-To: References: Message-ID: Hi Ludovic, Thanks for the information. Regarding using the registers passed in, is there any guarantee in terms of register conflict or we should handle all corner cases? One example I mentioned earlier is, again in TLAB allocate, https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L155. The code handles the case where var_size_in_bytes is the same register as tmp2. And I can confirm that such conflict indeed happens in practice, and if not handled, causes issues, for example, in java.util.Arrays.copyOfRange, where the object size register is used after allocation to perform the copying. Sincerely, Zixian On 5/10/2022, 20:10, "Ludovic Henry" wrote: Hi, My understanding is it's quite a mixed-bag. There are some places where it is assumed which register is used (ex: StubGenerator::generate_zero_blocks assumes x28 and x29 are used in MacroAssembler::zero_words) while other places pass the registers to be used. It's surprising that the TLAB allocation passes the registers to be used (tmp1 and tmp2) but there are cases where tmp1 is noreg (like templateTable_riscv.cpp [1]). In that specific case of TLAB allocation, it seems to me the right approach is to make sure `tmp1` is always valid (AFAIU, the only required change is to pass `t0` in [1]), and use `tmp1` in place of `t0` in BarrierSetAssembler::tlab_allocate. It would also require asserting that both tmp1 and tmp2 are not noreg. Finally, on always using t0 and t1 as temporary registers, it's the case most of the time, but there are cases where you can't (StubGenerator::generate_zero_blocks and MacroAssembler::zero_words for example). So IMHO the best practice is to pass the registers you can use. The exception is when you can't (like the StubGenerator which generates the code at startup); then it's a tradeoff between passing the parameters as normal arguments (c_rarg0, c_rarg1, etc.) or make an assumption or where the value is currently stored (x28. x29 in MacroAssembler::zero_words case for example). Cheers, Ludovic [1] https://github.com/openjdk/jdk/blob/953ce8da2c7ddd60b09a18c7875616a2477e5ba5/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3406 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ludovic at rivosinc.com Wed Oct 5 14:03:15 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Wed, 5 Oct 2022 15:03:15 +0100 Subject: Choosing registers for assembly snippets In-Reply-To: References: Message-ID: Hi Zixian, This is where the abstraction can be leaky and the callee (in this case BarrierSetAssembler::tlab_allocate) needs to know how its called by callers (C1_MacroAssembler::allocate_array, C1_MacroAssembler::allocate_object, TemplateTable::_new in this case). It might go down to precedents (is it done like that in similar cases?) and to reviewers. I'll always prefer a clear contract with clear separation of concerns between callers and callees, but that needs to be balanced with the performance overhead of the solution. I'd be happy to hear of others' points of view on the RISC-V port. Thanks, Ludovic On Wed, Oct 5, 2022 at 12:27 PM Zixian Cai wrote: > Hi Ludovic, > > > > Thanks for the information. Regarding using the registers passed in, is > there any guarantee in terms of register conflict or we should handle all > corner cases? > > One example I mentioned earlier is, again in TLAB allocate, > https://github.com/openjdk/jdk/blob/5a9cd33632862aa2249794902d4168a7fe143054/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L155 > . > > > > The code handles the case where var_size_in_bytes is the same register as > tmp2. And I can confirm that such conflict indeed happens in practice, and > if not handled, causes issues, for example, in > java.util.Arrays.copyOfRange, where the object size register is used after > allocation to perform the copying. > > > > Sincerely, > > Zixian > > > > On 5/10/2022, 20:10, "Ludovic Henry" wrote: > > Hi, > > My understanding is it's quite a mixed-bag. There are some places where it > is assumed which register is used (ex: StubGenerator::generate_zero_blocks > assumes x28 and x29 are used in MacroAssembler::zero_words) while other > places pass the registers to be used. It's surprising that the TLAB > allocation passes the registers to be used (tmp1 and tmp2) but there are > cases where tmp1 is noreg (like templateTable_riscv.cpp [1]). > > > > In that specific case of TLAB allocation, it seems to me the right > approach is to make sure `tmp1` is always valid (AFAIU, the only required > change is to pass `t0` in [1]), and use `tmp1` in place of `t0` in > BarrierSetAssembler::tlab_allocate. It would also require asserting that > both tmp1 and tmp2 are not noreg. > > > > Finally, on always using t0 and t1 as temporary registers, it's the case > most of the time, but there are cases where you can't > (StubGenerator::generate_zero_blocks and MacroAssembler::zero_words for > example). > > > > So IMHO the best practice is to pass the registers you can use. The > exception is when you can't (like the StubGenerator which generates the > code at startup); then it's a tradeoff between passing the parameters as > normal arguments (c_rarg0, c_rarg1, etc.) or make an assumption or where > the value is currently stored (x28. x29 in MacroAssembler::zero_words case > for example). > > > > Cheers, > > Ludovic > > > > [1] > https://github.com/openjdk/jdk/blob/953ce8da2c7ddd60b09a18c7875616a2477e5ba5/src/hotspot/cpu/riscv/templateTable_riscv.cpp#L3406 > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yunyao.zxl at alibaba-inc.com Sat Oct 8 03:05:00 2022 From: yunyao.zxl at alibaba-inc.com (Xiaolin Zheng) Date: Sat, 08 Oct 2022 11:05:00 +0800 Subject: =?UTF-8?B?UmU6IFJlOiBEaXNjdXNzIHRoZSBSVkMgaW1wbGVtZW50YXRpb24=?= In-Reply-To: <4a86a30d.339e1.183a1ef5784.Coremail.yangfei@iscas.ac.cn> References: <2d7bbad2-7ade-4b38-91b5-12c4c0a91602.yunyao.zxl@alibaba-inc.com>, <77e347f0.29ad8.183887bfeb5.Coremail.yangfei@iscas.ac.cn> <3d27d06d-6cac-44d2-90b7-15b4ebb07ddd.yunyao.zxl@alibaba-inc.com>, <4a86a30d.339e1.183a1ef5784.Coremail.yangfei@iscas.ac.cn> Message-ID: <4a4d34ae-c9a5-4687-a780-ee26bd08e8b3.yunyao.zxl@alibaba-inc.com> Hi Felix, Nice to reach a consensus! Seems loom's backend code only needs to adapt with the post-call nop. So they are orthogonal with just a little overlapping. Therefore, whichever merged secondly could do that minor adjustment based on the former one. (Sorry for the late reply for being on holiday without opening my mailbox) Regards, Xiaolin ------------------------------------------------------------------ From:yangfei Send Time:2022?10?4?(???) 15:39 To:???(??) Cc:riscv-port-dev Subject:Re: Re: Discuss the RVC implementation Hi Xiaolin, Thanks for the thorough consierations. Comments inlined. > Hi Felix, > > Thank you for taking the time to consider this, and the discussions. > > I think it's certainly a fairly good observation, regarding the three versions that can theoretically cover any case in combination, in an instruction-level granularity. But in reality, I may have some of my personal practices to share: such might be too fine-grained to implement a high-level control, please let me explain it. > > Let alone correctness, there are also code styles and maintenance that we have to focus on for sure. For example, if we want to rewrite one piece of code[1] with a fixed length by removing the `IncompressibleRegion` thing, to an instruction-level granularity, it might become [2]. Please see my comments in that gist. > > 1. From the code style aspect: > We can see it is not looking so promising. In fact, my RVC prototype was in exactly the same way as your thought (so I guess it might be an intuitive and general thought :-) ), in an instruction-level granularity. And I sadly found the code style was messy even to myself. We have to overload lots of things such as _ld(Register, Address), _ld(Register, address), (see my comments) and so on to fulfill any usage in an incompressible piece of code: the overall API changes (like _ld in any form) are not convergent. > > In the comments from the gist, we can see we certainly have to make incompressible all the callees, even the callees of the callees, and so on, in a transitive relation. For example, the 'la(Register, Address)' API itself must be incompressible if we are in an instruction granularity. So we have to make its callee, 'la(Register, address)' API incompressible as well, and so on. It might be indeed an inferno... Yes, I think the 'la' case here already demonstrates the complexity of my proposal. I agree an 'IncompressibleRegion' mark would be simpler and easier for the developers. > 2. From the compression rate aspect: > Besides, we are just talking about la() here. If we directly mark la()s as incompressible, then the la()s called by actually safe and compressible code will be left as incompressible forever. The compression rate will be definitely lower: the main issue here is, of course, the granularity problem -- instruction-level granularity is too fine-grained, which cannot allow us to make high-level controls. > > The current `CompressibleRegion` combined with `IncompressibleRegion` can implement a function-level granularity (neither too fine nor too coarse), which I think is very suitable for the current backend, that we can use them combined to mark everything without many efforts and with a concentration (like the current implementation: the unified relocate() with a lambda[3] and an IncompressibleRegion hidden inside). With them both, we can avoid the above problems with no effort, please see the first line of [1]: the incompressible region directly controls the current function, marking THE 'la' it currently uses incompressible, without affecting the 'la' definitions themselves (movptr, ld ... are as well). So we can avoid lots of invasions to the current backend code base. Nice, right? I have went through your local changes about unified relocate() with a lambda. And I think it looks better and we can go on with this solution if no objections. > 3. From the maintenance aspect: Explicitly adding '_' to every compressible instruction might be a burden for developers and porters. One may say, just adding some '_'s, why burdens? In fact, considering we are porting code like [1] again from AArch64 port. We not only have to change instructions to RISC-V's, but also have to consider RVC... does one instruction have '_' or not? Do its callees, even its callees' callees, have an incompressible version? Even if to myself, it might be a heavy burden :-) I might feel very troublesome - I may just want to ctrl+c and ctrl+v some code without other confusion. So, why not directly throw an `IncompressionRegion` to that stub with a fixed length, so that programmers can normally write their code with the normal "ld", "la" and "addi"? Everything is easily solved without caring for the trifling :-) That makes sense :-) > Just sharing some practices from the same thought and might be verbose again -- there are things not easy to foresee at a glance. When implementing, the pitfalls might be obvious then. From my personal perspective, I may consider the CompressibleRegions plan looks better though, and I am looking forward to your views and suggestions. > > > Best, > Xiaolin > > > [1] https://github.com/zhengxiaolinX/jdk/blob/2ee3204ace5a7767482819be2240982cc0744f8c/src/hotspot/cpu/riscv/gc/shared/barrierSetAssembler_riscv.cpp#L196-L275 > [2] https://gist.github.com/zhengxiaolinX/3151db356a9001f58827d272c8330bb7 > [3] https://github.com/zhengxiaolinX/jdk/blob/2ee3204ace5a7767482819be2240982cc0744f8c/src/hotspot/cpu/riscv/assembler_riscv.hpp#L2167-L2178 -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Sun Oct 9 06:38:15 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Sun, 9 Oct 2022 06:38:15 +0000 Subject: Cross compiling hsdis for RISC-V Message-ID: Hi all, I?m just wondering how I can cross compile hsdis for RISC-V. I downloaded the source of binutils-2.36.1, and configured jdk. bash configure --openjdk-target=riscv64-linux-gnu --with-sysroot=../sysroot-riscv64/ --with-boot-jdk=/usr/lib/jvm/temurin-19-jdk-amd64 --with-debug-level=slowdebug --with-jvm-variants=server --disable-warnings-as-errors --with-hsdis=binutils --with-binutils-src=$PWD/../binutils-2.36.1 OpenJDK configure exited with error when configuring binutils, because binutils wants to be configured with --host when cross compiling. I workarounded the problem by configuring and building binutils manually, so OpenJDK configure will skip building binutils. The flags I used are from jdk/make/autoconf/lib-hsdis.m4 ./configure --host=riscv64-linux-gnu --disable-nls CFLAGS=" -fPIC -O0" make -j I then run jdk configure again, and build jdk with hsdis. make CONF=linux-riscv64-server-slowdebug install-hsdis The build succeeded. But when I try to run a program, JVM will crash with the following backtrace. /mnt/jdk/build/linux-riscv64-server-slowdebug/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+DebugNonSafepoints -XX:+PrintStubCode -Xint -version - - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ICache::fake_stub_for_inlined_icache_flush [0x00007fff95000080, 0x00007fff95000084] (4 bytes) -------------------------------------------------------------------------------- # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fff9d467e84, pid=14142, tid=14143 # The backtrace is as follows. Current thread (0x00007fffa8028f30): JavaThread "Unknown thread" [_thread_in_vm, id=13140, stack(0x00007fffaee75000,0x00007fffaf075000)] Stack: [0x00007fffaee75000,0x00007fffaf075000], sp=0x00007fffaf072560, free space=2037k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [hsdis-riscv64.so+0xcbe84] riscv_get_disassembler+0x1c C [hsdis-riscv64.so+0xc5bb6] disassembler+0x36 C [hsdis-riscv64.so+0x3b6ea] setup_app_data+0x110 C [hsdis-riscv64.so+0x3b0d0] decode+0x2c C [hsdis-riscv64.so+0x3afea] decode_instructions_virtual+0xbc V [libjvm.so+0x6f09c2] decode_env::decode_instructions(unsigned char*, unsigned char*, unsigned char*)+0x238 V [libjvm.so+0x6f14f2] Disassembler::decode(unsigned char*, unsigned char*, outputStream*, AsmRemarks const*, long)+0x10c V [libjvm.so+0xe9c846] StubCodeGenerator::stub_epilog(StubCodeDesc*)+0xea V [libjvm.so+0xe9ca80] StubCodeMark::~StubCodeMark()+0x124 V [libjvm.so+0x8cf36e] ICacheStubGenerator::generate_icache_flush(int (**)(unsigned char*, int, int))+0x72 V [libjvm.so+0x8cef06] AbstractICache::initialize()+0x9a V [libjvm.so+0x8cf146] icache_init()+0xc V [libjvm.so+0x5b4e40] CodeCache::initialize()+0x1ba V [libjvm.so+0x5b4e6c] codeCache_init()+0xc V [libjvm.so+0x8e13dc] init_globals()+0x38 V [libjvm.so+0xf3d58e] Threads::create_vm(JavaVMInitArgs*, bool*)+0x348 V [libjvm.so+0x9f32da] JNI_CreateJavaVM_inner(JavaVM_**, void**, void*)+0xe0 V [libjvm.so+0x9f34d4] JNI_CreateJavaVM+0x2a C [libjli.so+0x6a82] InitializeJVM+0x118 (java.c:1457) C [libjli.so+0x3f00] JavaMain+0xa0 (java.c:413) C [libjli.so+0xa050] ThreadJavaMain+0x24 (java_md.c:650) C [libc.so.6+0x6a450] C [libc.so.6+0xb7ef2] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000350 FWIW, if I use the hsdis from https://builds.shipilev.net/hsdis/hsdis-riscv64.so, everything works fine. Sincerely, Zixian -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Sun Oct 9 06:55:41 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Sun, 9 Oct 2022 06:55:41 +0000 Subject: Cross compiling hsdis for RISC-V In-Reply-To: References: Message-ID: Hi all, I made a mistake. I used the binutils version on the FSF wiki https://directory.fsf.org/wiki/Binutils , which is neither the latest nor what?s supported by hsdis. I built binutils-2.37 per hsdis/README.md, and everything works fine now. Sincerely, Zixian On 9/10/2022, 17:39, "riscv-port-dev" wrote: Hi all, I?m just wondering how I can cross compile hsdis for RISC-V. I downloaded the source of binutils-2.36.1, and configured jdk. bash configure --openjdk-target=riscv64-linux-gnu --with-sysroot=../sysroot-riscv64/ --with-boot-jdk=/usr/lib/jvm/temurin-19-jdk-amd64 --with-debug-level=slowdebug --with-jvm-variants=server --disable-warnings-as-errors --with-hsdis=binutils --with-binutils-src=$PWD/../binutils-2.36.1 OpenJDK configure exited with error when configuring binutils, because binutils wants to be configured with --host when cross compiling. I workarounded the problem by configuring and building binutils manually, so OpenJDK configure will skip building binutils. The flags I used are from jdk/make/autoconf/lib-hsdis.m4 ./configure --host=riscv64-linux-gnu --disable-nls CFLAGS=" -fPIC -O0" make -j I then run jdk configure again, and build jdk with hsdis. make CONF=linux-riscv64-server-slowdebug install-hsdis The build succeeded. But when I try to run a program, JVM will crash with the following backtrace. /mnt/jdk/build/linux-riscv64-server-slowdebug/jdk/bin/java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+DebugNonSafepoints -XX:+PrintStubCode -Xint -version - - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - ICache::fake_stub_for_inlined_icache_flush [0x00007fff95000080, 0x00007fff95000084] (4 bytes) -------------------------------------------------------------------------------- # # A fatal error has been detected by the Java Runtime Environment: # # SIGSEGV (0xb) at pc=0x00007fff9d467e84, pid=14142, tid=14143 # The backtrace is as follows. Current thread (0x00007fffa8028f30): JavaThread "Unknown thread" [_thread_in_vm, id=13140, stack(0x00007fffaee75000,0x00007fffaf075000)] Stack: [0x00007fffaee75000,0x00007fffaf075000], sp=0x00007fffaf072560, free space=2037k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) C [hsdis-riscv64.so+0xcbe84] riscv_get_disassembler+0x1c C [hsdis-riscv64.so+0xc5bb6] disassembler+0x36 C [hsdis-riscv64.so+0x3b6ea] setup_app_data+0x110 C [hsdis-riscv64.so+0x3b0d0] decode+0x2c C [hsdis-riscv64.so+0x3afea] decode_instructions_virtual+0xbc V [libjvm.so+0x6f09c2] decode_env::decode_instructions(unsigned char*, unsigned char*, unsigned char*)+0x238 V [libjvm.so+0x6f14f2] Disassembler::decode(unsigned char*, unsigned char*, outputStream*, AsmRemarks const*, long)+0x10c V [libjvm.so+0xe9c846] StubCodeGenerator::stub_epilog(StubCodeDesc*)+0xea V [libjvm.so+0xe9ca80] StubCodeMark::~StubCodeMark()+0x124 V [libjvm.so+0x8cf36e] ICacheStubGenerator::generate_icache_flush(int (**)(unsigned char*, int, int))+0x72 V [libjvm.so+0x8cef06] AbstractICache::initialize()+0x9a V [libjvm.so+0x8cf146] icache_init()+0xc V [libjvm.so+0x5b4e40] CodeCache::initialize()+0x1ba V [libjvm.so+0x5b4e6c] codeCache_init()+0xc V [libjvm.so+0x8e13dc] init_globals()+0x38 V [libjvm.so+0xf3d58e] Threads::create_vm(JavaVMInitArgs*, bool*)+0x348 V [libjvm.so+0x9f32da] JNI_CreateJavaVM_inner(JavaVM_**, void**, void*)+0xe0 V [libjvm.so+0x9f34d4] JNI_CreateJavaVM+0x2a C [libjli.so+0x6a82] InitializeJVM+0x118 (java.c:1457) C [libjli.so+0x3f00] JavaMain+0xa0 (java.c:413) C [libjli.so+0xa050] ThreadJavaMain+0x24 (java_md.c:650) C [libc.so.6+0x6a450] C [libc.so.6+0xb7ef2] siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x0000000000000350 FWIW, if I use the hsdis from https://builds.shipilev.net/hsdis/hsdis-riscv64.so , everything works fine. Sincerely, Zixian -------------- next part -------------- An HTML attachment was scrubbed... URL: From yunyao.zxl at alibaba-inc.com Tue Oct 11 05:20:22 2022 From: yunyao.zxl at alibaba-inc.com (Xiaolin Zheng) Date: Tue, 11 Oct 2022 13:20:22 +0800 Subject: =?UTF-8?B?UmU6IERpc2N1c3MgdGhlIFJWQyBpbXBsZW1lbnRhdGlvbg==?= In-Reply-To: <42bdf74a.1322.1834b9400ed.Coremail.yangfei@iscas.ac.cn> References: <2d7bbad2-7ade-4b38-91b5-12c4c0a91602.yunyao.zxl@alibaba-inc.com>, <42bdf74a.1322.1834b9400ed.Coremail.yangfei@iscas.ac.cn> Message-ID: <924c628d-6770-4aa8-9ba1-0c9596f7228b.yunyao.zxl@alibaba-inc.com> Hi team, Following up on the performance data on SPECjbb2015 (composite mode) for RVC. These days I have been going in for some performance data on SPECjbb2015. Shortly, there seems to be a (maybe) 1.5%~2.5% performance gain of mutators under some observations and it might be reasonable for it aligns the results I have observed. Results are at [0]. Saying "maybe" because the SPECjbb2015 results on my Hifive Unmatched board seem to have a ~?5% fluctuations and I think they are reasonable too. So there's a question of if the seeming performance gain is legal or not. Wrote a simple program to calculate the average max-JOPS results for convenience. (There has been a result from philosophers evaluating the "whitelist mode" implementation of RVC, see [1]; I follow a similar style.) Let us have [A] RVC branch at [2] but without the histogram patch [B] The simple unaligned access patch at [3] for I am interested in the unaligned access thing (though reading from the results afterward it seems to behave normally, having nothing special) 1. [A] + [B] + g1 http://cr.openjdk.java.net/~xlinzheng/rvc-size/performance-specjbb2015/g1.1.jpg Mutators seem to have a 1.69% gain; The T.TEST result, shows the confidence level is only 62%, with 2-tailed. 2. [A] + [B] + parallel gc http://cr.openjdk.java.net/~xlinzheng/rvc-size/performance-specjbb2015/parallel.1.jpg There seems to have a 2.62% gain; The T.TEST result shows the confidence level is 98.6%, so seems okay. 3. [A] + g1 http://cr.openjdk.java.net/~xlinzheng/rvc-size/performance-specjbb2015/g1.2.jpg Seems a 3.64% gain? A confidence level 99.1% is shown by Excel; but I doubt it a bit because I have never observed such data though. The max-JOPS data at last are too high to be considered normal. I guess my board overdosed at that time, so I keep a reserved attitude toward it. 4. [A] + parallel gc http://cr.openjdk.java.net/~xlinzheng/rvc-size/performance-specjbb2015/parallel.2.jpg I just invoked this yesterday so the sample data is not enough. I didn't drop the lowest/highest results this time accordingly. Showing a 1.5% gain, the confidence level is only 70% though. (maybe samples are not big enough) Evaluated on a general Hifive Unmatched board which (seemingly) we all have, so the results should be reproducible I guess. Though I believe there should be performance gain theoretically in generated code for the potential "I-cache enlargement" from RVC's code size reduction, well, for this feature currently I think no regression is enough though. The performance gain from RVC is a special bonus to me (or to us), so this post is just showing some evaluations to follow up on the performance aspect mentioned weeks ago. Accordingly, going to submit a PR for the rest part of RVC (to implement the "blacklist mode"). Thanks, Xiaolin [0] http://cr.openjdk.java.net/~xlinzheng/rvc-size/performance-specjbb2015/ [1] https://mail.openjdk.org/pipermail/riscv-port-dev/2022-September/000629.html [2] https://github.com/zhengxiaolinX/jdk/commits/REBASE-rvc-beautify-histogram [3] https://github.com/zhengxiaolinX/jdk/commit/f9e28e72ce1ac51b3da1a501e8ea33eaf076c343 ------------------------------------------------------------------ From:yangfei Send Time:2022?9?17?(???) 21:12 To:???(??) Cc:riscv-port-dev Subject:Re: Discuss the RVC implementation Hi Xiaolin, Your new proposal for supporting the RVC extension looks interesting. May I ask if you have any performance data including code size measured? Also it's appreciated if you have more details about the issue with MachBranch nodes. Thanks, Fei -----Original Messages----- From:"Xiaolin Zheng" Sent Time:2022-09-15 10:52:59 (Thursday) To: riscv-port-dev Cc: Subject: Discuss the RVC implementation Hi team, I am going to describe a different implementation of RVC for our backend. ## Background The RISC-V C extension, also known as RVC, could transform 4-byte instructions to 2-byte counterparts when eligible (for example, as the manual, Rd/Rs of instruction ranges from [x8,x15] might be one common requirement, etc.). ## The current implementation in the Hotspot The current implementation[0] is a transient one, introducing a "CompressibleRegion" by using RTTI[1] to indicate that instructions inside these regions can be safely substituted by the RVC counterparts, if convertible; and the implementation also uses a, say, "whitelist mode" by using the "CompressibleRegion" mentioned above to "manually mark out safe regions", then batch emit them if could. However, after a deeper look, we might discover the current "whitelist mode" has several shortages: ## Shortages of the current implementation 1. Coverages: The current implementation only covers some of C2 match rules, and only some small part of stub code, so there is obviously far more space to reduce the total code size. In my observations, some RISC-V instruction sequences generally occupy a bit more space than AArch64 ones[2]. With the new implementations, we could achieve a code size level alike AArch64's generated code. Some better, some still worse than AArch64 one in my simple observation. 2. Though safe, I'd say it's very much not easy to maintain. The background is, most of the patchable instructions cannot be easily transformed into their shorter counterparts[3], and they need to be prevented from being compressed. So comes the question: we must make sure no patchable relocation is inside the range of a "CompressibleRegion". For example, the string comparison intrinsic function[4] looks very delicious: transforming it and its siblings may result in a yummy compression rate. But programmers might have to check lots of its callees to find if there is just one patchable relocation hidden inside that causes the whole intrinsic incompressible. This could cause extra burden for programmers, so I bet no one would like to add "CompressibleRegion" for his/her code :-) 3. Performance: Better performance of generated code is a little side effect this extension gives us, the smaller the I$ size, the better performance though - please see Andrew Waterman's paper[5] for more reference there. Anyway, it looks like a higher general compression rate is better for performance. The main issue here is the granularity of "CompressibleRegion" is a bit coarse. "Why not exclude the incompressible parts" may come up to us naturally. And after some diggings, we may find: we just need to exclude countable places that would be patched back (mostly relocations), and several code slices with a fixed length, which will be calculated, such as "emit_static_call_stub". All remaining instructions could be safely transformed into RVC counterparts if eligible. So maybe, say, the "blacklist mode"? ## The new implementation To implement the "blacklist mode" in the backend, we need two things: 1. an "IncompressibleRegion", indicating instructions inside it should remain in their normal 4-byte form no matter what happens. 2. a simple strategy to exclude patchable instructions, mainly for relocations. So we can see the new strategy is highly bounded to relocations' positions: We all know the "relocate()" in Hotspot VM is a mark that only has an explicit "start point" without an end point, and some of them could be patched back. Therefore, we can use a simple strategy: introduce a lambda as another argument to assign "end point" semantics to the relocations, for completing our requirements without extra costs. For example: Originally: ``` __ relocate(safepoint_pc.rspec()); __ la(t0, safepoint_pc.target()); __ sd(t0, Address(xthread, JavaThread::saved_exception_pc_offset())); ``` After introducing a simple lambda as an extra argument: ``` __ relocate(safepoint_pc.rspec(), [&] { // The relocate() hides an "IncompressibleRegion" in it __ la(t0, safepoint_pc.target()); // This patchable instruction sequence is incompressible }); _ sd(t0, Address(xthread, JavaThread::saved_exception_pc_offset())); ``` Well, simple but effective. Excluding such countable dynamically patchable places and unifying all relocations, all other instructions can be safely transformed, without messing up the current code style. Programmers could just keep aligning the same style; most of the time they have no need to care about whether the RVC exists or not and things get converted automatically. The proposed new sample code is again, here[6]. ## Other things worth being noticed 1. Instruction patching issues With the C extension, the backend mixes with both 2-byte and 4-byte instructions. It gets a little CISC alike. We know the Hotspot would patch instructions when code is running at full speed, such as call instructions, nops used for deoptimizations (the nops at the entry points, and post-call nops after loom). Instruction patching is delicate so we must carefully handle such places, to keep these 4-byte instructions from spanning cachelines. Though remaining a 4-byte normal form even with RVC, they might sit at a 2-byte aligned boundary. Such cases should definitely not happen, for patching such places spanning cachelines would lose the atomicity. So shortly, we must properly align them, such as [7][8]. Such a problem could exist with RVC, no matter "whitelist mode" or "blacklist mode". It is a general problem for instruction patching. I will add more strong assertions to the potential places (trampoline_call might be a very good spot, for patchable "static_call", "opt_virtual" and "virtual" relocations) to check alignment in the future patches. 2. MachBranch Nodes And MachBranch nodes: they are not easy to be tamed because the "fake label"[9] in PhaseOutput::scratch_emit_size() cannot tell us the real distance of the label. But we can leave them alone in this discussion, for there will be patches to handle those afterward. That's nearly all. Thanks for reaching here despite the verbosity. It would be very nice to receive any suggestions. Best, Xiaolin [0] Original patch: https://github.com/openjdk/riscv-port/pull/34 [1] Of course, the "CompressibleRegion" is good, I like it; and this idea is not from myself. [2] For a simple example, a much commonly used fixed-length movptr() uses up six 4-byte instructions (lui+addi+slli+addi+slli+addi, MIPS alike instructions using arithmetical calculations with signed extensions, but not anyone's fault :-) ), while the AArch64 counterpart only takes three 4-byte instructions (movz+movk+movk). They are both going to mov a 48-bit immediate. After accumulation, the size differs quite a lot. [3] 2-byte instructions have fewer bits, so comes shorter immediate encoding etc. compared to the 4-byte counterparts. After we transform patchable instructions (ones at marks of patchable relocations, etc.) to 2-byte ones, when they are patched to a larger value or farther distances afterward, it is possible that they sadly find themselves, the shorter instructions, cannot cover the newly patched value. So we need to exclude patchable instructions (at the relocation marks etc.) from being compressed. [4] https://github.com/openjdk/jdk/blob/7f3250d71c4866a64eb73f52140c669fe90f122f/src/hotspot/cpu/riscv/riscv.ad#L10032-L10035 [5] https://digitalassets.lib.berkeley.edu/etd/ucb/text/Waterman_berkeley_0028E_15908.pdf , Page 64: "5.4 The RVC Extension, Performance Implications" [6] https://github.com/zhengxiaolinX/jdk/tree/REBASE-rvc-beautify [7] https://github.com/openjdk/jdk/blob/7f3250d71c4866a64eb73f52140c669fe90f122f/src/hotspot/cpu/riscv/riscv.ad#L9873 [8] https://github.com/openjdk/jdk/blob/7f3250d71c4866a64eb73f52140c669fe90f122f/src/hotspot/cpu/riscv/c1_LIRAssembler_riscv.cpp#L1348-L1353 [9] https://github.com/openjdk/jdk/blob/211fab8d361822bbd1a34a88626853bf4a029af5/src/hotspot/share/opto/output.cpp#L3331-L3340 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ludovic at rivosinc.com Wed Oct 12 09:57:51 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Wed, 12 Oct 2022 11:57:51 +0200 Subject: Cross compiling hsdis for RISC-V In-Reply-To: References: Message-ID: Hi, I've made a local patch to be able to cross-compile hsdis based on binutils at https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 I'm blocked on getting access to my JBS again in order to submit it upstream. Feel free to use it in the meantime. Thanks, Ludovic On Sun, Oct 9, 2022 at 8:57 AM Zixian Cai wrote: > Hi all, > > > > I made a mistake. I used the binutils version on the FSF wiki > https://directory.fsf.org/wiki/Binutils , which is neither the latest nor > what?s supported by hsdis. I built binutils-2.37 per hsdis/README.md, and > everything works fine now. > > > > Sincerely, > > Zixian > > > > On 9/10/2022, 17:39, "riscv-port-dev" > wrote: > > Hi all, > > > > I?m just wondering how I can cross compile hsdis for RISC-V. > > I downloaded the source of binutils-2.36.1, and configured jdk. > > > > bash configure --openjdk-target=riscv64-linux-gnu > --with-sysroot=../sysroot-riscv64/ > --with-boot-jdk=/usr/lib/jvm/temurin-19-jdk-amd64 > --with-debug-level=slowdebug --with-jvm-variants=server > --disable-warnings-as-errors --with-hsdis=binutils > --with-binutils-src=$PWD/../binutils-2.36.1 > > > > OpenJDK configure exited with error when configuring binutils, because > binutils wants to be configured with --host when cross compiling. > > I workarounded the problem by configuring and building binutils manually, > so OpenJDK configure will skip building binutils. > > The flags I used are from jdk/make/autoconf/lib-hsdis.m4 > > > > ./configure --host=riscv64-linux-gnu --disable-nls CFLAGS=" -fPIC -O0" > > make -j > > > > I then run jdk configure again, and build jdk with hsdis. > > make CONF=linux-riscv64-server-slowdebug install-hsdis > > > > The build succeeded. But when I try to run a program, JVM will crash with > the following backtrace. > > > > /mnt/jdk/build/linux-riscv64-server-slowdebug/jdk/bin/java > -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:+DebugNonSafepoints > -XX:+PrintStubCode -Xint -version > > > > - - - [BEGIN] - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - > > ICache::fake_stub_for_inlined_icache_flush [0x00007fff95000080, > 0x00007fff95000084] (4 bytes) > > > -------------------------------------------------------------------------------- > > # > > # A fatal error has been detected by the Java Runtime Environment: > > # > > # SIGSEGV (0xb) at pc=0x00007fff9d467e84, pid=14142, tid=14143 > > # > > > > The backtrace is as follows. > > > > Current thread (0x00007fffa8028f30): JavaThread "Unknown thread" > [_thread_in_vm, id=13140, stack(0x00007fffaee75000,0x00007fffaf075000)] > > > > Stack: > [0x00007fffaee75000,0x00007fffaf075000], sp=0x00007fffaf072560, free > space=2037k > > Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native > code) > > C [hsdis-riscv64.so+0xcbe84] riscv_get_disassembler+0x1c > > C [hsdis-riscv64.so+0xc5bb6] disassembler+0x36 > > C [hsdis-riscv64.so+0x3b6ea] setup_app_data+0x110 > > C [hsdis-riscv64.so+0x3b0d0] decode+0x2c > > C [hsdis-riscv64.so+0x3afea] decode_instructions_virtual+0xbc > > V [libjvm.so+0x6f09c2] decode_env::decode_instructions(unsigned char*, > unsigned char*, unsigned char*)+0x238 > > V [libjvm.so+0x6f14f2] Disassembler::decode(unsigned char*, unsigned > char*, outputStream*, AsmRemarks const*, long)+0x10c > > V [libjvm.so+0xe9c846] StubCodeGenerator::stub_epilog(StubCodeDesc*)+0xea > > V [libjvm.so+0xe9ca80] StubCodeMark::~StubCodeMark()+0x124 > > V [libjvm.so+0x8cf36e] ICacheStubGenerator::generate_icache_flush(int > (**)(unsigned char*, int, int))+0x72 > > V [libjvm.so+0x8cef06] AbstractICache::initialize()+0x9a > > V [libjvm.so+0x8cf146] icache_init()+0xc > > V [libjvm.so+0x5b4e40] CodeCache::initialize()+0x1ba > > V [libjvm.so+0x5b4e6c] codeCache_init()+0xc > > V [libjvm.so+0x8e13dc] init_globals()+0x38 > > V [libjvm.so+0xf3d58e] Threads::create_vm(JavaVMInitArgs*, bool*)+0x348 > > V [libjvm.so+0x9f32da] JNI_CreateJavaVM_inner(JavaVM_**, void**, > void*)+0xe0 > > V [libjvm.so+0x9f34d4] JNI_CreateJavaVM+0x2a > > C [libjli.so+0x6a82] InitializeJVM+0x118 (java.c:1457) > > C [libjli.so+0x3f00] JavaMain+0xa0 (java.c:413) > > C [libjli.so+0xa050] ThreadJavaMain+0x24 (java_md.c:650) > > C [libc.so.6+0x6a450] > > C [libc.so.6+0xb7ef2] > > > > > > siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: > 0x0000000000000350 > > > > FWIW, if I use the hsdis from > https://builds.shipilev.net/hsdis/hsdis-riscv64.so < > https://aus01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbuilds.shipilev.net%2Fhsdis%2Fhsdis-riscv64.so&data=05%7C01%7Czixian.cai%40anu.edu.au%7C33c79318a1244d1202f608daa9c0f2fc%7Ce37d725cab5c46249ae5f0533e486437%7C0%7C0%7C638008943428531908%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2FyJ9RFlwnkhXCWZdihUsHQtMdHlpMCqL8fo26Czakmc%3D&reserved=0> > , > everything works fine. > > > > Sincerely, > > Zixian > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shade at redhat.com Wed Oct 12 10:13:38 2022 From: shade at redhat.com (Aleksey Shipilev) Date: Wed, 12 Oct 2022 12:13:38 +0200 Subject: Cross compiling hsdis for RISC-V In-Reply-To: References: Message-ID: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> On 10/12/22 11:57, Ludovic Henry wrote: > I've made a local patch to be able to cross-compile hsdis based on binutils at > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > I'm blocked on getting access to my JBS again in order to submit it upstream. Feel free to use it in > the meantime. You don't need this anymore, as mainline how has this: https://github.com/openjdk/jdk/commit/392f35df4be1a9a8d7a67a25ae01230c7dd060ac It allows to cross-compile hsdis on every arch, including RISC-V :) -- Thanks, -Aleksey From ludovic at rivosinc.com Wed Oct 12 11:28:10 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Wed, 12 Oct 2022 13:28:10 +0200 Subject: Cross compiling hsdis for RISC-V In-Reply-To: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> References: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> Message-ID: Great! It's still using the same binutils src folder to build multiple architectures, meaning there will be a conflict when cross-compiling and/or native-compiling. The change I propose also integrates using the folder `build//binutils` as the build folder for binutils, so that the binutils source tree stays clean and can be used for multiple targets. Whenever I can submit a bug in JBS (or if anyone feels like doing it and send me the link), I'll submit these changes as well. Thanks, Ludovic On Wed, Oct 12, 2022 at 12:13 PM Aleksey Shipilev wrote: > On 10/12/22 11:57, Ludovic Henry wrote: > > I've made a local patch to be able to cross-compile hsdis based on > binutils at > > > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > < > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > > > > I'm blocked on getting access to my JBS again in order to submit it > upstream. Feel free to use it in > > the meantime. > > You don't need this anymore, as mainline how has this: > > https://github.com/openjdk/jdk/commit/392f35df4be1a9a8d7a67a25ae01230c7dd060ac > > It allows to cross-compile hsdis on every arch, including RISC-V :) > > -- > Thanks, > -Aleksey > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dingli at iscas.ac.cn Thu Oct 13 03:15:43 2022 From: dingli at iscas.ac.cn (Dingli Zhang) Date: Thu, 13 Oct 2022 11:15:43 +0800 Subject: Cross compiling hsdis for RISC-V In-Reply-To: References: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> Message-ID: <191EE71D-AD78-473F-B780-8298789CB2E9@iscas.ac.cn> Hi Ludovic, Thank you for your patch! I have created one for you: https://bugs.openjdk.org/browse/JDK-8295251 Best regards, Dingli > On Oct 12, 2022, at 19:28, Ludovic Henry wrote: > > Great! It's still using the same binutils src folder to build multiple architectures, meaning there will be a conflict when cross-compiling and/or native-compiling. The change I propose also integrates using the folder `build//binutils` as the build folder for binutils, so that the binutils source tree stays clean and can be used for multiple targets. > > Whenever I can submit a bug in JBS (or if anyone feels like doing it and send me the link), I'll submit these changes as well. > > Thanks, > Ludovic > > On Wed, Oct 12, 2022 at 12:13 PM Aleksey Shipilev wrote: > On 10/12/22 11:57, Ludovic Henry wrote: > > I've made a local patch to be able to cross-compile hsdis based on binutils at > > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > > > > I'm blocked on getting access to my JBS again in order to submit it upstream. Feel free to use it in > > the meantime. > > You don't need this anymore, as mainline how has this: > https://github.com/openjdk/jdk/commit/392f35df4be1a9a8d7a67a25ae01230c7dd060ac > > It allows to cross-compile hsdis on every arch, including RISC-V :) > > -- > Thanks, > -Aleksey > From ludovic at rivosinc.com Thu Oct 13 09:07:22 2022 From: ludovic at rivosinc.com (Ludovic Henry) Date: Thu, 13 Oct 2022 11:07:22 +0200 Subject: Cross compiling hsdis for RISC-V In-Reply-To: <191EE71D-AD78-473F-B780-8298789CB2E9@iscas.ac.cn> References: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> <191EE71D-AD78-473F-B780-8298789CB2E9@iscas.ac.cn> Message-ID: Hi Dingli, Sorry I saw your message too late. I recovered access to my JBS account this morning (yay!) and submitted https://bugs.openjdk.org/browse/JDK-8295262. The corresponding PR is https://github.com/openjdk/jdk/pull/10689. I'll mark yours as duplicate to help keep track. Thank you! Ludovic On Thu, Oct 13, 2022 at 5:15 AM Dingli Zhang wrote: > Hi Ludovic, > > Thank you for your patch! > I have created one for you: https://bugs.openjdk.org/browse/JDK-8295251 > > Best regards, > Dingli > > > On Oct 12, 2022, at 19:28, Ludovic Henry wrote: > > > > Great! It's still using the same binutils src folder to build multiple > architectures, meaning there will be a conflict when cross-compiling and/or > native-compiling. The change I propose also integrates using the folder > `build//binutils` as the build folder for binutils, so that the > binutils source tree stays clean and can be used for multiple targets. > > > > Whenever I can submit a bug in JBS (or if anyone feels like doing it and > send me the link), I'll submit these changes as well. > > > > Thanks, > > Ludovic > > > > On Wed, Oct 12, 2022 at 12:13 PM Aleksey Shipilev > wrote: > > On 10/12/22 11:57, Ludovic Henry wrote: > > > I've made a local patch to be able to cross-compile hsdis based on > binutils at > > > > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > < > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > > > > > > I'm blocked on getting access to my JBS again in order to submit it > upstream. Feel free to use it in > > > the meantime. > > > > You don't need this anymore, as mainline how has this: > > > https://github.com/openjdk/jdk/commit/392f35df4be1a9a8d7a67a25ae01230c7dd060ac > > > > It allows to cross-compile hsdis on every arch, including RISC-V :) > > > > -- > > Thanks, > > -Aleksey > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dingli at iscas.ac.cn Thu Oct 13 09:19:31 2022 From: dingli at iscas.ac.cn (Dingli Zhang) Date: Thu, 13 Oct 2022 17:19:31 +0800 Subject: Cross compiling hsdis for RISC-V In-Reply-To: References: <1c56d2fa-efa6-9581-66bf-862d2c241dfe@redhat.com> <191EE71D-AD78-473F-B780-8298789CB2E9@iscas.ac.cn> Message-ID: <57359ED8-80DD-4DDC-B938-459DD3EFBC2A@iscas.ac.cn> Hi Ludovic, I'm very glad to hear that your account is recovered! : ) I will close the duplicate issue. Best regards, Dingli > On Oct 13, 2022, at 17:07, Ludovic Henry wrote: > > Hi Dingli, > > Sorry I saw your message too late. I recovered access to my JBS account this morning (yay!) and submitted https://bugs.openjdk.org/browse/JDK-8295262. The corresponding PR is https://github.com/openjdk/jdk/pull/10689. > > I'll mark yours as duplicate to help keep track. > > Thank you! > Ludovic > > On Thu, Oct 13, 2022 at 5:15 AM Dingli Zhang wrote: > Hi Ludovic, > > Thank you for your patch! > I have created one for you: https://bugs.openjdk.org/browse/JDK-8295251 > > Best regards, > Dingli > > > On Oct 12, 2022, at 19:28, Ludovic Henry wrote: > > > > Great! It's still using the same binutils src folder to build multiple architectures, meaning there will be a conflict when cross-compiling and/or native-compiling. The change I propose also integrates using the folder `build//binutils` as the build folder for binutils, so that the binutils source tree stays clean and can be used for multiple targets. > > > > Whenever I can submit a bug in JBS (or if anyone feels like doing it and send me the link), I'll submit these changes as well. > > > > Thanks, > > Ludovic > > > > On Wed, Oct 12, 2022 at 12:13 PM Aleksey Shipilev wrote: > > On 10/12/22 11:57, Ludovic Henry wrote: > > > I've made a local patch to be able to cross-compile hsdis based on binutils at > > > https://github.com/rivosinc/jdk/commit/4c88e66b654e4c29ede7c221d223298497fa06c2 > > > > > > > > > I'm blocked on getting access to my JBS again in order to submit it upstream. Feel free to use it in > > > the meantime. > > > > You don't need this anymore, as mainline how has this: > > https://github.com/openjdk/jdk/commit/392f35df4be1a9a8d7a67a25ae01230c7dd060ac > > > > It allows to cross-compile hsdis on every arch, including RISC-V :) > > > > -- > > Thanks, > > -Aleksey > > > From zixian.cai at anu.edu.au Tue Oct 18 13:41:59 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Tue, 18 Oct 2022 13:41:59 +0000 Subject: Assertion error in codegen due to sharing C2 node in post store barrier? Message-ID: Hi all, I?m working on a C2 write barrier. Interesting enough, the barrier code works fine on x86_64, and only fails with RISC-V. I?m wondering whether there?s anything in the RISC-V codegen that might cause the above behaviour. I would really appreciate if people could point out the correct pattern for writing such barrier code, or any hint for further debugging. Below are more details. The barrier in its simplest form transforms each oop store into an oop store and a runtime call. Here is an example for CAS. virtual Node* atomic_cmpxchg_val_at_resolved(C2AtomicParseAccess& access, Node* expected_val, Node* new_val, const Type* value_type) const { Node* result = BarrierSetC2::atomic_cmpxchg_val_at_resolved(access, expected_val, new_val, value_type); if (access.is_oop()) object_reference_write_post(access.kit(), access.base(), access.addr().node(), new_val); return result; } void MMTkObjectBarrierSetC2::object_reference_write_post(GraphKit* kit, Node* src, Node* slot, Node* val) const { if (can_remove_barrier(kit, &kit->gvn(), src, slot, val, /* skip_const_null */ true)) return; MMTkIdealKit ideal(kit, true); const TypeFunc* tf = __ func_type(src->bottom_type(), slot->bottom_type(), val->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src, slot, val); kit->final_sync(ideal); // Final sync IdealKit and GraphKit. } Currently, I?m having an assertion error in ReduceInst (frame #0). # Internal Error (/home/zixianc/mmtk-riscv/jdk-mmtk/src/hotspot/share/opto/matcher.cpp:1791), pid=1275, tid=1291 # assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf)) failed: duplicating node that's already been matched I have been debugging in gdb and narrowed down the offending method to java.util.concurrent.ConcurrentHashMap::casTabAt. The s->_leaf in above is the AddP (o48) below. In gdb, the call stack of the C2 compiler starts with match_tree(CompareAndSwapP (o58)) (frame #3), which in turn calls ReduceInst (frame #2), ReduceInst_Interior (frame #1), and finally ReduceInst on AddP (o48) (frame #0). Dumping AddP (o48) gives the following. leaf->dump(3) o28 ConI === o0 [[ o29 ]] #int:3 o27 ConvI2L === _ o11 [[ o29 ]] #long:minint..maxint o0 Root === o0 o44 o74 [[ o0 o1 o3 o39 o36 o28 o32 o76 0 ]] inner o29 LShiftL === _ o27 o28 [[ o49 ]] o3 Start === o3 o0 [[ o3 o5 o6 o7 o8 o9 o10 o11 o12 o13 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:java/util/concurrent/ConcurrentHashMap$Node *[int:>=0] *, 6:int, 7:java/util/concurrent/ConcurrentHashMap$Node *, 8:java/util/co ncurrent/ConcurrentHashMap$Node *} o32 ConL === o0 [[ o48 ]] #long:24 o49 AddP === _ o10 o10 o29 [[ o48 ]] o10 Parm === o3 [[ o37 o61 o49 o49 o48 ]] Parm0: java/util/concurrent/ConcurrentHashMap$Node *[int:>=0] * o48 AddP === _ o10 o49 o32 [[ o58 o61 ]] leaf->dump(-3) o48 AddP === _ o10 o49 o32 [[ o58 o61 ]] o58 CompareAndSwapP === o55 o56 o48 o75 |o45 [[ o59 o69 o74 9 11 34 ]] o61 CallLeaf === o55 o1 o56 o1 o1 (o10 o48 o13 ) [[ o62 o63 ]] # mmtk_barrier_call void ( java/util/concurrent/ConcurrentHashMap$Node *[int:>=0]:NotNull *, java/util/concurrent/ConcurrentHashMap$Node *[int:>=0]:NotNull+any *, java/util/concurrent/ConcurrentH ashMap$Node * ) o59 SCMemProj === o58 [[ o64 32 ]] Memory: @BotPTR *+bot, idx=Bot; o69 MemBarAcquire === o66 o1 o67 o1 o1 o58 [[ o70 o71 10 ]] o74 Return === o70 o6 o71 o8 o9 returns o58 [[ o0 1 ]] 9 Ret === o70 o6 o71 o8 o9 o58 [[ ]] 11 membar_acquire === o66 o1 o67 o1 o1 o58 [[ ]] !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) 34 SCMemProj === o58 [[ ]] Memory: @BotPTR *+bot, idx=Bot; !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o62 Proj === o61 [[ o65 ]] #0 o63 Proj === o61 [[ o64 ]] #2 Memory: @rawptr:BotPTR, idx=Raw; o64 MergeMem === _ o1 o56 o63 o59 [[ o65 13 ]] { N63:rawptr:BotPTR N59:java/lang/Object *[int:>=0]+any * } Memory: @BotPTR *+bot, idx=Bot; 32 MergeMem === _ 0 25 33 o59 [[ ]] { N33:rawptr:BotPTR N59:java/lang/Object *[int:>=0]+any * } Memory: @BotPTR *+bot, idx=Bot; !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o70 Proj === o69 [[ o74 9 ]] #0 o71 Proj === o69 [[ o74 9 ]] #2 Memory: @BotPTR *+bot, idx=Bot; 10 MachProj === o69 [[ ]] #0/unmatched !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o0 Root === o0 o44 o74 [[ o0 o1 o3 o39 o36 o28 o32 o76 0 ]] inner 1 Root === 1 2 o74 [[ 1 6 ]] inner o65 MemBarCPUOrder === o62 o1 o64 o1 o1 [[ o66 o67 12 ]] My guess is that the AddP node (o48) has been reduced when generating machine code for CallLeaf (o61), so when visiting CompareAndSwapP (o58), AddP (o48) will be visited again, and thus failing the assertion. If I change const TypeFunc* tf = __ func_type(src->bottom_type(), slot->bottom_type(), val->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src, slot, val); into const TypeFunc* tf = __ func_type(src->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src); the assertion error will disappear. Note that the slot is the AddP node (an offset from the src pointer). Sincerely, Zixian -------------- next part -------------- An HTML attachment was scrubbed... URL: From zixian.cai at anu.edu.au Wed Oct 19 07:56:34 2022 From: zixian.cai at anu.edu.au (Zixian Cai) Date: Wed, 19 Oct 2022 07:56:34 +0000 Subject: Assertion error in codegen due to sharing C2 node in post store barrier? In-Reply-To: References: Message-ID: Hi all, I put more print statements in matcher.cpp and narrowed down the bug. When transforming CallLeaf, a bunch of arguments are pushed to the mstack. The AddP (48) node is into a MachNode after CallLeaf. When transforming CompareAndSwapP, the AddP operand is recursively passed to ReduceInst. https://github.com/openjdk/jdk/blob/f502ab85c987be827d36b0a29f77ec5ce5bb3d01/src/hotspot/share/opto/matcher.cpp#L1973 But because AddP has been transformed (and thus set_new_node has been called), the second call to ReduceInst with AddP will fail the assertion. (C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf) https://github.com/openjdk/jdk/blob/f502ab85c987be827d36b0a29f77ec5ce5bb3d01/src/hotspot/share/opto/matcher.cpp#L1787 Because AddP is not a new node (not MachNode), and it has been matched (has a new node). This is not a problem on x86_64, because when matching CompareAndSwapP, the AddP operand is passed to ReduceOper instead of ReduceInst. https://github.com/openjdk/jdk/blob/f502ab85c987be827d36b0a29f77ec5ce5bb3d01/src/hotspot/share/opto/matcher.cpp#L1960 I?m wondering whether any of you have had this problem before, and what a clean solution might look like. Sincerely, Zixian On 19/10/2022, 00:42, "Zixian Cai" wrote: Hi all, I?m working on a C2 write barrier. Interesting enough, the barrier code works fine on x86_64, and only fails with RISC-V. I?m wondering whether there?s anything in the RISC-V codegen that might cause the above behaviour. I would really appreciate if people could point out the correct pattern for writing such barrier code, or any hint for further debugging. Below are more details. The barrier in its simplest form transforms each oop store into an oop store and a runtime call. Here is an example for CAS. virtual Node* atomic_cmpxchg_val_at_resolved(C2AtomicParseAccess& access, Node* expected_val, Node* new_val, const Type* value_type) const { Node* result = BarrierSetC2::atomic_cmpxchg_val_at_resolved(access, expected_val, new_val, value_type); if (access.is_oop()) object_reference_write_post(access.kit(), access.base(), access.addr().node(), new_val); return result; } void MMTkObjectBarrierSetC2::object_reference_write_post(GraphKit* kit, Node* src, Node* slot, Node* val) const { if (can_remove_barrier(kit, &kit->gvn(), src, slot, val, /* skip_const_null */ true)) return; MMTkIdealKit ideal(kit, true); const TypeFunc* tf = __ func_type(src->bottom_type(), slot->bottom_type(), val->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src, slot, val); kit->final_sync(ideal); // Final sync IdealKit and GraphKit. } Currently, I?m having an assertion error in ReduceInst (frame #0). # Internal Error (/home/zixianc/mmtk-riscv/jdk-mmtk/src/hotspot/share/opto/matcher.cpp:1791), pid=1275, tid=1291 # assert(C->node_arena()->contains(s->_leaf) || !has_new_node(s->_leaf)) failed: duplicating node that's already been matched I have been debugging in gdb and narrowed down the offending method to java.util.concurrent.ConcurrentHashMap::casTabAt. The s->_leaf in above is the AddP (o48) below. In gdb, the call stack of the C2 compiler starts with match_tree(CompareAndSwapP (o58)) (frame #3), which in turn calls ReduceInst (frame #2), ReduceInst_Interior (frame #1), and finally ReduceInst on AddP (o48) (frame #0). Dumping AddP (o48) gives the following. leaf->dump(3) o28 ConI === o0 [[ o29 ]] #int:3 o27 ConvI2L === _ o11 [[ o29 ]] #long:minint..maxint o0 Root === o0 o44 o74 [[ o0 o1 o3 o39 o36 o28 o32 o76 0 ]] inner o29 LShiftL === _ o27 o28 [[ o49 ]] o3 Start === o3 o0 [[ o3 o5 o6 o7 o8 o9 o10 o11 o12 o13 ]] #{0:control, 1:abIO, 2:memory, 3:rawptr:BotPTR, 4:return_address, 5:java/util/concurrent/ConcurrentHashMap$Node *[int:>=0] *, 6:int, 7:java/util/concurrent/ConcurrentHashMap$Node *, 8:java/util/co ncurrent/ConcurrentHashMap$Node *} o32 ConL === o0 [[ o48 ]] #long:24 o49 AddP === _ o10 o10 o29 [[ o48 ]] o10 Parm === o3 [[ o37 o61 o49 o49 o48 ]] Parm0: java/util/concurrent/ConcurrentHashMap$Node *[int:>=0] * o48 AddP === _ o10 o49 o32 [[ o58 o61 ]] leaf->dump(-3) o48 AddP === _ o10 o49 o32 [[ o58 o61 ]] o58 CompareAndSwapP === o55 o56 o48 o75 |o45 [[ o59 o69 o74 9 11 34 ]] o61 CallLeaf === o55 o1 o56 o1 o1 (o10 o48 o13 ) [[ o62 o63 ]] # mmtk_barrier_call void ( java/util/concurrent/ConcurrentHashMap$Node *[int:>=0]:NotNull *, java/util/concurrent/ConcurrentHashMap$Node *[int:>=0]:NotNull+any *, java/util/concurrent/ConcurrentH ashMap$Node * ) o59 SCMemProj === o58 [[ o64 32 ]] Memory: @BotPTR *+bot, idx=Bot; o69 MemBarAcquire === o66 o1 o67 o1 o1 o58 [[ o70 o71 10 ]] o74 Return === o70 o6 o71 o8 o9 returns o58 [[ o0 1 ]] 9 Ret === o70 o6 o71 o8 o9 o58 [[ ]] 11 membar_acquire === o66 o1 o67 o1 o1 o58 [[ ]] !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) 34 SCMemProj === o58 [[ ]] Memory: @BotPTR *+bot, idx=Bot; !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o62 Proj === o61 [[ o65 ]] #0 o63 Proj === o61 [[ o64 ]] #2 Memory: @rawptr:BotPTR, idx=Raw; o64 MergeMem === _ o1 o56 o63 o59 [[ o65 13 ]] { N63:rawptr:BotPTR N59:java/lang/Object *[int:>=0]+any * } Memory: @BotPTR *+bot, idx=Bot; 32 MergeMem === _ 0 25 33 o59 [[ ]] { N33:rawptr:BotPTR N59:java/lang/Object *[int:>=0]+any * } Memory: @BotPTR *+bot, idx=Bot; !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o70 Proj === o69 [[ o74 9 ]] #0 o71 Proj === o69 [[ o74 9 ]] #2 Memory: @BotPTR *+bot, idx=Bot; 10 MachProj === o69 [[ ]] #0/unmatched !jvms: ConcurrentHashMap::casTabAt @ bci:17 (line 765) o0 Root === o0 o44 o74 [[ o0 o1 o3 o39 o36 o28 o32 o76 0 ]] inner 1 Root === 1 2 o74 [[ 1 6 ]] inner o65 MemBarCPUOrder === o62 o1 o64 o1 o1 [[ o66 o67 12 ]] My guess is that the AddP node (o48) has been reduced when generating machine code for CallLeaf (o61), so when visiting CompareAndSwapP (o58), AddP (o48) will be visited again, and thus failing the assertion. If I change const TypeFunc* tf = __ func_type(src->bottom_type(), slot->bottom_type(), val->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src, slot, val); into const TypeFunc* tf = __ func_type(src->bottom_type()); Node* x = __ make_leaf_call(tf, FN_ADDR(MMTkBarrierSetRuntime::object_reference_write_post_call), "mmtk_barrier_call", src); the assertion error will disappear. Note that the slot is the AddP node (an offset from the src pointer). Sincerely, Zixian -------------- next part -------------- An HTML attachment was scrubbed... URL: From weikai at isrc.iscas.ac.cn Fri Oct 21 10:08:25 2022 From: weikai at isrc.iscas.ac.cn (=?UTF-8?B?5L2V5Lyf5Yev?=) Date: Fri, 21 Oct 2022 18:08:25 +0800 (GMT+08:00) Subject: Test Graphical Application: Netbeans and Luke Message-ID: <4244db29.1688d.183fa03f089.Coremail.weikai@isrc.iscas.ac.cn> Recently, basic functions of NetBeans and Luke have been tested. NetBeans is an IDE for Java development. Luke is the GUI tool for introspecting Lucene index. ## Version Openjdk: 19-ea https://builds.shipilev.net/openjdk-jdk/ Luke: 9.4.0 https://lucene.apache.org/core/downloads.html NetBeans: 15.0 https://netbeans.apache.org/download/nb15/ ## X forwarding X forwarding can open a graphical applications in a remote machine and display it in your personal computer. In the remote, package `xorg-x11-xauth` must be installed and sshd must enable X forwarding. To enable X forwarding, switch `X11Forwarding` in `/etc/ssh/ssh_config` to yes and then restart sshd. You also can search how to configure on the network[1]. The graphical application running in remote is considered a X client program, for X forwarding in SSH to work, your personal computer must be running an X server program. X forwarding is enable by default in my system, `ssh -X` works directly. Configuration in other environment can reference to this article[2]. ## Details A java project has been created and run in the process of testing. In addition, some other functions are tested. Picture: https://raw.githubusercontent.com/eikalida/pics/main/image-20221015155741043.png When testing Luke, a simple index file is used. Luke is able to open the index file and inspect the index file successfully. The index file can be generated by following demo in `docs/demo`. Picture: https://raw.githubusercontent.com/eikalida/pics/main/image-20221016153444265.png [1] https://ostechnix.com/how-to-configure-x11-forwarding-using-ssh-in-linux/ [2] https://kb.iu.edu/d/bdnt From yangfei at iscas.ac.cn Tue Oct 25 08:20:06 2022 From: yangfei at iscas.ac.cn (yangfei at iscas.ac.cn) Date: Tue, 25 Oct 2022 16:20:06 +0800 (GMT+08:00) Subject: Test Graphical Application: Netbeans and Luke In-Reply-To: <4244db29.1688d.183fa03f089.Coremail.weikai@isrc.iscas.ac.cn> References: <4244db29.1688d.183fa03f089.Coremail.weikai@isrc.iscas.ac.cn> Message-ID: Hi Weikai, Thanks for taking the time running those testing workload. Great to know that the GUI for those tests works as expected. I think I will also perform those tests regularly as we go. Regards, Fei > -----Original Messages----- > From: "???" > Sent Time: 2022-10-21 18:08:25 (Friday) > To: riscv-port-dev at openjdk.org > Cc: > Subject: Test Graphical Application: Netbeans and Luke > > Recently, basic functions of NetBeans and Luke have been tested. NetBeans is an IDE for Java development. Luke is the GUI tool for introspecting Lucene index. > > > ## Version > > Openjdk: 19-ea https://builds.shipilev.net/openjdk-jdk/ > Luke: 9.4.0 https://lucene.apache.org/core/downloads.html > NetBeans: 15.0 https://netbeans.apache.org/download/nb15/ > > > ## X forwarding > > X forwarding can open a graphical applications in a remote machine and display it in your personal computer. > In the remote, package `xorg-x11-xauth` must be installed and sshd must enable X forwarding. > To enable X forwarding, switch `X11Forwarding` in `/etc/ssh/ssh_config` to yes and then restart sshd. > You also can search how to configure on the network[1]. > > The graphical application running in remote is considered a X client program, for X forwarding in SSH to work, your personal computer must be running an X server program. > X forwarding is enable by default in my system, `ssh -X` works directly. Configuration in other environment can reference to this article[2]. > > > ## Details > > A java project has been created and run in the process of testing. In addition, some other functions are tested. > Picture: https://raw.githubusercontent.com/eikalida/pics/main/image-20221015155741043.png > > When testing Luke, a simple index file is used. Luke is able to open the index file and inspect the index file successfully. > The index file can be generated by following demo in `docs/demo`. > Picture: https://raw.githubusercontent.com/eikalida/pics/main/image-20221016153444265.png > > > [1] https://ostechnix.com/how-to-configure-x11-forwarding-using-ssh-in-linux/ > [2] https://kb.iu.edu/d/bdnt > > From weikai at isrc.iscas.ac.cn Wed Oct 26 06:15:51 2022 From: weikai at isrc.iscas.ac.cn (=?UTF-8?B?5L2V5Lyf5Yev?=) Date: Wed, 26 Oct 2022 14:15:51 +0800 (GMT+08:00) Subject: Panama-FFI API Porting for RISCV64 Message-ID: <4eaaac87.5c7.18412eece8d.Coremail.weikai@isrc.iscas.ac.cn> ## Summary In recent, Panama FFI-API has been a preview feature. In many scenarios, the Panama FFI API can replace the JNI to implement native function access. The FFI API provides more secure and convenient access to native functions. The specific implementation of FFI API is related to ARCH and OS. In order to enable RISC V64 to use FFI API, porting is required. ## Notable Things Because there are different return value passing convention for special structures like `struct {int, float}` on the RISCV64[1]. When making an upcall, that's calling a Java method from a native function, return value of the Java method will be saved in a segment of stack memory called return buffer, and then riscv backend will transfer the data in memory to `a0` and `fa0`. The instructions used for data transfer are closely related to correctness. If the field containing in the special structure menteioned above is float, must use 'flw', otherwise, 'fld'. This requires a means to pass the width information of fields to the riscv backend. Unfortunately, according to my understanding, the current interface does not provide a direct means to pass the width information of struct fields to the riscv backend, so the width information is encoded in other ways. Although this makes the riscv porting slightly different from other arch, it does not make the code more difficult to understand and maintain. For details, see comments in the code[2]. ## Testing All the tests in jtreg have been passed. Some tests under the user mode of the QEMU may fail, however they can pass on the development board. ## Reference [1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc [2] https://github.com/feilongjiang/jdk/tree/riscv-foreign-api [3] https://github.com/openjdk/jdk/pull/8842/files