Re: Re: Discuss the RVC implementation

Fri Sep 23 13:11:57 UTC 2022

I forgot to describe something about MachBranchNodes.
The thing is, C2 needs to calculate node sizes to allocate buffers, so it has a scratch_emit phase to estimate node size first. It uses a clever strategy to measure MachBranchNodes' size. When estimating the size, we could find only the MacnBranchNode itself matters, not the Label. The labels are just tools for generating branch instructions. So there has "fake label"[1] instead, directly placed at the same pc as the MachBranchNode's to simplify code logic.
On other platforms like x86 and aarch64, the size of branch instructions is not changed, and these platforms don't have a code size reduction extension as RISC-V. For example, on other platforms, the jcc is jcc, and the bl is bl. In our implementation, we have:
```
#define INSN(NAME) \
 void NAME(Register Rd, const int32_t offset) { \
 /* jal -> c.j */ \
 if (do_compress() ...) { \
 c_j(offset); \
 return; \
 } \
 _jal(Rd, offset); \
 }
 INSN(jal);
#undef INSN
```
The size of an emitted instruction is determined by the `offset`. Though reasonable, it is not compatible with the "fake label" strategy. For example, with the "fake label", the offset is always 0 when scratch-emitting a MachBranchNode. The offset does not match the real offset. Therefore, In scratch_emit and the real emission, the size of MachBranchNode might be different, which will break the assumption of C2's strategy. 
To emit the code that we want, a basic approach is to pass the real offset into the MachBranchNode, and let us read it instead of the "0" every time.
So currently in these patches, all MachBranchNodes are temporarily incompressible in C2 when RVC is enabled.
Thanks,
Xiaolin
------------------------------------------------------------------
From:yangfei <yangfei at iscas.ac.cn>
Send Time:2022年9月20日(星期二) 19:48
To:郑孝林(云矅) <yunyao.zxl at alibaba-inc.com>
Cc:riscv-port-dev <riscv-port-dev at openjdk.org>
Subject:Re: Re: Discuss the RVC implementation
Hi Xiaolin,
> -----Original Messages-----
> From: "Xiaolin Zheng" <yunyao.zxl at alibaba-inc.com>
> Sent Time: 2022-09-20 18:44:21 (Tuesday)
> To: yangfei <yangfei at iscas.ac.cn>
> Cc: riscv-port-dev <riscv-port-dev at openjdk.org>
> Subject: Re: Discuss the RVC implementation
> 
> Hi Felix,
> 
> TL;DR of code size evaluations, stably reproduced:
> 
> If a piece of code is 100 bytes full of 4-byte instructions:
> 1. In the current master branch with RVC, it may shrink to 95 bytes. (compression rate is %5)
> 2. With the new implementation at [1], it may shrink to 84 bytes. (compression rate is 16%; ~11% more than master)
> 3. With the special patch at [2] (a special optimization of compressing two "slli"s in the movptr), it may shrink to 79 bytes. (compression rate is 21%; ~%5 addition to the previous one, because movptr() is used in a quite big quantity. But this patch might need further beautification for the hard-coded enumeration and will cause complexity for reviewing, so we'd postpone that temporarily)
> These are evaluated by a hand-written toy histogram[3], excluding the scratch_emit, and tested with release build (for fastdebug build, the compression rate is far more than release mode; but we may not care about that), only for evaluation purposes.
> 
> About the performance, I need more time to make some more evaluations. Due to the patch of the new implementation should wait for the loom port merging first, we have plenty of time then. I am going to make a long run of specjbb2015 to measure it on average. Will update the result in the same thread.
> 
> ---------------
> 
> Precisely, here are some detailed data about the code size.
> 
> This histogram mentioned above presents all the instructions emitted in a JVM process, shown when exiting. For example, the picture in [4].
> 
> The second row (RVC instructions) + The third row (4-byte normal instructions) = The fourth row (total instructions); sorted by the fourth row.
> If RVC is not enabled, the second row is always 0 and the third row is always equal to the fourth row.
> 
> Tested with the new RVC [1] branch with springboot / springboot-petclinic / SPECjvm2008 / SPECjbb2015(when exiting), the results are all a stable ~84%. The SPECjvm2008 results are at [5]. Please search the keywords "Ideally Code Size Could Shrink to" in the files in the browser for more details.
> 
> P.S.: the result with the special patch [2] is about ~79% at [6] for future references, but might be reserved for now.
> 
> Best,
> Xiaolin
> 
> [1] https://github.com/zhengxiaolinX/jdk/commits/REBASE-rvc-beautify
> [2] https://github.com/zhengxiaolinX/jdk/commit/3a4d80197da0c497c844016b9a9fbae541eca9c8
> [3] https://github.com/zhengxiaolinX/jdk/commit/5312cbd8ac860f47b109ab2a99750041865c018d
> [4] https://github.com/openjdk/riscv-port/pull/34
> [5] http://cr.openjdk.java.net/~xlinzheng/rvc-size/size/
> [6] http://cr.openjdk.java.net/~xlinzheng/rvc-size/size-full/
Thanks for taking the time measuring those figures :-)
It's great to know that your new proposal for supporting RVC works better in respect of codesize metric.
I am currently looking at the details of your code changes at [1].
I just realized that your work bears some code cleanup in the first two commits. I would suggest we upstream those code cleanup first if possible.
Regards,
Fei
[1] https://github.com/zhengxiaolinX/jdk/commits/REBASE-rvc-beautify</riscv-port-dev@openjdk.org></yangfei at iscas.ac.cn></yunyao.zxl at alibaba-inc.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/riscv-port-dev/attachments/20220923/df4f6eaa/attachment.htm>