RFR: 8314125: RISC-V: implement Base64 intrinsic - encoding [v6]

Fei Yang fyang at openjdk.org
Mon Aug 5 03:00:42 UTC 2024


On Fri, 26 Jul 2024 08:10:01 GMT, Hamlin Li <mli at openjdk.org> wrote:

>> Hi,
>> Can you help to review the patch?
>> 
>> I'm also working a base64 decode instrinsic, but there is some performance regression in some cases, and decode and encode are totally independent with each other, so I will send out review of decode in another pr when I fix the performance regression in it.
>> 
>> Thanks.
>> 
>> ## Test
>> benchmarks run on CanVM-K230 (vlenb == 16), and banana-pi (vlenb == 32)
>> 
>> I've tried several implementations, respectively with vector group
>> * m2+m1+scalar
>> * m2+scalar
>> * m1+scalar
>> * pure scalar
>> The best one is combination of m2+m1, it have best performance in all source size.
>> 
>> ### K230
>> 
>> this implementation (m2+m1)
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
>> Benchmark | (maxNumBytes) | Mode | Cnt | Score -intrinsic | Score + instrinsic, m1+m2 | Error | Units | -intrinsic/+intrinsic
>> -- | -- | -- | -- | -- | -- | -- | -- | --
>> Base64Encode.testBase64Encode | 1 | avgt | 10 | 86.784 | 86.996 | 0.459 | ns/op | 0.9975631063
>> Base64Encode.testBase64Encode | 2 | avgt | 10 | 93.603 | 94.026 | 1.081 | ns/op | 0.9955012443
>> Base64Encode.testBase64Encode | 3 | avgt | 10 | 121.927 | 123.227 | 0.342 | ns/op | 0.989450364
>> Base64Encode.testBase64Encode | 6 | avgt | 10 | 139.554 | 137.4 | 1.221 | ns/op | 1.015676856
>> Base64Encode.testBase64Encode | 7 | avgt | 10 | 160.698 | 162.25 | 2.36 | ns/op | 0.9904345146
>> Base64Encode.testBase64Encode | 9 | avgt | 10 | 161.085 | 153.772 | 1.505 | ns/op | 1.047557423
>> Base64Encode.testBase64Encode | 10 | avgt | 10 | 187.963 | 174.763 | 1.204 | ns/op | 1.075530862
>> Base64Encode.testBase64Encode | 48 | avgt | 10 | 405.212 | 199.4 | 6.374 | ns/op | 2.032156469
>> Base64Encode.testBase64Encode | 512 | avgt | 10 | 3652.555 | 1111.009 | 3.462 | ns/op | 3.287601631
>> Base64Encode.testBase64Encode | 1000 | avgt | 10 | 7217.187 | 2011.943 | 227.784 | ns/op | 3.587172698
>> Base64Encode.testBase64Encode | 20000 | avgt | 10 | 135165.706 | 33864.592 | 57.557 | ns/op | 3.991357876
>> 
>> </google-sheets-html-origin>
>> 
>> vector with only m2
>> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-style: normal; font-variant-caps: normal; font-weight: 4...
>
> Hamlin Li has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:
> 
>  - merge master
>  - Merge branch 'master' into baes64-encode-integrated
>  - move label
>  - refine code
>  - use pure scalar version when rvv is not supported
>  - clean code
>  - Initial commit

Some minor comments.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5201:

> 5199: 
> 5200:     Register codec = c_rarg6;
> 5201:     Register length = c_rarg7; // total length of src data in bytes

Seems better to keep these aligned: 

    Register src   = c_rarg0;
    Register soff  = c_rarg1;
    Register send  = c_rarg2;
    Register dst   = c_rarg3;
    Register doff  = c_rarg4;
    Register isURL = c_rarg5;

    Register codec  = c_rarg6;
    Register length = c_rarg7; // total length of src data in bytes

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5204:

> 5202: 
> 5203:     Label ProcessData, Exit, Greater;
> 5204: 

Better to build a frame for proper stackwalking of RuntimeStub frame. Like:

    __ enter(); // Required for proper stackwalking of RuntimeStub frame

    // Your code here.

    __ leave(); // Required for proper stackwalking of RuntimeStub frame
    __ ret();

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5206:

> 5204: 
> 5205:     RegSet saved_regs;
> 5206:     __ push_reg(saved_regs, sp);

But `saved_regs` is empty?

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 5211:

> 5209:     __ sub(length, send, soff);
> 5210:     __ mv(t0, 3);
> 5211:     __ div(length, length, t0);

Could this `div` be avoided? Maybe we could simply check and update `length` at the byte granunarity.

src/hotspot/cpu/riscv/stubGenerator_riscv.cpp line 6224:

> 6222:     if (UseBASE64Intrinsics) {
> 6223:       StubRoutines::_base64_encodeBlock = generate_base64_encodeBlock();
> 6224:     }

Nit: put a new line after this if block.

-------------

PR Review: https://git.openjdk.org/jdk/pull/19973#pullrequestreview-2217446699
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703473913
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703084532
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703451782
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703483528
PR Review Comment: https://git.openjdk.org/jdk/pull/19973#discussion_r1703082172


More information about the hotspot-dev mailing list