RFR: 8285711: riscv: RVC: Support disassembler show-bytes option [v2]
Thomas Stuefe
stuefe at openjdk.java.net
Fri Apr 29 04:05:35 UTC 2022
On Thu, 28 Apr 2022 07:38:51 GMT, Xiaolin Zheng <xlinzheng at openjdk.org> wrote:
>> Not a riscv expert, but looks good to me.
>>
>> One question, this only works if the pointer points to the start of an instruction, right? So, it would not work if the pointer pointed to the second half word of a four byte instruction?
>>
>> In other words, in riscv, is it possible to take an arbitrary half word address into code, and determine the start of the instruction, and possibly go back n instructions? e.g. when duming arbitrary pieces of code as hex?
>
>> Not a riscv expert, but looks good to me.
>>
>> One question, this only works if the pointer points to the start of an instruction, right? So, it would not work if the pointer pointed to the second half word of a four byte instruction?
>>
>> In other words, in riscv, is it possible to take an arbitrary half word address into code, and determine the start of the instruction, and possibly go back n instructions? e.g. when duming arbitrary pieces of code as hex?
>
> Hi Thomas, thank you for the review!
>
> In my personal opinion, it might be hard to do so:
>
> Practically, using `objdump` to disassemble a hello world C program:
>
>
> ubuntu at ubuntu:~$ ./a.out
> hello, world!
>
> --------------------------
>
> ubuntu at ubuntu:~$ objdump -C -D -m riscv:rv64 -M numeric -M no-aliases --start-address=0x668 --stop-address=0x680 a.out
>
> a.out: file format elf64-littleriscv
>
>
> Disassembly of section .text:
>
> 0000000000000668 <main>:
> 668: 1141 c.addi x2,-16
> 66a: e406 c.sdsp x1,8(x2)
> 66c: e022 c.sdsp x8,0(x2)
> 66e: 0800 c.addi4spn x8,x2,16
> 670: 00000517 auipc x10,0x0 // Here @ 0x670, objdump could tell
> // it is an 32-bit auipc instruction
> 674: 02050513 addi x10,x10,32 # 690 <_IO_stdin_used+0x8>
> 678: f29ff0ef jal x1,5a0 <puts at plt>
> 67c: 4781 c.li x15,0
> 67e: 853e c.mv x10,x15
>
> --------------------------
>
> ubuntu at ubuntu:~$ objdump -C -D -m riscv:rv64 -M numeric -M no-aliases --start-address=0x672 --stop-address=0x680 a.out
>
> a.out: file format elf64-littleriscv
>
>
> Disassembly of section .text:
>
> 0000000000000672 <main+0xa>:
> 672: 0000 c.unimp // The new result seems broken when we
> // start from '0x672' -- but it is inside the 'aupic'.
> 674: 02050513 addi x10,x10,32
> 678: f29ff0ef jal x1,5a0 <puts at plt>
> 67c: 4781 c.li x15,0
> 67e: 853e c.mv x10,x15
>
>
> Theoretically,
>
> The encoding of `auipc` is like
> 
> , and the manual is at [here](https://github.com/riscv/riscv-isa-manual/releases).
>
> From the disassembly result the `auipc x10,0x0` seems to be `0x00000517`. But instructions are required to be stored as [16-bit little-endian](https://github.com/riscv/riscv-isa-manual/blob/04cc07bccea63f6587371b6c75b228af3e5ebb02/src/intro.tex#L612-L618) so in the little-endian memory system, it would be: `0x00000670: 17 05 00 00`. If we fetch the first half-word we could directly get the `0x0517`, so we could tell it is a 32-bit instruction by examining that; but if we start from the second halfword we could only get the `0x0000`, which is just inside the `imm[31:12]` encoding. I think it might find itself hard to interpret what is the `0x0000`; also this could theoretically be any value, for it is an immediate val.
>
> So maybe we must decode from the first halfword of one instruction. I might write too verbose, but hope this is right.
@zhengxiaolinX Thanks for your explanation!
-------------
PR: https://git.openjdk.java.net/jdk/pull/8421
More information about the hotspot-compiler-dev
mailing list