RFR: 8285711: riscv: RVC: Support disassembler show-bytes option

Xiaolin Zheng xlinzheng at openjdk.java.net
Thu Apr 28 07:42:41 UTC 2022


On Thu, 28 Apr 2022 05:29:12 GMT, Thomas Stuefe <stuefe at openjdk.org> wrote:

> Not a riscv expert, but looks good to me.
> 
> One question, this only works if the pointer points to the start of an instruction, right? So, it would not work if the pointer pointed to the second half word of a four byte instruction?
> 
> In other words, in riscv, is it possible to take an arbitrary half word address into code, and determine the start of the instruction, and possibly go back n instructions? e.g. when duming arbitrary pieces of code as hex?

Hi Thomas, thank you for the review! 

In my personal opinion, it might be hard to do so:

Practically, using `objdump` to disassemble a hello world C program:


ubuntu at ubuntu:~$ ./a.out
hello, world!

--------------------------

ubuntu at ubuntu:~$ objdump -C -D -m riscv:rv64 -M numeric -M no-aliases --start-address=0x668 --stop-address=0x680 a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000668 <main>:
 668:  1141        c.addi  x2,-16
 66a:  e406        c.sdsp  x1,8(x2)
 66c:  e022        c.sdsp  x8,0(x2)
 66e:  0800        c.addi4spn  x8,x2,16
 670:  00000517    auipc  x10,0x0          // Here @ 0x670, objdump could tell 
                                           //   it is an 32-bit auipc instruction
 674:  02050513    addi  x10,x10,32 # 690 <_IO_stdin_used+0x8>
 678:  f29ff0ef    jal  x1,5a0 <puts at plt>
 67c:  4781        c.li  x15,0
 67e:  853e        c.mv  x10,x15

--------------------------

ubuntu at ubuntu:~$ objdump -C -D -m riscv:rv64 -M numeric -M no-aliases --start-address=0x672 --stop-address=0x680 a.out

a.out:     file format elf64-littleriscv


Disassembly of section .text:

0000000000000672 <main+0xa>:
 672:  0000        c.unimp                 // The new result seems broken when we 
                                           //   start from '0x672' -- but it is inside the 'aupic'.
 674:  02050513    addi  x10,x10,32
 678:  f29ff0ef    jal  x1,5a0 <puts at plt>
 67c:  4781        c.li  x15,0
 67e:  853e        c.mv  x10,x15


Theoretically, 

The encoding of `auipc` is like
![image](https://user-images.githubusercontent.com/38156692/165698493-52ed76cb-0eef-496f-a935-cc6c23ded040.png)
, and the manual is at [here](https://github.com/riscv/riscv-isa-manual/releases).

>From the disassembly result the `auipc  x10,0x0` seems to be `0x00000517`. But instructions are required to be stored as little-endian so in the memory, it would be: `0x00000670:     17 05 00 00`. If we fetch the first half-word we could directly get the `0x0517`, so we could tell it is a 32-bit instruction by examining that; but if we start from the second halfword we could only get the `0x0000`, which is just inside the `imm[31:12]` encoding. I think it might find itself hard to interpret what is the `0x0000`; also this could theoretically be any value, for it is an immediate val.

So maybe we must decode from the first halfword of one instruction. I might write too verbose, but hope this is right.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8421


More information about the hotspot-compiler-dev mailing list