RFR: 8314020: Print instruction blocks in byte units

Thu Aug 10 07:03:59 UTC 2023

On Wed, 9 Aug 2023 17:40:43 GMT, Aleksey Shipilev <shade at openjdk.org> wrote:

> When following up on JVM crashes in the field, we frequently want to disassemble the "Instructions:" block in hs_err. Unfortunately, some architectures print out the instructions in N-byte chunks, which is affected by platform endianness. The simple scripts would then fail to parse the instruction stream, because they have no assumptions about the endianness, and would require additional re-assembling (pun intended).
> 
> See more details in the bug.
> 
> I understand it is natural to print the full instructions on architectures with fixed-size instructions like AArch64, RISC-V, PPC, ARM. But I'd argue we should simplify the mechanical parsing of those instruction dumps. What do you think, @theRealAph, @RealFYang, @TheRealMDoerr, @tstuefe?
> 
> After this fix, the following scripts work well with `Instruction:` block from artificial AArch64 crash:
> 
> 
> % head asm.txt 
> 0x000000010763c8c4:   e0 a3 00 91 e1 03 13 aa fd 0d c9 97 e0 83 00 91
> 0x000000010763c8d4:   47 9e da 97 68 ce 40 f9 08 05 40 f9 09 31 40 b9
> 0x000000010763c8e4:   29 05 00 11 09 31 00 b9 25 16 f2 97 e0 23 00 91
> 0x000000010763c8f4:   e1 03 13 aa e2 03 16 aa e3 03 15 aa f2 3d 00 94
> 0x000000010763c904:   f6 83 40 a9 28 00 80 52 c8 42 13 39 e0 1f 00 f9
> 0x000000010763c914:   a8 2c 00 b0 1f 20 03 d5 08 8d 41 f9 1f 01 00 f1
> 0x000000010763c924:   04 18 40 fa 40 00 00 54 00 01 3f d6 78 2c 00 b0
> 0x000000010763c934:   18 1b 3f 91 08 03 40 39 68 00 00 34 e0 e3 00 91
> 0x000000010763c944:   90 e2 f1 97 e1 0f 40 f9 e0 e3 00 91 ef 3c 00 94
> 0x000000010763c954:   f5 03 00 aa 08 03 40 39 48 03 00 34 e0 e3 00 91
> 
> % cat asm.txt | cut -d: -f2 | sed -e 's/  +/ /g' -e 's/ / 0x/g' | xxd -r -p > asm.bin; objdump -D -m aarch64 -b binary asm.bin | head -n 20
> 
> asm.bin:     file format binary
> 
> 
> Disassembly of section .data:
> 
> 0000000000000000 <.data>:
>    0:	9100a3e0 	add	x0, sp, #0x28
>    4:	aa1303e1 	mov	x1, x19
>    8:	97c90dfd 	bl	0xffffffffff2437fc
>    c:	910083e0 	add	x0, sp, #0x20
>   10:	97da9e47 	bl	0xffffffffff6a792c
>   14:	f940ce68 	ldr	x8, [x19, #408]
>   18:	f9400508 	ldr	x8, [x8, #8]
>   1c:	b9403109 	ldr	w9, [x8, #48]
>   20:	11000529 	add	w9, w9, #0x1
> 
> % cat asm.txt | cut -d: -f2 | sed -e 's/  +/ /g' -e 's/ / 0x/g' | llvm-mc --disassemble --show-encoding | head -n 20
> 	.text
> 	add	x0, sp, #40                     // encoding: [0xe0,0xa3,0x00,0x91]
>                                         // =40
> 	mov	x1, x19                         // encoding: [0xe1,0x03,0x13,0xaa]
> 	bl	#-14403596                      // encoding: [0xfd,0x0d,0xc9,0x97]
> 	add	x...

Hi, this works for me too. In fact, RISC-V already does the byte-unit dump when "C" extension (Compressed Instructions) is there since it has a variable-size encoding. And "C" extension is available on current popular RISC-V CPU implementations.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/15211#issuecomment-1672655068