SA and disassemblers

Sat Sep 12 17:06:53 PDT 2009

On Sep 11, 2009, at 5:59 AM, Volker Simonis wrote:

These are good ideas, Volker; thanks for thinking it all through.

> I like the extension of the decode_instructions interface, especially
> because this is something we are already using anyway. We have
> extended the disassembler in the VM to decode instructions which had
> been dumped by the VM earlier when no disassembler library was
> available. We realized this with an 'AbstractDisassaembler' which in
> our case is the base class of 'Disassembler'. If the native
> disassembler can not be loaded, we automatically fall back to the
> abstract disassembler which prints a kind of hex dump, but with all
> the usual annotations. Later on, we can take such an output produced
> by the abstract disassembler (for example from an hs_err file) and
> decode it to real assembly code. Perhaps this is an extension you also
> want to consider? I could contribute some code if you'd like.
>
> Besides this I have some remarks/suggestions regarding hsdis:
>
> - I don't really think it is necessary for hsdis.c to export two
> functions: decode_instructions() and decode_instructions_virtual(). I
> think the more general one decode_instructions_virtual() would be
> enough together with a small change in disassembler.{cpp,hpp}:

Moving forward, hsdis should keep compatibility with older VMs, so  
that a JVM developer can keep a recent version in the load library  
path, and still be able to work with multiple JVMs.  This is, of  
course, less important for developers who work with only one JVM at a  
time.

The implication is that hsdis should add new entry points that  
generalize the old ones.

> address decode_env::decode_instructions(address start, address end,
> address virtual_start /* = 0*/) {
>  if (virtual_start == 0) virtual_start = start;
>
> This would unify the hsdis interface used by the VM and the SA and
> allow one to easily implement the decoding of "abstract disassembly"
> as described above.

Good point.  In that way, decode_instructions_virtual extends  
decode_instructions.  This reasoning takes the design back to Tom's  
original choice of name, something like decode_instructions_v2.

> - in hsdis.c in hsdis_read_memory_func you use a C++ style comment
> which I think should be better changed into a C style comment and
> 'membar' should probably read 'mamaddr':

(Yes.)

> ...
> +   // convert the virtual address memaddr into an address within  
> memory buffer
> +   uintptr_t memaddr_p = ((uintptr_t) memaddr) - app_data->start_va +
> app_data->start;
> ...
>
> - The Readme file should contain the hint:
>
> "The binutils should be configured with the '--disable-nls' flag to
> disable Native Language Support, otherwise you might get an "undefined
> reference to `libintl_gettext'" if you try to load hsdis.so on systems
> which don't have NLS by default."
>
> - There's a typo in the README file of the hsdis plugin:
> line 70: 'hsdid-$LIBARCH.so' should read 'hsdis-$LIBARCH.so'
>
> - On IA64 and PPC, function pointers are pointers to function
> descriptors, so hsdis-demo.c should be changed accordingly:
>
> static const char* lookup(void* addr) {
> #if defined(__ia64) || defined(__powerpc__)
>  /* On IA64 and PPC function pointers are pointers to function  
> descriptors */
>  #define CHECK_NAME(fn) \
>    if (addr == *((void**) &fn))  return #fn;
> #else
>  #define CHECK_NAME(fn) \
>    if (addr == (void*) &fn)  return #fn;
> #endif

Thanks!

> ...
>
> void disassemble(void* from, void* to) {
> #if defined(__ia64) || defined(__powerpc__)
>  /* On IA64 and PPC function pointers are pointers to function  
> descriptors */
>  from = *((void**)from);
>  to = *((void**)to);
> #endif
>
> - In the Makefile I need to link hsdis-demo against 'ld'. So I suggest
> adding (at least for Linux):
>
> LDFLAGS		+= -ldl

Yes.

> - I don't understand why 'decode_instructions' in hsdis.c needs to
> print a newline after every instruction. I think it would be more
> consistent if this newline would be printed in the appropriate
> callback in the disassembler class. I would therefore suggest to
> delete the following lines from decode_instructions in hsdis.c:
>
>     /* follow each complete insn by a nice newline */
>     /*(*printf_callback)(printf_stream, "\n");*/

The output protocol for decode_insns is loosely structured text, a la  
XML (with no schema).  In that format, newlines are about as good as  
any other delimiter.  I suggest that if you want to translate up to a  
more structured output, just special-case a printf of a single  
newline.  Remember that you get to specify that printf callback to be  
anything you want.  I suppose the newlines could be enclosed by event  
pairs, but that seems like overkill to me.

> - The previous change would also make it easier to use
> decode_instructions() to decode a single instruction. But it would be
> additionally necessary to change the line:
>
>    while (p < end && !app_data.losing) {
>
> to:
>
>    while (p <= end && !app_data.losing) {
>
> to make that possible. Additionally, "out of bound check" in
> hsdis_read_memory_func() will have to be removed, if we want to decode
> single instructions:
>
> if (memaddr_p + length > app_data->end) {
>  /* read is out of bounds */
>  return EIO;
> } else {
>
> I don't think that this does any harm, but on the other side it will
> make the hsdis plugin more generally usable.

That's a very good idea.  Or, instead of having start==end be a magic  
condition, maybe we can have an optional argument which says "stop  
after emitting N instructions and return the updated pointer".

> Finally I would like to suggest moving the printing of nmethods and
> code blobs into the corresponding print methods of the CodeBlob and
> nmethod classes.This could be easily achieved with the new
> functionality which allows the decoding of a single assembler
> instruction. I don't see a reason why the disassembler class needs to
> know the internals of nmethod and CodeBlob. In my opinion it would be
> a cleaner design if the disassembler class would export a clear
> interface for decoding assembler instructions and leave all the rest
> to the class like nmethod or CodeBlob which use it.

That's a reasonable refactoring cleanup.  We still need "raw"  
disassembly of ad hoc stubs, BTW.

> Again this would
> simplify the implementation of an "AbstractDisassembler" class as
> described above because the code which prints all the various
> annotations around the instructions would not have to be duplicated in
> the "AbstractDisassembler" class.
>
> Thank you and best regards,
> Volker
>
>
> On 8/27/09, Tom Rodriguez <Thomas.Rodriguez at sun.com> wrote:
>> One of the limitations of the current SA is that it doesn't have a
>> disassembler for amd64 which makes looking at 64 bit cores somewhat
>> challenging.  Instead of trying to implement a full disassembler  
>> for x64 in
>> Java I think we should switch the SA to using hsdis-arch for the  
>> decoding.
>> It requires a minor extension to the decode_instructions interface  
>> so that
>> we can disassemble a buffer of code that's not physically located  
>> where the
>> code came from.  I've got a working version of this that also fixes  
>> the
>> printing to match up better with what the JVM itself prints.  I was  
>> planning
>> on blowing away the other disassemblers completely because they  
>> don't really
>> fit into the new model.  I thought I'd float this before finishing  
>> it to see
>> if there are any concerns.  The initial webrev is at
>> http://cr.openjdk.java.net/~never/sadis.  This includes
>> building the SA with source 1.5 to allow use of printf which will  
>> probably
>> be pushed separately.
>>
>> tom
>>