x86 interpreters in hotspot

Tue May 15 09:12:30 PDT 2012

On Tue, May 15, 2012 at 12:03 PM, Krystal Mok <rednaxelafx at gmail.com> wrote:

> It means that most bytecode has an expected entry TOS and a fixed exit
> TOS. Well, not every bytecode, because some bytecode could accept multiple
> entry TOS and/or produce varying exit TOS, e.g. dup / getfield
>
> For the ones that have a fixed expected entry TOS, if the current TOS
> doesn't match the expectation, then some adaption would have to take place.
> That's why there are multiple entry points for these bytecodes: to adapt to
> the expected entry TOS.
>
> - Kris
>
>
> On Tue, May 15, 2012 at 11:53 PM, Xin Tong <xerox.time.tech at gmail.com>wrote:
>
>> I am looking at the template interpreter code in hotspot.
>>
>> the interpreter dispatch table is 2 D. the first dimension is the tos
>> state and the second dimension is the bytecode.
>>
>> the tos state is the out tosstate of the current bytecode. does that mean
>> that for a bytecode, it could have multiple tosstates, i.e. itos, vtos ?
>>
>> Thanks
>>
>> Xin
>>
>>
>> On Sat, May 12, 2012 at 11:55 PM, Xin Tong <xerox.time.tech at gmail.com>wrote:
>>
>>> I am hacking the interpreter dispatch based code and found something i
>>> do not understand.
>>>
>>> in the
>>>
>>> void InterpreterMacroAssembler::dispatch_base(TosState state,
>>>                                              address* table,
>>>                                              bool verifyoop) {
>>>   ...
>>>   ...
>>>
>>>  // load the table address
>>>  lea(rscratch1, ExternalAddress((address)table));
>>>  // jmp based on the table address and bytecode ( loaded into rbx)
>>>  jmp(Address(rscratch1, rbx, Address::times_8));
>>> }
>>>
>>> However, when i take a profile and look at the generated interpreter
>>> code.  I do not see the lea being generated. instead, r10 is used directly.
>>> it seems that the hotspot does optimizations on the generated interpreter
>>> sequences. ( maybe like peephole optimizations )
>>>
>>> Address           Offset     Bytes                 Disassembly
>>>         % br_misp_exec
>>> 0x00007f2d146fe6ab 0x0000040b 0x488d24dc        LEA RSP,QWORD PTR
>>> [RSP+RBX*8]
>>> 0x00007f2d146fe6af 0x0000040f 0x410fb65d00        MOVZX RBX,BYTE PTR
>>> [R13]
>>> 0x00007f2d146fe6b4 0x00000414 0x49ba0052371d2d7f0000 MOV
>>> RDX,7F2D1D375200H
>>> // no lea ?
>>> 0x00007f2d146fe6be 0x0000041e 0x41ff24da            * JMP DWORD PTR
>>> [R10+RBX*8]  *
>>>
>>>
>>> I am also trying to record the current bytecode index into a memory
>>> buffer allocated by myself.  However, the following code gives me
>>> java.lang.NullPointerException  when running one of the test cases.
>>> allocated_channel is a malloc allocated memory.
>>>
>>> void InterpreterMacroAssembler::dispatch_base(TosState state,
>>>                                              address* table,
>>>                                              bool verifyoop) {
>>>
>>>  ...
>>>  ...
>>>  // the current bytecode pc is kept in r13.
>>>  lea(rscratch2, ExternalAddress((address)allocated_channel));
>>>  movptr(Address(rscratch2, 0), r13);
>>>
>>>  lea(rscratch1, ExternalAddress((address)table));
>>>  jmp(Address(rscratch1, rbx, Address::times_8));
>>> }
>>>
>>>
>>> Thanks
>>>
>>> Xin
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Tue, May 8, 2012 at 8:53 AM, Coleen Phillimore <
>>> coleen.phillimore at oracle.com> wrote:
>>> >
>>> > There's a PrintBytecodeHistogram which will tell you how many times
>>> each
>>> > bytecode is called.
>>> >
>>> > There's PrintInterpreter which will tell you the size of the template
>>> for
>>> > each bytecode.
>>> >
>>> > There's only one tos (top of stack) element, we print two tos elements
>>> > because if the tos is a double or long, it takes two slots.
>>> >
>>> > I'm not sure exactly what you want to do with this information, but
>>> > hopefully this helps.
>>> >
>>> > Coleen
>>> >
>>> >
>>> > On 5/8/2012 2:34 AM, Krystal Mok wrote:
>>> >
>>> > On Tue, May 8, 2012 at 11:17 AM, Xin Tong <xerox.time.tech at gmail.com>
>>> wrote:
>>> >>
>>> >> For example, for bipush interpreter code, it is like this in x86_64
>>> >>
>>> >> void TemplateTable::bipush() {
>>> >>  transition(vtos, itos);
>>> >>  __ load_signed_byte(rax, at_bcp(1));
>>> >> }
>>> >>
>>> >>
>>> >> I would like to know the size of the generated assembly by the
>>> >> TemplateTable::bipush in given a bcp.
>>> >
>>> >
>>> > Not sure why you would want that, but here's what you could do:
>>> >
>>> > // Bytecodes::Code code = (Bytecodes::Code) i;
>>> > address ep = Interpreter::dispatch_table()[i]; // or normal_table()
>>> > InterpreterCodelet* codelet = Interpreter::codelet_containing(ep);
>>> > int size = codelet->size(); // or code_size() or code_size_to_size()
>>> >
>>> > Code varies in detail depending on what you really want.
>>> >
>>> >>
>>> >>
>>> >> Also, btw, i traced down the trace_bytecode. it calls overloaded
>>> >> traces. 1 with 1 tos and 1 with 2 toses. does that mean java opcodes
>>> >> can take up to 2 tos elements ?
>>> >>
>>> >>
>>> > Short answer: no.
>>> > tos and tos2 are there because on some architectures (e.g. 32-bit x86)
>>> the
>>> > top-of-stack value may be stored in two registers (e.g. LTOS on x86_32
>>> > stores the long value in eax:edx).
>>> > On 64-bit architectures, tos2 tends to do nothing, since tos is
>>> 64-bit, wide
>>> > enough to hold any TOS value.
>>> >
>>> > - Kris
>>> >
>>> >>
>>> >>
>>> >> Xin
>>> >
>>> >
>>>
>> I imagine the adaptation happens in some trampoline code which sets up
the context. also, i see the dispatch_epilogue called 220 times. however
there are about 240 bytecodes in java ? should not it be called for every
java bytecode ?

Thanks

Xin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/attachments/20120515/34f84ae1/attachment-0001.html