8008243: Zero: implement fast-bytecodes [Was: Re: [RFC] Fast-bytecodes for Zero]

Wed Feb 20 10:58:55 PST 2013

I'm glad bytecode rewriting gives a good performance improvement with 
the c++ interpreter.   This is great.  I have a couple of questions 
about the code.

In templateInterpreter_<cpu>.cpp we only patch the bytecode with 
_fast_{x}putfield if the constant pool cache is resolved. This isn't 
always the case.  You can call putfield on a class that isn't completely 
initialized yet and don't want other threads to execute this code (we 
have a test for this somewhere).   Also, there's some conditional code 
with posting breakpoints.  If this bytecode is a breakpoint bytecode, we 
patch into the method's stream.   You might not have to do this.  It can 
try again to patch the bytecode if there's a breakpoint there.   Anyway, 
the code here has the patching unconditional, and I think it needs to be 
conditioned on these two things.

The second is that if I'm reading this right, the patching code assumes 
that the order of the fast bytecodes doesn't change.  This could 
silently break if we do change the order.   Maybe an assert is in order 
here.

I didn't see how this tests the flag value of RewriteBytecodes and 
RewriteFrequentPairs.   If Zero supports Class data sharing, you need to 
turn off this bytecode rewriting (unfortunately).

This looks like good work.   Someday someone has to break out the cases 
here into functions because I can't imagine how the C++ compiler can 
optimize this massive function.

Thanks,
Coleen

On 2/20/2013 3:45 AM, Roman Kennke wrote:
> Am Montag, den 18.02.2013, 12:30 -0800 schrieb Christian Thalinger:
>> On Feb 15, 2013, at 6:39 AM, Vitaly Davidovich <vitalyd at gmail.com> wrote:
>>
>>> Hi Roman,
>>>
>>> One suggestion I have is to add functions to JvmtiExport like has_field_access_watchers() and change this accordingly:
>>>
>>> count_addr = (int *)JvmtiExport::get_field_access_count_addr(); \
>>> 436     if ( *count_addr > 0 ) {
>>>
>>> The has_XXX function can simply have that code inside.
>>>
>>> Reason I think it's useful is because the field is an int but the get_* function declares address as return type requiring you to cast and thus know internal representation.  Seems cleaner to encapsulate inside JvmtiExport?
>> I agree on that but that would be beyond the scope of this change.  Roman only moved the code into a macro so he can reuse it in multiple places.  Actually I would prefer to move the code into a method because that would make the huge method a little smaller.
> I agree with both of you.  :-)
>
> I also tried to move this code into separate methods. However, this
> seems to be very difficult, because the CALL_VM macro depends on a lot
> of variables local to the interpreter loop: cp, locals, cache, thread,
> topOfStack, pc, istate and handle_exception, handle_Pop_frame
> (goto-labels). Yes, it is possible to somehow pass everything into such
> a method, but this seems awkward. I am not sure what is the lesser evil
> here.
>
> Roman
>
>> -- Chris
>>
>>> Thanks
>>>
>>> Sent from my phone
>>>
>>> On Feb 15, 2013 5:23 AM, "Roman Kennke" <rkennke at redhat.com> wrote:
>>> Hi David,
>>>
>>> Am Freitag, den 15.02.2013, 17:25 +1000 schrieb David Holmes:
>>>> There's a lot of changes to shared code here - what is the impact on
>>>> non-zero platforms?
>>> The changes to shared code are in bytecodeInterpreter.cpp, which is, to
>>> my knowledge, only used by Zero. *If* there are other interpreters that
>>> use it (maybe in closed trees?) they would most likely just pick up and
>>> benefit from the improvements. But of course, I cannot tell...
>>>
>>> Regards,
>>> Roman
>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 15/02/2013 4:51 AM, Christian Thalinger wrote:
>>>>> I've filed:
>>>>>
>>>>> 8008243: Zero: implement fast-bytecodes
>>>>>
>>>>> -- Chris
>>>>>
>>>>> On Feb 13, 2013, at 11:40 AM, Roman Kennke <rkennke at redhat.com> wrote:
>>>>>
>>>>>> The following change implements support for the following fast-bytecodes
>>>>>> in the Zero interpreter:
>>>>>>
>>>>>> fast_agetfield
>>>>>> fast_bgetfield
>>>>>> fast_cgetfield
>>>>>> fast_dgetfield
>>>>>> fast_fgetfield
>>>>>> fast_igetfield
>>>>>> fast_lgetfield
>>>>>> fast_sgetfield
>>>>>> fast_aputfield
>>>>>> fast_bputfield
>>>>>> fast_cputfield
>>>>>> fast_dputfield
>>>>>> fast_fputfield
>>>>>> fast_iputfield
>>>>>> fast_lputfield
>>>>>> fast_sputfield
>>>>>> fast_aload_0
>>>>>> fast_iaccess_0
>>>>>> fast_aaccess_0
>>>>>> fast_faccess_0
>>>>>> fast_iload
>>>>>> fast_iload2
>>>>>> fast_icaload
>>>>>> fast_invokevfinal
>>>>>>
>>>>>> All together this leads to a speedup of the interpreter of about 25%.
>>>>>>
>>>>>> Some notes:
>>>>>> - I extracted the JVMTI related blocks into a macro to avoid repetition.
>>>>>> - The field get/put opcodes are only rewritten for non-volatile
>>>>>> non-static field access, this makes the fast one really fast (no
>>>>>> additional branches needed), and static/volatile field accesses seem
>>>>>> rare enough anyway.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~rkennke/zero-fast-opcodes/webrev.00/
>>>>>>
>>>>>>
>>>>>> Opinions? Can this be included in hotspot? And Can I have a bug-ID?
>>>>>>
>>>>>> Best regards,
>>>>>> Roman
>>>>>>
>>>>>>
>>>
>