RFR: 8074345: Enable RewriteBytecodes when VM runs with CDS

Fri Mar 20 20:53:23 UTC 2015

Hi, Coleen and all

   New version with suggested changes can be reviewed at:
   http://cr.openjdk.java.net/~minqi/8074345/webrev02/

   Removed _fast_invokeinvirtual from last version, disable rewriting 
_invokevirtual if UseSharedSpaces turned on. Only on sparc 
_invokevirtual got rewritten.  Other platforms as unimplemented.

   Thanks
   Yumin

On 3/11/2015 1:23 PM, Yumin Qi wrote:
> Thanks, I will have another webrev after build/test/perf test.
>
> Yumin
>
> On 3/11/2015 1:11 PM, Coleen Phillimore wrote:
>>
>> Hi Yumin,  One comment embedded.
>>
>> On 3/11/15, 2:04 PM, Yumin Qi wrote:
>>> Hi, Coleen
>>>
>>>   Thanks for the review. See embedded.
>>>
>>> On 3/10/2015 2:54 PM, Coleen Phillimore wrote:
>>>>
>>>> Yumin,
>>>>
>>>> The new bytecode approach came out pretty cleanly, or as cleanly as 
>>>> this could be.
>>>>
>>>> The file templateTable_x86_32 and 64 have just been merged, so 
>>>> you'll have to make your change in the new version.
>>>>
>>>> I also have some comments:
>>>>
>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/rewriter.cpp.udiff.html 
>>>>
>>>>
>>>> Typo "rewirting"
>>>>
>>> Sure.
>>>> In these files, can you break up the long lines into three lines?
>>>>
>>>> -  if (!is_static) { patch_bytecode(Bytecodes::_fast_fgetfield, 
>>>> Rbc, Rscratch); }
>>>> +  if (!is_static && rc == MAY_REWRITE) { 
>>>> patch_bytecode(Bytecodes::_fast_fgetfield, Rbc, Rscratch); }
>>>>
>>> Sure.
>>>>
>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/bytecodes.hpp.udiff.html 
>>>>
>>>>
>>>> How many bytecodes do we have now?  We're limited to 255 (or 256) 
>>>> and there are other new bytecodes being added.
>>>>
>>> now total is 234 (after the fix). See below answer.
>>>> What was the performance benefit to this?   I think if we wanted to 
>>>> be conservative, we'd turn off RewriteFrequentPairs and only do 
>>>> nofast_getfield and nofast_putfield.  I think they were the only 
>>>> bytecodes that actually affected performance.
>>>>
>>>> In this file above, can you remove the last block of comments about 
>>>> fast_linearswitch and fast_ldc?  I think this confuses rewriting in 
>>>> the interpreter and rewriting in the rewriter, or rather makes the 
>>>> confusion worse.  I don't think this comment is helpful.
>>>>
>>>> I'd prefer to see the first comment smaller also, like:
>>>>
>>>> +    // These bytecodes are rewritten at CDS dump time, so that we 
>>>> can prevent them from being
>>>> +    // rewritten at run time. This way, the ConstMethods can be 
>>>> placed in the CDS ReadOnly
>>>> +    // section, and RewriteByteCodes/RewriteFrequentPairs can 
>>>> rewrite non-CDS bytecodes
>>>> +    // at run time.
>>>> +    _nofast_getfield      ,
>>>> +    _nofast_putfield      ,
>>>> +    _nofast_aload_0       ,
>>>> +    _nofast_iload         ,
>>>> +    _nofast_invokevirtual ,
>>>>
>>>> It's sort of obvious which bytecode they rewrite.  I don't know how 
>>>> much performance fast_invokevfinal is worth.  I thought I deleted 
>>>> it.  Can we not rewrite this so we don't waste another bytecode on 
>>>> it?  Maybe add a RewriteVFinal option and consider removing it for 
>>>> the future?  x86 doesn't use it and I can't see how this would save 
>>>> any significant performance to be worth having!
>>>>
>>> _invokevirtual got rewritten on sparc and ppc. Now ppc is removed, 
>>> no need to take care for it. For sparc, it does patch code. I am 
>>> thinking of a way if we need to add _nofast_code as you indicated, 
>>> we only have 255 codes to use.
>>
>> PPC isn't removed from the open repository.
>>>
>>>    bool not_rewrite = UseSharedSpaces && RewriteBytecodes && 
>>> RewriteFrequentPair;
>>
>> I think the conditional would be
>>
>>    bool not_rewrite = UseSharedSpaces || !RewriteBytecodes;
>>>
>>>    Can this boolean decide if we not rewrite the bytecode to fast? 
>>> If so, I can remove all the _nofast_code and do not patch code when 
>>> it is on.
>>
>> Yes, this would be nice to not add the bytecode.
>>
>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/templateTable.hpp.udiff.html 
>>>>
>>>>
>>>> +  enum RewriteControl { MAY_REWRITE, MAY_NOT_REWRITE };  // 
>>>> control for fast code under CDS
>>>>
>>>>
>>>> I don't know what our coding standard is but in the 
>>>> templateTable_<cpu>.cpp files these strings look like macros. I'd 
>>>> rather see them as MayRewrite or MayNotRewrite.
>>>>
>>> Agree.
>>>> http://cr.openjdk.java.net/~minqi/8074345/src/cpu/sparc/vm/templateTable_sparc.cpp.udiff.html 
>>>>
>>>>
>>>> I think there's a java_code() function that returns the original 
>>>> bytecode that you could use instead of the case statement in 
>>>> resolve_cache_and_index().   The indentation is odd in the webrev. 
>>>> This probably applies to the other cpu directories.
>>>>
>>>> One last question below:
>>>>
>>>> On 3/5/15, 4:21 PM, Yumin Qi wrote:
>>>>> Please review:
>>>>>
>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8074345
>>>>> webrev: http://cr.openjdk.java.net/~minqi/8074345/
>>>>>
>>>>> Summary: Currently CDS when is disabled, RewriteBytecodes and 
>>>>> RewriteFrequentPairs are disabled due to ConstantMethod in CDS are 
>>>>> mapped read only. So memory fault will be triggered when 
>>>>> RewriteBytecodes turned on. This also disable all method 
>>>>> rewritten, leads interpreter run slower. Observed about 2% 
>>>>> regression with C2 on some benchmarks, since interpreter speed is 
>>>>> important to C2. By enable RewriteBytecodes and 
>>>>> RewriteFrequentPairs under CDS enabled, adding _nofast_xxxx for 
>>>>> corresponding fast codes at dump time to avoid byte code rewritten 
>>>>> at run time, we prevent byte code rewritten and modify the read 
>>>>> only shared portion in CDS. Meanwhile other byte codes with fast 
>>>>> codes still get speed up.
>>>>>
>>>>> Tests: JPRT, jtreg, refworkload (20+ benchmarks) on all supported 
>>>>> platforms.  Interpreter only tests showed about 3% improvement.
>>>>
>>>> What performance did you measure?  Is it -Xint -Xshare:on with and 
>>>> without your patch?  It was only 3% better?
>>>>
>>>> What was the difference in performance with -Xint 
>>>> -XX:-RewriteBytecodes vs. -Xint 
>>>> -XX:+RewriteBytecodes/FrequentPairs? I thought this was around 15%.
>>>>
>>> I will send you a separate email of the links which run with 
>>> CDS/NoCDS/CDS+Int
>>
>> Thanks for the links.  From your experiments, I think your 
>> performance improvement with your patch and CDS with -Xmixed is 4%. 
>> That's good enough for a couple of bytecodes.
>>
>> Coleen
>>
>>>
>>> Thanks
>>> Yumin
>>>> Thanks,
>>>> Coleen
>>>>
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>
>>>>
>>>
>>
>