RFR: 8074345: Enable RewriteBytecodes when VM runs with CDS
Yumin Qi
yumin.qi at oracle.com
Wed Mar 25 16:55:09 UTC 2015
Thanks
On 3/25/2015 9:38 AM, Ioi Lam wrote:
> Hi Yumin,
>
> The changes look good. Just a few nits:
>
> *src/share/vm/interpreter/bytecodes.hpp:**
> *
> 293 // Rewritten at CDS dump time to | Original bytecode
> 294 // _invoke_virtual rewritten on sparc, will be disabled if
> UseSharedSpaces turned on.
> 295 // ------------------------------+------------------
> 296 _nofast_getfield , // <- _getfield
> 297 _nofast_putfield , // <- _putfield
> 298 _nofast_aload_0 , // <- _aload_0
> 299 _nofast_iload , // <- _iload
>
> I think it should be reformatted to line up the columns:
>
> 293 // Rewritten at CDS dump time to | Original bytecode
> 295 // ------------------------------+------------------
> 296 _nofast_getfield , // <- _getfield
> 297 _nofast_putfield , // <- _putfield
> 298 _nofast_aload_0 , // <- _aload_0
> 299 _nofast_iload , // <- _iload
> 230 // NOTE: _invoke_virtual is rewritten only on sparc. This will
> be disabled if
> // UseSharedSpaces turned on.
>
Sure.
> *src/share/vm/interpreter/rewriter.cpp:*
>
> There are many places that modify the Method object. Instead of
> putting asserts at all the places where an actual modification
> happens, I think it's better to use only one assert at the Rewriter
> entry point, and remove the other assets that you added:
>
> 516 void Rewriter::rewrite(instanceKlassHandle klass, TRAPS) {
> + if (!DumpSharedSpaces) {
> + assert(!MetaspaceShared::is_in_shared_space(klass()), "archive
> methods must not be rewritten at run time");
> + }
> 517 ResourceMark rm(THREAD);
> 518 Rewriter rw(klass, klass->constants(), klass->methods(),
> CHECK);
> 519 // (That's all, folks.)
> 520 }
>
Sure.
> Also, I am not sure if the PPC directories in the repo have been
> 'locked' or not, but I guess you will find out when you do the push.
>
I will get rid of the ppc change in next webrev.
Yumin
> I am not a Reviewer, so probably Coleen needs to look at this as well.
>
> Thanks
> - Ioi
>
> On 3/20/15, 1:53 PM, Yumin Qi wrote:
>> Hi, Coleen and all
>>
>> New version with suggested changes can be reviewed at:
>> http://cr.openjdk.java.net/~minqi/8074345/webrev02/
>>
>> Removed _fast_invokeinvirtual from last version, disable rewriting
>> _invokevirtual if UseSharedSpaces turned on. Only on sparc
>> _invokevirtual got rewritten. Other platforms as unimplemented.
>>
>> Thanks
>> Yumin
>>
>> On 3/11/2015 1:23 PM, Yumin Qi wrote:
>>> Thanks, I will have another webrev after build/test/perf test.
>>>
>>> Yumin
>>>
>>> On 3/11/2015 1:11 PM, Coleen Phillimore wrote:
>>>>
>>>> Hi Yumin, One comment embedded.
>>>>
>>>> On 3/11/15, 2:04 PM, Yumin Qi wrote:
>>>>> Hi, Coleen
>>>>>
>>>>> Thanks for the review. See embedded.
>>>>>
>>>>> On 3/10/2015 2:54 PM, Coleen Phillimore wrote:
>>>>>>
>>>>>> Yumin,
>>>>>>
>>>>>> The new bytecode approach came out pretty cleanly, or as cleanly
>>>>>> as this could be.
>>>>>>
>>>>>> The file templateTable_x86_32 and 64 have just been merged, so
>>>>>> you'll have to make your change in the new version.
>>>>>>
>>>>>> I also have some comments:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/rewriter.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>> Typo "rewirting"
>>>>>>
>>>>> Sure.
>>>>>> In these files, can you break up the long lines into three lines?
>>>>>>
>>>>>> - if (!is_static) { patch_bytecode(Bytecodes::_fast_fgetfield,
>>>>>> Rbc, Rscratch); }
>>>>>> + if (!is_static && rc == MAY_REWRITE) {
>>>>>> patch_bytecode(Bytecodes::_fast_fgetfield, Rbc, Rscratch); }
>>>>>>
>>>>> Sure.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/bytecodes.hpp.udiff.html
>>>>>>
>>>>>>
>>>>>> How many bytecodes do we have now? We're limited to 255 (or 256)
>>>>>> and there are other new bytecodes being added.
>>>>>>
>>>>> now total is 234 (after the fix). See below answer.
>>>>>> What was the performance benefit to this? I think if we wanted
>>>>>> to be conservative, we'd turn off RewriteFrequentPairs and only
>>>>>> do nofast_getfield and nofast_putfield. I think they were the
>>>>>> only bytecodes that actually affected performance.
>>>>>>
>>>>>> In this file above, can you remove the last block of comments
>>>>>> about fast_linearswitch and fast_ldc? I think this confuses
>>>>>> rewriting in the interpreter and rewriting in the rewriter, or
>>>>>> rather makes the confusion worse. I don't think this comment is
>>>>>> helpful.
>>>>>>
>>>>>> I'd prefer to see the first comment smaller also, like:
>>>>>>
>>>>>> + // These bytecodes are rewritten at CDS dump time, so that
>>>>>> we can prevent them from being
>>>>>> + // rewritten at run time. This way, the ConstMethods can be
>>>>>> placed in the CDS ReadOnly
>>>>>> + // section, and RewriteByteCodes/RewriteFrequentPairs can
>>>>>> rewrite non-CDS bytecodes
>>>>>> + // at run time.
>>>>>> + _nofast_getfield ,
>>>>>> + _nofast_putfield ,
>>>>>> + _nofast_aload_0 ,
>>>>>> + _nofast_iload ,
>>>>>> + _nofast_invokevirtual ,
>>>>>>
>>>>>> It's sort of obvious which bytecode they rewrite. I don't know
>>>>>> how much performance fast_invokevfinal is worth. I thought I
>>>>>> deleted it. Can we not rewrite this so we don't waste another
>>>>>> bytecode on it? Maybe add a RewriteVFinal option and consider
>>>>>> removing it for the future? x86 doesn't use it and I can't see
>>>>>> how this would save any significant performance to be worth having!
>>>>>>
>>>>> _invokevirtual got rewritten on sparc and ppc. Now ppc is removed,
>>>>> no need to take care for it. For sparc, it does patch code. I am
>>>>> thinking of a way if we need to add _nofast_code as you indicated,
>>>>> we only have 255 codes to use.
>>>>
>>>> PPC isn't removed from the open repository.
>>>>>
>>>>> bool not_rewrite = UseSharedSpaces && RewriteBytecodes &&
>>>>> RewriteFrequentPair;
>>>>
>>>> I think the conditional would be
>>>>
>>>> bool not_rewrite = UseSharedSpaces || !RewriteBytecodes;
>>>>>
>>>>> Can this boolean decide if we not rewrite the bytecode to fast?
>>>>> If so, I can remove all the _nofast_code and do not patch code
>>>>> when it is on.
>>>>
>>>> Yes, this would be nice to not add the bytecode.
>>>>
>>>>>> http://cr.openjdk.java.net/~minqi/8074345/src/share/vm/interpreter/templateTable.hpp.udiff.html
>>>>>>
>>>>>>
>>>>>> + enum RewriteControl { MAY_REWRITE, MAY_NOT_REWRITE }; //
>>>>>> control for fast code under CDS
>>>>>>
>>>>>>
>>>>>> I don't know what our coding standard is but in the
>>>>>> templateTable_<cpu>.cpp files these strings look like macros. I'd
>>>>>> rather see them as MayRewrite or MayNotRewrite.
>>>>>>
>>>>> Agree.
>>>>>> http://cr.openjdk.java.net/~minqi/8074345/src/cpu/sparc/vm/templateTable_sparc.cpp.udiff.html
>>>>>>
>>>>>>
>>>>>> I think there's a java_code() function that returns the original
>>>>>> bytecode that you could use instead of the case statement in
>>>>>> resolve_cache_and_index(). The indentation is odd in the
>>>>>> webrev. This probably applies to the other cpu directories.
>>>>>>
>>>>>> One last question below:
>>>>>>
>>>>>> On 3/5/15, 4:21 PM, Yumin Qi wrote:
>>>>>>> Please review:
>>>>>>>
>>>>>>> bug: https://bugs.openjdk.java.net/browse/JDK-8074345
>>>>>>> webrev: http://cr.openjdk.java.net/~minqi/8074345/
>>>>>>>
>>>>>>> Summary: Currently CDS when is disabled, RewriteBytecodes and
>>>>>>> RewriteFrequentPairs are disabled due to ConstantMethod in CDS
>>>>>>> are mapped read only. So memory fault will be triggered when
>>>>>>> RewriteBytecodes turned on. This also disable all method
>>>>>>> rewritten, leads interpreter run slower. Observed about 2%
>>>>>>> regression with C2 on some benchmarks, since interpreter speed
>>>>>>> is important to C2. By enable RewriteBytecodes and
>>>>>>> RewriteFrequentPairs under CDS enabled, adding _nofast_xxxx for
>>>>>>> corresponding fast codes at dump time to avoid byte code
>>>>>>> rewritten at run time, we prevent byte code rewritten and modify
>>>>>>> the read only shared portion in CDS. Meanwhile other byte codes
>>>>>>> with fast codes still get speed up.
>>>>>>>
>>>>>>> Tests: JPRT, jtreg, refworkload (20+ benchmarks) on all
>>>>>>> supported platforms. Interpreter only tests showed about 3%
>>>>>>> improvement.
>>>>>>
>>>>>> What performance did you measure? Is it -Xint -Xshare:on with
>>>>>> and without your patch? It was only 3% better?
>>>>>>
>>>>>> What was the difference in performance with -Xint
>>>>>> -XX:-RewriteBytecodes vs. -Xint
>>>>>> -XX:+RewriteBytecodes/FrequentPairs? I thought this was around 15%.
>>>>>>
>>>>> I will send you a separate email of the links which run with
>>>>> CDS/NoCDS/CDS+Int
>>>>
>>>> Thanks for the links. From your experiments, I think your
>>>> performance improvement with your patch and CDS with -Xmixed is 4%.
>>>> That's good enough for a couple of bytecodes.
>>>>
>>>> Coleen
>>>>
>>>>>
>>>>> Thanks
>>>>> Yumin
>>>>>> Thanks,
>>>>>> Coleen
>>>>>>
>>>>>>>
>>>>>>> Thanks
>>>>>>> Yumin
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
More information about the hotspot-runtime-dev
mailing list