Request for review (L): 7071653: JSR 292: call site change notification should be pushed not pulled
Vladimir Kozlov
vladimir.kozlov at oracle.com
Mon Aug 8 07:55:32 PDT 2011
Christian,
Should we put "skip bytecode quickening" code under flag to do this only when invoke dynamic is enabled? Or put_code is
zero only in invoke dynamic case?
On 8/8/11 6:56 AM, Christian Thalinger wrote:
>> Why on sparc you use ld_ptr() to load from cache but on X86 and X64 you use movl() (only 32 bit)?
>
> Good question. I took the code from TemplateTable::resolve_cache_and_index without thinking about it and that one uses ld_ptr.
>
> _indices in CosntantPoolCacheEntry is defined as intx:
>
> volatile intx _indices; // constant pool index& rewrite bytecodes
>
> and bytecode 1 and 2 are in the upper 16-bit of the lower 32-bit word:
>
> // bit number |31 0|
> // bit length |-8--|-8--|---16----|
> // --------------------------------
> // _indices [ b2 | b1 | index ]
>
> Loading 32-bit on LE gives you the right bits but on BE it does not. I think that's the reason for the "optimization" on x64.
I don't like this "optimization" but I understand why we using it. Add a comment (especially in x64 file).
>>
>> I am concern about using next short branch in new code in templateTable_sparc.cpp:
>>
>> cmp_and_br_short(..., L_patch_done); // don't patch
>>
>> There is __ stop() call which generates a lot of code so that label L_patch_done could be far.
>
> Yeah, I thought I give it a try if it works. cmp_and_br_short should assert if the branch displacement is too far, right?
>
Yes, it will assert but may be only in some worst case which we do not test. For example, try to run 64 bit fastdebug VM
on Sparc + compressed oops + VerifyOops.
>>
>>
>> Why you added new #include into ciEnv.cpp and nmethod.cpp, what code needs it? Nothing else is changed in these files.
>
> Both files use dependencies and I got linkage errors on Linux while working on the fix (because of inline methods). It seems that the include is not required in ciEnv.cpp because ciEnv.hpp already includes it. I missed that. But nmethod.cpp needs it because nmethod.hpp only declares class Dependencies.
>
OK.
>
>>
>> Why you did not leave "volatile" call site inlining with guard? You did not explain why virtual call is fine for it.
>
> The spec of MutableCallSite says:
>
> "For target values which will be frequently updated, consider using a volatile call site instead."
>
> And VolatileCallSite says:
>
> "A VolatileCallSite is a CallSite whose target acts like a volatile variable. An invokedynamic instruction linked to a VolatileCallSite sees updates to its call site target immediately, even if the update occurs in another thread. There may be a performance penalty for such tight coupling between threads.
>
> Unlike MutableCallSite, there is no syncAll operation on volatile call sites, since every write to a volatile variable is implicitly synchronized with reader threads.
>
> In other respects, a VolatileCallSite is interchangeable with MutableCallSite."
>
> Since VolatileCallSite really should only be used when you know the target changes very often we don't do optimizations for this case. Obviously this is just a guess how people will use VolatileCallSite but I think for now this is a safe bet.
>
Thank you for explaining it.
> Additionally I had to do two small changes because the build was broken on some configurations:
>
> - klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : NULL;
> + klassOop new_type = _changes.is_klass_change() ? _changes.as_klass_change()->new_type() : (klassOop) NULL;
>
> and
>
> - MutexLockerEx ccl(CodeCache_lock, thread);
> + MutexLockerEx ccl(CodeCache_lock, Mutex::_no_safepoint_check_flag);
>
> I updated the webrev.
Good.
Vladimir
>
> -- Christian
>
>>
>>
>> Vladimir
>>
>> On 8/5/11 6:32 AM, Christian Thalinger wrote:
>>> http://cr.openjdk.java.net/~twisti/7071653
>>>
>>> 7071653: JSR 292: call site change notification should be pushed not pulled
>>> Reviewed-by:
>>>
>>> Currently every speculatively inlined method handle call site has a
>>> guard that compares the current target of the CallSite object to the
>>> inlined one. This per-invocation overhead can be removed if the
>>> notification is changed from pulled to pushed (i.e. deoptimization).
>>>
>>> I had to change the logic in TemplateTable::patch_bytecode to skip
>>> bytecode quickening for putfield instructions when the put_code
>>> written to the constant pool cache is zero. This is required so that
>>> every execution of a putfield to CallSite.target calls out to
>>> InterpreterRuntime::resolve_get_put to do the deoptimization of
>>> depending compiled methods.
>>>
>>> I also had to change the dependency machinery to understand other
>>> dependencies than class hierarchy ones. DepChange got the super-type
>>> of two new dependencies, KlassDepChange and CallSiteDepChange.
>>>
>>> Tested with JRuby tests and benchmarks, hand-written testcases, JDK
>>> tests and vm.mlvm tests.
>>>
>>> Here is the speedup for the JRuby fib benchmark (first is JDK 7 b147,
>>> second with 7071653). Since the CallSite targets don't change during
>>> the runtime of this benchmark we can see the performance benefit of
>>> eliminating the guard:
>>>
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>> 0.883000 0.000000 0.883000 ( 0.854000)
>>> 0.715000 0.000000 0.715000 ( 0.715000)
>>> 0.712000 0.000000 0.712000 ( 0.712000)
>>> 0.713000 0.000000 0.713000 ( 0.713000)
>>> 0.713000 0.000000 0.713000 ( 0.712000)
>>>
>>> $ jruby --server bench/bench_fib_recursive.rb 5 35
>>> 0.772000 0.000000 0.772000 ( 0.742000)
>>> 0.624000 0.000000 0.624000 ( 0.624000)
>>> 0.621000 0.000000 0.621000 ( 0.621000)
>>> 0.622000 0.000000 0.622000 ( 0.622000)
>>> 0.622000 0.000000 0.622000 ( 0.621000)
>>>
>
More information about the mlvm-dev
mailing list