RFR 8012371: Adjust Tiered compile threshold according to available space in codecache
Vladimir Kozlov
vladimir.kozlov at oracle.com
Tue May 14 13:45:49 PDT 2013
Thank you, Igor
For the record, these changes are based on your suggestion :)
Thanks,
Vladimir
On 5/14/13 1:25 PM, Igor Veresov wrote:
> Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
> That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.
>
> Looks good!
>
> igor
>
> On May 14, 2013, at 11:21 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> http://cr.openjdk.java.net/~kvn/8012371/webrev
>>
>> On 5/14/13 6:23 AM, Albert Noll wrote:
>>
>> Hi,
>>
>> I think I found a solution to the code cache fill-up-problem.
>>
>> The problem with the previous solution was that the threshold for
>> recompilation is increased equally
>> for different tiers. As a result, peak performance was not reached if
>> the code cache was "rather full", since frequently invoked methods that
>> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
>> when using tiered compilation.
>>
>> The proposed solution to the problem is that we start increasing the
>> threshold rather early (e.g., if the code cache is filled up by 50%),
>> and do not increase the threshold for C2 compilation. As a result, we
>> have enough space for C2 code (we reach peak performance).
>> The drawback of this solution, of course, is that tiered compilation
>> potentially performs worse than if we provide more code cache. However,
>> this solution should not perform worse compared to not using tiered
>> compilation.
>>
>> I evaluated the proposed changes using the nashorn benchmarks with
>> ReservedCodeCacheSize=80m letting all benchmarks run in the same JVM
>> instance. We start increaseing the threshold for recompilation (not for
>> recompiling to C2) when the code cache is filled up by 50%. The result
>> is that
>> the warning that the code cache is filled up and compilation stops is
>> not printed out. Furthermore, we achieve similar peak performance
>> compared to non-tiered but a faster startup time.
>>
>>
>> Many thanks for your comments,
>> Albert
>>
>> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>>> And add product flag for initial ratio value so people can adjust it
>>> as they wish.
>>>
>>> Vladimir
>>>
>>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>>> Albert,
>>>>
>>>> You should start using Nashorn/octane for performance testing since
>>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>>> Hi Vladimir,
>>>>>>
>>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>>> generated code.
>>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>>>
>>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>>> code.
>>>>>> Without the changes, 2 runs generate 116mb code.
>>>>>
>>>>> This is good but I mostly concern about effect on performance, startup
>>>>> and peek. Also look on codecache usage with default size at the end of
>>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>>> number)
>>>>> in output to see have behavior change. Note, third number in output is
>>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>>>
>>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>>> thread state.
>>>>>
>>>>> And I made mistake with my suggested expressions. If we want to scale
>>>>> only for 25% and less space we need:
>>>>>
>>>>> if (free_reverse_ratio > 4.) {
>>>>> k *= exp(free_reverse_ratio - 4.);
>>>>>
>>>>> But I will leave it to you to determine best ratio value by experiments
>>>>> to get best results: get the same startup and peek with less codecache.
>>>>> May be your 50% will be better value.
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Best,
>>>>>> Albert
>>>>>>
>>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>>> Hi Vladimir,
>>>>>>>
>>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>>> proposed.
>>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>>> below). Rick
>>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>>>
>>>>>>> What do you think of the current version? Do you think we need to
>>>>>>> evaluate the
>>>>>>> performance impact of that change?
>>>>>>>
>>>>>>> Best,
>>>>>>> Albert
>>>>>>>
>>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>>> out an RFR?
>>>>>>>
>>>>>>> # To suppress the following error report, specify this argument
>>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>>> #
>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>> #
>>>>>>> # Internal Error
>>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>>> pid=5663, tid=140406973806336
>>>>>>> # Error: ShouldNotReachHere()
>>>>>>> #
>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>>> 1.8.0-ea-b86)
>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>>> #
>>>>>>> # An error report file with more information is saved as:
>>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>>> #
>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>> # http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>> #
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>>> should be
>>>>>>>> the same now.
>>>>>>>>
>>>>>>>> Make new method:
>>>>>>>> double CodeCache::free_space_ratio()
>>>>>>>>
>>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>>> available
>>>>>>>> for JIT code.
>>>>>>>>
>>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>>> + k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>>>
>>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>>> think we can simplify code:
>>>>>>>>
>>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>> k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>>> }
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>> I looked at: https://jbs.oracle.com/bugs/browse/JDK-8012371 .
>>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>>> early feedback from you?
>>>>>>>>>
>>>>>>>>> Many thanks,
>>>>>>>>> Albert
>>>>>>>
>>>>>>
>>
>>
>>
>>
>
More information about the hotspot-compiler-dev
mailing list