RFR 8012371: Adjust Tiered compile threshold according to available space in codecache

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed May 15 09:04:15 PDT 2013


Thanks, Albert

Move CodeCache::reverse_free_ratio() under first condition to execute only when needed.
Put parenthesis around (a == b).
The comment is too complex, I think. How about this:

    // Increase C1 compile threshold when the code cache is filled more
    // than specified by IncreaseFirstTierCompileThresholdAt percentage.
    // The main intention is to keep enough free space for C2 compiled code
    // to achieve peak performance if the code cache is under stress.
    if ((TieredStopAtLevel == CompLevel_full_optimization) && (level != CompLevel_full_optimization))  {
      double current_reverse_free_ratio = CodeCache::reverse_free_ratio();
      if (current_reverse_free_ratio > _increase_threshold_at_ratio) {

Thanks,
Vladimir

On 5/15/13 6:33 AM, Albert Noll wrote:
> Hi all,
>
> thank you very much for your feedback and advice. I made the changes
> as proposed by Vladimir and Igor. Please see the webrev:
>
> http://cr.openjdk.java.net/~anoll/8012371/webrev.01/ <http://cr.openjdk.java.net/%7Eanoll/8012371/webrev.01/>
>
> Also, I attached a spread sheet that contains the performance
> evaluation for the proposed changes.
> NT .. non-tierd
> T .... tiered, without patch
> T-50 tiered with -XX:IncreaseFirstTierCompileThresholdAt=50
> T-75 tiered with -XX:IncreaseFirstTierCompileThresholdAt=75
> T-90 tiered with -XX:IncreaseFirstTierCompileThresholdAt=90
>
> The sheet contains three evaluations for different  values for ReservedCodeCacheSize
> - 80m
> - 160m
> - 240m
>
> All combinations are executed 3 times and the average is reported in
> the right-most columns.
>
> Executive summary:TieredCompilation provides better startup times,
> especially for benchmarks that are executed early (and as a result
> there is enough free code cache). There is no performance regression
> for benchmarks that are executed later (and the code cache is full).
>
> Please let me know what you think about the evaluation.
>
> Many thanks in advance,
> Albert
>
> On 15/05/2013 11:15, Igor Veresov wrote:
>> Another thought is that I it might be useful for future debugging to add "TieredStopAtLevel == CompLevel_full_optimization" to the predicate at advancedThresholdPolicy.cpp:211, otherwise the c1-only mode is penalized. That is if you guys haven't pushed yet, not really important.
>>
>> Thanks!
>> igor
>>
>> On May 14, 2013, at 1:45 PM, Vladimir Kozlov<vladimir.kozlov at oracle.com>  wrote:
>>
>>> Thank you, Igor
>>>
>>> For the record, these changes are based on your suggestion :)
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 5/14/13 1:25 PM, Igor Veresov wrote:
>>>> Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
>>>> That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.
>>>>
>>>> Looks good!
>>>>
>>>> igor
>>>>
>>>> On May 14, 2013, at 11:21 AM, Vladimir Kozlov<vladimir.kozlov at oracle.com>  wrote:
>>>>
>>>>> http://cr.openjdk.java.net/~kvn/8012371/webrev
>>>>>
>>>>> On 5/14/13 6:23 AM, Albert Noll wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I think I found a solution to the code cache fill-up-problem.
>>>>>
>>>>> The problem with the previous solution was that the threshold for
>>>>> recompilation is increased equally
>>>>> for different tiers. As a result, peak performance was not reached if
>>>>> the code cache was "rather full", since frequently invoked methods that
>>>>> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
>>>>> when using tiered compilation.
>>>>>
>>>>> The proposed solution to the problem is that we start increasing the
>>>>> threshold rather early (e.g., if the code cache is filled up by 50%),
>>>>> and do not increase the threshold for C2 compilation. As a result, we
>>>>> have enough space for C2 code (we reach peak performance).
>>>>> The drawback of this solution, of course, is that tiered compilation
>>>>> potentially performs worse than if we provide more code cache. However,
>>>>> this solution should not perform worse compared to not using tiered
>>>>> compilation.
>>>>>
>>>>> I evaluated the proposed changes using the nashorn benchmarks with
>>>>> ReservedCodeCacheSize=80m  letting all benchmarks run in the same JVM
>>>>> instance. We start increaseing the threshold for recompilation (not for
>>>>> recompiling to C2) when the code cache is filled up by 50%. The result
>>>>> is that
>>>>> the warning that the code cache is filled up and compilation stops is
>>>>> not printed out. Furthermore, we achieve similar peak performance
>>>>> compared to non-tiered but a faster startup time.
>>>>>
>>>>>
>>>>> Many thanks for your comments,
>>>>> Albert
>>>>>
>>>>> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>>>>>> And add product flag for initial ratio value so people can adjust it
>>>>>> as they wish.
>>>>>>
>>>>>> Vladimir
>>>>>>
>>>>>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>>>>>> Albert,
>>>>>>>
>>>>>>> You should start using Nashorn/octane for performance testing since
>>>>>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>>>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>>>>>> generated code.
>>>>>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>>>>>>
>>>>>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>>>>>> code.
>>>>>>>>> Without the changes, 2 runs generate 116mb code.
>>>>>>>> This is good but I mostly concern about effect on performance, startup
>>>>>>>> and peek. Also look on codecache usage with default size at the end of
>>>>>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>>>>>> number)
>>>>>>>> in output to see have behavior change. Note, third number in output is
>>>>>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>>>>>>
>>>>>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>>>>>> thread state.
>>>>>>>>
>>>>>>>> And I made mistake with my suggested expressions. If we want to scale
>>>>>>>> only for 25% and less space we need:
>>>>>>>>
>>>>>>>>   if (free_reverse_ratio > 4.) {
>>>>>>>>     k *= exp(free_reverse_ratio - 4.);
>>>>>>>>
>>>>>>>> But I will leave it to you to determine best ratio value by experiments
>>>>>>>> to get best results: get the same startup and peek with less codecache.
>>>>>>>> May be your 50% will be better value.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Albert
>>>>>>>>>
>>>>>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>
>>>>>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>>>>>> proposed.
>>>>>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>>>>>> below). Rick
>>>>>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>>>>>>
>>>>>>>>>> What do you think of the current version? Do you think we need to
>>>>>>>>>> evaluate the
>>>>>>>>>> performance impact of that change?
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Albert
>>>>>>>>>>
>>>>>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>>>>>> out an RFR?
>>>>>>>>>>
>>>>>>>>>> # To suppress the following error report, specify this argument
>>>>>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  Internal Error
>>>>>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>>>>>> pid=5663, tid=140406973806336
>>>>>>>>>> #  Error: ShouldNotReachHere()
>>>>>>>>>> #
>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>>>>>> 1.8.0-ea-b86)
>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>>>>>> #
>>>>>>>>>> # An error report file with more information is saved as:
>>>>>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>>>>>> #
>>>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>>>> #http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>>>>> #
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>>>>>> should be
>>>>>>>>>>> the same now.
>>>>>>>>>>>
>>>>>>>>>>> Make new method:
>>>>>>>>>>>   double CodeCache::free_space_ratio()
>>>>>>>>>>>
>>>>>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>>>>>> available
>>>>>>>>>>> for JIT code.
>>>>>>>>>>>
>>>>>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>>>>>> +        k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>>>>>>
>>>>>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>>>>>> think we can simplify code:
>>>>>>>>>>>
>>>>>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>>>>>   k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Vladimir
>>>>>>>>>>>
>>>>>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>>>
>>>>>>>>>>>> I looked at:https://jbs.oracle.com/bugs/browse/JDK-8012371  .
>>>>>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>>>>>> early feedback from you?
>>>>>>>>>>>>
>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>> Albert
>>>>>
>>>>>
>>>>>
>


More information about the hotspot-compiler-dev mailing list