RFR 8012371: Adjust Tiered compile threshold according to available space in codecache

Vladimir Kozlov vladimir.kozlov at oracle.com
Wed May 15 09:04:15 PDT 2013

Thanks, Albert

Move CodeCache::reverse_free_ratio() under first condition to execute only when needed.
Put parenthesis around (a == b).
The comment is too complex, I think. How about this:

    // Increase C1 compile threshold when the code cache is filled more
    // than specified by IncreaseFirstTierCompileThresholdAt percentage.
    // The main intention is to keep enough free space for C2 compiled code
    // to achieve peak performance if the code cache is under stress.
    if ((TieredStopAtLevel == CompLevel_full_optimization) && (level != CompLevel_full_optimization))  {
      double current_reverse_free_ratio = CodeCache::reverse_free_ratio();
      if (current_reverse_free_ratio > _increase_threshold_at_ratio) {


On 5/15/13 6:33 AM, Albert Noll wrote:
> Hi all,
> thank you very much for your feedback and advice. I made the changes
> as proposed by Vladimir and Igor. Please see the webrev:
> http://cr.openjdk.java.net/~anoll/8012371/webrev.01/ <http://cr.openjdk.java.net/%7Eanoll/8012371/webrev.01/>
> Also, I attached a spread sheet that contains the performance
> evaluation for the proposed changes.
> NT .. non-tierd
> T .... tiered, without patch
> T-50 tiered with -XX:IncreaseFirstTierCompileThresholdAt=50
> T-75 tiered with -XX:IncreaseFirstTierCompileThresholdAt=75
> T-90 tiered with -XX:IncreaseFirstTierCompileThresholdAt=90
> The sheet contains three evaluations for different  values for ReservedCodeCacheSize
> - 80m
> - 160m
> - 240m
> All combinations are executed 3 times and the average is reported in
> the right-most columns.
> Executive summary:TieredCompilation provides better startup times,
> especially for benchmarks that are executed early (and as a result
> there is enough free code cache). There is no performance regression
> for benchmarks that are executed later (and the code cache is full).
> Please let me know what you think about the evaluation.
> Many thanks in advance,
> Albert
> On 15/05/2013 11:15, Igor Veresov wrote:
>> Another thought is that I it might be useful for future debugging to add "TieredStopAtLevel == CompLevel_full_optimization" to the predicate at advancedThresholdPolicy.cpp:211, otherwise the c1-only mode is penalized. That is if you guys haven't pushed yet, not really important.
>> Thanks!
>> igor
>> On May 14, 2013, at 1:45 PM, Vladimir Kozlov<vladimir.kozlov at oracle.com>  wrote:
>>> Thank you, Igor
>>> For the record, these changes are based on your suggestion :)
>>> Thanks,
>>> Vladimir
>>> On 5/14/13 1:25 PM, Igor Veresov wrote:
>>>> Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
>>>> That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.
>>>> Looks good!
>>>> igor
>>>> On May 14, 2013, at 11:21 AM, Vladimir Kozlov<vladimir.kozlov at oracle.com>  wrote:
>>>>> http://cr.openjdk.java.net/~kvn/8012371/webrev
>>>>> On 5/14/13 6:23 AM, Albert Noll wrote:
>>>>> Hi,
>>>>> I think I found a solution to the code cache fill-up-problem.
>>>>> The problem with the previous solution was that the threshold for
>>>>> recompilation is increased equally
>>>>> for different tiers. As a result, peak performance was not reached if
>>>>> the code cache was "rather full", since frequently invoked methods that
>>>>> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
>>>>> when using tiered compilation.
>>>>> The proposed solution to the problem is that we start increasing the
>>>>> threshold rather early (e.g., if the code cache is filled up by 50%),
>>>>> and do not increase the threshold for C2 compilation. As a result, we
>>>>> have enough space for C2 code (we reach peak performance).
>>>>> The drawback of this solution, of course, is that tiered compilation
>>>>> potentially performs worse than if we provide more code cache. However,
>>>>> this solution should not perform worse compared to not using tiered
>>>>> compilation.
>>>>> I evaluated the proposed changes using the nashorn benchmarks with
>>>>> ReservedCodeCacheSize=80m  letting all benchmarks run in the same JVM
>>>>> instance. We start increaseing the threshold for recompilation (not for
>>>>> recompiling to C2) when the code cache is filled up by 50%. The result
>>>>> is that
>>>>> the warning that the code cache is filled up and compilation stops is
>>>>> not printed out. Furthermore, we achieve similar peak performance
>>>>> compared to non-tiered but a faster startup time.
>>>>> Many thanks for your comments,
>>>>> Albert
>>>>> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>>>>>> And add product flag for initial ratio value so people can adjust it
>>>>>> as they wish.
>>>>>> Vladimir
>>>>>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>>>>>> Albert,
>>>>>>> You should start using Nashorn/octane for performance testing since
>>>>>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>>>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>>>>>> generated code.
>>>>>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>>>>>> code.
>>>>>>>>> Without the changes, 2 runs generate 116mb code.
>>>>>>>> This is good but I mostly concern about effect on performance, startup
>>>>>>>> and peek. Also look on codecache usage with default size at the end of
>>>>>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>>>>>> number)
>>>>>>>> in output to see have behavior change. Note, third number in output is
>>>>>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>>>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>>>>>> thread state.
>>>>>>>> And I made mistake with my suggested expressions. If we want to scale
>>>>>>>> only for 25% and less space we need:
>>>>>>>>   if (free_reverse_ratio > 4.) {
>>>>>>>>     k *= exp(free_reverse_ratio - 4.);
>>>>>>>> But I will leave it to you to determine best ratio value by experiments
>>>>>>>> to get best results: get the same startup and peek with less codecache.
>>>>>>>> May be your 50% will be better value.
>>>>>>>> Thanks,
>>>>>>>> Vladimir
>>>>>>>>> Best,
>>>>>>>>> Albert
>>>>>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>>>>>> Hi Vladimir,
>>>>>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>>>>>> proposed.
>>>>>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>>>>>> below). Rick
>>>>>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>>>>>> What do you think of the current version? Do you think we need to
>>>>>>>>>> evaluate the
>>>>>>>>>> performance impact of that change?
>>>>>>>>>> Best,
>>>>>>>>>> Albert
>>>>>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>>>>>> out an RFR?
>>>>>>>>>> # To suppress the following error report, specify this argument
>>>>>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>>>>>> #
>>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>>> #
>>>>>>>>>> #  Internal Error
>>>>>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>>>>>> pid=5663, tid=140406973806336
>>>>>>>>>> #  Error: ShouldNotReachHere()
>>>>>>>>>> #
>>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>>>>>> 1.8.0-ea-b86)
>>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>>>>>> #
>>>>>>>>>> # An error report file with more information is saved as:
>>>>>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>>>>>> #
>>>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>>>> #http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>>>>> #
>>>>>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>>>>>> should be
>>>>>>>>>>> the same now.
>>>>>>>>>>> Make new method:
>>>>>>>>>>>   double CodeCache::free_space_ratio()
>>>>>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>>>>>> available
>>>>>>>>>>> for JIT code.
>>>>>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>>>>>> +        k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>>>>>> think we can simplify code:
>>>>>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>>>>>   k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>>>>>> }
>>>>>>>>>>> Regards,
>>>>>>>>>>> Vladimir
>>>>>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>>> I looked at:https://jbs.oracle.com/bugs/browse/JDK-8012371  .
>>>>>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>>>>>> early feedback from you?
>>>>>>>>>>>> Many thanks,
>>>>>>>>>>>> Albert

More information about the hotspot-compiler-dev mailing list