RFR 8012371: Adjust Tiered compile threshold according to available space in codecache

Wed May 15 06:33:37 PDT 2013

Hi all,

thank you very much for your feedback and advice. I made the changes
as proposed by Vladimir and Igor. Please see the webrev:

http://cr.openjdk.java.net/~anoll/8012371/webrev.01/ 
<http://cr.openjdk.java.net/%7Eanoll/8012371/webrev.01/>

Also, I attached a spread sheet that contains the performance
evaluation for the proposed changes.
NT .. non-tierd
T .... tiered, without patch
T-50 tiered with -XX:IncreaseFirstTierCompileThresholdAt=50
T-75 tiered with -XX:IncreaseFirstTierCompileThresholdAt=75
T-90 tiered with -XX:IncreaseFirstTierCompileThresholdAt=90

The sheet contains three evaluations for different  values for 
ReservedCodeCacheSize
- 80m
- 160m
- 240m

All combinations are executed 3 times and the average is reported in
the right-most columns.

Executive summary:TieredCompilation provides better startup times,
especially for benchmarks that are executed early (and as a result
there is enough free code cache). There is no performance regression
for benchmarks that are executed later (and the code cache is full).

Please let me know what you think about the evaluation.

Many thanks in advance,
Albert

On 15/05/2013 11:15, Igor Veresov wrote:
> Another thought is that I it might be useful for future debugging to add "TieredStopAtLevel == CompLevel_full_optimization" to the predicate at advancedThresholdPolicy.cpp:211, otherwise the c1-only mode is penalized. That is if you guys haven't pushed yet, not really important.
>
> Thanks!
> igor
>
> On May 14, 2013, at 1:45 PM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>
>> Thank you, Igor
>>
>> For the record, these changes are based on your suggestion :)
>>
>> Thanks,
>> Vladimir
>>
>> On 5/14/13 1:25 PM, Igor Veresov wrote:
>>> Yup, I had something similar in mind. But it didn't occur to me that you can just throttle C1 exclusively. Awesome solution!
>>> That way you just throttle C1 compiles, and in the meanwhile you'll just use the interpreter for profiling.
>>>
>>> Looks good!
>>>
>>> igor
>>>
>>> On May 14, 2013, at 11:21 AM, Vladimir Kozlov <vladimir.kozlov at oracle.com> wrote:
>>>
>>>> http://cr.openjdk.java.net/~kvn/8012371/webrev
>>>>
>>>> On 5/14/13 6:23 AM, Albert Noll wrote:
>>>>
>>>> Hi,
>>>>
>>>> I think I found a solution to the code cache fill-up-problem.
>>>>
>>>> The problem with the previous solution was that the threshold for
>>>> recompilation is increased equally
>>>> for different tiers. As a result, peak performance was not reached if
>>>> the code cache was "rather full", since frequently invoked methods that
>>>> WERE compiled to C2 in a non-tiered version WERE NOT compiled with C2
>>>> when using tiered compilation.
>>>>
>>>> The proposed solution to the problem is that we start increasing the
>>>> threshold rather early (e.g., if the code cache is filled up by 50%),
>>>> and do not increase the threshold for C2 compilation. As a result, we
>>>> have enough space for C2 code (we reach peak performance).
>>>> The drawback of this solution, of course, is that tiered compilation
>>>> potentially performs worse than if we provide more code cache. However,
>>>> this solution should not perform worse compared to not using tiered
>>>> compilation.
>>>>
>>>> I evaluated the proposed changes using the nashorn benchmarks with
>>>> ReservedCodeCacheSize=80m  letting all benchmarks run in the same JVM
>>>> instance. We start increaseing the threshold for recompilation (not for
>>>> recompiling to C2) when the code cache is filled up by 50%. The result
>>>> is that
>>>> the warning that the code cache is filled up and compilation stops is
>>>> not printed out. Furthermore, we achieve similar peak performance
>>>> compared to non-tiered but a faster startup time.
>>>>
>>>>
>>>> Many thanks for your comments,
>>>> Albert
>>>>
>>>> On 07/05/2013 22:38, Vladimir Kozlov wrote:
>>>>> And add product flag for initial ratio value so people can adjust it
>>>>> as they wish.
>>>>>
>>>>> Vladimir
>>>>>
>>>>> On 5/7/13 9:40 AM, Vladimir Kozlov wrote:
>>>>>> Albert,
>>>>>>
>>>>>> You should start using Nashorn/octane for performance testing since
>>>>>> TieredCompilatation has big effect on it. Roland can help you with it.
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 5/7/13 9:08 AM, Vladimir Kozlov wrote:
>>>>>>> On 5/7/13 6:49 AM, Albert Noll wrote:
>>>>>>>> Hi Vladimir,
>>>>>>>>
>>>>>>>> I performed a preliminary evaluation of the effects on the size of
>>>>>>>> generated code.
>>>>>>>> I used the eclipse benchmark from the DaCapo benchmarks.
>>>>>>>> In the test, I limited the ReservedCodeCacheSize to 32m
>>>>>>>>
>>>>>>>> With the changes in advancedThresholdPolicy, 2 runs generate 76mb
>>>>>>>> code.
>>>>>>>> Without the changes, 2 runs generate 116mb code.
>>>>>>> This is good but I mostly concern about effect on performance, startup
>>>>>>> and peek. Also look on codecache usage with default size at the end of
>>>>>>> execution. Use -XX:PrintCompilation which has time stamps (first
>>>>>>> number)
>>>>>>> in output to see have behavior change. Note, third number in output is
>>>>>>> compilation type: 3 - C1 with profiling, 4 - C2.
>>>>>>>
>>>>>>> Sorry about dexp() suggestion, yes it needs to be called at correct
>>>>>>> thread state.
>>>>>>>
>>>>>>> And I made mistake with my suggested expressions. If we want to scale
>>>>>>> only for 25% and less space we need:
>>>>>>>
>>>>>>>   if (free_reverse_ratio > 4.) {
>>>>>>>     k *= exp(free_reverse_ratio - 4.);
>>>>>>>
>>>>>>> But I will leave it to you to determine best ratio value by experiments
>>>>>>> to get best results: get the same startup and peek with less codecache.
>>>>>>> May be your 50% will be better value.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Vladimir
>>>>>>>
>>>>>>>> Best,
>>>>>>>> Albert
>>>>>>>>
>>>>>>>> On 05/07/2013 02:13 PM, Albert Noll wrote:
>>>>>>>>> Hi Vladimir,
>>>>>>>>>
>>>>>>>>> thank you very much for your feedback. I made the changes as you
>>>>>>>>> proposed.
>>>>>>>>> I could not use SharedRuntime::dexp(d), since the VM crashed (see
>>>>>>>>> below). Rick
>>>>>>>>> explained me why: (the current thread is in a wrong state).
>>>>>>>>>
>>>>>>>>> What do you think of the current version? Do you think we need to
>>>>>>>>> evaluate the
>>>>>>>>> performance impact of that change?
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Albert
>>>>>>>>>
>>>>>>>>> P.S.: Is it OK if I ask you for early feedback, or should I just send
>>>>>>>>> out an RFR?
>>>>>>>>>
>>>>>>>>> # To suppress the following error report, specify this argument
>>>>>>>>> # after -XX: or in .hotspotrc: SuppressErrorAt=/gcLocker.cpp:223
>>>>>>>>> #
>>>>>>>>> # A fatal error has been detected by the Java Runtime Environment:
>>>>>>>>> #
>>>>>>>>> #  Internal Error
>>>>>>>>> (/export/anoll/JDK-8012371/src/share/vm/memory/gcLocker.cpp:223),
>>>>>>>>> pid=5663, tid=140406973806336
>>>>>>>>> #  Error: ShouldNotReachHere()
>>>>>>>>> #
>>>>>>>>> # JRE version: Java(TM) SE Runtime Environment (8.0-b86) (build
>>>>>>>>> 1.8.0-ea-b86)
>>>>>>>>> # Java VM: Java HotSpot(TM) 64-Bit Server VM
>>>>>>>>> (25.0-b32-internal-fastdebug mixed mode linux-amd64 compressed oops)
>>>>>>>>> # Failed to write core dump. Core dumps have been disabled. To enable
>>>>>>>>> core dumping, try "ulimit -c unlimited" before starting Java again
>>>>>>>>> #
>>>>>>>>> # An error report file with more information is saved as:
>>>>>>>>> # /export/anoll/hs_err_pid5663.log
>>>>>>>>> #
>>>>>>>>> # If you would like to submit a bug report, please visit:
>>>>>>>>> #   http://bugreport.sun.com/bugreport/crash.jsp
>>>>>>>>> #
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 05/06/2013 11:01 PM, Vladimir Kozlov wrote:
>>>>>>>>>> I don't think it should be only C1 specific code. Before Tiered code
>>>>>>>>>> we had decay counters code for all compilers. So I think it
>>>>>>>>>> should be
>>>>>>>>>> the same now.
>>>>>>>>>>
>>>>>>>>>> Make new method:
>>>>>>>>>>   double CodeCache::free_space_ratio()
>>>>>>>>>>
>>>>>>>>>> Also subtract CodeCacheMinimumFreeSpace to get correct size
>>>>>>>>>> available
>>>>>>>>>> for JIT code.
>>>>>>>>>>
>>>>>>>>>> I would prefer to have one scaling expression which starts with *1
>>>>>>>>>> and end with e**k. But it would be nice to have switch (from one
>>>>>>>>>> expression to an other) the same (graph without steps). For example,
>>>>>>>>>> the next code will sharply increase scale by 2 which is not good:
>>>>>>>>>> +        k += (free_ratio < 0.50) ? 1/free_ratio : 0;
>>>>>>>>>>
>>>>>>>>>> Also you use 2 divisions when you could use just one. And 50% empty
>>>>>>>>>> is too early. If we start at 25% and use SharedRuntime::dexp(d) I
>>>>>>>>>> think we can simplify code:
>>>>>>>>>>
>>>>>>>>>> double free_reverse_ratio = max_capacity / unallocated_capacity;
>>>>>>>>>> if (free_reverse_ratio > 2.) {
>>>>>>>>>>   k *= SharedRuntime::dexp(free_reverse_ratio - 2.);
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Vladimir
>>>>>>>>>>
>>>>>>>>>> On 5/6/13 6:00 AM, Albert Noll wrote:
>>>>>>>>>>> Hi Vladimir,
>>>>>>>>>>>
>>>>>>>>>>> I looked at: https://jbs.oracle.com/bugs/browse/JDK-8012371 .
>>>>>>>>>>> I attached a possible solution to this mail. Could I get some
>>>>>>>>>>> early feedback from you?
>>>>>>>>>>>
>>>>>>>>>>> Many thanks,
>>>>>>>>>>> Albert
>>>>
>>>>
>>>>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130515/4a34745f/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nashorn-benchmarks-all-summary.ods
Type: application/vnd.oasis.opendocument.spreadsheet
Size: 210488 bytes
Desc: not available
Url : http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20130515/4a34745f/nashorn-benchmarks-all-summary-0001.ods