[9] RFR(S): 8059606: Enable per-method usage of CompileThresholdScaling (per-method compilation thresholds)

Tue Jan 20 13:29:32 UTC 2015

Hi Vladimir,

thanks for the review!

On 01/19/2015 07:36 PM, Vladimir Kozlov wrote:
> What was changed in globalDefinitions.hpp? Webrev shows nothing 
> (sometimes it does not show spacing changes).

Yes, it was some whitespace I've removed around line 1150:

-  uintptr_t p =  1;
+  uintptr_t p = 1;

The newest webrev does not show that either. In addition to that, I've 
updated two comments in the newest webrev.

> In arguments.cpp, please, add check that CompileThresholdScaling is 
> not negative.

I added the check. If CompileThresholdScaling is negative, we set its 
value to 1.0 (the default value).

Here is the new webrev: http://cr.openjdk.java.net/~zmajo/8059606/webrev.04/

The webrev link is the same as the one in my latest reply to John.

All JPRT tests pass.

Thank you!

Best regards,

Zoltan

>
> Otherwise this looks good to me.
>
> Thanks,
> Vladimir
>
> On 1/19/15 8:56 AM, Zoltán Majó wrote:
>> Hi John,
>>
>>
>> On 01/16/2015 08:59 PM, John Rose wrote:
>>> On Jan 16, 2015, at 5:05 AM, Zoltán Majó <zoltan.majo at oracle.com> 
>>> wrote:
>>>> Here is the new webrev: 
>>>> http://cr.openjdk.java.net/~zmajo/8059606/webrev.02/
>>> Reviewed, with some comments.
>>
>> thank you for your review and for the feedback! Please see detailed 
>> comments below.
>>
>>> Overall, I like the cleanups along the way.
>>>
>>> The basic idea of replacing a hard-coded 'mask' with an addressable 
>>> variable is sound and nicely executed.
>>>
>>> I suppose that idea by itself is "S" small, but this really is a "M" 
>>> or even "L" change, as Vladimir says, especially
>>> since the enhanced logic is spread all around many files.
>>
>> I agree that this is not a small change any more, but I would like to 
>> keep the subject of this discussion unchanged so
>> that we don't move to a different thread (unless you or Vladimir 
>> think it is better to change it).
>>
>>> How have you regression tested this?
>>
>> I used our performance infrastructure. I collected performance data 
>> on 6 different architectures for 41 benchmark
>> programs/suites. The data show that per-method compilation thresholds 
>> do not result in a statistically significant
>> change of performance. One benchmark, Footprint3-Client, degrades 
>> ~0.5% on the X86 Client VM, but I think that is
>> negligible.
>>
>>> Have you verified that the compilation sequence doesn't change for a 
>>> non-trivial use case?  A slip in the assembly
>>> code (for example) might cause a comparison against a garbage mask 
>>> or limit that could cause compilation decisions to
>>> go crazy quietly.  I didn't spot any such bug, but they are hard to 
>>> spot and sometimes quiet.
>>
>> I did extensive manual testing on all architectures targeted by the 
>> patch.
>>
>> I checked that a target method (a method for which per-method 
>> compilation thresholds have been specified) is indeed
>> compiled sooner/later than methods for which global compilation 
>> thresholds are in effect. I ran tests for all
>> combinations of +/-TieredCompilation, +/-ProfileInterpreter, and 
>> +/-UseOnStackReplacement
>> on each architecture that we support. While working on this issue 
>> I've discovered and reported two interpreter-related
>> bugs, 8068652 and 8068505.
>>
>> I cannot, unfortunately, guarantee that my changes are error-free, 
>> but I did my best to catch any possible error that I
>> can think of and that I can check.
>>
>>> In the sparc assembly code (more than one file), the live range of 
>>> Rcounters has increased, since it is used to supply
>>> limits as well as to update the counter (which happens early in the 
>>> code).
>>>
>>> To make it easier to maintain the code, I suggest renaming Rcounters 
>>> to G3_method_counters.
>>> (As you can see, when a register has a logical name but has a 
>>> complicated live range, we add the hardware name is to
>>> the logical name, to make it easier to spot interfering uses, when 
>>> manually editing code.)
>>
>> Thanks for the suggestion, I've changed the register's name.
>>
>>> If scale==0.0 is a valid input checked specially in compileBroker, 
>>> perhaps the effect of a zero should be documented?
>>> Suggest adding to globals.hpp:
>>>     "; values greater than one delay counter overflow; zero forces 
>>> counter overflow immediately"
>>>     "; can be set as a per-method  option."
>>
>> If CompileThresholdScaling is set to 0.0, all methods are interpreted.
>>
>> The reason is that CompileThreshold==0 has been historically 
>> equivalent to setting -Xint to true. Setting
>> CompileThresholdScaling to 0.0 scales down CompileThreshold to 0 and 
>> we wanted to keep VM behavior consistent when we've
>> added support for the global CompileThresholdScaling flag in 
>> JDK-8059604.
>>
>> If you think it would be good if we changed the meaning of 
>> CompileThreshold==0, please let me know. I'll file an RFE for
>> it and change it. As CompileThreshold is a product flag, we will need 
>> CCC approval for that change.
>>
>> As you've suggested, I added a comment to globals.hpp that precisely 
>> describes the behavior of CompileThresholdScaling.
>>
>>> Question:  What if both a global scale and a method option for scale 
>>> are both set?  Is the global one ignored?  Do
>>> they multiply?  It's worth specifying it explicitly (and checking 
>>> that the logic DTRT).
>>
>> Global and per-method values multiply. That behavior is now described 
>> in the comment in globals.hpp.
>>
>>> Question:  How are the log values (like Tier0InvokeNotifyFreqLog or 
>>> the result of get_scaled_freq_log) constrained to
>>> fit in a reasonable range?  (I would suppose that range is 0..31 or 
>>> less.)  Should we have range clipping code in the
>>> scaler functions?
>>
>> Currently InvocationCounter::number_of_count_bits=29 bits are 
>> reserved for counting. As a result, the value of the log2
>> of the notification frequency can be at most 30. I updated the source 
>> code accordingly.
>>
>>> It would give notably simpler code, in MethodData::init, to use a 
>>> branch-free setup of tier_0_invoke_notify_freq_log
>>> etc.  Set scale = 1.0 and then update it conditionally. Special-case 
>>> scale=1.0 in get_scaled_freq_log to taste.
>>
>> I did that.
>>
>>> Same comment about less-branchy code for methodCounters.hpp.  (It's 
>>> better to have a one-way branch that sets up
>>> 'scale' than a two-way branch with duplicate setups of one or more 
>>> variables.)
>>
>> I changed the code according to your suggestions.
>>
>>> In MethodCounters, I think the conditional scaling of 
>>> _interpreter_backward_branch_limit is going to confuse someone,
>>> at some point.  It should be scaled, unconditionally, like its 
>>> sibling variables.  (That would remove another somewhat
>>> verbose initialization branch!)
>>
>> The value of the *global* interpreter backward branch limit 
>> (InvocationCounter::InterpreterBackwardBranchLimit) is
>> computed based on the value of CompileThreshold (in 
>> InvocationCounter::reinitialize()). The backward branch limit is
>> assigned a different value depending on whether interpreter profiling 
>> is enabled or not.
>>
>> The logic of the *per-method* interpreter backward branch limit 
>> (MethodCounters::_interpreter_backward_branch_limit) is
>> intended to be identical to that of the global branch limit. 
>> Therefore, I'm afraid that we have to keep the if-then-else
>> construct initializing _interpreter_backward_branch_limit in 
>> methodCounters.hpp. I hope that I understood right what
>> you've previously suggested.
>>
>>> Small style nit:  The noise-word "get_" is discouraged in the style 
>>> doc:
>>>
>>>>   • Getter accessor names are noun phrases, with no "get_" noise 
>>>> word. Boolean getters can also begin with "is_" or
>>>> "has_".
>>>>    https://wiki.openjdk.java.net/display/HotSpot/StyleGuide
>>> Arguments.cpp follows this rule partially (in older code maybe?).  
>>> It would be better to decrease counter-examples to
>>> the rule instead of increase them.
>>>
>>> Bigger style nit:  Since the functions are not getting a preset 
>>> value (from the arguments) but rather normalizing a
>>> provided argument value, I suggest naming them 
>>> "scale_compile_threshold" (i.e., a verb phrase instead of a noun
>>> phrase).  Again from the style doc:
>>>
>>>>   • Other method names are verb phrases, as if commands to the 
>>>> receiver.
>>
>> I did not know that, thanks for pointing it out to me. I changed
>>
>> get_scaled_compile_threshold -> scaled_compiled_threshold
>> get_scaled_freq_log -> scaled_freq_log.
>>
>>> Since you are providing overloads of the scaling functions, the 
>>> header file should either contain inline code for the
>>> convenience methods, or else document how the optional argument 
>>> ('scale') defaults.  I'd prefer inline code, since it
>>> is simple.  It's as much text to document with a comment as just to 
>>> supply the inline overload.
>>
>> I inlined the convenience methods into the header file.
>>
>> Here is the new webrev: 
>> http://cr.openjdk.java.net/~zmajo/8059606/webrev.03/
>>
>>>
>>> As I said before, nice work!
>>
>> Thank you!
>>
>> Best regards,
>>
>>
>> Zoltan
>>
>>>
>>> — John
>>