RemoveNeverExecutedCode
Yudi Zheng
yudi.zheng at usi.ch
Fri Aug 22 19:05:00 UTC 2014
I had the same problem before..
The threshold is defined in share/vm/interpreter/invocationCounter.cpp#InvocationCounter::reinitialize
> InterpreterProfileLimit = ((CompileThreshold * InterpreterProfilePercentage) / 100)<< number_of_noncount_bits;
where InterpreterProfilePercentage is 33 by default.
On X86_64, it is used in cpu/x86/vm/templateInterpreter_x86_64.cpp#InterpreterGenerator::generate_counter_incr
> if (ProfileInterpreter && profile_method != NULL) {
> // Test to see if we should create a method data oop
> __ cmp32(rcx, ExternalAddress((address)&InvocationCounter::InterpreterProfileLimit));
> __ jcc(Assembler::less, *profile_method_continue);
>
> // if no method data exists, go to profile_method
> __ test_method_data_pointer(rax, *profile_method);
> }
Initially, the invocation counter is stored in the MethodCounters*.
If the profiling is enabled, the interpreter creates a MethodData when the threshold is met.
> class Method : public Metadata {
> friend class VMStructs;
> private:
> ConstMethod* _constMethod; // Method read-only data.
> MethodData* _method_data;
> MethodCounters* _method_counters;
Yudi
On 22 Aug 2014, at 19:44, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:
>
> On Aug 22, 2014, at 9:10 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
>
>> Doug --
>>
>> Yes, for the bytecode in question this tells me that the branchProbability is 1.0 but I was trying to understand why.
>>
>> For instance, 10000 elements in array, the first 10 should all not take the branch, but I see...
>>
>> executionCount at 13: 9488; branchProbability at 13: 1.000000; exceptionSeen at 13: FALSE;
>>
>> It looks like maybe the profiling only kicks in after the first 512 times a bytecode is executed (since executionCount above is 9488)? Also if I move the location of the differing elements to past 512, I see that it gets recorded. Is there some logic like that in the hotspot profiling interpreter?
>
> Profiling only kicks in after a certain number of invocations. It’s usually 33% of the first tier compile threshold, though I think it works differently in tiered. I’m having trouble finding the exact logic. Anyway, it’s a profile so there’s no guarantee about what’s in it. For unreachable code it will eventually come to a proper stable state about what has been reached which is all we normally care about.
>
> For GPU you might want a more conservative notion of never executed, possibly taking into account the maturity of the branch itself or maybe trusting those counts is just a bad idea for GPU. Is that what you’re trying to figure out?
>
> tom
>
>>
>> -- Tom
>>
>>
>>
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com]
>> Sent: Friday, August 22, 2014 7:39 AM
>> To: Deneau, Tom
>> Cc: Tom Rodriguez; graal-dev at openjdk.java.net
>> Subject: Re: RemoveNeverExecutedCode
>>
>> You should get all the info you need with ResolvedJavaMethod.getProfilingInfo().toString().
>>
>> On Aug 22, 2014, at 2:34 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>
>>> I don't know the details of how the profile is collected but...
>>> In my case with 2000000 elements and 1 different, in the profile, the branchTakenProbability is showing up as 1.0.
>>> For other combinations of elements and differs I saw this:
>>> Elems Differs BTP
>>> 1000000 1000 < 1.0
>>> 1000000 10 1.0
>>> 100000 10 1.0
>>> 10000 10 1.0
>>> 1000 10 < 1.0
>>>
>>> -- Tom
>>>
>>>
>>> -----Original Message-----
>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com]
>>> Sent: Thursday, August 21, 2014 7:12 PM
>>> To: Deneau, Tom
>>> Cc: graal-dev at openjdk.java.net
>>> Subject: Re: RemoveNeverExecutedCode
>>>
>>>
>>> On Aug 21, 2014, at 4:39 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>>
>>>> I am trying to benchmark an HSAIL kernel that is implementing the following lambda
>>>> n -> {
>>>> if (a1[n] != a2[n]) {
>>>> isEqual = false;
>>>> }
>>>>
>>>> where a1 and a2 are arrays of doubles. So it is basically doing an Arrays.equals.
>>>>
>>>> I set up the test to have 2,000,000 elements in the arrays and one of them does not match.
>>>> Before I compile for HSAIL, I enable some profiling by running the above lambda on the cpu in non-parallel mode (IntStream.range().forEach()) so the lambda gets executed 2000000 times (and takes the false branch once).
>>>>
>>>> But even though one of the elements does not match, when I compile thru graal with the default -G:+RemoveNeverExecutedCode, I see that the isEqual = false path has been considered "not executed" and has been removed.
>>>>
>>>> Is taking a branch once out of 2,000,000 times considered "not executed"?
>>>> Or is there some other flaw here?
>>>
>>> Remember this means never executed according to the profile, so if it doesn't show up in the profile then it didn't happen. Did the profile include the taken branch?
>>>
>>> tom
>>>
>>>>
>>>> Of course I can work around this in this case by forcing -G:-RemoveNeverExecutedCode but I'd like to understand this.
>>>> This is running in -server mode.
>>>>
>>>> -- Tom
>>>>
>>>
>>
>
More information about the graal-dev
mailing list