RemoveNeverExecutedCode

Deneau, Tom tom.deneau at amd.com
Fri Aug 22 20:01:53 UTC 2014


Thanks, Yudi.

Although if CompileThreshold is 10000, I still don't see how this would explain in my case where the profiled execution count was 512 less than the actual execution count.

-- Tom

-----Original Message-----
From: Yudi Zheng [mailto:yudi.zheng at usi.ch] 
Sent: Friday, August 22, 2014 2:05 PM
To: Tom Rodriguez
Cc: Deneau, Tom; graal-dev at openjdk.java.net
Subject: Re: RemoveNeverExecutedCode 

I had the same problem before..

The threshold is defined in share/vm/interpreter/invocationCounter.cpp#InvocationCounter::reinitialize

> InterpreterProfileLimit = ((CompileThreshold * InterpreterProfilePercentage) / 100)<< number_of_noncount_bits;

where InterpreterProfilePercentage is 33 by default.

On X86_64, it is used in cpu/x86/vm/templateInterpreter_x86_64.cpp#InterpreterGenerator::generate_counter_incr

>     if (ProfileInterpreter && profile_method != NULL) {
>       // Test to see if we should create a method data oop
>       __ cmp32(rcx, ExternalAddress((address)&InvocationCounter::InterpreterProfileLimit));
>       __ jcc(Assembler::less, *profile_method_continue);
> 
>       // if no method data exists, go to profile_method
>       __ test_method_data_pointer(rax, *profile_method);
>     }


Initially, the invocation counter is stored in the MethodCounters*.
If the profiling is enabled, the interpreter creates a MethodData when the threshold is met.

> class Method : public Metadata {
>  friend class VMStructs;
>  private:
>   ConstMethod*      _constMethod;                // Method read-only data.
>   MethodData*       _method_data;
>   MethodCounters*   _method_counters;



Yudi


On 22 Aug 2014, at 19:44, Tom Rodriguez <tom.rodriguez at oracle.com> wrote:

> 
> On Aug 22, 2014, at 9:10 AM, Deneau, Tom <tom.deneau at amd.com> wrote:
> 
>> Doug --
>> 
>> Yes, for the bytecode in question this tells me that the branchProbability is 1.0 but I was trying to understand why.
>> 
>> For instance, 10000 elements in array, the first 10 should all not take the branch, but I see...
>> 
>> executionCount at 13: 9488; branchProbability at 13: 1.000000; exceptionSeen at 13: FALSE;
>> 
>> It looks like maybe the profiling only kicks in after the first 512 times a bytecode is executed (since executionCount above is 9488)?   Also if I move the location of the differing elements to past 512, I see that it gets recorded. Is there some logic like that in the hotspot profiling interpreter?
> 
> Profiling only kicks in after a certain number of invocations.  It's usually 33% of the first tier compile threshold, though I think it works differently in tiered.  I'm having trouble finding the exact logic.  Anyway, it's a profile so there's no guarantee about what's in it.  For unreachable code it will eventually come to a proper stable state about what has been reached which is all we normally care about.
> 
> For GPU you might want a more conservative notion of never executed, possibly taking into account the maturity of the branch itself or maybe trusting those counts is just a bad idea for GPU.  Is that what you're trying to figure out?
> 
> tom
> 
>> 
>> -- Tom
>> 
>> 
>> 
>> -----Original Message-----
>> From: Doug Simon [mailto:doug.simon at oracle.com] 
>> Sent: Friday, August 22, 2014 7:39 AM
>> To: Deneau, Tom
>> Cc: Tom Rodriguez; graal-dev at openjdk.java.net
>> Subject: Re: RemoveNeverExecutedCode
>> 
>> You should get all the info you need with ResolvedJavaMethod.getProfilingInfo().toString().
>> 
>> On Aug 22, 2014, at 2:34 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>> 
>>> I don't know the details of how the profile is collected but...
>>> In my case with 2000000 elements and 1 different, in the profile, the branchTakenProbability is showing up as 1.0.
>>> For other combinations of elements and differs I saw this:
>>>   Elems    Differs      BTP 
>>>  1000000    1000        < 1.0
>>>  1000000      10        1.0
>>>   100000      10        1.0
>>>    10000      10        1.0
>>>     1000      10        < 1.0
>>> 
>>> -- Tom
>>> 
>>> 
>>> -----Original Message-----
>>> From: Tom Rodriguez [mailto:tom.rodriguez at oracle.com] 
>>> Sent: Thursday, August 21, 2014 7:12 PM
>>> To: Deneau, Tom
>>> Cc: graal-dev at openjdk.java.net
>>> Subject: Re: RemoveNeverExecutedCode
>>> 
>>> 
>>> On Aug 21, 2014, at 4:39 PM, Deneau, Tom <tom.deneau at amd.com> wrote:
>>> 
>>>> I am trying to benchmark an HSAIL kernel that is implementing the following lambda
>>>>  n -> {
>>>>    if (a1[n] != a2[n]) {
>>>>      isEqual = false;
>>>>   }
>>>> 
>>>> where a1 and a2 are arrays of doubles.   So it is basically doing an Arrays.equals.
>>>> 
>>>> I set up the test to have 2,000,000 elements in the arrays and one of them does not match.
>>>> Before I compile for HSAIL, I enable some profiling by running the above lambda on the cpu in non-parallel mode (IntStream.range().forEach()) so the lambda gets executed 2000000 times (and takes the false branch once).
>>>> 
>>>> But even though one of the elements does not match, when I compile thru graal with the default -G:+RemoveNeverExecutedCode, I see that the isEqual = false path has been considered "not executed" and has been removed.
>>>> 
>>>> Is taking a branch once out of 2,000,000 times considered "not executed"?
>>>> Or is there some other flaw here?
>>> 
>>> Remember this means never executed according to the profile, so if it doesn't show up in the profile then it didn't happen.  Did the profile include the taken branch?
>>> 
>>> tom
>>> 
>>>> 
>>>> Of course I can work around this  in this case by forcing -G:-RemoveNeverExecutedCode but I'd like to understand this.
>>>> This is running in -server mode.
>>>> 
>>>> -- Tom
>>>> 
>>> 
>> 
> 



More information about the graal-dev mailing list