[PATCH] 8217561 : X86: Add floating-point Math.min/max intrinsics, approval request

Vladimir Kozlov vladimir.kozlov at oracle.com
Tue Feb 26 20:50:02 UTC 2019


Hi Bernard,

Unfortunately HotSpot does not collect profiling information per call site.
Currently calee's data is collected from all call sites. For example, you can have 2 calls with same *swapped* arguments 
and you will get your balance == 0.
I would assume if you have multiply call sites in application the profiling data will be "flat" - balance value will be 
near 0.

We did experiment few years ago to collect data per call site but we got negative result - no significant improvement in 
code quality in cost of a lot more memory used for MethodData.
But your case (and cmove) is special since MDO is small for these methods. The only downside is to implement that you 
will need a lot more complex changes in HotSpot.

https://bugs.openjdk.java.net/browse/JDK-8015416

Note, your code is correct in a sense that it use the same profiling data as it currently used for inlining these 
methods. So I don't want to reject changes from the start.

The only simple solution I can think of is to use intrinsic only for hottest call site. Compare invocation_count with 
caller's call site count to find if it is most used site which contributed the most profiling data.

Thanks,
Vladimir

On 2/26/19 3:16 AM, B. Blaser wrote:
> On Mon, 25 Feb 2019 at 19:42, Andrew Haley <aph at redhat.com> wrote:
>>
>> On 2/25/19 4:09 PM, B. Blaser wrote:
>>> Here it is:
>>>
>>> http://cr.openjdk.java.net/~bsrbnd/jdk8217561/webrev.05/
>>>
>>> I've just added the interesting use-case of inserting elements in a search tree.
>>> The insertion is absolutely unpredictable with random elements whereas
>>> fully predictable with sorted elements.
>>>
>>> Benchmarking this example is hard as it'd compute the average
>>> insertion time in a search tree which highly depends on its
>>> topography.
>>> That's why I decided to add it to the test-case doing some manual
>>> measurement using 'System.nanoTime()'.
>>>
>>> This experiment revealed at no surprise that the intrinsic is really
>>> worth being used only in the non-predictable case which the heuristic
>>> is for.
>>>
>>> Any Reviewer feedback and hopefully approval would be welcome
>>> (hotspot:tier1 is still OK on x86_64 xeon)!
>>
>> 6612 #ifdef AMD64
>>
>> Why? It's the same for every out-of-order processor with speculation.
>>
>> And the code *really* needs a comment to say what you're doing.
> 
> Thanks for your comments, I've updated the patch accordingly here under.
> 
> As it affects shared code, should I already push it to jdk/submit?
> Is the following procedure up-to-date?
> 
> https://wiki.openjdk.java.net/display/Build/Submit+Repo
> 
> Thanks,
> Bernard
> 
> 
> diff --git a/src/hotspot/share/opto/library_call.cpp
> b/src/hotspot/share/opto/library_call.cpp
> --- a/src/hotspot/share/opto/library_call.cpp
> +++ b/src/hotspot/share/opto/library_call.cpp
> @@ -6609,6 +6609,31 @@
> 
>   //------------------------------inline_fp_min_max------------------------------
>   bool LibraryCallKit::inline_fp_min_max(vmIntrinsics::ID id) {
> +  // The intrinsic should be used only when the API branches aren't
> predictable,
> +  // the last one performing the most important comparison. The
> following heuristic
> +  // uses the branch statistics to eventually bail out if necessary.
> +
> +  ciMethod *c = callee();
> +  ciMethodData *d = c->method_data();
> +
> +  int taken = 0, not_taken = 0;
> +  int invocations = d->invocation_count();
> +
> +  if (invocations > 0) {
> +    for (ciProfileData *p = d->first_data(); d->is_valid(p); p =
> d->next_data(p)) {
> +      if (p->is_BranchData()) {
> +        taken = ((ciBranchData*)p)->taken();
> +        not_taken = ((ciBranchData*)p)->not_taken();
> +      }
> +    }
> +
> +    double balance = (((double)taken) - ((double)not_taken)) /
> ((double)invocations);
> +    balance = balance < 0 ? -balance : balance;
> +    if ( balance > 0.2 ) {
> +      return false;
> +    }
> +  }
> +
>     Node *a = NULL;
>     Node *b = NULL;
>     Node *n = NULL;
> 


More information about the hotspot-compiler-dev mailing list