Adding an intrinsic to the interpreter

Paul Sandoz paul.sandoz at oracle.com
Wed Sep 16 09:36:57 UTC 2015


Hi,

Here is a quick a dirty patch:

  http://cr.openjdk.java.net/~psandoz/tmp/interpreter-unsafe-getLong-intrinsic/webrev/

It wires up getLong and also getLongUnaligned (if unaligned access is supported) to an intrinsic. Seems to work, but it am not entirely sure i did things correctly regarding the generating method.

Benchmark with results is here:

  http://cr.openjdk.java.net/~psandoz/tmp/interpreter-unsafe-getLong-intrinsic/LongAccess.java

When the intrinsic is enabled the costs are reduced.

Here are some benchmark results run against the lexico patch comparing array equals:

# VM options: -XX:+UnlockDiagnosticVMOptions -XX:+UseUnsafeInterpreterIntrinsics -Xint
Benchmark              (lastNEQ)   (n)  Mode  Cnt      Score      Error  Units
LongArray.base_equals       true     1  avgt   10    203.886 ±    3.971  ns/op
LongArray.base_equals       true  1024  avgt   10  15695.233 ±  204.514  ns/op
LongArray.jdk_equals        true     1  avgt   10    860.569 ±   14.528  ns/op
LongArray.jdk_equals        true  1024  avgt   10  65302.751 ± 1129.216  ns/op
ByteArray.base_equals       true     1  avgt   10    210.963 ±    2.743  ns/op
ByteArray.base_equals       true  1024  avgt   10  14883.093 ±  387.772  ns/op
ByteArray.jdk_equals        true     1  avgt   10    277.830 ±    5.126  ns/op
ByteArray.jdk_equals        true  1024  avgt   10   8935.940 ±  121.070  ns/op

# VM options: -XX:-UnlockDiagnosticVMOptions -XX:-UseUnsafeInterpreterIntrinsics -Xint
Benchmark              (lastNEQ)   (n)  Mode  Cnt       Score       Error  Units
LongArray.base_equals       true     1  avgt   10     212.514 ±    23.749  ns/op
LongArray.base_equals       true  1024  avgt   10   16191.692 ±   717.162  ns/op
LongArray.jdk_equals        true     1  avgt   10    1057.496 ±   102.620  ns/op
LongArray.jdk_equals        true  1024  avgt   10  355476.908 ± 12577.777  ns/op
ByteArray.base_equals       true     1  avgt   10     200.575 ±     3.199  ns/op
ByteArray.base_equals       true  1024  avgt   10   14907.001 ±   297.510  ns/op
ByteArray.jdk_equals        true     1  avgt   10     270.780 ±     2.692  ns/op
ByteArray.jdk_equals        true  1024  avgt   10   44466.436 ±   623.087  ns/op

The cost is reduced and in the case of bytes there is an improvement once the array length gets large enough.

I am not sure we can really make more improvements to reduce the cost per-element without going further up the stack and that defeats the purpose of not pushing specialisations down into the VM. And i suspect given the high cost of making invocations in the interpreter such differences are likely to be less of a concern in real world cases where C1/C2 kick in.

My conclusion is we have a potential tweak we can use if necessary.


On 15 Sep 2015, at 06:42, John Rose <john.r.rose at oracle.com> wrote:

> On Sep 14, 2015, at 11:35 AM, Paul Sandoz <paul.sandoz at oracle.com> wrote:
>> 
>> Thanks. Those patches provides a useful guide of changes required. If I take the plunge I would prefer to tackle getLong, as a quicker hack, rather than vectorizedMismatch in terms of the generated machine code.
> 
> I agree this is worth a first try.  Try to intrinsify the smaller bits first.
> The interpreter has math intrinsics (AbstractInterpreter::java_lang_math_sqrt / vmIntrinsics::_dsqrt)
> The enum in AbsInterp predates the vmIntrinsics enum, and there is duplication between them.
> 

Thanks, yes i see the mapping.


> If we add new special cases to AbstractInterpreter, they might just vector through the Method::_intrinsic_id slot to a leaf C function.
> (Perhaps different distinct leaf-function signatures get distinct MethodKind values.)
> The extra indirections (via a function pointer table indexed by intrinsic_id) are noise in the interpreter.
> The existing hardwired math functions could (in principle) be treated this way, as an additional cleanup.
> 

I am not really following all of that. How can we wire up to a C function rather than generating machine code?

Paul.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150916/5cc213f6/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 841 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/attachments/20150916/5cc213f6/signature.asc>


More information about the hotspot-compiler-dev mailing list