How to add a new intrinsic

Mon Nov 10 01:01:33 UTC 2014

Hi Jaromir,

Here's the slides of a presentation I did on HotSpot intrinsics:
http://www.slideshare.net/RednaxelaFX/green-teajug-hotspotintrinsics02232013
Hope it helps.

That said, back in Taobao we've implemented the intrinsics for the pause
instruction in HotSpot. From our experience, it's actually not a good
entry-level task for someone new to the HotSpot code base, because of its
semantics.

In my opinion, the correct way to implement a good intrinsic for the
"pause" instruction should do the following:

1. Don't use the "pause" name for the intrinsic. It's not a good name to
indicate its purpose. Gil suggested something like
"sun.misc.Unsafe.spinLoopHint()", which I think is much better than
"pause()". Let me refer to this intrinsic with the name "spinLoopHint()" in
the rest of this email.

2. The interpreter version of this intrinsic should actually be implemented
with the EmptyMethod intrinsic, instead of an actual x86 pause instruction.
The interpreter contains a dispatch loop itself (even though in the case of
the HotSpot template interpreter it's token-threaded / indirect-threaded,
but logically it's still a dispatch loop), whereas the loop that you want
to affect is a Java-level loop, which is one level of abstract away from
the interpreter. The pause instruction in the interpreter would be on a
different level from the Java-level loop.

The EmptyMethod intrinsic (-XX:+ UseFastEmptyMethods) is removed from the
current version of HotSpot VM already, because it interferes with the
tiered compilation system for not having method invocation counter update
logic. But you can probably revive the code for implementing the pause
intrinsic in the interpreter.

3. The C1 and C2 versions. These should only treat the "spinLoopHint()"
call as a hint. Instead of generating an explicit "PauseNode" in place of
this call, you should probably mark the hint on a LoopNode (or in the case
of C1, mark it on the basic block with the backedge). Then, only emit the
x86 pause instruction at the backedge if the loop is not a CountedLoop,
assuming spin loops shouldn't look like a counted loop.

This is very different from the way other intrinsics are implemented, say,
String.equals() or Unsafe.compareAndSwapInt(), where you could just treat
it as a call and unconditionally emit the code in place of the call.

Just my two cents.

- Kris

On Sun, Nov 9, 2014 at 12:38 PM, Jaromir Hamala <jaromir.hamala at gmail.com>
wrote:

> Hi Aleksey,
>
> thanks again for your feedback & help! I'm working on C1 right now and your
> examples made it way easier for me. I'll sort out the OCA thing.
>
> Cheers,
> Jaromir
>
>
>
> On Sun, Nov 9, 2014 at 7:48 PM, Aleksey Shipilev <
> aleksey.shipilev at oracle.com> wrote:
>
> > Hi Jaromir,
> >
> > On 11/09/2014 09:53 PM, Jaromir Hamala wrote:
> > > I do not have ambitions to include it in OpenJDK,
> >
> > Why not? Please follow the step 0 from here:
> > http://openjdk.java.net/contribute/ -- submit the OCA.
> >
> >
> > > but I greatly appreciate any feedback / help. I guess the next step
> > > for me is to include C1 and eventually C2 support - again - any
> > > pointers are very highly appreciated!
> >
> > This may be a shortest example for simple Unsafe intrinsic handled in
> > interpreter, C1 and C2 (notice how much shorter the interpreter code is,
> > since we "just" use the native methods as interpreter handlers):
> >   http://hg.openjdk.java.net/jdk8/jdk8/jdk/rev/ad6097d547e1
> >   http://hg.openjdk.java.net/jdk8/jdk8/hotspot/rev/1e41b0bc58a0
> >
> >
> > IIRC, for C1, you would need to:
> >
> >  1) Handle the intrinsic in LIRGenerator::do_Intrinsic
> > (c1_LIRGenerator.cpp), it should add the nodes to C1 IR, see e.g.
> > membar_acquire(). You will have to create a new LirOp, with 0 arguments,
> > in LIR_Code enum, say, "lir_pause".
> >
> >  2) Lower the lir_pause to machine code in LIR_Assembler::emit_op0 (see
> > c1_LIRAssembler.cpp). There should be a call to macro-assembler defining
> > the PAUSE instruction.
> >
> >
> > For C2, you would need to:
> >
> >  1) Add the intrinsic definitions and intrinsic code into
> > library_call.cpp/hpp. It would be easier to follow the code for some
> > already-existing simple intrinsic, see e.g. inline_unsafe_prefetch. Your
> > code should emit a new, special-named IR node, say, PauseNode.
> >
> >  2) Add the matching rule for PauseNode into architecture description
> > file (x86_32.ad or x86_64.ad). This file matches the IR node to the
> > concrete machine code to emit. It is usually macroed into assembler
> > call. E.g. PrefetchRead node with mem argument is matched to
> > Assembler::prefetchr in assembler_x86.cpp. There, the exact machine code
> > is emitted.
> >
> >
> > I think this is enough to make a working example.
> >
> > -Aleksey.
> >
> >
>
>
> --
> “Perfection is achieved, not when there is nothing more to add, but when
> there is nothing left to take away.”
> Antoine de Saint Exupéry
>