8226721: Missing intrinsics for Math.ceil, floor, rint

Mon Sep 9 12:31:45 UTC 2019

Hi Jatin and Doug,

I think we could probably also add a stub routine and call it from the
interpreter along with C1?
I experimented with something like below for fp min/max intrinsics
which showed a great gain in both standard and reduction scenarios.

In 'stubGenerator_x86_64.cpp':

address generate_libmMinD() {
  StubCodeMark mark(this, "StubRoutines", "libmMinD");

  address start = __ pc();

  const XMMRegister a = xmm0;
  const XMMRegister b = xmm1;
  const XMMRegister atmp = xmm2;
  const XMMRegister btmp = xmm3;
  const XMMRegister tmp = xmm4;
  const XMMRegister dst = xmm0;

  __ enter(); // required for proper stackwalking of RuntimeStub frame

  int vector_len = Assembler::AVX_128bit;
  __ blendvpd(atmp, a, b, a, vector_len);
  __ blendvpd(btmp, b, a, a, vector_len);
  __ vminsd(tmp, atmp, btmp);
  __ cmppd(btmp, atmp, atmp, Assembler::_false, vector_len);
  __ blendvpd(dst, tmp, atmp, btmp, vector_len);

  __ leave(); // required for proper stackwalking of RuntimeStub frame
  __ ret(0);

  return start;
}

In 'templateInterpreterGenerator_x86_64.cpp':

address TemplateInterpreterGenerator::generate_minD_entry(AbstractInterpreter::MethodKind
kind) {
  address entry = __ pc();

  __ movdbl(xmm0, Address(rsp, wordSize));
  __ movdbl(xmm1, Address(rsp, 3 * wordSize));

  __ call(RuntimeAddress(CAST_FROM_FN_PTR(address, StubRoutines::minD())));

  __ pop(rax);
  __ mov(rsp, r13);
  __ jmp(rax);

  return entry;
}

And in 'c1_LIRGenerator_x86.cpp':

void LIRGenerator::do_MinDIntrinsic(Intrinsic* x) {
  LIRItem value(x->argument_at(0), this);
  value.set_destroys_register();

  LIR_Opr calc_result = rlock_result(x);
  LIR_Opr result_reg = result_register_for(x->type());

  CallingConvention* cc = NULL;

  LIRItem value1(x->argument_at(1), this);
  value1.set_destroys_register();

  BasicTypeList signature(2);
  signature.append(T_DOUBLE);
  signature.append(T_DOUBLE);
  cc = frame_map()->c_calling_convention(&signature);
  value.load_item_force(cc->at(0));
  value1.load_item_force(cc->at(1));

  __ call_runtime_leaf(StubRoutines::minD(), getThreadTemp(),
result_reg, cc->args());

  __ move(result_reg, calc_result);
}

If comments are encouraging, I'll probably post a RFR for something
like this soon along with GRAAL support for fp min/max unless someone
else is already working on it?

Thanks,
Bernard

On Wed, 4 Sep 2019 at 14:18, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
>
> Hi Doug,
>
> Thanks for sharing the link.
> As suggested, will open a follow-up issue for Graal support for these intrinsic and work over it.
>
> Regards,
> Jatin
>
> > -----Original Message-----
> > From: Doug Simon <doug.simon at oracle.com>
> > Sent: Tuesday, September 3, 2019 5:31 PM
> > To: Bhateja, Jatin <jatin.bhateja at intel.com>
> > Cc: hotspot-compiler-dev at openjdk.java.net
> > Subject: Re: 8226721: Missing intrinsics for Math.ceil, floor, rint
> >
> > Hi Jatin,
> >
> > It would be great to see these intrinsics applied to Graal as well, either in this
> > issue or a follow up issue.
> >
> > As an example of how to do this, you can look at
> > https://github.com/oracle/graal/pull/1171
> >
> > -Doug
> >
> > > On 3 Sep 2019, at 11:41, Bhateja, Jatin <jatin.bhateja at intel.com> wrote:
> > >
> > > Hi All,
> > >
> > > Please find a patch with the following changes:-
> > > 1) Intrincifiation for Math.ceil/floor/rint.
> > > 2) Auto-vectorizer handling.
> > >
> > > JBS: https://bugs.openjdk.java.net/browse/JDK-8226721
> > > Webrev: http://cr.openjdk.java.net/~jbhateja/8226721/webrev.05
> > >
> > > Kindly review it.
> > >
> > > Best Regards,
> > > Jatin
>