Intel AMX and feature detection

Sat Jun 22 04:49:59 UTC 2024

Hi Paul.
Thank you very much. That can be useful.

On Fri, 21 Jun 2024, 23:53 Paul Sandoz, <paul.sandoz at oracle.com> wrote:

>
>
> > On Jun 20, 2024, at 11:17 PM, Andrii Lomakin <andrii0lomakin at gmail.com>
> wrote:
> >
> > Hi Paul.
> >
> > Thank you for all your help.
> >
> > I will raise this topic again when float16 support is landed. You will
> > probably have new ideas about the details of the implementation till
> > this time.
> > I remember we already discussed the possibility of implementing
> > special vector shapes in another thread.
> >
> > I am afraid that fine-grained foreign memory calls will kill all
> > performance benefits.
>
> Yes, it might in its current form, even when linking to “critical”
> functions. I mostly mention it as a way to more quickly experiment with AMX
> and Java.
>
> Out first experiments leveraging vector hardware instructions in Java used
> a technique we called code snippets where we could bind a method handle to
> some x86 code via a calling convention (see this presentation [1] from
> slide 37). It was effective for experimenting, and did not have the
> VM/native transition costs currently associated with Panama. Something of
> that nature might allow users to link to more specialized instructions,
> similarly as if it was a compiler intrinsic (a downside is the code snippet
> is opaque to C2).
>
> Paul.
>
> [1]
> https://cr.openjdk.org/~psandoz/conferences/2015-JavaOne/j1-2015-unsafe-CON7076.pdf
>
> >
> > On Thu, Jun 20, 2024 at 8:55 PM Paul Sandoz <paul.sandoz at oracle.com>
> wrote:
> >>
> >> Hi Andrii,
> >>
> >> We have thought about AMX a little bit, but nothing concrete has
> emerged so far. It may be we can lean on special vector shapes (e.g. viewed
> linearly with a max size of 1024Kb), where vectors of such shapes would
> correspond to tile registers that can be used with a limited set of
> operators supported in the hardware e.g., DOT.  I believe the element types
> supported are int8 and float16, and the Vector API would need to be
> extended for that (we are investigating float16). One challenge might be to
> manage the register file, which I believe is programmable, and it may
> require that some sort of scoped execution to configure/release.
> >>
> >> As an interim experiment it may be possible to leverage Panama and
> native methods using the AMX intrinsics.
> >>
> >> No current plans to support a feature detection API. On architectures
> that don’t support explicit mask registers and mask register accepting
> instructions we emulate using vector registers and blend instructions, as
> you indicate.
> >>
> >> Paul.
> >>
> >>> On Jun 16, 2024, at 9:26 PM, Andrii Lomakin <andrii0lomakin at gmail.com>
> wrote:
> >>>
> >>> Hi guys.
> >>>
> >>> I have three  questions:
> >>>
> >>> 1.   Do you plan to add support for Intel AMX instructions? According
> >>> to Intel reports, it can add 2-3 times speedup in deep learning model
> >>> inference.
> >>> 2. The next question follows from the first one. Even now, masks are
> >>> not supported in every architecture, but AFAIK, there is no way to
> >>> detect whether they are supported at runtime. Do you plan to provide a
> >>> so-called "feature detection" API?
> >>> 3. And the last question: even on older sets of commands, there are
> >>> some that use register values as masks, blending, for example. Will
> >>> those instructions be supported on architectures that do not support
> >>> masking registers per se?
> >>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.openjdk.org/pipermail/panama-dev/attachments/20240622/bdf3c384/attachment-0001.htm>