Vectorized BitSet operations

Vitaly Davidovich vitalyd at gmail.com
Fri Nov 14 19:39:34 UTC 2014


I asked about vector math support on compiler dev a while ago, and was told
it's on the radar (there's a linked jira in what Kris posted) but seems
like it's not being worked on.  Right now, I believe only some copy
operations are vectorized, but not arithmetic or logical stuff.  Would be
cool to have autovectorization for those things.

Sent from my phone
On Nov 14, 2014 2:34 PM, "Krystal Mok" <rednaxelafx at gmail.com> wrote:

> Hi Jerven,
>
> (I thought I was reading email from hotspot-compiler-dev and then realized
> it's graal-dev...but here I go anyway)
>
> If you're interested in taking HotSpot C2's auto vectorization for a
> reference, here's the RFE that implemented the vectorization feature:
> JDK-6340864: Implement vectorization optimizations in hotspot-server
> [1][2]. It's mostly still based on the superword algorithm described in
> [3].
>
> I don't think Graal implements that algorithm now. It certainly would be
> useful to have it.
>
> Graal does use vector (SIMD) instructions for some intrinsics, e.g.
> ArrayEqualsOp and UnsafeArrayCopySnippet. If you want to improve
> performance for specific methods, this is the easier way to go, with good
> existing examples.
>
> - Kris
>
> [1]: https://bugs.openjdk.java.net/browse/JDK-6340864
> [2]: http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/006050192a5a
> [3]: Exploiting superword level parallelism with multimedia instruction
> sets, http://dl.acm.org/citation.cfm?id=358438.349320
>
> On Fri, Nov 14, 2014 at 8:38 AM, Jerven Bolleman <me at jerven.eu> wrote:
>
> > Hi All,
> >
> > My apologies in advance if I am a rather clueless.
> >
> > I am interested in seeing the use of SSE/AVX instructions use for some
> > methods in java.util.Bitset. e.g. the "or" and "and" methods.
> >
> > For example I believe that this loop
> > <
> >
> http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/BitSet.java#681
> > >
> > can be vectorized.
> >
> > My first thought was that an intrinsic/MethodSubstitution would be a good
> > start.
> > However, it would be even nicer if this type of code can be auto
> > vectorized.
> > I was wondering if anyone has worked on this or if this is already the
> > case?
> >
> > Because working auto vectorization could accelerate this loop
> > <
> >
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.lucene/lucene-core/3.0.1/org/apache/lucene/util/OpenBitSet.java#661
> > >
> > used in Lucene
> > a lot. While the intrinsic has a smaller use case.
> >
> > In my specific case a bitset of 20mb or larger is not uncommon. So in
> these
> > cases doing more work per tick is interesting.
> >
> > If I wanted to have a go at implementing the intrinsic are there any
> > pitfalls I need to be aware of?
> >
> > Regards,
> > Jerven
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> > Jerven Bolleman
> > me at jerven.eu
> >
>


More information about the graal-dev mailing list