Some New Vector API Code + Code Snippets

Mon Jun 6 21:00:01 UTC 2016

All,

We've been working more on Vector API concrete classes and structuring these classes to better support specialization (ala Stream) and enable us to loads and stores in arrays of primitive types.  Additionally, we've been working on better structuring our Code Snippets and adding support for modRM and SIB encoding that supports memory accesses.  Vladimir Ivanov has been incredibly helpful getting us to where we are with this now!

Where before we structured our Vector inheritance as Vector > Concrete Instance (Element x Shape/Size), we now go Vector > Elemental Superclass > Sized Concrete Class.  For example, for our 256-bit float implementation, the structure is Vector > FloatVector > Float256Vector.  Methods supporting read/writes into float[] arrays make an appearance in FloatVector as abstract methods, and are implemented fully in Float256Vector.  Right now the elemental classes are pretty slim.  Most of the methods still reside in the Vector class.  In the case of streams, it seems that the superclass is the lightweight one, and the intermediate specialized classes are more heavy weight (BaseStream vs Stream or IntStream etc.).   It might make sense to draw down the Vector superclass and bulk out the specialized classes here so we can elide most or all of our boxed primitives from the design.  I think there is definitely some design and structuring work to be done with respect to structuring these classes.  Would love to get some opinions on that.

I've included some tests that drive the classes as they are in the webrev.  Vladimir suggested that we use the double register approach (Object + long offset) for invoking vmovups/vmovdqu instructions.  The tests work for small examples, but there seems to be persistent VM-crashing bugs that happen when they get inlined by C2.  I haven't been able to identify what causes them yet, but I do know that if you disable C2, the crashing doesn't occur.  The examples include commented out loops that you can uncomment to replicate the behavior.

One thing we've been particularly interested in is the memory overhead of this API.  In a discussion with John, Paul, and Mikael we had a month or so ago, one idea that came up was that we could lean pretty heavily on escape analysis to head off the object allocation issue.  I've been able to do some quick sampling of the heap using jmap to get some histograms, and it seems like the Vector API makes little if any appearance on-heap.  That is to say that escape analysis seems to be working on that part of the stack.  The escape analysis pass seems to have trouble reasoning about the Long2/4/8 data types when they appear in loops, however.  In other cases it does work.  We've noticed a significant memory usage when putting Vector operations in tight loops like one might find in a libblas sgemm implementation.  A cursory analysis would seem that a lot of Long2/4/8 objects are making it to the heap that don't need to be going there.  All things considered, though, escape analysis is working very well!

The HOF components of the Vector API are still implemented in scalar form in this code.  Right now I'm focused on getting the concrete methods working well before putting more cycles into the higher order bits.

Let me know your thoughts.

The webrev is located here:  http://cr.openjdk.java.net/~vdeshpande/Panama_Collaboration/webrev.01/

Thanks,
Ian