SuperWord optimization
John Rose
John.Rose at Sun.COM
Mon Jan 5 12:57:31 PST 2009
I'd love to see that webrev. (You've got some great back-burner
stuff, Tom!)
An alternative to adding pointer alignment logic (in the compiler and
also the hand-tuned assembly code) is allocating all arrays above
some fixed length (e.g., 10 words) with an additional alignment
constraint.
This has the advantage of making any pair of large-enough arrays
mutually aligned, if their access indexes are aligned. The compiler
and assembly stubs already have special paths for very short arrays;
these cut-outs could be adapted to take into account the strong
alignment size. Or we could just unswitch the the whole loop (or
predicate it with an uncommon trap, given the right length profile
data).
-- John
P.S. In general, if there's an optimization that makes sense mainly
for large arrays, the JVM has the option of allocating large arrays
with special tactics (alignment, chunking, multianewarray layout,
cache line coloring, CPU affinity for work-stealing, etc., etc.).
This option is not yet exercised, except in the simple case of slow-
pathing truly huge arrays, bigger than FastAllocateSizeLimit. This
class of optimizations gets more valuable as CPU-to-memory distances
increase, but it is still waiting for the right PhD student to come
along.
On Jan 5, 2009, at 11:39 AM, Tom Rodriguez wrote:
> I have a workspace that combines Ross's initial work supporting
> arbitrary ALU operations for vectorization with some work I did for
> adding a new RegQ type for 128-bit vector operations. It was
> limping along when I stopped looking at it but I could post a
> webrev of it if there is interest. It's not something I'm working
> on but I'd wanted to bring the parts forward so they didn't get
> completely lost. All the code generation pieces were working and
> the main remaining piece is fixing the loop alignment code to
> support aligning to 128 bit boundaries. The heap is only aligned
> to 64-bits so the alignment code that superword uses needs to
> switch to a pointer alignment calculation instead of using index
> alignment.
>
> Once we add 128 bit vectors we'll need some policy work to choose
> vector sizes based on the code we see. Currently the code assumes
> there is only one vector register size.
More information about the hotspot-dev
mailing list