Vector API and dot product
Chris Hegarty
chegar999 at gmail.com
Mon Dec 12 16:27:44 UTC 2022
Hi Adam,
Thanks for your reply Adam. Comments inline...
On 12/12/2022 15:43, Adam Pocock wrote:
> I agree that this kind of primitive would be useful,
Great to hear that you would also find it useful.
> but I’m unsure how it would be specified especially if it’s in java.lang.Math.
I already regret making that concrete suggestion. ;-) Let's not get hung
up on the possible location. I'm sure we can find somewhere more
appropriate, if needed.
Your point regarding accumulation ordering and slight variations in
results is spot on.
> There are a few different ways we could implement such a dot product (e.g. is fma the building block or mul + add as separate operations, and the order of accumulation/reduction), and I’ve already been bitten by methods in java.lang.Math producing slightly different outputs on different platforms resulting in different ML computations. I worry that specifying the dot product to the same level as java.lang.Math would prevent future optimizations, but also if it’s not specified that much then I’ll get different answers on different JVMs which also seems problematic (and those answers would be more different than any of the current methods in java.lang.Math, as differences would accumulate as a function of array length).
My intuition is that the spec should not constrain potential
optimizations resulting from accumulation order. And this should be
explicit and clear. I'm sure we can find appropriate specification
wording for this, and that it would not affect the usefulness of the method.
> For the ANN use case as you’ve already given up on the exact nature of the solution I guess it doesn’t matter so much, but it may be unexpected if a platform with a 256 bit vector width returned slightly different search results to one with a 512 bit vector width, or if a Vector API equipped version produced a different answer to a version without that.
Yes, for ANN the exact result is less important.
Again, my intuition here is that this should be explicit in the API, and
should follow precedent already set in the platform, e.g.
DoubleStream::sum [1], and DoubleAdder::sum [2].
-Chris.
> Thanks,
>
> Adam
[1]
https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/stream/DoubleStream.html#sum()
[2]
https://docs.oracle.com/en/java/javase/19/docs/api/java.base/java/util/concurrent/atomic/DoubleAdder.html#sum()
> --
> Adam Pocock
> Principal Member of Technical Staff
> Machine Learning Research Group
> Oracle Labs, Burlington, MA
>
>> On 12 Dec 2022, at 09:33, Chris Hegarty <chegar999 at gmail.com> wrote:
>>
>> Hi,
>>
>> I'm sure that this must have come up before, but I cannot find it so I'm asking here.
>>
>> Over in Elasticsearch and Lucene we've been experimenting a little with the incubating Vector API, for use-cases relating to Approximate Nearest Neighbour vector search. We see approx 4x performance improvement in some experiments - which is amazing. Thank you. By far, our most significant use-case is dot product of two k dimensional vectors - float/double arrays.
>>
>> The Vector API is low-level and incredibly flexible, and will be a great addition to the Platform when it eventually moves out of incubation. But, since dot product is an extremely common operation it could be helpful to provide a "convenience" for it out-of-the-box, say Math.dot(float[], float[]) [*], etc.
>>
>> Ultimately, I believe that the Vector API is the right approach to supporting the rich set of linear algebra and machine learning algorithms, but there could be a sweet-spot for supporting a minimal set of extremely common functions too. I am explicitly not proposing that the JDK should become a linear algebra library! ;-)
>>
>> Additionally, `Math::dot` could be added to the platform now, no need to wait, or preview or incubate. The implementation could use already existing internal primitives (without an dependency on Vector API), and eventually become a "simple convenience", when the Vector API is finalized.
>>
>> I believe that this approach remains largely inline with the goals of the project (as I understand them), but just reframes the potential to deliver small improvements in this area incrementally.
>>
>> Yes, but whose goina do the work? If the idea is positively received, then I am happy to drive this non-trivial contribution.
>>
>> -Chris.
>>
>> [*] or some other location where it can be easily found and imported
>
More information about the panama-dev
mailing list