<div dir="ltr">I am looking forward to intrinsic support for 128 bit math using ?Long2? and XMM (or even YMM, ZMM) instructions.  <div>This is the best way forward, I hope.</div><div><br></div><div>Personally I would like to see a long long type, or even uint128, uint256, uint512 style notation. </div><div><br></div><div>Another option might be something like long<128> or an annotation like @uint128 long or even @decimal128 double but who knows.</div><div><br></div><div>Regards, Peter.</div></div><div hspace="streak-pt-mark" style="max-height:1px"><img alt="" style="width:0px;max-height:0px;overflow:hidden" src="https://mailfoogae.appspot.com/t?sender=acGV0ZXIubGF3cmV5QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=43864d45-2734-42b0-a9bd-39fae08a6bcf"><font color="#ffffff" size="1">ᐧ</font></div><div class="gmail_extra"><br><div class="gmail_quote">On 25 September 2017 at 18:48, Andrew Haley <span dir="ltr"><<a href="mailto:aph@redhat.com" target="_blank">aph@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 25/09/17 18:21, Adam Petcher wrote:<br>

> I agree that an unsigned multiplyHigh would be useful for crypto<br>

> purposes, and we should consider adding it. Of course, I would much<br>

> rather have multiply operations that return both 64-bit parts of the<br>

> result, but that is going to be hard to do well without value types. So<br>

> it would be nice to have something like this in the meantime.<br>

<br>

</span>I take your point, but it won't be excruciatingly difficult for the C2<br>

compiler to turn the multiply operations into a single one, if the CPU<br>

can do that.  From what I've seen recently, though, on non-x86 it's<br>

common for the two halves of the result to be calculated by separate<br>

instructions.<br>

<span class=""><br>

> If we are going to add this operation, it should probably be added<br>

> along with an intrinsic. I think the Java code can simply factor out<br>

> the else branch from the existing multiplyHigh code. This way,<br>

> unsignedMultiplyHigh will be at least as fast as multiplyHigh,<br>

> whether the intrinsic implementation is available or not.<br>

<br>

</span>Sure.  I can do that.<br>

<span class=""><br>

> If possible, the implementation of this operation should not branch on<br>

> either operand. This would make it more widely useful for constant-time<br>

> crypto implementations. Though this property would need to go into the<br>

> spec in order for constant-time crypto code to use this method, and I<br>

> don't know how reasonable it is to put something like this in the spec.<br>

<br>

</span>OK.  I can do it so that there are no branches in the Java.  The Java<br>

code for signed multiplyHigh has some data-dependent branches in an<br>

attempt to speed it up, though.  I don't know how effective they are,<br>

and I could have a look at taking them out.<br>

<span class=""><br>

> Side note: at the moment, I am using signed arithmetic in prototypes for<br>

> Poly1305, X25519, and EdDSA, partially due to lack of support for<br>

> unsigned operations like this one. I don't think having<br>

> unsignedMultiplyHigh would, on its own, convince me to use an unsigned<br>

> representation, but the forces are different for each<br>

> algorithm/implementation.<br>

<br>

</span>Sure.  I don't think it really matters from a performance point of<br>

view which you use, given intrinsics for both.<br>

<div class="HOEnZb"><div class="h5"><br>

--<br>

Andrew Haley<br>

Java Platform Lead Engineer<br>

Red Hat UK Ltd. <<a href="https://www.redhat.com" rel="noreferrer" target="_blank">https://www.redhat.com</a>><br>

EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671<br>

</div></div></blockquote></div><br></div>