<div dir="ltr">I am looking forward to intrinsic support for 128 bit math using ?Long2? and XMM (or even YMM, ZMM) instructions. <div>This is the best way forward, I hope.</div><div><br></div><div>Personally I would like to see a long long type, or even uint128, uint256, uint512 style notation. </div><div><br></div><div>Another option might be something like long<128> or an annotation like @uint128 long or even @decimal128 double but who knows.</div><div><br></div><div>Regards, Peter.</div></div><div hspace="streak-pt-mark" style="max-height:1px"><img alt="" style="width:0px;max-height:0px;overflow:hidden" src="https://mailfoogae.appspot.com/t?sender=acGV0ZXIubGF3cmV5QGdtYWlsLmNvbQ%3D%3D&type=zerocontent&guid=43864d45-2734-42b0-a9bd-39fae08a6bcf"><font color="#ffffff" size="1">ᐧ</font></div><div class="gmail_extra"><br><div class="gmail_quote">On 25 September 2017 at 18:48, Andrew Haley <span dir="ltr"><<a href="mailto:aph@redhat.com" target="_blank">aph@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 25/09/17 18:21, Adam Petcher wrote:<br>
> I agree that an unsigned multiplyHigh would be useful for crypto<br>
> purposes, and we should consider adding it. Of course, I would much<br>
> rather have multiply operations that return both 64-bit parts of the<br>
> result, but that is going to be hard to do well without value types. So<br>
> it would be nice to have something like this in the meantime.<br>
<br>
</span>I take your point, but it won't be excruciatingly difficult for the C2<br>
compiler to turn the multiply operations into a single one, if the CPU<br>
can do that. From what I've seen recently, though, on non-x86 it's<br>
common for the two halves of the result to be calculated by separate<br>
instructions.<br>
<span class=""><br>
> If we are going to add this operation, it should probably be added<br>
> along with an intrinsic. I think the Java code can simply factor out<br>
> the else branch from the existing multiplyHigh code. This way,<br>
> unsignedMultiplyHigh will be at least as fast as multiplyHigh,<br>
> whether the intrinsic implementation is available or not.<br>
<br>
</span>Sure. I can do that.<br>
<span class=""><br>
> If possible, the implementation of this operation should not branch on<br>
> either operand. This would make it more widely useful for constant-time<br>
> crypto implementations. Though this property would need to go into the<br>
> spec in order for constant-time crypto code to use this method, and I<br>
> don't know how reasonable it is to put something like this in the spec.<br>
<br>
</span>OK. I can do it so that there are no branches in the Java. The Java<br>
code for signed multiplyHigh has some data-dependent branches in an<br>
attempt to speed it up, though. I don't know how effective they are,<br>
and I could have a look at taking them out.<br>
<span class=""><br>
> Side note: at the moment, I am using signed arithmetic in prototypes for<br>
> Poly1305, X25519, and EdDSA, partially due to lack of support for<br>
> unsigned operations like this one. I don't think having<br>
> unsignedMultiplyHigh would, on its own, convince me to use an unsigned<br>
> representation, but the forces are different for each<br>
> algorithm/implementation.<br>
<br>
</span>Sure. I don't think it really matters from a performance point of<br>
view which you use, given intrinsics for both.<br>
<div class="HOEnZb"><div class="h5"><br>
--<br>
Andrew Haley<br>
Java Platform Lead Engineer<br>
Red Hat UK Ltd. <<a href="https://www.redhat.com" rel="noreferrer" target="_blank">https://www.redhat.com</a>><br>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671<br>
</div></div></blockquote></div><br></div>