<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div>I have been using the rule of thumb that I only use AVX-512 on systems where AVX-512 VBMI2 is available. I think it would be quite reasonable for Java to adopt the same convention. If someone has a criticism for this rule, I'd really like to hear it.<br></div><div><br></div><div>I understand that you are not flat out disabling 512... you are just setting the 'preferred' setting... I cannot see people getting *too* upset.<br></div><div><br></div><div><br></div><div><br></div><div><br></div><div>On Tue, Jun 25, 2024, at 22:51, John Rose wrote:<br></div><blockquote type="cite" id="qt" style=""><div>On 25 Jun 2024, at 19:18, Robert Muir wrote:<br></div><div><br></div><div>> On Tue, Jun 25, 2024 at 10:05 PM John Rose <<a href="mailto:john.r.rose@oracle.com">john.r.rose@oracle.com</a>> wrote:<br></div><div>>><br></div><div>>> On 25 Jun 2024, at 18:51, Robert Muir wrote:<br></div><div>>><br></div><div>>>> Java VM knows the cpu flags and cpu family, but won't give up the<br></div><div>>>> goods to users like us :)<br></div><div>>><br></div><div>>> I hear you on this, but is the VM really your only hope?<br></div><div>>><br></div><div>>> If you were programming in C wouldn’t you grab some C header<br></div><div>>> file and ask a function in there somewhere? And if that’s the<br></div><div>>> case surely there is a way to spin up a Panama FFM access to<br></div><div>>> the same API. Or is that already one of Uwe’s “too hacky”<br></div><div>>> solutions?<br></div><div>><br></div><div>> Or we could parse /proc/cpuinfo like some other java projects are<br></div><div>> doing for these issues, but then it only works on linux, and Uwe won't<br></div><div>> approve!<br></div><div>><br></div><div>> I'm just trying to communicate the struggle to get good performance<br></div><div>> and still try to be "portable", it is not easy.<br></div><div><br></div><div>It’s inherently hard given the vagaries of VPUs out there, even<br></div><div>from a single vendor. I’m encouraged, frankly amazed, that we have<br></div><div>as much portability as we do.<br></div><div><br></div><div>> The vector API could theoretically solve it for us too, maybe by<br></div><div>> making SPECIES_PREFERRED more fine-grained, rather than just set to<br></div><div>> "512" for everything. On such machines, for some vector operations,<br></div><div>> "512" is really not the "preferred" size because it causes bad<br></div><div>> performance side effects.<br></div><div><br></div><div>I think we should consider setting the SPECIES_PREFERRED to 256 bits on<br></div><div>machines that support 512 but are known (somehow) to down-clock when 512<br></div><div>is used. That might disappoint a different but smaller set of users.<br></div><div>Any advice on this?<br></div><div><br></div></blockquote><div><br></div></body></html>