better random numbers

Sun Sep 5 22:23:19 UTC 2021

On Sep 5, 2021, at 7:44 AM, Andrew Haley <aph-open at littlepinkcloud.com> wrote:
> 
> On 9/3/21 12:35 AM, John Rose wrote:
> 
>> The reference I’d like to give here is to Dr. Melissa O’Neill’s
>> website and articles:
> 
> I'm quite sceptical. Anyone who says a (non-cryptographic) random-
> number generator is "hard to predict" is either quite naive or in a
> state of sin, (;-) and while O’Neill’s argument seems sound, it
> doesn't seem to have convinced the academic world.

She meets one of these objections on her blog. But IMO there’s not much that can be said here, since “hard” is hard to define. It’s a practical question to be addressed (tentatively) by practical arguments. 

> 
> Lemire is thoughtful:
> https://lemire.me/blog/2017/08/15/on-melissa-oneills-pcg-random-number-generator/

That’s a good link thanks. 
> 
> I wonder about AES, which can do (on Apple M1) 2 parallel rounds per
> clock cycle. I'm quite tempted to try a reduced- round AES on the
> TestU01 statistical tests. Maybe 6 rounds?

I think I’ve mentioned using aes as a mix step at one or two JVMLS meetings. I think off label  low-round use of HW accelerated crypto steps is a good use, to build sub-cryptographic but “hard” (that word!) prngs and hashes. 

I expect a 128-bit multiply has a comparable amount of cascading with two aes rounds; probably the aes does a better job since it has no weak lanes. At least it looks that way on paper. So I’d start with 2 rounds and try various ways to replace a big multiply with the aes.

To increase throughput use vectors or generate more than one random sample per crank turn. But back to back aes steps are probably always twice the latency of a single wide multiply. So I think there might be some more room for cleverly using single aes rounds, say two in parallel with 1-cycle input transforms. Put simply, engineers and academics approach novelty differently. 

The data driven ror/lsl is less valuable as an output transform since it doesn’t need to cover a structural problem with the main advance function.  Xoro and pcg are drastically linear so they need a nonlinear mix function somewhere. 

That said, aes probably has its own odd quirks and regularities, at very low round numbers. one reason I like LCG methods is they use very well understood basic components. Which means if they have a flaw (in the low bits) it’s easier to account for and doesn’t require as much original research. So I prefer LCG over XOR networks and AES because it’s been examined more carefully.

Perhaps the academic publishing world has an opposite value function: if you are using old research in a new way to solve problems they don’t care as much as if there is a new technique. After all, finding the flaws in something new will fill new journal slots and get more traffic. 

> However, there can be a
> long latency between FPU and integer CPU, so perhaps it's not such a
> great idea.

That’s why I’d consider batching. 

> Also, you have to load the key registers before you can
> generate a random number, so it only really works if you want to
> generate a lot of bits at a time.

Yes, that. 

> But it is maybe 128 randomish bits
> per a few clock cycles.

That’s the attraction. But you can do similar things with vectorized 64-bit operations. As long as you can clean up the weak low bits you have good mixing of state. 

Thanks for the comments. 
> 
> -- 
> Andrew Haley  (he/him)
> Java Platform Lead Engineer
> Red Hat UK Ltd. <https://www.redhat.com>
> https://keybase.io/andrewhaley
> EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
>