[RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?

Doerr, Martin martin.doerr at sap.com
Tue Apr 3 12:50:29 UTC 2018


Hi Gustavo,

thanks for posting your change.

I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?

Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?

It would be good to have the complete change in one webrev for easier reviewing.

Thanks and best regards,
Martin


-----Original Message-----
From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com] 
Sent: Montag, 2. April 2018 13:55
To: Volker Simonis <volker.simonis at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; vladimir.kozlov at oracle.com
Cc: ppc-aix-port-dev at openjdk.java.net
Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Importance: High

Hi Martin, Volker, Vladimir

Sorry for the huge delay replaying on this...

I hope Martin (and Lutz) are feeling better and fully recovered.

On 11/24/2017 08:04 PM, Volker Simonis wrote:
> Hi Gustavo,
> 
> in one of my talks [1,2] I have an example on how to intrinsify
> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But
> please notice that this is just a "toy" example - it is not production
> ready. In fact I think the right way would be to create a new
> SecureRandom provider where you may implement "engineNextBytes" by
> using  the new Power instruction (maybe by calling a native function).
> This special implementation of "engineNextBytes" could then be
> intrinsified as described in the talk.
> 
> Also, before you start this, please contact the security mailing list
> just to make sure you're not going into the wrong direction (I'm not a
> security expert :)
I've created a new JCA provider called 'P9TRNG' and implemented a darn
intrinsic for Interpreter, C1, and C2 compiler and did a couple of
tests using micro benches [1, 2] to check the latency and throughput
to get a random number using generateSeed() and nextBytes() with darn
in place.

The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's
necessary a software fallback in case darn instruction fails to return
a valid random number after ten attempts (although it's very rare
condition). On the other hand 'P9TRNG' uses the darn intrinsic when
it's available.

The maximum theoretical throughput on the machine I'm testing it (a
POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so
only one RNG per CPU. With a simple C code it's possible to get very
close to that value (please see C code [3] for code details and
log [4] for the expected outputs). Unrolling the tight loop does not
help and causes a performance degradation.

On Hotspot, for Interpreter and C1 the throughput is ~3x higher
than the version that does not use darn instruction (using micro
benches [1, 2]):

gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10
3.8759432E7 ns
2.113550 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG  1024 100000
2.65902244E10 ns
30.808313 Mbps

gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100
7.1741008E7 ns
11.418853 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000
2.74547937E10 ns
29.838140 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000
5.5632339E10 ns
14.725248 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000
2.78629519E10 ns
29.401051 Mbps

[With darn disabled: performance like NativePRNG]
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100
7.0272888E7 ns
11.657412 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000
2.75566244E10 ns
29.727880 Mbps
...

[With darn enabled]
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100
8305029.0 ns
98.639030 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000
6.442112E9 ns
127.163261 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000
1.57303337E10 ns
52.077728 Mbps
gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000
6.46914E9 ns
126.631973 Mbps


For C2 compiler using darn is better until it reaches ~128Mbps
(maximum theoretical throughput), but on the other hand it never
blocks, so, for instance, generateSeed() which normally uses
/dev/random (blocking) is not affected by a lack of entropy in Linux
entropy pool.

@Vladimir, Volker mentioned that you already experimented with rand on
Intel. Do you  know if creating a new JCA provider as I did is a
reasonable approach to exploit darn on POWER9? Also, in my
implementation I had to create a VM intrinsic (_darn) in vmSymbols
that is, let's say, arch dependent, and that seems to be the only case
so far, but on the other hand a new JCA provider (with methods to be
intrinsified) is necessary (I don't see another way to intrinsify the
methods in NativePRNG/SHA1PRNG providers since I need a software
fallback to darn). Do you have any recommendation about it?

The patchset rebased on top of
jdk11 (http://hg.openjdk.java.net/jdk/hs) is:

http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_to_exploit_HW_RNG_on_POWER9.patch
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_support_for_darn_Deliver_A_Random_Number_instruction.patch
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_template_to_exploit_darn_instruction.patch
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_new_node_to_exploit_darn_instruction.patch
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_intrinsic_to_exploit_darn_instruction.patch

I intend to contribute that change as an experimental feature.

Thank you.

Best regards,
Gustavo

[1] https://github.com/gromero/darn/blob/master/next_bytes.java
[2] https://github.com/gromero/darn/blob/master/generate_seed.java
[3] https://github.com/gromero/darn/blob/master/C/darn.c
[4] https://github.com/gromero/darn/blob/master/C/darn.log



More information about the ppc-aix-port-dev mailing list