[RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Gustavo Romero
gromero at linux.vnet.ibm.com
Mon Apr 16 02:44:58 UTC 2018
Hi Martin,
Thank you very much for your comments.
On 04/03/2018 09:50 AM, Doerr, Martin wrote:
> I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it
was P9TRNG). I changed the names for the helpers to something more "neutral" and
removed the snake_case from their names. Yes, it may get used for other
platforms and can get used readily by other platforms by providing an intrinsic
for the randomLong() method.
I provided an unique value for serialVersionUID in the new JCA provider by using
an increment of a preexisting one. Do you know if there is any other (formal)
way to determine that value?
> Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of
criticism in that sense. I've talked to NX RNG designed and we've tried to find
out a documentation about it on OpenPOWER foundation but it's not available yet.
In any case, just like the use of 'rdrand' on OpenSSL that is disabled by
default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is
optional and deliberated? Currently the user needs to (a) explicitly use the new
provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using
"-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do
you think it would be acceptable?
> It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/
Thanks a lot.
Best regards,
Gustavo
> Thanks and best regards,
> Martin
>
>
> -----Original Message-----
> From: Gustavo Romero [mailto:gromero at linux.vnet.ibm.com]
> Sent: Montag, 2. April 2018 13:55
> To: Volker Simonis <volker.simonis at gmail.com>; Doerr, Martin <martin.doerr at sap.com>; vladimir.kozlov at oracle.com
> Cc: ppc-aix-port-dev at openjdk.java.net
> Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
> Importance: High
>
> Hi Martin, Volker, Vladimir
>
> Sorry for the huge delay replaying on this...
>
> I hope Martin (and Lutz) are feeling better and fully recovered.
>
> On 11/24/2017 08:04 PM, Volker Simonis wrote:
>> Hi Gustavo,
>>
>> in one of my talks [1,2] I have an example on how to intrinsify
>> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But
>> please notice that this is just a "toy" example - it is not production
>> ready. In fact I think the right way would be to create a new
>> SecureRandom provider where you may implement "engineNextBytes" by
>> using the new Power instruction (maybe by calling a native function).
>> This special implementation of "engineNextBytes" could then be
>> intrinsified as described in the talk.
>>
>> Also, before you start this, please contact the security mailing list
>> just to make sure you're not going into the wrong direction (I'm not a
>> security expert :)
> I've created a new JCA provider called 'P9TRNG' and implemented a darn
> intrinsic for Interpreter, C1, and C2 compiler and did a couple of
> tests using micro benches [1, 2] to check the latency and throughput
> to get a random number using generateSeed() and nextBytes() with darn
> in place.
>
> The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's
> necessary a software fallback in case darn instruction fails to return
> a valid random number after ten attempts (although it's very rare
> condition). On the other hand 'P9TRNG' uses the darn intrinsic when
> it's available.
>
> The maximum theoretical throughput on the machine I'm testing it (a
> POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so
> only one RNG per CPU. With a simple C code it's possible to get very
> close to that value (please see C code [3] for code details and
> log [4] for the expected outputs). Unrolling the tight loop does not
> help and causes a performance degradation.
>
> On Hotspot, for Interpreter and C1 the throughput is ~3x higher
> than the version that does not use darn instruction (using micro
> benches [1, 2]):
>
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10
> 3.8759432E7 ns
> 2.113550 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000
> 2.65902244E10 ns
> 30.808313 Mbps
>
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100
> 7.1741008E7 ns
> 11.418853 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000
> 2.74547937E10 ns
> 29.838140 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000
> 5.5632339E10 ns
> 14.725248 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000
> 2.78629519E10 ns
> 29.401051 Mbps
>
> [With darn disabled: performance like NativePRNG]
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100
> 7.0272888E7 ns
> 11.657412 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000
> 2.75566244E10 ns
> 29.727880 Mbps
> ...
>
> [With darn enabled]
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100
> 8305029.0 ns
> 98.639030 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000
> 6.442112E9 ns
> 127.163261 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000
> 1.57303337E10 ns
> 52.077728 Mbps
> gromero at gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000
> 6.46914E9 ns
> 126.631973 Mbps
>
>
> For C2 compiler using darn is better until it reaches ~128Mbps
> (maximum theoretical throughput), but on the other hand it never
> blocks, so, for instance, generateSeed() which normally uses
> /dev/random (blocking) is not affected by a lack of entropy in Linux
> entropy pool.
>
> @Vladimir, Volker mentioned that you already experimented with rand on
> Intel. Do you know if creating a new JCA provider as I did is a
> reasonable approach to exploit darn on POWER9? Also, in my
> implementation I had to create a VM intrinsic (_darn) in vmSymbols
> that is, let's say, arch dependent, and that seems to be the only case
> so far, but on the other hand a new JCA provider (with methods to be
> intrinsified) is necessary (I don't see another way to intrinsify the
> methods in NativePRNG/SHA1PRNG providers since I need a software
> fallback to darn). Do you have any recommendation about it?
>
> The patchset rebased on top of
> jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
>
> http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_to_exploit_HW_RNG_on_POWER9.patch
> http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_support_for_darn_Deliver_A_Random_Number_instruction.patch
> http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_template_to_exploit_darn_instruction.patch
> http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_new_node_to_exploit_darn_instruction.patch
> http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_intrinsic_to_exploit_darn_instruction.patch
>
> I intend to contribute that change as an experimental feature.
>
> Thank you.
>
> Best regards,
> Gustavo
>
> [1] https://github.com/gromero/darn/blob/master/next_bytes.java
> [2] https://github.com/gromero/darn/blob/master/generate_seed.java
> [3] https://github.com/gromero/darn/blob/master/C/darn.c
> [4] https://github.com/gromero/darn/blob/master/C/darn.log
>
More information about the ppc-aix-port-dev
mailing list