POWER9: Is there a way to improve the random number generation on PPC64?
Hi, POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time. Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT. Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI? Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case? Any comments shedding some light on that is much appreciated. Thanks and best regards, Gustavo [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
Hi Gustavo, in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk. Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) Regards, Volker [1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/ On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
Hi Volker, On 24-11-2017 20:04, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Thanks for the references :-)
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Sure. I just want to do a few experiments before to get at least an initial working "toy" example for 'darn'. The references you pointed out will help a lot. Thanks! Regards, Gustavo
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
Hi Volker, On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]: When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate. and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here: https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak... but for fputs() from libc. Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic? Thanks! Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all). The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
Thanks!
Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
Hi Volker, On 29-11-2017 11:21, Volker Simonis wrote:
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I see. Thanks for advising.
Hi Volker, On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check: Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting. SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class: gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn(); I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed: https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak... On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get: Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting. Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary? Thanks a lot! Regards, Gustavo The hs patches: [0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr... and the test-case: [3] https://github.com/gromero/darn/blob/master/patches/darning.java
Thanks!
Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
The solution is quite simple (took me a while though as well to see it :) - you have to define the intrinsic with the correct access flags. I.e. do_intrinsic(_darn, securerandom, darn_name, serializePropertiesToByteArray_signature, F_R) instead of using "F_S" as you did in your first attempt. The access flags are defined at the bottom of vmSymbols.hpp: F_R, // !static ?native !synchronized (R="regular") F_S, // static ?native !synchronized If they don't correspond to the actual function, it won't be matched. If you'll do this right you'll get the following error: Compiler intrinsic is defined for method [java.security.SecureRandom.darn()[B], but the method is not annotated with @HotSpotIntrinsicCandidate. Exiting. which can be pacified by using "-XX:-CheckIntrinisics" or better by using the @HotSpotIntrinsicCandidate annotation on the SEcureRandom.darn() Notice that in the real implementation you won't be able to add a public method to SecureRandom. Regards, Volker On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
Thanks!
Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
Hi Volker, On 04-12-2017 12:00, Volker Simonis wrote:
The solution is quite simple (took me a while though as well to see it :) - you have to define the intrinsic with the correct access flags. I.e.
Pretty cool! Thanks a lot for the help and the quick answer :-)
do_intrinsic(_darn, securerandom, darn_name, serializePropertiesToByteArray_signature, F_R)
instead of using "F_S" as you did in your first attempt. The access flags are defined at the bottom of vmSymbols.hpp:
F_R, // !static ?native !synchronized (R="regular") F_S, // static ?native !synchronized
If they don't correspond to the actual function, it won't be matched.
If you'll do this right you'll get the following error:
Compiler intrinsic is defined for method [java.security.SecureRandom.darn()[B], but the method is not annotated with @HotSpotIntrinsicCandidate. Exiting.
which can be pacified by using "-XX:-CheckIntrinisics" or better by using the @HotSpotIntrinsicCandidate annotation on the SEcureRandom.darn()
Right. I also had to tweak the darn() intrinsic to return a 'long' instead of a byte array just for quickly avoid dealing with any allocation inside it. The prototype worked fine on interpreted mode. As expected it generated an illegal instruction on P8 but returned 50 random numbers on P9 without getting blocked :-) So test [0] that uses SecureRandom.generateSeed() blocks from time to time (both on BM and VM it blocks always on a second run), but test [1] which uses SecureRandom.darn() never blocks: http://cr.openjdk.java.net/~gromero/logs/darn_prototype.log Here is the final change for reference (hs + corelib): https://github.com/gromero/darn/blob/a8ea0802018b0c4de946b548d4ede4bfc5fb645... I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance? Regards, Gustavo [0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
Thanks!
Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi,
POWER9 processors introduced a new single instruction to generate a random number called 'darn' (Deliver A Random Number) [1, 2]. The random number generator behind this instruction is NIST SP800-90B and SP800-90C compliant and provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit operand to select the random number format. One can call 'darn' many times to obtain a new random number each time.
Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() method & friends from JCE (NativePRNG provider). If that holds, so it has to be done both for Interpreter and JIT.
Currently generateSeed() from NativePRNG basically reads from /dev/random by default (which blocks from time to time) or /dev/urandom if instructed to do so. Could somebody please help me to figure out the appropriate place to exploit such a P9 instruction for interpreted mode, given that code for generateSeed() is pure Java and behind scenes just opens /dev/random file and reads from it? For instance, is it correct to exploit it on a C/C++ code and attach that by means of a JNI?
Finally, for JITed mode, I think that a way to exploit such a feature would be by matching an specific sub-tree in Ideal Graph and from that emit a `darn` instruction, however I could not figure one sound sub-tree with known nodes (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters usually proceed in this case?
Any comments shedding some light on that is much appreciated.
Thanks and best regards, Gustavo
[1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
On Tue, Dec 5, 2017 at 12:24 AM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 04-12-2017 12:00, Volker Simonis wrote:
The solution is quite simple (took me a while though as well to see it :) - you have to define the intrinsic with the correct access flags. I.e.
Pretty cool!
Thanks a lot for the help and the quick answer :-)
do_intrinsic(_darn, securerandom, darn_name, serializePropertiesToByteArray_signature, F_R)
instead of using "F_S" as you did in your first attempt. The access flags are defined at the bottom of vmSymbols.hpp:
F_R, // !static ?native !synchronized (R="regular") F_S, // static ?native !synchronized
If they don't correspond to the actual function, it won't be matched.
If you'll do this right you'll get the following error:
Compiler intrinsic is defined for method [java.security.SecureRandom.darn()[B], but the method is not annotated with @HotSpotIntrinsicCandidate. Exiting.
which can be pacified by using "-XX:-CheckIntrinisics" or better by using the @HotSpotIntrinsicCandidate annotation on the SEcureRandom.darn()
Right. I also had to tweak the darn() intrinsic to return a 'long' instead of a byte array just for quickly avoid dealing with any allocation inside it. The prototype worked fine on interpreted mode. As expected it generated an illegal instruction on P8 but returned 50 random numbers on P9 without getting blocked :-) So test [0] that uses SecureRandom.generateSeed() blocks from time to time (both on BM and VM it blocks always on a second run), but test [1] which uses SecureRandom.darn() never blocks:
http://cr.openjdk.java.net/~gromero/logs/darn_prototype.log
Here is the final change for reference (hs + corelib):
https://github.com/gromero/darn/blob/a8ea0802018b0c4de946b548d4ede4bfc5fb645...
I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
What is "DayTrade7 bench" ? I don't know it and a quick Google search didn't returned anything useful.
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance?
Probably yes, but I'm not sure about it as well. I think once you have a complete implementation you should start a new thread on the security mailing list (and maybe CC hostspot-dev) to ask about the expert's opinions. As Intel also has the similar 'randr' instruction since quite some time it may be reasonable to create a special provider which is intended to intrinsically use the native CPU instructions if available and fall back to the default implementation otherwise. I think Vladimir Kozlov from the HotSpot team has tried to build something similar for 'randr' some time ago so I'm sure you'll get some good comments and advices :)
Regards, Gustavo
[0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote:
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
Thanks!
Regards, Gustavo
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :)
Regards, Volker
[1] https://vimeo.com/182074382 [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote: > Hi, > > POWER9 processors introduced a new single instruction to generate a random > number called 'darn' (Deliver A Random Number) [1, 2]. The random number > generator behind this instruction is NIST SP800-90B and SP800-90C compliant and > provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple > as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit > operand to select the random number format. One can call 'darn' many times to > obtain a new random number each time. > > Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() > method & friends from JCE (NativePRNG provider). If that holds, so it has to > be done both for Interpreter and JIT. > > Currently generateSeed() from NativePRNG basically reads from /dev/random by > default (which blocks from time to time) or /dev/urandom if instructed to do so. > Could somebody please help me to figure out the appropriate place to exploit > such a P9 instruction for interpreted mode, given that code for generateSeed() > is pure Java and behind scenes just opens /dev/random file and reads from > it? For instance, is it correct to exploit it on a C/C++ code and attach that > by means of a JNI? > > Finally, for JITed mode, I think that a way to exploit such a feature would be > by matching an specific sub-tree in Ideal Graph and from that emit a `darn` > instruction, however I could not figure one sound sub-tree with known nodes > (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters > usually proceed in this case? > > Any comments shedding some light on that is much appreciated. > > Thanks and best regards, > Gustavo > > [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 > [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 >
Hi Volker, On 05-12-2017 06:16, Volker Simonis wrote:
I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
What is "DayTrade7 bench" ? I don't know it and a quick Google search didn't returned anything useful.
I have never used it, but it was suggested to me that DayTrade7 (https://github.com/WASdev/sample.daytrader7) with security enabled (GCM128, for instance) will spend more than half the type on crypto and will stress the seed generator. But I'm not sure why the "sample" on it. The README.md says "This sample contains the DayTrader 7 benchmark [...]", so I'm hoping it contains the complete benchmark...
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance?
Probably yes, but I'm not sure about it as well. I think once you have a complete implementation you should start a new thread on the security mailing list (and maybe CC hostspot-dev) to ask about the expert's opinions. As Intel also has the similar 'randr' instruction since quite some time it may be reasonable to create a special provider which is intended to intrinsically use the native CPU instructions if available and fall back to the default implementation otherwise. I think Vladimir Kozlov from the HotSpot team has tried to build something similar for 'randr' some time ago so I'm sure you'll get some good comments and advices :)
OK. I'll complete the implementation adding the fallback and the JIT and start a new thread asking about it on the security ML. Looks like 'rdrand|randr' instruction is not exploited on Intel? Interesting... I'll CC Vladimir as well. Thanks Volker!
Regards, Gustavo
[0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 24-11-2017 20:04, Volker Simonis wrote: > in one of my talks [1,2] I have an example on how to intrinsify > Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But > please notice that this is just a "toy" example - it is not production > ready. In fact I think the right way would be to create a new > SecureRandom provider where you may implement "engineNextBytes" by > using the new Power instruction (maybe by calling a native function). > This special implementation of "engineNextBytes" could then be > intrinsified as described in the talk.
I've implemented a simple interpreter intrinsic for 'darn' for a given class/method provided by the user, similarly to what you did for Helloword.sayHello() in your example. Thanks! I'm now looking for the correct way to call back from the intrinsic a Java method to act as a fallback method, since ISA says [1]:
When the error value is obtained [i.e. 'darn' did not return a random number], software is expected to repeat the operation. If a non-error value has not been obtained after several attempts, a software random number generation method should be used. The recommended number of attempts may be implementation specific. In the absence of other guidance, ten attempts should be adequate.
and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() non-intrinsified method after about 10 failures to get the random number so it can take over the task again. You did something like that here:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
but for fputs() from libc.
Do you know if it's possible to call, for instance, a loaded method like SecureRandom.nextInt() from the instrinsic?
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
Thanks!
Regards, Gustavo
> Also, before you start this, please contact the security mailing list > just to make sure you're not going into the wrong direction (I'm not a > security expert :) > > Regards, > Volker > > [1] https://vimeo.com/182074382 > [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/ > > On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero > <gromero@linux.vnet.ibm.com> wrote: >> Hi, >> >> POWER9 processors introduced a new single instruction to generate a random >> number called 'darn' (Deliver A Random Number) [1, 2]. The random number >> generator behind this instruction is NIST SP800-90B and SP800-90C compliant and >> provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple >> as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit >> operand to select the random number format. One can call 'darn' many times to >> obtain a new random number each time. >> >> Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() >> method & friends from JCE (NativePRNG provider). If that holds, so it has to >> be done both for Interpreter and JIT. >> >> Currently generateSeed() from NativePRNG basically reads from /dev/random by >> default (which blocks from time to time) or /dev/urandom if instructed to do so. >> Could somebody please help me to figure out the appropriate place to exploit >> such a P9 instruction for interpreted mode, given that code for generateSeed() >> is pure Java and behind scenes just opens /dev/random file and reads from >> it? For instance, is it correct to exploit it on a C/C++ code and attach that >> by means of a JNI? >> >> Finally, for JITed mode, I think that a way to exploit such a feature would be >> by matching an specific sub-tree in Ideal Graph and from that emit a `darn` >> instruction, however I could not figure one sound sub-tree with known nodes >> (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters >> usually proceed in this case? >> >> Any comments shedding some light on that is much appreciated. >> >> Thanks and best regards, >> Gustavo >> >> [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 >> [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 >> >
Hello Volker, I finished a v1 random implementation for Interpreter and C2 Compiler. However I'm struggling a bit on C1 implementation... There is probably something wrong with my new LIR node for random. At runtime C1 Linear Scan hits the following assert(): .../hs/src/hotspot/share/c1/c1_LinearScan.cpp:855)da, pid=13334, tid=13382 assert(false) failed: live_in set of first block must be empty Error: live_in set of first block must be empty (when this fails, virtual registers are used before they are defined) affected registers: 262 * vreg 262 (HIR instruction l68) used in block B3 When I inspect Block 3 it shows as: B3 [24, 47] preds: B2 sux: B1 __id_Instruction___________________________________________ 254 label [label:0x0000712c20027f80] 256 null_check [R252|L] [bci:25] 258 move [R252|L] [R261|L] 260 profile_call main.seed_darn @ 25 [R259|L] [R261|L] [R260|J] 262 random [R262|J] <========================== 264 null_check [R253|L] [bci:32] 266 move [R253|L] [R265|L] 268 profile_call main.seed_darn @ 32 [R263|L] [R265|L] [R264|J] 40 270 move [int:0|I] [R4|I] 272 move [R262|J] [R5R5|J] 274 move [R253|L] [R3|L] 276 icvirtual call: [addr: 0x0000000000000000] [recv: [R3|L]] [result: [R3|L]] [bci:32] 278 move [R3|L] [R266|L] 280 move [obj:0x0000712bec000e40|L] [R267|L] 282 move [Base:[R26797|L] Disp: 116|L] [R268|L] 284 null_check [R268|L] [bci:41] 286 move [R268|L] [R271|L] 288 profile_call main.seed_darn @ 41 [R269|L] [R271|L] [R270|J] 290 move [obj:0x0000712bec000e48|L] [R4|L] 292 move [R268|L] [R3|L] 294 optvirtual call: [addr: 0x0000000000000000] [recv: [R3e0|L]] [bci:41] 296 move [R254|I] [R274|I] 298 move [R253|L] [R273|L] 300 move [R252|L] [R272|L] 302 move [int:0|I] [R275|I] 304 branch [AL] [B1] I mapped the intrinsic to do_Random() (please find full diff here [1]): --- a/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -3215,6 +3215,8 @@ case vmIntrinsics::_fmaD: do_FmaIntrinsic(x); break; case vmIntrinsics::_fmaF: do_FmaIntrinsic(x); break; + case vmIntrinsics::_darn: do_Random(x); break; + --- a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -1531,6 +1531,12 @@ } } +void LIRGenerator::do_Random(Intrinsic* x) { + + LIR_Opr result = rlock_result(x); + __ rng(result); +} + ...and I expected that all allocation for the vreg would be done by rlock_result(). Besides that I assumed that lir_random is LIR_Op1 since there is not input and 1 output (result). Have you ever encountered that error when implementing a new LIR instruction? Best regards, Gustavo [1] http://cr.openjdk.java.net/~gromero/misc/darn_C1.diff On 12/05/2017 10:03 AM, Gustavo Romero wrote:
Hi Volker,
On 05-12-2017 06:16, Volker Simonis wrote:
I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
What is "DayTrade7 bench" ? I don't know it and a quick Google search didn't returned anything useful.
I have never used it, but it was suggested to me that DayTrade7 (https://github.com/WASdev/sample.daytrader7) with security enabled (GCM128, for instance) will spend more than half the type on crypto and will stress the seed generator. But I'm not sure why the "sample" on it. The README.md says "This sample contains the DayTrader 7 benchmark [...]", so I'm hoping it contains the complete benchmark...
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance?
Probably yes, but I'm not sure about it as well. I think once you have a complete implementation you should start a new thread on the security mailing list (and maybe CC hostspot-dev) to ask about the expert's opinions. As Intel also has the similar 'randr' instruction since quite some time it may be reasonable to create a special provider which is intended to intrinsically use the native CPU instructions if available and fall back to the default implementation otherwise. I think Vladimir Kozlov from the HotSpot team has tried to build something similar for 'randr' some time ago so I'm sure you'll get some good comments and advices :)
OK. I'll complete the implementation adding the fallback and the JIT and start a new thread asking about it on the security ML. Looks like 'rdrand|randr' instruction is not exploited on Intel? Interesting... I'll CC Vladimir as well.
Thanks Volker!
Regards, Gustavo
[0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote: > Hi Volker, > > On 24-11-2017 20:04, Volker Simonis wrote: >> in one of my talks [1,2] I have an example on how to intrinsify >> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But >> please notice that this is just a "toy" example - it is not production >> ready. In fact I think the right way would be to create a new >> SecureRandom provider where you may implement "engineNextBytes" by >> using the new Power instruction (maybe by calling a native function). >> This special implementation of "engineNextBytes" could then be >> intrinsified as described in the talk. > > I've implemented a simple interpreter intrinsic for 'darn' for a given > class/method provided by the user, similarly to what you did for > Helloword.sayHello() in your example. Thanks! I'm now looking for the correct > way to call back from the intrinsic a Java method to act as a fallback method, > since ISA says [1]: > > When the error value is obtained [i.e. 'darn' did not return a random number], > software is expected to repeat the operation. If a non-error value has not been > obtained after several attempts, a software random number generation method > should be used. The recommended number of attempts may be implementation > specific. In the absence of other guidance, ten attempts should be adequate. > > and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() > non-intrinsified method after about 10 failures to get the random number so it > can take over the task again. You did something like that here: > > https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak... > > but for fputs() from libc. > > Do you know if it's possible to call, for instance, a loaded method like > SecureRandom.nextInt() from the instrinsic? >
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
> Thanks! > > Regards, > Gustavo > >> Also, before you start this, please contact the security mailing list >> just to make sure you're not going into the wrong direction (I'm not a >> security expert :) >> >> Regards, >> Volker >> >> [1] https://vimeo.com/182074382 >> [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/ >> >> On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero >> <gromero@linux.vnet.ibm.com> wrote: >>> Hi, >>> >>> POWER9 processors introduced a new single instruction to generate a random >>> number called 'darn' (Deliver A Random Number) [1, 2]. The random number >>> generator behind this instruction is NIST SP800-90B and SP800-90C compliant and >>> provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple >>> as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit >>> operand to select the random number format. One can call 'darn' many times to >>> obtain a new random number each time. >>> >>> Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() >>> method & friends from JCE (NativePRNG provider). If that holds, so it has to >>> be done both for Interpreter and JIT. >>> >>> Currently generateSeed() from NativePRNG basically reads from /dev/random by >>> default (which blocks from time to time) or /dev/urandom if instructed to do so. >>> Could somebody please help me to figure out the appropriate place to exploit >>> such a P9 instruction for interpreted mode, given that code for generateSeed() >>> is pure Java and behind scenes just opens /dev/random file and reads from >>> it? For instance, is it correct to exploit it on a C/C++ code and attach that >>> by means of a JNI? >>> >>> Finally, for JITed mode, I think that a way to exploit such a feature would be >>> by matching an specific sub-tree in Ideal Graph and from that emit a `darn` >>> instruction, however I could not figure one sound sub-tree with known nodes >>> (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters >>> usually proceed in this case? >>> >>> Any comments shedding some light on that is much appreciated. >>> >>> Thanks and best regards, >>> Gustavo >>> >>> [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 >>> [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 >>> >> >
Hi Gustavo, I think lir_random needs to be modelled as LIR_Op0 (i.e. 0 input operands) like e.g. lir_get_thread. Best regards, Martin -----Original Message----- From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces@openjdk.java.net] On Behalf Of Gustavo Romero Sent: Mittwoch, 31. Januar 2018 06:47 To: Volker Simonis <volker.simonis@gmail.com> Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: POWER9: Is there a way to improve the random number generation on PPC64? Hello Volker, I finished a v1 random implementation for Interpreter and C2 Compiler. However I'm struggling a bit on C1 implementation... There is probably something wrong with my new LIR node for random. At runtime C1 Linear Scan hits the following assert(): .../hs/src/hotspot/share/c1/c1_LinearScan.cpp:855)da, pid=13334, tid=13382 assert(false) failed: live_in set of first block must be empty Error: live_in set of first block must be empty (when this fails, virtual registers are used before they are defined) affected registers: 262 * vreg 262 (HIR instruction l68) used in block B3 When I inspect Block 3 it shows as: B3 [24, 47] preds: B2 sux: B1 __id_Instruction___________________________________________ 254 label [label:0x0000712c20027f80] 256 null_check [R252|L] [bci:25] 258 move [R252|L] [R261|L] 260 profile_call main.seed_darn @ 25 [R259|L] [R261|L] [R260|J] 262 random [R262|J] <========================== 264 null_check [R253|L] [bci:32] 266 move [R253|L] [R265|L] 268 profile_call main.seed_darn @ 32 [R263|L] [R265|L] [R264|J] 40 270 move [int:0|I] [R4|I] 272 move [R262|J] [R5R5|J] 274 move [R253|L] [R3|L] 276 icvirtual call: [addr: 0x0000000000000000] [recv: [R3|L]] [result: [R3|L]] [bci:32] 278 move [R3|L] [R266|L] 280 move [obj:0x0000712bec000e40|L] [R267|L] 282 move [Base:[R26797|L] Disp: 116|L] [R268|L] 284 null_check [R268|L] [bci:41] 286 move [R268|L] [R271|L] 288 profile_call main.seed_darn @ 41 [R269|L] [R271|L] [R270|J] 290 move [obj:0x0000712bec000e48|L] [R4|L] 292 move [R268|L] [R3|L] 294 optvirtual call: [addr: 0x0000000000000000] [recv: [R3e0|L]] [bci:41] 296 move [R254|I] [R274|I] 298 move [R253|L] [R273|L] 300 move [R252|L] [R272|L] 302 move [int:0|I] [R275|I] 304 branch [AL] [B1] I mapped the intrinsic to do_Random() (please find full diff here [1]): --- a/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -3215,6 +3215,8 @@ case vmIntrinsics::_fmaD: do_FmaIntrinsic(x); break; case vmIntrinsics::_fmaF: do_FmaIntrinsic(x); break; + case vmIntrinsics::_darn: do_Random(x); break; + --- a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -1531,6 +1531,12 @@ } } +void LIRGenerator::do_Random(Intrinsic* x) { + + LIR_Opr result = rlock_result(x); + __ rng(result); +} + ...and I expected that all allocation for the vreg would be done by rlock_result(). Besides that I assumed that lir_random is LIR_Op1 since there is not input and 1 output (result). Have you ever encountered that error when implementing a new LIR instruction? Best regards, Gustavo [1] http://cr.openjdk.java.net/~gromero/misc/darn_C1.diff On 12/05/2017 10:03 AM, Gustavo Romero wrote:
Hi Volker,
On 05-12-2017 06:16, Volker Simonis wrote:
I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
What is "DayTrade7 bench" ? I don't know it and a quick Google search didn't returned anything useful.
I have never used it, but it was suggested to me that DayTrade7 (https://github.com/WASdev/sample.daytrader7) with security enabled (GCM128, for instance) will spend more than half the type on crypto and will stress the seed generator. But I'm not sure why the "sample" on it. The README.md says "This sample contains the DayTrader 7 benchmark [...]", so I'm hoping it contains the complete benchmark...
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance?
Probably yes, but I'm not sure about it as well. I think once you have a complete implementation you should start a new thread on the security mailing list (and maybe CC hostspot-dev) to ask about the expert's opinions. As Intel also has the similar 'randr' instruction since quite some time it may be reasonable to create a special provider which is intended to intrinsically use the native CPU instructions if available and fall back to the default implementation otherwise. I think Vladimir Kozlov from the HotSpot team has tried to build something similar for 'randr' some time ago so I'm sure you'll get some good comments and advices :)
OK. I'll complete the implementation adding the fallback and the JIT and start a new thread asking about it on the security ML. Looks like 'rdrand|randr' instruction is not exploited on Intel? Interesting... I'll CC Vladimir as well.
Thanks Volker!
Regards, Gustavo
[0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote:
On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote: > Hi Volker, > > On 24-11-2017 20:04, Volker Simonis wrote: >> in one of my talks [1,2] I have an example on how to intrinsify >> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But >> please notice that this is just a "toy" example - it is not production >> ready. In fact I think the right way would be to create a new >> SecureRandom provider where you may implement "engineNextBytes" by >> using the new Power instruction (maybe by calling a native function). >> This special implementation of "engineNextBytes" could then be >> intrinsified as described in the talk. > > I've implemented a simple interpreter intrinsic for 'darn' for a given > class/method provided by the user, similarly to what you did for > Helloword.sayHello() in your example. Thanks! I'm now looking for the correct > way to call back from the intrinsic a Java method to act as a fallback method, > since ISA says [1]: > > When the error value is obtained [i.e. 'darn' did not return a random number], > software is expected to repeat the operation. If a non-error value has not been > obtained after several attempts, a software random number generation method > should be used. The recommended number of attempts may be implementation > specific. In the absence of other guidance, ten attempts should be adequate. > > and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() > non-intrinsified method after about 10 failures to get the random number so it > can take over the task again. You did something like that here: > > https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak... > > but for fputs() from libc. > > Do you know if it's possible to call, for instance, a loaded method like > SecureRandom.nextInt() from the instrinsic? >
I don't think that would be easy to do (if possible at all).
The correct way to handle such situations would be to define a Java method with the exact semantics of your 'darn' instruction. All the other logic should be implemented in Java. So for example you would implement SecureRandom.darn() and call it from engineNextBytes(). At the call site of darn() you check the error value and dispatch to the corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
> Thanks! > > Regards, > Gustavo > >> Also, before you start this, please contact the security mailing list >> just to make sure you're not going into the wrong direction (I'm not a >> security expert :) >> >> Regards, >> Volker >> >> [1] https://vimeo.com/182074382 >> [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/ >> >> On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero >> <gromero@linux.vnet.ibm.com> wrote: >>> Hi, >>> >>> POWER9 processors introduced a new single instruction to generate a random >>> number called 'darn' (Deliver A Random Number) [1, 2]. The random number >>> generator behind this instruction is NIST SP800-90B and SP800-90C compliant and >>> provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple >>> as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit >>> operand to select the random number format. One can call 'darn' many times to >>> obtain a new random number each time. >>> >>> Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() >>> method & friends from JCE (NativePRNG provider). If that holds, so it has to >>> be done both for Interpreter and JIT. >>> >>> Currently generateSeed() from NativePRNG basically reads from /dev/random by >>> default (which blocks from time to time) or /dev/urandom if instructed to do so. >>> Could somebody please help me to figure out the appropriate place to exploit >>> such a P9 instruction for interpreted mode, given that code for generateSeed() >>> is pure Java and behind scenes just opens /dev/random file and reads from >>> it? For instance, is it correct to exploit it on a C/C++ code and attach that >>> by means of a JNI? >>> >>> Finally, for JITed mode, I think that a way to exploit such a feature would be >>> by matching an specific sub-tree in Ideal Graph and from that emit a `darn` >>> instruction, however I could not figure one sound sub-tree with known nodes >>> (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters >>> usually proceed in this case? >>> >>> Any comments shedding some light on that is much appreciated. >>> >>> Thanks and best regards, >>> Gustavo >>> >>> [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 >>> [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 >>> >> >
Hi Martin, On 01/31/2018 12:00 PM, Doerr, Martin wrote:
I think lir_random needs to be modelled as LIR_Op0 (i.e. 0 input operands) like e.g. lir_get_thread.
Great. I'm trying it as LIR_Op0. Indeed it was a major question I had (Op0 or Op1). Thank you. Best regards, Gustavo
Best regards, Martin
-----Original Message----- From: ppc-aix-port-dev [mailto:ppc-aix-port-dev-bounces@openjdk.java.net] On Behalf Of Gustavo Romero Sent: Mittwoch, 31. Januar 2018 06:47 To: Volker Simonis <volker.simonis@gmail.com> Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: POWER9: Is there a way to improve the random number generation on PPC64?
Hello Volker,
I finished a v1 random implementation for Interpreter and C2 Compiler.
However I'm struggling a bit on C1 implementation...
There is probably something wrong with my new LIR node for random.
At runtime C1 Linear Scan hits the following assert():
.../hs/src/hotspot/share/c1/c1_LinearScan.cpp:855)da, pid=13334, tid=13382 assert(false) failed: live_in set of first block must be empty
Error: live_in set of first block must be empty (when this fails, virtual registers are used before they are defined) affected registers: 262 * vreg 262 (HIR instruction l68) used in block B3
When I inspect Block 3 it shows as:
B3 [24, 47] preds: B2 sux: B1 __id_Instruction___________________________________________ 254 label [label:0x0000712c20027f80] 256 null_check [R252|L] [bci:25] 258 move [R252|L] [R261|L] 260 profile_call main.seed_darn @ 25 [R259|L] [R261|L] [R260|J] 262 random [R262|J] <========================== 264 null_check [R253|L] [bci:32] 266 move [R253|L] [R265|L] 268 profile_call main.seed_darn @ 32 [R263|L] [R265|L] [R264|J] 40 270 move [int:0|I] [R4|I] 272 move [R262|J] [R5R5|J] 274 move [R253|L] [R3|L] 276 icvirtual call: [addr: 0x0000000000000000] [recv: [R3|L]] [result: [R3|L]] [bci:32] 278 move [R3|L] [R266|L] 280 move [obj:0x0000712bec000e40|L] [R267|L] 282 move [Base:[R26797|L] Disp: 116|L] [R268|L] 284 null_check [R268|L] [bci:41] 286 move [R268|L] [R271|L] 288 profile_call main.seed_darn @ 41 [R269|L] [R271|L] [R270|J] 290 move [obj:0x0000712bec000e48|L] [R4|L] 292 move [R268|L] [R3|L] 294 optvirtual call: [addr: 0x0000000000000000] [recv: [R3e0|L]] [bci:41] 296 move [R254|I] [R274|I] 298 move [R253|L] [R273|L] 300 move [R252|L] [R272|L] 302 move [int:0|I] [R275|I] 304 branch [AL] [B1]
I mapped the intrinsic to do_Random() (please find full diff here [1]):
--- a/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/share/c1/c1_LIRGenerator.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -3215,6 +3215,8 @@ case vmIntrinsics::_fmaD: do_FmaIntrinsic(x); break; case vmIntrinsics::_fmaF: do_FmaIntrinsic(x); break;
+ case vmIntrinsics::_darn: do_Random(x); break; +
--- a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 23 10:52:33 2018 -0600 +++ b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp Tue Jan 30 23:21:24 2018 -0600 @@ -1531,6 +1531,12 @@ } }
+void LIRGenerator::do_Random(Intrinsic* x) { + + LIR_Opr result = rlock_result(x); + __ rng(result); +} +
...and I expected that all allocation for the vreg would be done by rlock_result().
Besides that I assumed that lir_random is LIR_Op1 since there is not input and 1 output (result).
Have you ever encountered that error when implementing a new LIR instruction?
Best regards, Gustavo
[1] http://cr.openjdk.java.net/~gromero/misc/darn_C1.diff
On 12/05/2017 10:03 AM, Gustavo Romero wrote:
Hi Volker,
On 05-12-2017 06:16, Volker Simonis wrote:
I intend to implement the fallback now and run it against DayTrade7 bench, if you have any other idea on how to test it, please let me know.
What is "DayTrade7 bench" ? I don't know it and a quick Google search didn't returned anything useful.
I have never used it, but it was suggested to me that DayTrade7 (https://github.com/WASdev/sample.daytrader7) with security enabled (GCM128, for instance) will spend more than half the type on crypto and will stress the seed generator. But I'm not sure why the "sample" on it. The README.md says "This sample contains the DayTrader 7 benchmark [...]", so I'm hoping it contains the complete benchmark...
Notice that in the real implementation you won't be able to add a public method to SecureRandom.
Yup, I'm aware of it. Initially I thought I could keep all the changes in arch-specific files but due to the need to fallback to a Java method if 'darn' intrinsic fails I understand that there is no way to not touch .java files. In that case your suggestion is to create an entire new provider by adding a new on to ./java.base/share/classes/com/sun/crypto/provider and listing it in java.security, for instance?
Probably yes, but I'm not sure about it as well. I think once you have a complete implementation you should start a new thread on the security mailing list (and maybe CC hostspot-dev) to ask about the expert's opinions. As Intel also has the similar 'randr' instruction since quite some time it may be reasonable to create a special provider which is intended to intrinsically use the native CPU instructions if available and fall back to the default implementation otherwise. I think Vladimir Kozlov from the HotSpot team has tried to build something similar for 'randr' some time ago so I'm sure you'll get some good comments and advices :)
OK. I'll complete the implementation adding the fallback and the JIT and start a new thread asking about it on the security ML. Looks like 'rdrand|randr' instruction is not exploited on Intel? Interesting... I'll CC Vladimir as well.
Thanks Volker!
Regards, Gustavo
[0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba... [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883e...
Regards, Volker
On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero <gromero@linux.vnet.ibm.com> wrote:
Hi Volker,
On 29-11-2017 11:21, Volker Simonis wrote: > On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero > <gromero@linux.vnet.ibm.com> wrote: >> Hi Volker, >> >> On 24-11-2017 20:04, Volker Simonis wrote: >>> in one of my talks [1,2] I have an example on how to intrinsify >>> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But >>> please notice that this is just a "toy" example - it is not production >>> ready. In fact I think the right way would be to create a new >>> SecureRandom provider where you may implement "engineNextBytes" by >>> using the new Power instruction (maybe by calling a native function). >>> This special implementation of "engineNextBytes" could then be >>> intrinsified as described in the talk. >> >> I've implemented a simple interpreter intrinsic for 'darn' for a given >> class/method provided by the user, similarly to what you did for >> Helloword.sayHello() in your example. Thanks! I'm now looking for the correct >> way to call back from the intrinsic a Java method to act as a fallback method, >> since ISA says [1]: >> >> When the error value is obtained [i.e. 'darn' did not return a random number], >> software is expected to repeat the operation. If a non-error value has not been >> obtained after several attempts, a software random number generation method >> should be used. The recommended number of attempts may be implementation >> specific. In the absence of other guidance, ten attempts should be adequate. >> >> and so I need to call back from the intrinsic, let's say, SecureRandom.netInt() >> non-intrinsified method after about 10 failures to get the random number so it >> can take over the task again. You did something like that here: >> >> https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak... >> >> but for fputs() from libc. >> >> Do you know if it's possible to call, for instance, a loaded method like >> SecureRandom.nextInt() from the instrinsic? >> > > I don't think that would be easy to do (if possible at all). > > The correct way to handle such situations would be to define a Java > method with the exact semantics of your 'darn' instruction. All the > other logic should be implemented in Java. So for example you would > implement SecureRandom.darn() and call it from engineNextBytes(). At > the call site of darn() you check the error value and dispatch to the > corresponding Java implementation if necessary.
I've implemented a Java SecureRandom.darn() method [1]. I works as expected, i.e. it returns 8 bytes of fake random number (using [3] example). However, when I proceeded to intrinsify it [2, 0] as I did for the method provided by the user (similarly to your HelloWorld example and for a user provided darn() method as I mentioned previously) I hit the following check:
Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
SecureRandom.darn() signature looks correct and I know that java/security/SecureRandom::darn() is present in core libs because before trying to intrinsify it worked ok (I've got the 8 bytes of fake random number - using darning.java, below in references) and also 'javap' shows it's in .class:
gromero@gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn public byte[] darn();
I thought that no additional hack was necessary to get that intrinsic working as it's in core libs, hence nothing like this is needed:
https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak...
On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I get:
Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
Any clue on what I'm missing? Is it correct to assume that since darn() now is in core libs no check is necessary?
Thanks a lot!
Regards, Gustavo
The hs patches:
[0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.pa... [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java... [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intr...
and the test-case:
[3] https://github.com/gromero/darn/blob/master/patches/darning.java
>> Thanks! >> >> Regards, >> Gustavo >> >>> Also, before you start this, please contact the security mailing list >>> just to make sure you're not going into the wrong direction (I'm not a >>> security expert :) >>> >>> Regards, >>> Volker >>> >>> [1] https://vimeo.com/182074382 >>> [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/ >>> >>> On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero >>> <gromero@linux.vnet.ibm.com> wrote: >>>> Hi, >>>> >>>> POWER9 processors introduced a new single instruction to generate a random >>>> number called 'darn' (Deliver A Random Number) [1, 2]. The random number >>>> generator behind this instruction is NIST SP800-90B and SP800-90C compliant and >>>> provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple >>>> as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit >>>> operand to select the random number format. One can call 'darn' many times to >>>> obtain a new random number each time. >>>> >>>> Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed() >>>> method & friends from JCE (NativePRNG provider). If that holds, so it has to >>>> be done both for Interpreter and JIT. >>>> >>>> Currently generateSeed() from NativePRNG basically reads from /dev/random by >>>> default (which blocks from time to time) or /dev/urandom if instructed to do so. >>>> Could somebody please help me to figure out the appropriate place to exploit >>>> such a P9 instruction for interpreted mode, given that code for generateSeed() >>>> is pure Java and behind scenes just opens /dev/random file and reads from >>>> it? For instance, is it correct to exploit it on a C/C++ code and attach that >>>> by means of a JNI? >>>> >>>> Finally, for JITed mode, I think that a way to exploit such a feature would be >>>> by matching an specific sub-tree in Ideal Graph and from that emit a `darn` >>>> instruction, however I could not figure one sound sub-tree with known nodes >>>> (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters >>>> usually proceed in this case? >>>> >>>> Any comments shedding some light on that is much appreciated. >>>> >>>> Thanks and best regards, >>>> Gustavo >>>> >>>> [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79 >>>> [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0 >>>> >>> >> >
Hi Martin, On 01/31/2018 12:00 PM, Doerr, Martin wrote:
I think lir_random needs to be modelled as LIR_Op0 (i.e. 0 input operands) like e.g. lir_get_thread.
Modeling as a LIR_Op0 fixed the error [1]. I'll prepare a final version, do a few more tests, and start a discussion about using 'darn' for SecureRandom on POWER9 on the security ML as Volker suggested. Thanks a lot, Martin. Best regards, Gustavo [1] http://cr.openjdk.java.net/~gromero/misc/darn_C1_v2.patch
Hi Martin, Volker, Vladimir Sorry for the huge delay replaying on this... I hope Martin (and Lutz) are feeling better and fully recovered. On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available. The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation. On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]): gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps [With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ... [With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool. @Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it? The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is: http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i... I intend to contribute that change as an experimental feature. Thank you. Best regards, Gustavo [1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Gustavo, thanks for posting your change. I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well? Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else? It would be good to have the complete change in one webrev for easier reviewing. Thanks and best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High Hi Martin, Volker, Vladimir Sorry for the huge delay replaying on this... I hope Martin (and Lutz) are feeling better and fully recovered. On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available. The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation. On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]): gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps [With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ... [With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool. @Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it? The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is: http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i... I intend to contribute that change as an experimental feature. Thank you. Best regards, Gustavo [1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Martin, Thank you very much for your comments. On 04/03/2018 09:50 AM, Doerr, Martin wrote:
I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it was P9TRNG). I changed the names for the helpers to something more "neutral" and removed the snake_case from their names. Yes, it may get used for other platforms and can get used readily by other platforms by providing an intrinsic for the randomLong() method. I provided an unique value for serialVersionUID in the new JCA provider by using an increment of a preexisting one. Do you know if there is any other (formal) way to determine that value?
Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of criticism in that sense. I've talked to NX RNG designed and we've tried to find out a documentation about it on OpenPOWER foundation but it's not available yet. In any case, just like the use of 'rdrand' on OpenSSL that is disabled by default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is optional and deliberated? Currently the user needs to (a) explicitly use the new provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do you think it would be acceptable?
It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/ Thanks a lot. Best regards, Gustavo
Thanks and best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High
Hi Martin, Volker, Vladimir
Sorry for the huge delay replaying on this...
I hope Martin (and Lutz) are feeling better and fully recovered.
On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available.
The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation.
On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]):
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps
[With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ...
[With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps
For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool.
@Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it?
The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i...
I intend to contribute that change as an experimental feature.
Thank you.
Best regards, Gustavo
[1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Gustavo, thanks for providing the webrev. Please note that it needs to get reviewed on the official mailing lists hotspot-compiler-dev and security-dev (you should subscribe before posting). ppc.ad: I think the pipe classes are not implemented in a very useful way on PPC64 at the moment. Adding a new one just for this doesn't make any sense in my opinion. If you would like to improve the OptoScheduling, I suggest to do this separately. Will probably take quite some effort. templateInterpreterGenerator_ppc.cpp: I think the generator should return NULL for Power8 and below. templateInterpreterGenerator: Please move the generation below the basic entries. HWTRNG.java: The closing braces look weird at the end. A few declarations like generate_HWTRNG_randomLong_entry() or LIRGenerator::do_Random are in shared code, but only defined in PPC64 code. But it may make sense to fix that after you got a few reviews.
Do you know if there is any other (formal) way to determine that value? You can use the tool "serialver" from the jdk/bin. I think Eclipse can also generate it if you have a project for it.
I'm ok with using darn directly in the initial version as long as it's not used by default. If it is supposed to get used by default, I think we should add something similar to linux' dev/random. I haven't checked how it's implemented on PPC64. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 16. April 2018 04:45 To: Doerr, Martin <martin.doerr@sap.com>; Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Hi Martin, Thank you very much for your comments. On 04/03/2018 09:50 AM, Doerr, Martin wrote:
I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it was P9TRNG). I changed the names for the helpers to something more "neutral" and removed the snake_case from their names. Yes, it may get used for other platforms and can get used readily by other platforms by providing an intrinsic for the randomLong() method. Do you know if there is any other (formal) way to determine that value?
Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of criticism in that sense. I've talked to NX RNG designed and we've tried to find out a documentation about it on OpenPOWER foundation but it's not available yet. In any case, just like the use of 'rdrand' on OpenSSL that is disabled by default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is optional and deliberated? Currently the user needs to (a) explicitly use the new provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do you think it would be acceptable?
It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/ Thanks a lot. Best regards, Gustavo
Thanks and best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High
Hi Martin, Volker, Vladimir
Sorry for the huge delay replaying on this...
I hope Martin (and Lutz) are feeling better and fully recovered.
On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available.
The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation.
On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]):
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps
[With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ...
[With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps
For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool.
@Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it?
The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i...
I intend to contribute that change as an experimental feature.
Thank you.
Best regards, Gustavo
[1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Martin, On 04/16/2018 08:10 AM, Doerr, Martin wrote:
thanks for providing the webrev.
Thanks a lot for reviewing it.
Please note that it needs to get reviewed on the official mailing lists hotspot-compiler-dev and security-dev (you should subscribe before posting).
Yup, I'm aware of that. I'm extending it to these MLs after I address all your comments. Thanks for reminding me about the need to subscribe before posting :-)
ppc.ad: I think the pipe classes are not implemented in a very useful way on PPC64 at the moment. Adding a new one just for this doesn't make any sense in my opinion. If you would like to improve the OptoScheduling, I suggest to do this separately. Will probably take quite some effort.
I see. I removed the new one and replaced that simply by the default (pipe_class_default).
templateInterpreterGenerator_ppc.cpp: I think the generator should return NULL for Power8 and below.
Done.
templateInterpreterGenerator: Please move the generation below the basic entries.
Done. I also added a few comments on the new entry.
HWTRNG.java: The closing braces look weird at the end.
Yes. It's taken from NativePRNG.java: --- a/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Fri Mar 30 08:59:14 2018 -0500 +++ b/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Mon May 28 16:44:15 2018 -0500 @@ -566,5 +566,5 @@ throw new ProviderException("nextBytes() failed", e); } } - } + } } I incorporated that change to the new webrev (below).
A few declarations like generate_HWTRNG_randomLong_entry() or LIRGenerator::do_Random are in shared code, but only defined in PPC64 code. But it may make sense to fix that after you got a few reviews.
OK.
Do you know if there is any other (formal) way to determine that value? You can use the tool "serialver" from the jdk/bin. I think Eclipse can also generate it if you have a project for it.
Thanks. I understood that it basically extracts the SerialVersionUID from the class. So if I keep the value hardcoded 'serialver' only extracts it, not generating a new one. I had to remove the hardcoded value and the compiler generated one, which I then extracted using 'serialver' and used it finally in the new class.
I'm ok with using darn directly in the initial version as long as it's not used by default. If it is supposed to get used by default, I think we should add something similar to linux' dev/random. I haven't checked how it's implemented on PPC64.
OK. New webrev : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4 Interdiff : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4/v3_v4.diff (changes made from last webrev to the current one) If you don't have any objection to that version I'll start to discuss it on the security-dev ML. Best regards, Gustavo
Best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 16. April 2018 04:45 To: Doerr, Martin <martin.doerr@sap.com>; Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Hi Martin,
Thank you very much for your comments.
On 04/03/2018 09:50 AM, Doerr, Martin wrote:
I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it was P9TRNG). I changed the names for the helpers to something more "neutral" and removed the snake_case from their names. Yes, it may get used for other platforms and can get used readily by other platforms by providing an intrinsic for the randomLong() method.
Do you know if there is any other (formal) way to determine that value?
Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of criticism in that sense. I've talked to NX RNG designed and we've tried to find out a documentation about it on OpenPOWER foundation but it's not available yet. In any case, just like the use of 'rdrand' on OpenSSL that is disabled by default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is optional and deliberated? Currently the user needs to (a) explicitly use the new provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do you think it would be acceptable?
It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/
Thanks a lot.
Best regards, Gustavo
Thanks and best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High
Hi Martin, Volker, Vladimir
Sorry for the huge delay replaying on this...
I hope Martin (and Lutz) are feeling better and fully recovered.
On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available.
The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation.
On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]):
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps
[With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ...
[With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps
For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool.
@Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it?
The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i...
I intend to contribute that change as an experimental feature.
Thank you.
Best regards, Gustavo
[1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Gustavo, generate_HWTRNG_randomLong_entry() is still misplaced in templateInterpreterGenerator.cpp violating " We expect the normal and native entry points to be generated first so we can reuse them.". I think you should create a rebased webrev once 8203669 is pushed because you'll get merge conflicts. Best regards, Martin -----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Dienstag, 29. Mai 2018 01:04 To: Doerr, Martin <martin.doerr@sap.com> Cc: Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com; ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Hi Martin, On 04/16/2018 08:10 AM, Doerr, Martin wrote:
thanks for providing the webrev.
Thanks a lot for reviewing it.
Please note that it needs to get reviewed on the official mailing lists hotspot-compiler-dev and security-dev (you should subscribe before posting).
Yup, I'm aware of that. I'm extending it to these MLs after I address all your comments. Thanks for reminding me about the need to subscribe before posting :-)
ppc.ad: I think the pipe classes are not implemented in a very useful way on PPC64 at the moment. Adding a new one just for this doesn't make any sense in my opinion. If you would like to improve the OptoScheduling, I suggest to do this separately. Will probably take quite some effort.
I see. I removed the new one and replaced that simply by the default (pipe_class_default).
templateInterpreterGenerator_ppc.cpp: I think the generator should return NULL for Power8 and below.
Done.
templateInterpreterGenerator: Please move the generation below the basic entries.
Done. I also added a few comments on the new entry.
HWTRNG.java: The closing braces look weird at the end.
Yes. It's taken from NativePRNG.java: --- a/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Fri Mar 30 08:59:14 2018 -0500 +++ b/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Mon May 28 16:44:15 2018 -0500 @@ -566,5 +566,5 @@ throw new ProviderException("nextBytes() failed", e); } } - } + } } I incorporated that change to the new webrev (below).
A few declarations like generate_HWTRNG_randomLong_entry() or LIRGenerator::do_Random are in shared code, but only defined in PPC64 code. But it may make sense to fix that after you got a few reviews.
OK.
Do you know if there is any other (formal) way to determine that value? You can use the tool "serialver" from the jdk/bin. I think Eclipse can also generate it if you have a project for it.
Thanks. I understood that it basically extracts the SerialVersionUID from the class. So if I keep the value hardcoded 'serialver' only extracts it, not generating a new one. I had to remove the hardcoded value and the compiler generated one, which I then extracted using 'serialver' and used it finally in the new class.
I'm ok with using darn directly in the initial version as long as it's not used by default. If it is supposed to get used by default, I think we should add something similar to linux' dev/random. I haven't checked how it's implemented on PPC64.
OK. New webrev : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4 Interdiff : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4/v3_v4.diff (changes made from last webrev to the current one) If you don't have any objection to that version I'll start to discuss it on the security-dev ML. Best regards, Gustavo
Best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 16. April 2018 04:45 To: Doerr, Martin <martin.doerr@sap.com>; Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Hi Martin,
Thank you very much for your comments.
On 04/03/2018 09:50 AM, Doerr, Martin wrote:
I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it was P9TRNG). I changed the names for the helpers to something more "neutral" and removed the snake_case from their names. Yes, it may get used for other platforms and can get used readily by other platforms by providing an intrinsic for the randomLong() method.
Do you know if there is any other (formal) way to determine that value?
Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of criticism in that sense. I've talked to NX RNG designed and we've tried to find out a documentation about it on OpenPOWER foundation but it's not available yet. In any case, just like the use of 'rdrand' on OpenSSL that is disabled by default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is optional and deliberated? Currently the user needs to (a) explicitly use the new provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do you think it would be acceptable?
It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/
Thanks a lot.
Best regards, Gustavo
Thanks and best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High
Hi Martin, Volker, Vladimir
Sorry for the huge delay replaying on this...
I hope Martin (and Lutz) are feeling better and fully recovered.
On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available.
The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation.
On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]):
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps
[With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ...
[With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps
For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool.
@Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it?
The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i...
I intend to contribute that change as an experimental feature.
Thank you.
Best regards, Gustavo
[1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
Hi Martin, On 05/29/2018 10:16 AM, Doerr, Martin wrote:
generate_HWTRNG_randomLong_entry() is still misplaced in templateInterpreterGenerator.cpp violating " We expect the normal and native entry points to be generated first so we can reuse them.".
Done.
I think you should create a rebased webrev once 8203669 is pushed because you'll get merge conflicts.
Yes, at least for feature detection part there will be a conflict. I'll rebase after 8203669 is pushed. webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v5 Thank you. Best regards, Gustavo
Best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Dienstag, 29. Mai 2018 01:04 To: Doerr, Martin <martin.doerr@sap.com> Cc: Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com; ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Hi Martin,
On 04/16/2018 08:10 AM, Doerr, Martin wrote:
thanks for providing the webrev.
Thanks a lot for reviewing it.
Please note that it needs to get reviewed on the official mailing lists hotspot-compiler-dev and security-dev (you should subscribe before posting).
Yup, I'm aware of that. I'm extending it to these MLs after I address all your comments. Thanks for reminding me about the need to subscribe before posting :-)
ppc.ad: I think the pipe classes are not implemented in a very useful way on PPC64 at the moment. Adding a new one just for this doesn't make any sense in my opinion. If you would like to improve the OptoScheduling, I suggest to do this separately. Will probably take quite some effort.
I see. I removed the new one and replaced that simply by the default (pipe_class_default).
templateInterpreterGenerator_ppc.cpp: I think the generator should return NULL for Power8 and below.
Done.
templateInterpreterGenerator: Please move the generation below the basic entries.
Done. I also added a few comments on the new entry.
HWTRNG.java: The closing braces look weird at the end.
Yes. It's taken from NativePRNG.java:
--- a/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Fri Mar 30 08:59:14 2018 -0500 +++ b/src/java.base/unix/classes/sun/security/provider/NativePRNG.java Mon May 28 16:44:15 2018 -0500 @@ -566,5 +566,5 @@ throw new ProviderException("nextBytes() failed", e); } } - } + } }
I incorporated that change to the new webrev (below).
A few declarations like generate_HWTRNG_randomLong_entry() or LIRGenerator::do_Random are in shared code, but only defined in PPC64 code. But it may make sense to fix that after you got a few reviews.
OK.
Do you know if there is any other (formal) way to determine that value? You can use the tool "serialver" from the jdk/bin. I think Eclipse can also generate it if you have a project for it.
Thanks. I understood that it basically extracts the SerialVersionUID from the class. So if I keep the value hardcoded 'serialver' only extracts it, not generating a new one. I had to remove the hardcoded value and the compiler generated one, which I then extracted using 'serialver' and used it finally in the new class.
I'm ok with using darn directly in the initial version as long as it's not used by default. If it is supposed to get used by default, I think we should add something similar to linux' dev/random. I haven't checked how it's implemented on PPC64.
OK.
New webrev : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4 Interdiff : http://cr.openjdk.java.net/~gromero/POWER9/darn/v4/v3_v4.diff (changes made from last webrev to the current one)
If you don't have any objection to that version I'll start to discuss it on the security-dev ML.
Best regards, Gustavo
Best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 16. April 2018 04:45 To: Doerr, Martin <martin.doerr@sap.com>; Volker Simonis <volker.simonis@gmail.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: Re: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64?
Hi Martin,
Thank you very much for your comments.
On 04/03/2018 09:50 AM, Doerr, Martin wrote:
I think the Java and shared C++ code and should not use PPC64 specific names because it may get used for other platforms as well?
I fixed all the names and changed the provider's name to HWTRNG (previously it was P9TRNG). I changed the names for the helpers to something more "neutral" and removed the snake_case from their names. Yes, it may get used for other platforms and can get used readily by other platforms by providing an intrinsic for the randomLong() method.
Do you know if there is any other (formal) way to determine that value?
Some people don't want to trust relying solely on the hardware number generator which cannot get reviewed publicly. So would it make sense to use the instruction mixed with something else?
Yes, I'm aware of the caveat... I the past Intel's 'rdrand' received a lot of criticism in that sense. I've talked to NX RNG designed and we've tried to find out a documentation about it on OpenPOWER foundation but it's not available yet. In any case, just like the use of 'rdrand' on OpenSSL that is disabled by default, wouldn't that be fine to use 'darn' on OpenJDK provided its use is optional and deliberated? Currently the user needs to (a) explicitly use the new provider by SecureRandom.getInstance("HWTRNG") and (b) unlock it using "-XX:+UnlockExperimentalVMOptions -XX:+UseRANDOMIntrinsics". In that sense, do you think it would be acceptable?
It would be good to have the complete change in one webrev for easier reviewing.
Sure. Thanks for letting me know. Here is the new webrev: http://cr.openjdk.java.net/~gromero/POWER9/darn/v3/webrev/
Thanks a lot.
Best regards, Gustavo
Thanks and best regards, Martin
-----Original Message----- From: Gustavo Romero [mailto:gromero@linux.vnet.ibm.com] Sent: Montag, 2. April 2018 13:55 To: Volker Simonis <volker.simonis@gmail.com>; Doerr, Martin <martin.doerr@sap.com>; vladimir.kozlov@oracle.com Cc: ppc-aix-port-dev@openjdk.java.net Subject: [RFC] Re: POWER9: Is there a way to improve the random number generation on PPC64? Importance: High
Hi Martin, Volker, Vladimir
Sorry for the huge delay replaying on this...
I hope Martin (and Lutz) are feeling better and fully recovered.
On 11/24/2017 08:04 PM, Volker Simonis wrote:
Hi Gustavo,
in one of my talks [1,2] I have an example on how to intrinsify Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But please notice that this is just a "toy" example - it is not production ready. In fact I think the right way would be to create a new SecureRandom provider where you may implement "engineNextBytes" by using the new Power instruction (maybe by calling a native function). This special implementation of "engineNextBytes" could then be intrinsified as described in the talk.
Also, before you start this, please contact the security mailing list just to make sure you're not going into the wrong direction (I'm not a security expert :) I've created a new JCA provider called 'P9TRNG' and implemented a darn intrinsic for Interpreter, C1, and C2 compiler and did a couple of tests using micro benches [1, 2] to check the latency and throughput to get a random number using generateSeed() and nextBytes() with darn in place.
The 'P9TRNG' provider is basically a copy of 'NativePRNG' since it's necessary a software fallback in case darn instruction fails to return a valid random number after ten attempts (although it's very rare condition). On the other hand 'P9TRNG' uses the darn intrinsic when it's available.
The maximum theoretical throughput on the machine I'm testing it (a POWER9 witherspoon) is 128Mbps and there is one RNG per socket, so only one RNG per CPU. With a simple C code it's possible to get very close to that value (please see C code [3] for code details and log [4] for the expected outputs). Unrolling the tight loop does not help and causes a performance degradation.
On Hotspot, for Interpreter and C1 the throughput is ~3x higher than the version that does not use darn instruction (using micro benches [1, 2]):
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 10 3.8759432E7 ns 2.113550 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes SHA1PRNG 1024 100000 2.65902244E10 ns 30.808313 Mbps
gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100 7.1741008E7 ns 11.418853 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes NativePRNG 1024 100000 2.74547937E10 ns 29.838140 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:TieredStopAtLevel=3 next_bytes NativePRNG 1024 100000 5.5632339E10 ns 14.725248 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -Xcomp -XX:-TieredCompilation next_bytes NativePRNG 1024 100000 2.78629519E10 ns 29.401051 Mbps
[With darn disabled: performance like NativePRNG] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100 7.0272888E7 ns 11.657412 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java next_bytes P9TRNG 1024 100000 2.75566244E10 ns 29.727880 Mbps ...
[With darn enabled] gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100 8305029.0 ns 98.639030 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN next_bytes P9TRNG 1024 100000 6.442112E9 ns 127.163261 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:TieredStopAtLevel=3 next_bytes P9TRNG 1024 100000 1.57303337E10 ns 52.077728 Mbps gromero@gromero1:~/git/darn$ /tmp/jdk11/jvm/openjdk-11-internal/bin/java -XX:+UnlockExperimentalVMOptions -XX:+UseDARN -Xcomp -XX:-TieredCompilation next_bytes P9TRNG 1024 100000 6.46914E9 ns 126.631973 Mbps
For C2 compiler using darn is better until it reaches ~128Mbps (maximum theoretical throughput), but on the other hand it never blocks, so, for instance, generateSeed() which normally uses /dev/random (blocking) is not affected by a lack of entropy in Linux entropy pool.
@Vladimir, Volker mentioned that you already experimented with rand on Intel. Do you know if creating a new JCA provider as I did is a reasonable approach to exploit darn on POWER9? Also, in my implementation I had to create a VM intrinsic (_darn) in vmSymbols that is, let's say, arch dependent, and that seems to be the only case so far, but on the other hand a new JCA provider (with methods to be intrinsified) is necessary (I don't see another way to intrinsify the methods in NativePRNG/SHA1PRNG providers since I need a software fallback to darn). Do you have any recommendation about it?
The patchset rebased on top of jdk11 (http://hg.openjdk.java.net/jdk/hs) is:
http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/0_PPC64_Add_JCA_provider_... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/1_PPC64_Assembler_add_sup... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/2_PPC64_Interpreter_add_t... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/3_PPC64_C2_Compiler_add_n... http://cr.openjdk.java.net/~gromero/POWER9/darn/v1/4_PPC64_C1_Compiler_add_i...
I intend to contribute that change as an experimental feature.
Thank you.
Best regards, Gustavo
[1] https://github.com/gromero/darn/blob/master/next_bytes.java [2] https://github.com/gromero/darn/blob/master/generate_seed.java [3] https://github.com/gromero/darn/blob/master/C/darn.c [4] https://github.com/gromero/darn/blob/master/C/darn.log
participants (3)
-
Doerr, Martin
-
Gustavo Romero
-
Volker Simonis