POWER9: Is there a way to improve the random number generation on PPC64?

Wed Jan 31 05:46:45 UTC 2018

Hello Volker,

I finished a v1 random implementation for Interpreter and C2 Compiler.

However I'm struggling a bit on C1 implementation...

There is probably something wrong with my new LIR node for random.

At runtime C1 Linear Scan hits the following assert():

.../hs/src/hotspot/share/c1/c1_LinearScan.cpp:855)da, pid=13334, tid=13382
assert(false) failed: live_in set of first block must be empty

Error: live_in set of first block must be empty (when this fails, virtual registers are used before they are defined)
affected registers:
262
* vreg 262 (HIR instruction l68)
  used in block B3

When I inspect Block 3 it shows as:

B3 [24, 47] preds: B2 sux: B1
__id_Instruction___________________________________________
 254 label [label:0x0000712c20027f80]
 256 null_check [R252|L]   [bci:25]
 258 move [R252|L] [R261|L]
 260 profile_call main.seed_darn @ 25 [R259|L] [R261|L] [R260|J]
 262 random [R262|J] <==========================
 264 null_check [R253|L]   [bci:32]
 266 move [R253|L] [R265|L]
 268 profile_call main.seed_darn @ 32 [R263|L] [R265|L] [R264|J] 40
 270 move [int:0|I] [R4|I]
 272 move [R262|J] [R5R5|J]
 274 move [R253|L] [R3|L]
 276 icvirtual call: [addr: 0x0000000000000000] [recv: [R3|L]] [result: [R3|L]] [bci:32]
 278 move [R3|L] [R266|L]
 280 move [obj:0x0000712bec000e40|L] [R267|L]
 282 move [Base:[R26797|L] Disp: 116|L] [R268|L]
 284 null_check [R268|L]   [bci:41]
 286 move [R268|L] [R271|L]
 288 profile_call main.seed_darn @ 41 [R269|L] [R271|L] [R270|J]
 290 move [obj:0x0000712bec000e48|L] [R4|L]
 292 move [R268|L] [R3|L]
 294 optvirtual call: [addr: 0x0000000000000000] [recv: [R3e0|L]] [bci:41]
 296 move [R254|I] [R274|I]
 298 move [R253|L] [R273|L]
 300 move [R252|L] [R272|L]
 302 move [int:0|I] [R275|I]
 304 branch [AL] [B1]


I mapped the intrinsic to do_Random() (please find full diff here [1]):

--- a/src/hotspot/share/c1/c1_LIRGenerator.cpp  Tue Jan 23 10:52:33 2018 -0600
+++ b/src/hotspot/share/c1/c1_LIRGenerator.cpp  Tue Jan 30 23:21:24 2018 -0600
@@ -3215,6 +3215,8 @@
   case vmIntrinsics::_fmaD:           do_FmaIntrinsic(x); break;
   case vmIntrinsics::_fmaF:           do_FmaIntrinsic(x); break;

+  case vmIntrinsics::_darn:           do_Random(x); break;
+

--- a/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp       Tue Jan 23 10:52:33 2018 -0600
+++ b/src/hotspot/cpu/ppc/c1_LIRGenerator_ppc.cpp       Tue Jan 30 23:21:24 2018 -0600
@@ -1531,6 +1531,12 @@
   }
 }

+void LIRGenerator::do_Random(Intrinsic* x) {
+
+  LIR_Opr result = rlock_result(x);
+  __ rng(result);
+}
+

...and I expected that all allocation for the vreg would be done by
rlock_result().

Besides that I assumed that lir_random is LIR_Op1 since there is not input and
1 output (result).

Have you ever encountered that error when implementing a new LIR instruction?


Best regards,
Gustavo

[1] http://cr.openjdk.java.net/~gromero/misc/darn_C1.diff	



On 12/05/2017 10:03 AM, Gustavo Romero wrote:
> Hi Volker,
> 
> On 05-12-2017 06:16, Volker Simonis wrote:
>>> I intend to implement the fallback now and run it against DayTrade7 bench, if
>>> you have any other idea on how to test it, please let me know.
>>>
>>
>> What is "DayTrade7 bench" ? I don't know it and a quick Google search
>> didn't returned anything useful.
> 
> I have never used it, but it was suggested to me that DayTrade7
> (https://github.com/WASdev/sample.daytrader7) with security enabled (GCM128,
> for instance) will spend more than half the type on crypto and will stress
> the seed generator. But I'm not sure why the "sample" on it. The README.md
> says "This sample contains the DayTrader 7 benchmark [...]", so I'm hoping
> it contains the complete benchmark...
> 
> 
>>>> Notice that in the real implementation you won't be able to add a
>>>> public method to SecureRandom.
>>>
>>> Yup, I'm aware of it. Initially I thought I could keep all the changes in
>>> arch-specific files but due to the need to fallback to a Java method if 'darn'
>>> intrinsic fails I understand that there is no way to not touch .java files. In
>>> that case your suggestion is to create an entire new provider by adding a new on
>>> to ./java.base/share/classes/com/sun/crypto/provider and listing it in
>>> java.security, for instance?
>>>
>>
>> Probably yes, but I'm not sure about it as well. I think once you have
>> a complete implementation you should start a new thread on the
>> security mailing list (and maybe CC hostspot-dev) to ask about the
>> expert's opinions. As Intel also has the similar 'randr' instruction
>> since quite some time it may be reasonable to create a special
>> provider which is intended to intrinsically use the native CPU
>> instructions if available and fall back to the default implementation
>> otherwise. I think Vladimir Kozlov from the HotSpot team has tried to
>> build something similar for 'randr' some time ago so I'm sure you'll
>> get some good comments and advices :)
> 
> OK. I'll complete the implementation adding the fallback and the JIT and
> start a new thread asking about it on the security ML. Looks like 'rdrand|randr'
> instruction is not exploited on Intel? Interesting... I'll CC Vladimir as well.
> 
> Thanks Volker!
> 
>>>
>>> Regards,
>>> Gustavo
>>>
>>> [0] https://github.com/gromero/darn/blob/eee8f0a480d7fd5cf6a307d3e7520e867d784ba3/patches/seed_current.java
>>> [1] https://github.com/gromero/darn/blob/0591eaf338664222c2cd26188d56fdb5a56883ea/patches/seed_darn.java
>>>
>>>> Regards,
>>>> Volker
>>>>
>>>>
>>>> On Fri, Dec 1, 2017 at 10:44 PM, Gustavo Romero
>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>> Hi Volker,
>>>>>
>>>>> On 29-11-2017 11:21, Volker Simonis wrote:
>>>>>> On Wed, Nov 29, 2017 at 2:04 PM, Gustavo Romero
>>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>>> Hi Volker,
>>>>>>>
>>>>>>> On 24-11-2017 20:04, Volker Simonis wrote:
>>>>>>>> in one of my talks [1,2] I have an example on how to intrinsify
>>>>>>>> Random.nextInt() in C2 by using the Intel 'rdrandl' instruction. But
>>>>>>>> please notice that this is just a "toy" example - it is not production
>>>>>>>> ready. In fact I think the right way would be to create a new
>>>>>>>> SecureRandom provider where you may implement "engineNextBytes" by
>>>>>>>> using  the new Power instruction (maybe by calling a native function).
>>>>>>>> This special implementation of "engineNextBytes" could then be
>>>>>>>> intrinsified as described in the talk.
>>>>>>>
>>>>>>> I've implemented a simple interpreter intrinsic for 'darn' for a given
>>>>>>> class/method provided by the user, similarly to what you did for
>>>>>>> Helloword.sayHello() in your example. Thanks! I'm now looking for the correct
>>>>>>> way to call back from the intrinsic a Java method to act as a fallback method,
>>>>>>> since ISA says [1]:
>>>>>>>
>>>>>>> When the error value is obtained [i.e. 'darn' did not return a random number],
>>>>>>> software is expected to repeat the operation. If a non-error value has not been
>>>>>>> obtained after several attempts, a software random number generation method
>>>>>>> should be used. The recommended number of attempts may be implementation
>>>>>>> specific. In the absence of other guidance, ten attempts should be adequate.
>>>>>>>
>>>>>>> and so I need to call back from the intrinsic, let's say, SecureRandom.netInt()
>>>>>>> non-intrinsified method after about 10 failures to get the random number so it
>>>>>>> can take over the task again. You did something like that here:
>>>>>>>
>>>>>>> https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak_HelloWorldIntrinsic.patch#L55
>>>>>>>
>>>>>>> but for fputs() from libc.
>>>>>>>
>>>>>>> Do you know if it's possible to call, for instance, a loaded method like
>>>>>>> SecureRandom.nextInt() from the instrinsic?
>>>>>>>
>>>>>>
>>>>>> I don't think that would be easy to do (if possible at all).
>>>>>>
>>>>>> The correct way to handle such situations would be to define a Java
>>>>>> method with the exact semantics of your 'darn' instruction. All the
>>>>>> other logic should be implemented in Java. So for example you would
>>>>>> implement SecureRandom.darn() and call it from engineNextBytes(). At
>>>>>> the call site of darn() you check the error value and dispatch to the
>>>>>> corresponding Java implementation if necessary.
>>>>>
>>>>> I've implemented a Java SecureRandom.darn() method [1]. I works as expected,
>>>>> i.e. it returns 8 bytes of fake random number (using [3] example). However, when
>>>>> I proceeded to intrinsify it [2, 0] as I did for the method provided by the user
>>>>> (similarly to your HelloWorld example and for a user provided darn() method as I
>>>>> mentioned previously) I hit the following check:
>>>>>
>>>>> Compiler intrinsic is defined for method [_darn: static SecureRandom.darn()[B], but the method is not available in class [java/security/SecureRandom]. Exiting.
>>>>>
>>>>> SecureRandom.darn() signature looks correct and I know that
>>>>> java/security/SecureRandom::darn() is present in core libs because before trying
>>>>> to intrinsify it worked ok (I've got the 8 bytes of fake random number - using
>>>>> darning.java, below in references) and also 'javap' shows it's in .class:
>>>>>
>>>>> gromero at gromero16:~/hg/jdk10/hs$ javap -c -s ./build/linux-ppc64le-normal-server-slowdebug/jdk/modules/java.base/java/security/SecureRandom.class | fgrep -i darn
>>>>>   public byte[] darn();
>>>>>
>>>>> I thought that no additional hack was necessary to get that intrinsic working as
>>>>> it's in core libs, hence nothing like this is needed:
>>>>>
>>>>> https://github.com/simonis/JBreak2016/blob/master/examples/hs_patches/JBreak2JavaZone.patch#L57-L59
>>>>>
>>>>> On the other hand if I add @HotSpotIntrinsicCandidate to SecureRandom.darn() I
>>>>> get:
>>>>>
>>>>> Method [java.security.SecureRandom.darn()[B] is annotated with @HotSpotIntrinsicCandidate, but no compiler intrinsic is defined for the method. Exiting.
>>>>>
>>>>> Any clue on what I'm missing? Is it correct to assume that since darn() now
>>>>> is in core libs no check is necessary?
>>>>>
>>>>> Thanks a lot!
>>>>>
>>>>> Regards,
>>>>> Gustavo
>>>>>
>>>>> The hs patches:
>>>>>
>>>>> [0] https://github.com/gromero/darn/blob/master/patches/0_darn_macroassembler.patch
>>>>> [1] https://github.com/gromero/darn/blob/master/patches/1_SecureRandom_darn_Java.patch
>>>>> [2] https://github.com/gromero/darn/blob/master/patches/2_SecureRandom_darn_intrinsic.patch
>>>>>
>>>>> and the test-case:
>>>>>
>>>>> [3] https://github.com/gromero/darn/blob/master/patches/darning.java
>>>>>
>>>>>>> Thanks!
>>>>>>>
>>>>>>> Regards,
>>>>>>> Gustavo
>>>>>>>
>>>>>>>> Also, before you start this, please contact the security mailing list
>>>>>>>> just to make sure you're not going into the wrong direction (I'm not a
>>>>>>>> security expert :)
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Volker
>>>>>>>>
>>>>>>>> [1] https://vimeo.com/182074382
>>>>>>>> [2] https://rawgit.com/simonis/JBreak2016/master/jbreak2016.xhtml#/
>>>>>>>>
>>>>>>>> On Fri, Nov 24, 2017 at 12:58 PM, Gustavo Romero
>>>>>>>> <gromero at linux.vnet.ibm.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> POWER9 processors introduced a new single instruction to generate a random
>>>>>>>>> number called 'darn' (Deliver A Random Number) [1, 2]. The random number
>>>>>>>>> generator behind this instruction is NIST SP800-90B and SP800-90C compliant and
>>>>>>>>> provides a minimum of 0.5 bits of entropy per bit. That instruction is as simple
>>>>>>>>> as "darn RT, L", where RT is general 64-bit purpose register and L is a 2-bit
>>>>>>>>> operand to select the random number format. One can call 'darn' many times to
>>>>>>>>> obtain a new random number each time.
>>>>>>>>>
>>>>>>>>> Initially I think it can help on the improving (throughput) of SecureRandom.generateSeed()
>>>>>>>>> method & friends from JCE (NativePRNG provider). If that holds, so it has to
>>>>>>>>> be done both for Interpreter and JIT.
>>>>>>>>>
>>>>>>>>> Currently generateSeed() from NativePRNG basically reads from /dev/random by
>>>>>>>>> default (which blocks from time to time) or /dev/urandom if instructed to do so.
>>>>>>>>> Could somebody please help me to figure out the appropriate place to exploit
>>>>>>>>> such a P9 instruction for interpreted mode, given that code for generateSeed()
>>>>>>>>> is pure Java and behind scenes just opens /dev/random file and reads from
>>>>>>>>> it? For instance, is it correct to exploit it on a C/C++ code and attach that
>>>>>>>>> by means of a JNI?
>>>>>>>>>
>>>>>>>>> Finally, for JITed mode, I think that a way to exploit such a feature would be
>>>>>>>>> by matching an specific sub-tree in Ideal Graph and from that emit a `darn`
>>>>>>>>> instruction, however I could not figure one sound sub-tree with known nodes
>>>>>>>>> (AddI, LoadN, Parm, etc) that could be matched for that purpose. How do porters
>>>>>>>>> usually proceed in this case?
>>>>>>>>>
>>>>>>>>> Any comments shedding some light on that is much appreciated.
>>>>>>>>>
>>>>>>>>> Thanks and best regards,
>>>>>>>>> Gustavo
>>>>>>>>>
>>>>>>>>> [1] https://www.docdroid.net/tWT7hjD/powerisa-v30.pdf, p. 79
>>>>>>>>> [2] https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>