8020292: j.u.SplittableRandom

Wed Aug 21 09:43:37 UTC 2013

On Aug 20, 2013, at 11:50 PM, Mike Duigou <mike.duigou at oracle.com> wrote:
>>> - Additional seed material might be desirable for "seeder". I worry about how many of the actual bits are random. If no local host address is available the seed might be fairly predictable. In the murmur3 implementation I included also System.identityHashCode(String.class), System.identityHashCode(System.class), System.identityHashCode(Thread.currentThread()), Thread.currentThread().getId() and Runtime.getRuntime().freeMemory(). Mixing multiply with XOR operations also helps to spread the random bits out. Perhaps just call mix64 on each component and XOR against previous?
>>> 
>> 
>> Are you concerned that the values passed to mix64 may be of low entropy? i.e. that small differences in input (such as counting numbers) do not result in large enough differences in the output. I suppose it might be possible to switch from MurmurHash3 to using Stafford's "Mix13" constants or a combination to avoid any potential correlations. However, given that there is further mixing going on for the values and gamma generation perhaps it is not so important?
> 
> It was more that "seeder" is initialized with a potentially predictable value. currentTimeMillis probably has only a couple of bits of entropy (and on a webserver it is probably observable), maybe a dozen bits for nanoTime. Those random bits are likely in the same range of bits (around bits 5-15). If getLocalHostAddress() returns a loopback address then we've seeded the shared state with only a few bits of randomness. (Also, since the local host address is directly visible it could be reversed out by an attacker). The suggestion of using mix64 was mostly to take best advantage of the entropy available. 
> 
> Plus, more bits would be better. Imagine an attack where the attacker looked for newly launched instances of a clustered webserver. It would not be too hard to reduce the search space for the initial value of it's SplittableRandom seeder to less than 2^16 possibilities. Whether this is useful would depend upon the application being run on that webserver but it seems insufficient to me though.
> 

OK. So it's predictability of the initial seed that is a concern, not the fact that different instances (created within a small period of time) don't produce seeds that are sufficiently different.

IMHO using a non-secure PRNG in combination with data (e.g. as a seed for hashing keys in maps) exposed to non-trusted parties is probably the wrong thing to be doing.

Does using System.identityHashCode on common classes really help, given that those classes are likely to be initialized in a known order? 

Perhaps it is really System.identityHashCode(Thread.currentThread()), Thread.currentThread().getId() and Runtime.getRuntime().freeMemory() that are more likely to offer some variance that is harder to predict, and even those are likely to be predictable for multiple runs of the same program.

How about this:

    /**
     * The seed generator for default constructors.
     */
    private static final AtomicLong seeder = getSeeder();

    private static AtomicLong getSeeder() {
        long seed = mix64((((long) hashedHostAddress()) << 32) ^
                          System.currentTimeMillis()) ^
                    mix64(System.nanoTime()) ^
                    mix64(Runtime.getRuntime().freeMemory()) ^
                    mix64(Thread.currentThread().getId()) ^
                    mix64((((long) System.identityHashCode(SplittableRandom.class)) << 32) |
                          System.identityHashCode(Thread.currentThread()));
        return new AtomicLong(seed);
    }

Although i must admit it feels a little over the top.

I did ponder whether it is worthwhile defining a boolean system property that if declared and true then SecureRandom is used.

Paul.