Optimized version of CopiesList.hashCode()
Ivan Gerasimov
ivan.gerasimov at oracle.com
Fri Nov 30 06:58:00 UTC 2018
Hi Zheka and Tagir!
On 11/29/18 10:37 PM, Zheka Kozlov wrote:
> Thanks, Tagir!
>
> I was also thinking of how to calculate hashCode quickly but my direction
> was wrong. I thought that we can use the formula of the sum of a geometric
> progression: Sum(p^k, k = 0..n) = (1-p^n)/(1-p). Unfortunately, this
> involves division which doesn't work with the overflow of integers. I
> didn't know the trick p^(2n) = (p^n)^2.
>
> Your formulas and implementation look correct.
>
> I would probably rewrite the loop to make it a bit simpler:
> for (int mask = n << (Integer.numberOfLeadingZeros(n) + 1); mask != 0; mask
> <<= 1) {
If n == 1, then it would become `mask = n << 32`, and the loop would run
32 times.
The check
if (((n << i) & 0x8000_0000) != 0) {
might be written as
if ((n << i) < 0 ) {
to save one bit-wise operation and avoid using extra constant.
With kind regards,
Ivan
> sum = sum * (pow + 1);
> pow *= pow;
> if ((mask & 0x8000_0000) != 0) {
> pow *= 31;
> sum = sum * 31 + 1;
> }
> }
>
> But that's just a matter of style.
>
> Great job!
>
> пт, 30 нояб. 2018 г. в 11:02, Tagir Valeev <amaembo at gmail.com>:
>
>> Hello!
>>
>> If you are doing it fast, why not doing it really fast? If you
>> deparenthesize and regroup terms, you'll got
>> h(e, n) = p ^ n + e * f(n)
>> Where h(e, n) is the hashCode of n elements with hashCode of single
>> element = e; p = 31 and
>> f(n) = Sum(p^k, k = 0..n-1)
>>
>> Using simple algebraic rules, you'll get:
>> p ^ (2n) = (p^n)^2
>> f(2n) = f(n) * (p^n + 1)
>> p ^ (n+1) = (p^n)*p
>> f(n+1) = f(n) * p + 1
>>
>> Thus the algorithm may look as follows:
>>
>> public int hashCode() {
>> int pow = 1; // -> 31^n
>> int sum = 0; // -> Sum(31^k, k = 0..n-1)
>> for (int i = Integer.numberOfLeadingZeros(n); i < Integer.SIZE; i++) {
>> sum = sum * (pow + 1);
>> pow *= pow;
>> if (((n << i) & 0x8000_0000) != 0) {
>> pow *= 31;
>> sum = sum * 31 + 1;
>> }
>> }
>> return pow + sum * (element == null ? 0 : element.hashCode());
>> }
>>
>> It seems reasonable to peel off the n = 0 case, which helps to reduce
>> number of multiplications for other cases (also for n = 0 we don't
>> need to calculate element hashCode at all):
>>
>> public int hashCode() {
>> if (n == 0) return 1;
>> int pow = 31; // -> 31^n
>> int sum = 1; // -> Sum(31^k, k = 0..n-1)
>> for (int i = Integer.numberOfLeadingZeros(n) + 1; i < Integer.SIZE;
>> i++) {
>> sum = sum * (pow + 1);
>> pow *= pow;
>> if (((n << i) & 0x8000_0000) != 0) {
>> pow *= 31;
>> sum = sum * 31 + 1;
>> }
>> }
>> return pow + sum * (element == null ? 0 : element.hashCode());
>> }
>>
>> Assuming that element hashCode is simple (I used String "foo" as an
>> element), I got the following results for different collection sizes:
>>
>> Benchmark (size) Score Error Units
>> hashCodeFast 0 2,299 ± 0,017 ns/op
>> hashCodeFast 1 2,731 ± 0,021 ns/op
>> hashCodeFast 2 4,073 ± 0,077 ns/op
>> hashCodeFast 3 4,315 ± 0,032 ns/op
>> hashCodeFast 5 5,470 ± 0,074 ns/op
>> hashCodeFast 10 6,904 ± 0,060 ns/op
>> hashCodeFast 30 9,102 ± 0,173 ns/op
>> hashCodeFast 100 10,093 ± 0,069 ns/op
>> hashCodeFast 1000 14,129 ± 0,074 ns/op
>> hashCodeFast 10000 17,028 ± 0,249 ns/op
>> hashCodeFast 100000 20,795 ± 0,194 ns/op
>> hashCodeFast 1000000 23,622 ± 0,264 ns/op
>>
>> Compared to Zheka's implementation:
>>
>> Benchmark (size) Score Error Units
>> hashCodeZheka 0 2,584 ± 0,024 ns/op
>> hashCodeZheka 1 2,868 ± 0,022 ns/op
>> hashCodeZheka 2 3,730 ± 0,030 ns/op
>> hashCodeZheka 3 4,323 ± 0,027 ns/op
>> hashCodeZheka 5 5,285 ± 0,037 ns/op
>> hashCodeZheka 10 8,254 ± 0,057 ns/op
>> hashCodeZheka 30 24,793 ± 0,218 ns/op
>> hashCodeZheka 100 89,017 ± 0,764 ns/op
>> hashCodeZheka 1000 923,792 ± 28,194 ns/op
>> hashCodeZheka 10000 9157,411 ± 98,902 ns/op
>> hashCodeZheka 100000 91705,599 ± 689,299 ns/op
>> hashCodeZheka 1000000 919723,545 ± 13092,935 ns/op
>>
>> So results are quite similar for one-digit counts, but we start
>> winning from n = 10 and after that logarithmic algorithm really rocks.
>>
>> I can file an issue and create a webrev, but I still need a sponsor
>> and review for such change. Martin, can you help with this?
>>
>> With best regards,
>> Tagir Valeev.
>> On Tue, Nov 27, 2018 at 5:49 PM Martin Buchholz <martinrb at google.com>
>> wrote:
>>> I agree!
>>>
>>> (but don't have time ...)
>>>
>>> On Sun, Nov 25, 2018 at 9:01 PM, Zheka Kozlov <orionllmain at gmail.com>
>> wrote:
>>>> Currently, CopiesList.hashCode() is inherited from AbstractList which:
>>>>
>>>> - calls hashCode() for each element,
>>>> - creates a new Iterator every time.
>>>>
>>>> However, for Collections.nCopies():
>>>>
>>>> - All elements are the same. So hashCode() can be called only once.
>>>> - An Iterator is unnecessary.
>>>>
>>>> So, I propose overridding hashCode() implementation for CopiesList:
>>>>
>>>> @Override
>>>> public int hashCode() {
>>>> int hashCode = 1;
>>>> final int elementHashCode = (element == null) ? 0 :
>> element.hashCode();
>>>> for (int i = 0; i < n; i++) {
>>>> hashCode = 31*hashCode + elementHashCode;
>>>> }
>>>> return hashCode;
>>>> }
>>>>
>>>> Benchmark:
>>>> List<List<String>> list = Collections.nCopies(10_000, new
>>>> ArrayList<>(Collections.nCopies(1_000_000, "a")));
>>>> long nano = System.nanoTime();
>>>> System.out.println(list.hashCode());
>>>> System.out.println((System.nanoTime() - nano) / 1_000_000);
>>>>
>>>> Result:
>>>> Old version - ~12 seconds.
>>>> New version - ~10 milliseconds.
>>>>
--
With kind regards,
Ivan Gerasimov
More information about the core-libs-dev
mailing list