ImmutableCollections.MapN optimisation ?
Playing with the value types, i've remarked that ImmutableCollections.MapN is missing an override for getOrDefault which making get() 2 times faster than getOrDefault on my test. The other problem is more subtle, get() uses probe() and probe() uses an infinite loop and not a counted loop, the result is that c2 generates lot of assembly codes for probe() as a consequence it doesn't inline probe() in get() making get() really slow compared to HashMap.get() which is fully inlined. There are several wys to fix that: - you can try to transform the while(true) loop in probe into 2 subsequents loops from idx to length + from 0 to idx. - you can inline the code of probe() into get() and even peel the first loop body evaluation as HashMap.get() does. - you can do both. - your idea here :) Also, i wonder if it's not better to make Map1 to also take cares of the zero case instead of making MapN to manage that case, because it's trading a size check with a null check and usually nullcheck can be removed by the VM. regards, Rémi
On Sat, Jun 30, 2018 at 3:54 AM, Remi Forax <forax@univ-mlv.fr> wrote:
The other problem is more subtle, get() uses probe() and probe() uses an infinite loop and not a counted loop, the result is that c2 generates lot of assembly codes for probe() as a consequence it doesn't inline probe() in get() making get() really slow compared to HashMap.get() which is fully inlined. There are several wys to fix that: - you can try to transform the while(true) loop in probe into 2 subsequents loops from idx to length + from 0 to idx.
In ArrayDeque we had success transforming loops into 2 nested loops, the inner one being hotspot friendly, and it seems like the same approach would work for probe(). E.g. look at ArrayDeque#contains
On Jun 30, 2018, at 8:39 AM, Martin Buchholz <martinrb@google.com> wrote:
On Sat, Jun 30, 2018 at 3:54 AM, Remi Forax <forax@univ-mlv.fr> wrote:
The other problem is more subtle, get() uses probe() and probe() uses an infinite loop and not a counted loop, the result is that c2 generates lot of assembly codes for probe() as a consequence it doesn't inline probe() in get() making get() really slow compared to HashMap.get() which is fully inlined. There are several wys to fix that: - you can try to transform the while(true) loop in probe into 2 subsequents loops from idx to length + from 0 to idx.
In ArrayDeque we had success transforming loops into 2 nested loops, the inner one being hotspot friendly, and it seems like the same approach would work for probe(). E.g. look at ArrayDeque#contains
That was my thought too. Best case is that the JIT unrolls both levels of loop. Test it though; YMMV. I like to make the outer loop iterate from false to true, as a style thing. When it unrolls you get two customized inner loops. – John
On Sat, Jun 30, 2018 at 9:05 AM, John Rose <john.r.rose@oracle.com> wrote:
On Jun 30, 2018, at 8:39 AM, Martin Buchholz <martinrb@google.com> wrote:
On Sat, Jun 30, 2018 at 3:54 AM, Remi Forax <forax@univ-mlv.fr> wrote:
The other problem is more subtle, get() uses probe() and probe() uses an infinite loop and not a counted loop, the result is that c2 generates
lot
of assembly codes for probe() as a consequence it doesn't inline probe() in get() making get() really slow compared to HashMap.get() which is fully inlined. There are several wys to fix that: - you can try to transform the while(true) loop in probe into 2 subsequents loops from idx to length + from 0 to idx.
In ArrayDeque we had success transforming loops into 2 nested loops, the inner one being hotspot friendly, and it seems like the same approach would work for probe(). E.g. look at ArrayDeque#contains
That was my thought too. Best case is that the JIT unrolls both levels of loop. Test it though; YMMV.
I like to make the outer loop iterate from false to true, as a style thing. When it unrolls you get two customized inner loops.
I'd be interested in seeing a model example of such an outer loop that iterates from false to true. In ArrayDeque#contains we do ad-hoc if (to == end) break; since the outer loop does either 1 or 2 iterations, never 0.
participants (3)
-
John Rose
-
Martin Buchholz
-
Remi Forax