RFR (S) 8136500: Integer/Long getChars and stringSize should be more idiomatic
Aleksey Shipilev
aleksey.shipilev at oracle.com
Sat Nov 21 22:41:56 UTC 2015
Hi Ivan!
Update:
http://cr.openjdk.java.net/~shade/8136500/webrev.02/
Performance data updated (shaves off 0.5 ns at most):
http://cr.openjdk.java.net/~shade/8136500/notes.txt
On 11/21/2015 11:38 PM, Ivan Gerasimov wrote:
> Just by a coincidence, I was experimenting with the same piece of code :)
Not a surprise: that code screams for cleaning up. I know others tried
to optimize at least stringSize before.
> 1)
> When benchmarking different implementation of Integer.stringSize() the
> fastest (on my machine) variants were with explicitly unrolled
> switch-like constructions and hard-coded binary search with guaranteed
> no-more-than-four comparisons:
> They appeared to be twice as fast as the original implementation.
> However, the simple loop that you have is surely cleaner.
>
> static int stringSize_switch(int x) {
> if (x < 10) return 1;
> if (x < 100) return 2;
> if (x < 1000) return 3;
> if (x < 10000) return 4;
> if (x < 100000) return 5;
> if (x < 1000000) return 6;
> if (x < 10000000) return 7;
> if (x < 100000000) return 8;
> if (x < 1000000000) return 9;
> return 10;
> }
Yes. The thing is, if you look into the code generated by C2, then you
will see the similar shape, because the loop is aggressively unrolled
and constants are computed. This seems like a no-brainer for any decent
compiler. I said that in the stringSize docs as the reminder.
> static int stringSize_binchop(int x) {
Binary search is an obvious idea too, but again the trouble is that many
integers are small, and you are much better off linear-searching from
small to large numbers. This is further exacerbated by the fact
byte/short values are normally coerced to integers, and go through
Integer.toString. I tried to capture that in stringSize docs as well.
>
> 2)
> The lookup tables DigitTens and DigitOnes are always used together.
> Packing them in an one table can save us one array access for the price
> of little arithmetic.
>
> static final int[] TwoDigits = {
> ('0'<<16)+'0', ('0'<<16)+'1', ('0'<<16)+'2', ('0'<<16)+'3',
> ...
> int twoDig = TwoDigits[r];
> buf[--charPos] = (byte)twoDig;
> buf[--charPos] = (byte)(twoDig >> 16);
> ...
I agree, that's a nifty idea. As the improvement, you can do char[]
array and access as (byte)v and (byte)(v >> 8). However, both variants
are not improving against my current patch...
# JDK 9 Patched (webrev.01)
IntegerToString.test 0 avgt 9 7.008 ± 0.252 ns/op
IntegerToString.test 1 avgt 9 16.352 ± 0.206 ns/op
IntegerToString.test 2 avgt 9 18.064 ± 0.598 ns/op
IntegerToString.test 3 avgt 9 18.626 ± 0.127 ns/op
IntegerToString.test 4 avgt 9 20.591 ± 0.295 ns/op
IntegerToString.test 5 avgt 9 20.796 ± 0.106 ns/op
IntegerToString.test 6 avgt 9 22.585 ± 0.083 ns/op
IntegerToString.test 7 avgt 9 23.247 ± 1.231 ns/op
IntegerToString.test 8 avgt 9 24.090 ± 0.289 ns/op
# JDK 9 Patched (webrev.01 + lookup tables combining)
Benchmark (mag) Mode Cnt Score Error Units
IntegerToString.test 0 avgt 9 6.846 ± 0.119 ns/op
IntegerToString.test 1 avgt 9 16.315 ± 0.157 ns/op
IntegerToString.test 2 avgt 9 17.967 ± 0.127 ns/op
IntegerToString.test 3 avgt 9 18.429 ± 0.226 ns/op
IntegerToString.test 4 avgt 9 20.653 ± 0.419 ns/op
IntegerToString.test 5 avgt 9 21.104 ± 0.049 ns/op
IntegerToString.test 6 avgt 9 22.728 ± 0.292 ns/op
IntegerToString.test 7 avgt 9 24.014 ± 0.153 ns/op
IntegerToString.test 8 avgt 9 25.626 ± 0.340 ns/op
Therefore, I'd prefer to keep the cleaner (i.e. split lookup tables)
alternative.
But that reminds me we can omit duplicating the lookup tables for
different coder, and use only a byte[] pair. This caters for the most
frequent case of Latin1-encoded Strings.
> 3)
> In Integer.toString(int i)
>
> 463 if (i == Integer.MIN_VALUE)
> 464 return "-2147483648";
> 465 int size = (i < 0) ? stringSize(-i) + 1 : stringSize(i);
>
> It might be reasonable to reorganize the code a bit:
>
> int size = 0, i1 = i;
> if (i < 0) {
> if (i == Integer.MIN_VALUE)
> return "-2147483648";
> size = 1;
> i1 = -i;
> }
> size += stringSize(i1);
>
> First, we would skip one comparison of positive i with MIN_INT.
Yes, I like this one. Grabbed a modified version of this approach.
> Second, I think, the code of stringSize() might be better inlined, if it
> was used only once.
No-no! That's a helpful little fella. Not-yet-integrated Indify String
Concat uses Integer/Long.stringSize.
Thanks,
-Aleksey
More information about the core-libs-dev
mailing list