Strange speed dependency; help needed for benchmark, please
Ulf Zibis
Ulf.Zibis at gmx.de
Thu Feb 5 03:30:32 PST 2009
Am 05.02.2009 08:55, Christian Thalinger schrieb:
> On Wed, 2009-02-04 at 20:50 +0100, Ulf Zibis wrote:
>
>> Hi,
>>
>> I experience very much different mutually contradictory times running my
>> UTF-8 benchmark
>> https://java-nio-charset-enhanced.dev.java.net/source/browse/java-nio-charset-enhanced/trunk/test/sun/nio/cs/UTF_8Benchmark.java?diff_format=s&rev=620&view=markup
>> for only 1 code unit pattern _against_ running for all code unit patterns:
>>
>> selected pattern 4
>> time for sun.nio.cs.UTF_8_60$Decoder: 2851 ms
>> time for sun.nio.cs.UTF_8_70$Decoder: 2501 ms
>> time for sun.nio.cs.UTF_8_last$Decoder: 2310 ms
>> time for sun.nio.cs.UTF_8_new$Decoder: 2048 ms
>>
>
> This is solaris-amd64:
>
> $ gamma -Xbootclasspath/a:. sun/nio/cs/UTF_8Benchmark
> time for warm up 1: 2455 ms
> time for warm up 2: 1150 ms
> time for warm up 3: 1151 ms
> time for warm up 4: 1180 ms
> time for sun.nio.cs.UTF_8_60$Decoder: 1553 ms
> time for sun.nio.cs.UTF_8_70$Decoder: 836 ms
> time for sun.nio.cs.UTF_8_last$Decoder: 746 ms
> time for sun.nio.cs.UTF_8_new$Decoder: 943 ms
> last warm up ./. test loops: 1.1573023
>
>
>> selected all patterns (SRC_BUF = -1)
>> time for sun.nio.cs.UTF_8_60$Decoder: 4951 4925 5351 4531 5269 4686 ms
>> time for sun.nio.cs.UTF_8_70$Decoder: 476 3066 4815 3921 4907 3369 ms
>> time for sun.nio.cs.UTF_8_last$Decoder: 560 3237 4539 4575 4772 3334 ms
>> time for sun.nio.cs.UTF_8_new$Decoder: 531 3642 4741 4462 5497 3742 ms
>>
>
> $ gamma -Xbootclasspath/a:. sun/nio/cs/UTF_8Benchmark
> time for warm up 1: 3942 ms
> time for warm up 2: 2394 ms
> time for warm up 3: 2409 ms
> time for warm up 4: 2388 ms
> time for sun.nio.cs.UTF_8_60$Decoder: 859 854 2134 1694 2049 1044 ms
> time for sun.nio.cs.UTF_8_70$Decoder: 290 781 1933 1448 2010 900 ms
> time for sun.nio.cs.UTF_8_last$Decoder: 414 681 1462 1452 1524 765 ms
> time for sun.nio.cs.UTF_8_new$Decoder: 396 686 2291 1500 1760 876 ms
> last warm up ./. test loops: 1.9226139
>
>
>> Also I experience significant differences, when I change the
>> chronological order of the tested decoders, see line 29... :
>> int dec = 0; // change process order of decoders for different
>> results:
>> decoders[dec++] = new UTF_8_60().newDecoder();
>> decoders[dec++] = new UTF_8_70().newDecoder();
>> decoders[dec++] = new UTF_8_last().newDecoder();
>> decoders[dec++] = new UTF_8_new().newDecoder();
>>
>
> $ gamma -Xbootclasspath/a:. sun/nio/cs/UTF_8Benchmark
> time for warm up 1: 2569 ms
> time for warm up 2: 1140 ms
> time for warm up 3: 1143 ms
> time for warm up 4: 1176 ms
> time for sun.nio.cs.UTF_8_new$Decoder: 745 ms
> time for sun.nio.cs.UTF_8_60$Decoder: 1536 ms
> time for sun.nio.cs.UTF_8_last$Decoder: 1001 ms
> time for sun.nio.cs.UTF_8_70$Decoder: 747 ms
> last warm up ./. test loops: 1.166911
>
> $ gamma -Xbootclasspath/a:. sun/nio/cs/UTF_8Benchmark
> time for warm up 1: 4145 ms
> time for warm up 2: 2598 ms
> time for warm up 3: 2606 ms
> time for warm up 4: 2651 ms
> time for sun.nio.cs.UTF_8_new$Decoder: 388 1822 1210 1368 912 1775 ms
> time for sun.nio.cs.UTF_8_60$Decoder: 738 733 2558 1988 2203 1035 ms
> time for sun.nio.cs.UTF_8_last$Decoder: 421 685 2154 1967 1731 844 ms
> time for sun.nio.cs.UTF_8_70$Decoder: 289 781 2364 1444 2010 947 ms
> last warm up ./. test loops: 1.9654278
>
> Hope that helps.
>
> -- Christian
>
>
Yes, that helps.
Very much thanks, Christian.
First I can see, that your machine also is becoming slower, if running
all 6 patterns (compare warmup times), but I wonder, that test loops
become 1.9 faster later.
Cold you set WARMUP_LOOPS = 16 once, to see, if there is a 2nd
acceleration later from HotSpot compiler?
There are 3 factors, that have significant influence on the benchmark
relative times:
1.) amount of different source patterns in test:
e.g.: in case of only testing pattern 4, UTF_8_70$Decoder is twice
faster then UTF_8_60$Decoder, but not in case of testing all 6 patterns:
836 ./. 1553 ms vs. 2010 ./. 2049 ms
2.) chronological order of the tested decoders:
e.g.: if UTF_8_new$Decoder is in last position, it's much slower
than UTF_8_last$Decoder, but vice versa, if in 1st position:
943 ./. 746 ms vs. 745 ./. 1001 ms
3.) type of CPU, machine and OS:
e.g.: ASCII only loop is 10 times faster on [Intel Pentium M,
Centrino, Windows], but only 2 on [amd64, ?, Solaris],
other benchmark times, e.g. on pattern 4, differ up to factor 2.
... so it is very difficult to check out, which of my 4 alternatives (+
XOR ./. ADD in uc calculation; + order of uc calculation ./. overflow
check) in UTF_8_new$Decoder is the best for real world case, because
they also behave completely different against the 3 factors mentioned above.
... plus not to forget, that in real world case a single byte sequence
will only be decoded once, so it won't reside in L1 or L2 CPU cache, and
additionally it's length would be much shorter (1..50 in most cases,
entire texts should be rare against single words and short phrases).
Do you have any idea, how to come closer to best choice ???
Thanks in advance,
-Ulf
More information about the hotspot-compiler-dev
mailing list