jvm handling of long indexed for loops
Vladimir Ivanov
vladimir.x.ivanov at oracle.com
Fri Jun 20 12:20:15 UTC 2014
On 6/19/14 9:20 PM, Alexander Weld wrote:
> Okay, that makes sense. The PrintAssembly flag is quite useful.
You can also try -XX:+TraceLoopOpts (non-product flag) for that
particular case.
> Are there plans for optimizing for loops that have a long counter in the
> next releases of hotspot? I guess there is no flag to force the jvm to
> unroll a long-loop.. Is there anything else I can do now, other than
> unrolling the loop by hand? (what I want to avoid)
No, you can't. int trip counter is a prerequisite for Hotspot to
consider a loop as counted.
It's quite easy to overcome this limitation in the code, but it should
be well-weighed and heavily tested before it gets into the product. We
do a bunch of optimizations for counted loops and some of them don't fit
for counted loops with long trip counter (e.g. safepoint check removal
from counted loops).
I haven't found any RFEs to enhance optimizations for counted loops with
long trip count. Loops with long trip counters are rare.
IMO it would be nice to turn on loop optimisations for such loops and
measure how much benefit does it have on larger benchmarks.
> And for my second question: Do i understand it correctly, that the OSR
> compilation performs better than the "normal" compilation for the
> long-loop?
It may be the case here, but it usually doesn't outperform normal
compilations. The goal is to get interim remedy and switch from
interpreter to compiled code and not to archieve peak performance until
normally compiled method is used. Profile data is usually immature and
scarce when OSR occurs.
Best regards,
Vladimir Ivanov
>
> On 6/18/2014 5:15 PM, Vitaly Davidovich wrote:
>>
>> Yes, hotspot has a bunch of loop optimizations that only kick in when
>> the counter is an int. My hunch is that, given your code, the
>> difference is that the int loop is unrolled and the long one isn't.
>> Checking the generated assembly should deny/confirm that, and
>> generally shed light on the compiled differences.
>>
>> As an aside, System.nanoTime is more appropriate for timing execution;
>> prefer that over System.currentTimeMillis.
>>
>> Sent from my phone
>>
>> On Jun 18, 2014 7:52 PM, "Alexander Weld" <alexander.weld at oracle.com
>> <mailto:alexander.weld at oracle.com>> wrote:
>>
>> Hi,
>>
>> I have some questions on how the JVM handles long values,
>> especially as
>> index in for loops.
>>
>> I have the following test class, which tracks the running time of two
>> versions of a simple for loop, one using an int-value as index and one
>> using a long-value as index:
>>
>> public class Test {
>>
>> public static long testInt(int maxIter) {
>> long x = 0;
>>
>> for (int i = 0; i < maxIter; i++) {
>> x += i;
>> }
>>
>> return x;
>> }
>> public static long testLong(long maxIter) {
>> long x = 0;
>>
>> for (long l = 0; l < maxIter; l++) {
>> x += l;
>> }
>>
>> return x;
>> }
>>
>> public static void main(String[] args) {
>>
>> final int MAX_ITER_INT = Integer.MAX_VALUE;
>> final long MAX_ITER_LONG = (long) MAX_ITER_INT;
>>
>> long t1, t2;
>>
>> for (int i = 0; i < 10; i++) {
>> long x = 0;
>>
>> t1 = System.currentTimeMillis();
>> x = testInt(MAX_ITER_INT);
>> t2 = System.currentTimeMillis();
>>
>> System.out.println("Iteration " + i + ": x = " + x + " time(int
>> loop) = " + (t2 - t1));
>> }
>>
>> for (int i = 0; i < 10; i++) {
>> long x = 0;
>>
>> t1 = System.currentTimeMillis();
>> x = testLong(MAX_ITER_LONG);
>> t2 = System.currentTimeMillis();
>>
>> System.out.println("Iteration " + i + ": x = " + x + " time(long
>> loop) = " + (t2 - t1));
>> }
>>
>> }
>>
>> }
>>
>> When I run this on our cluster (*) I get the following results:
>>
>> $ java Test
>> Iteration 0: x = 2305843005992468481 time(int loop) = 945
>> Iteration 1: x = 2305843005992468481 time(int loop) = 937
>> Iteration 2: x = 2305843005992468481 time(int loop) = 937
>> Iteration 3: x = 2305843005992468481 time(int loop) = 938
>> Iteration 4: x = 2305843005992468481 time(int loop) = 938
>> Iteration 5: x = 2305843005992468481 time(int loop) = 937
>> Iteration 6: x = 2305843005992468481 time(int loop) = 937
>> Iteration 7: x = 2305843005992468481 time(int loop) = 938
>> Iteration 8: x = 2305843005992468481 time(int loop) = 938
>> Iteration 9: x = 2305843005992468481 time(int loop) = 937
>> Iteration 0: x = 2305843005992468481 time(long loop) = 1510
>> Iteration 1: x = 2305843005992468481 time(long loop) = 1509
>> Iteration 2: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 3: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 4: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 5: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 6: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 7: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 8: x = 2305843005992468481 time(long loop) = 1799
>> Iteration 9: x = 2305843005992468481 time(long loop) = 1799
>>
>> My questions are:
>>
>> (1) Why is the long-version slower compared to the int-version?
>> Is it
>> the int to long conversion? Or are there some optimizations, that are
>> only applied to int loops?
>>
>> (2) What is the reason for the long-version getting slower after
>> "Iteration 1"?
>>
>> Thanks,
>> Alex
>>
>> (*) System information:
>>
>> Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>
>> $ lsb_release -d
>> Description: Oracle Linux Server release 6.5
>>
>> $ java -version
>> java version "1.7.0_55"
>> Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
>> Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
>>
>> (Let me know if you need more information)
>>
>
More information about the hotspot-compiler-dev
mailing list