jvm handling of long indexed for loops

Vladimir Ivanov vladimir.x.ivanov at oracle.com
Fri Jun 20 12:25:40 UTC 2014


Small correction.

>> And for my second question: Do i understand it correctly, that the OSR
>> compilation performs better than the "normal" compilation for the
>> long-loop?
> It may be the case here, but it usually doesn't outperform normal
> compilations. The goal is to get interim remedy and switch from
> interpreter to compiled code and not to archieve peak performance until
> normally compiled method is used. Profile data is usually immature and
> scarce when OSR occurs.

Read this as: "The goal is to get interim remedy and switch from 
interpreter to compiled code until normally compiled method is used. The 
goal is to get interim remedy and switch from interpreter to compiled 
code until normally compiled method is used. Peak performance isn't the 
goal in this case."

Best regards,
Vladimir Ivanov

> Best regards,
> Vladimir Ivanov
>
>>
>> On 6/18/2014 5:15 PM, Vitaly Davidovich wrote:
>>>
>>> Yes, hotspot has a bunch of loop optimizations that only kick in when
>>> the counter is an int.  My hunch is that, given your code, the
>>> difference is that the int loop is unrolled and the long one isn't.
>>> Checking the generated assembly should deny/confirm that, and
>>> generally shed light on the compiled differences.
>>>
>>> As an aside, System.nanoTime is more appropriate for timing execution;
>>> prefer that over System.currentTimeMillis.
>>>
>>> Sent from my phone
>>>
>>> On Jun 18, 2014 7:52 PM, "Alexander Weld" <alexander.weld at oracle.com
>>> <mailto:alexander.weld at oracle.com>> wrote:
>>>
>>>     Hi,
>>>
>>>     I have some questions on how the JVM handles long values,
>>>     especially as
>>>     index in for loops.
>>>
>>>     I have the following test class, which tracks the running time of
>>> two
>>>     versions of a simple for loop, one using an int-value as index
>>> and one
>>>     using a long-value as index:
>>>
>>>     public class Test {
>>>
>>>       public static long testInt(int maxIter) {
>>>         long x = 0;
>>>
>>>         for (int i = 0; i < maxIter; i++) {
>>>           x += i;
>>>         }
>>>
>>>         return x;
>>>       }
>>>       public static long testLong(long maxIter) {
>>>         long x = 0;
>>>
>>>         for (long l = 0; l < maxIter; l++) {
>>>           x += l;
>>>         }
>>>
>>>         return x;
>>>       }
>>>
>>>       public static void main(String[] args) {
>>>
>>>         final int MAX_ITER_INT = Integer.MAX_VALUE;
>>>         final long MAX_ITER_LONG = (long) MAX_ITER_INT;
>>>
>>>         long t1, t2;
>>>
>>>         for (int i = 0; i < 10; i++) {
>>>           long x = 0;
>>>
>>>           t1 = System.currentTimeMillis();
>>>           x = testInt(MAX_ITER_INT);
>>>           t2 = System.currentTimeMillis();
>>>
>>>           System.out.println("Iteration " + i + ": x = " + x + "
>>> time(int
>>>     loop) = " + (t2 - t1));
>>>         }
>>>
>>>         for (int i = 0; i < 10; i++) {
>>>           long x = 0;
>>>
>>>           t1 = System.currentTimeMillis();
>>>           x = testLong(MAX_ITER_LONG);
>>>           t2 = System.currentTimeMillis();
>>>
>>>           System.out.println("Iteration " + i + ": x = " + x + "
>>> time(long
>>>     loop) = " + (t2 - t1));
>>>         }
>>>
>>>       }
>>>
>>>     }
>>>
>>>     When I run this on our cluster (*) I get the following results:
>>>
>>>     $ java Test
>>>     Iteration 0: x = 2305843005992468481  time(int loop) = 945
>>>     Iteration 1: x = 2305843005992468481  time(int loop) = 937
>>>     Iteration 2: x = 2305843005992468481  time(int loop) = 937
>>>     Iteration 3: x = 2305843005992468481  time(int loop) = 938
>>>     Iteration 4: x = 2305843005992468481  time(int loop) = 938
>>>     Iteration 5: x = 2305843005992468481  time(int loop) = 937
>>>     Iteration 6: x = 2305843005992468481  time(int loop) = 937
>>>     Iteration 7: x = 2305843005992468481  time(int loop) = 938
>>>     Iteration 8: x = 2305843005992468481  time(int loop) = 938
>>>     Iteration 9: x = 2305843005992468481  time(int loop) = 937
>>>     Iteration 0: x = 2305843005992468481 time(long loop) = 1510
>>>     Iteration 1: x = 2305843005992468481 time(long loop) = 1509
>>>     Iteration 2: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 3: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 4: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 5: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 6: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 7: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 8: x = 2305843005992468481 time(long loop) = 1799
>>>     Iteration 9: x = 2305843005992468481 time(long loop) = 1799
>>>
>>>     My questions are:
>>>
>>>       (1) Why is the long-version slower compared to the int-version?
>>>     Is it
>>>     the int to long conversion? Or are there some optimizations, that
>>> are
>>>     only applied to int loops?
>>>
>>>       (2) What is the reason for the long-version getting slower after
>>>     "Iteration 1"?
>>>
>>>     Thanks,
>>>     Alex
>>>
>>>     (*) System information:
>>>
>>>     Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
>>>
>>>     $ lsb_release -d
>>>     Description:    Oracle Linux Server release 6.5
>>>
>>>     $ java -version
>>>     java version "1.7.0_55"
>>>     Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
>>>     Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
>>>
>>>     (Let me know if you need more information)
>>>
>>


More information about the hotspot-compiler-dev mailing list