RFR: 8319423: Improve Year.isLeap by checking divisibility by 16 [v2]

Cassio Neri duke at openjdk.org
Sat Nov 4 20:59:07 UTC 2023


On Fri, 3 Nov 2023 23:22:27 GMT, Claes Redestad <redestad at openjdk.org> wrote:

>> https://github.com/cassioneri/eaf suggest this code for leap year calculation:
>> 
>>     public static boolean isLeap(long year) {
>>         int d = year % 100 != 0 ? 4 : 16;
>>         return (year & (d - 1)) == 0;
>>     }
>> 
>> .. with a claim this would compile down to branchless, easily pipelined code.
>> 
>> This doesn't currently happen with C2. In the meantime I think we can improve the current code in `Year.isLeap` and `IsoChronology.isLeapYear` by leveraging the fact that the `% 100` check is only needed if `(year & 15) != 0`:
>> 
>> 
>>     public static boolean isLeap(long year) {
>>         return (year & 15) == 0 ? (year & 3) == 0 : (year & 3) == 0 && year % 100 != 0;
>>     }
>> 
>> 
>> Mac M1:
>> 
>> Name                           Cnt  Base   Error   Test   Error   Unit  Change
>> LeapYearBench.isLeapYear        15 0,743 ± 0,009  0,994 ± 0,005 ops/us   1,34x (p = 0,000*)
>> LeapYearBench.isLeapYearChrono  15 0,748 ± 0,006  0,991 ± 0,003 ops/us   1,32x (p = 0,000*)
>> LeapYearBench.isLeapYearNS      15 0,558 ± 0,026  0,552 ± 0,033 ops/us   0,99x (p = 0,602 )
>>   * = significant
>> 
>> 
>> Linux x64:
>> 
>> Name                           Cnt  Base   Error   Test   Error   Unit  Change
>> LeapYearBench.isLeapYear        15 0.534 ± 0.001  0.765 ± 0.004 ops/us   1.43x (p = 0.000*)
>> LeapYearBench.isLeapYearChrono  15 0.535 ± 0.000  0.753 ± 0.040 ops/us   1.41x (p = 0.000*)
>> LeapYearBench.isLeapYearNS      15 0.352 ± 0.000  0.351 ± 0.001 ops/us   1.00x (p = 0.000*)
>>   * = significant
>> 
>> 
>> 30% higher throughput on M1, 40% on x64. `isLeapYearNS` runs a variant of the code from https://github.com/cassioneri/eaf ported to java - perhaps the JIT can be improved to do whatever clang/gcc does here and achieve an even better speed-up.
>> 
>> Testing: so far only java/time/tck/java/time locally, will run a few tiers before filing an enhancement and opening the PR for review.
>
> Claes Redestad has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Apply similar optimization to GregorianCalendar, sun.util.calendar

Thanks for your interest in my work. I'd love to assist porting our algorithms to Java. Notice, however, that I'm not a Java programmer and I don't have the complete picture of what goes on in the JVM. What follows is based on my experience with C/C++ compilers but, I reckon, most of it might apply to Java as well.

When determining if `year` is leap or not it's very reasonable to believe that checking divisibility by 4 first is the best strategy. As I told in [my talk](https://www.youtube.com/watch?v=0s9F4QWAl-E), virtually every implementation that I found made that assumption. However, this is not the case thanks to modern branch predictors, at least for x86_64 CPUs. My experiments showed that checking divisibility by 100 first is the best way:

    if (year % 100 != 0)
      return year % 4 == 0;
    return year % 400 == 0;

Maths results show that, in calculations of leap year, it's correct to replace `year % 400 == 0` by the cheaper expression `y % 16 == 0`. Except for [compiler bugs](https://bugs.llvm.org/show_bug.cgi?id=50334), this should always be done. Hence, the implementation that @cl4es  mentioned:

    public static boolean isLeap(long year) {
        int d = year % 100 != 0 ? 4 : 16;
        return (year & (d - 1)) == 0;
    }

In my talk I also said that I did other optimisations around `year % 100 == 0` but that was another story. Let me tell this story.

Similar mathematical arguments as above show that it's also correct to replace `year % 100 == 0` with `year % 25 == 0` and [the latter requires one fewer assembly instruction](https://godbolt.org/z/zKza1Yc83). (I've also discussed this topic in a [PR](https://github.com/nakedible/datealgo-rs/discussions/21#discussioncomment-6907992) for the Rust implementation.) However, contrary to the case of 400-and-16, it's not always profitable to replace `year % 100 == 0` with `year % 25 == 0`. It depends on whether the code executed by the CPU contains branches or not. (Despite the usage of the ternary operator `?`, some CPUs support conditional moves which allow the elimination of some branches.)

If there's no branch, then this is probably be the best option:

    public static boolean is_leap_v0(long year) {
      int d = year % 25 != 0 ? 4 : 16;
      return (year & (d - 1)) == 0;
    }


If there's a branch, then I'd expect this to perform better:

    public static boolean is_leap_v0(long year) {
      int d = year % 100 != 0 ? 4 : 16;
      return (year & (d - 1)) == 0;
    }

The reason is hinted in my talk: If there's a branch, the branch predictor can do a better job when execution is split into paths with (1/100, 99/100) = (1%, 99%) probability distribution than when the distribution is (1/25, 24/25) = (4%, 96%).

[Looking at byte-code](https://godbolt.org/z/MEq9c5hef) for the 4 different implementations floated in this discussion, I see some `goto`s but I can't tell if, in assembly, they translate to branches or conditional moves. With more knowledgeable eyes, the [C versions](https://godbolt.org/z/xTGKh5x4h) of the same implementations suggest me that the branchless `is_leap_v0` is the best. But, I ask you, to not trust in my eyes or my intuition and, instead, measure the performance of the several alternatives. This old [SO post](https://stackoverflow.com/a/60646967/1137388) might also be helpful.

I can also shed some light on the magical numbers that appear in the code that check divisibility by 100 and 25 by pointing to my series of article  on this topic:

* Quick Modular Calculations (Part 1) https://accu.org/journals/overload/27/154/overload154.pdf#page=13
* Quick Modular Calculations (Part 2) https://accu.org/journals/overload/28/155/overload155.pdf#page=16
* Quick Modular Calculations (Part 3) https://accu.org/journals/overload/28/156/overload156.pdf#page=12

I hope this helps.
Cassio.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/16491#issuecomment-1793544215


More information about the core-libs-dev mailing list