A faster full-range Leap Year function

Using a new modulus-equality technique, which might have broader applicability

11 December 2025

Welcome back to the fast date series, where we squeeze every last nanosecond out of standard date library functions.

In this article I present a new faster full-range function to calculate whether a year is a leap year. This type of calculation is a fundamental date library utility, it is used all over the place: validating and parsing dates, determining the Nth last weekday of a month, handling month overflow/underflow when adjusting the day, etc.

The faster full-range function builds upon techniques from others, but utilises a new modulus alternative technique, which accepts "errors" in the calculation that are corrected in the subsequent step. This technique appears generalisable, and might find use in other areas such as hashing algorithms.

The C++ code is released as free open source software (BSL-1.0 License), although it is simple to reimplement it yourself.

Note: a function of the same speed was designed in 2020 by Cassio Neri and Lorenz Schneider, but that is a restricted-range version correct over only about 25% of the 32-bit input domain. The method presented here is accurate over the full 32-bit range.

Sections:

Speed comparison at a glance

Relative Speeds of Fastest Full-Range Leap Year AlgorithmsAs tested on Intel x64, Snapdragon ARM and Apple M4 Pro processors
(smaller numbers = faster)
See benchmark section
for further information.

Neri-Schneider (full-range variant)(2020)
~1.1 to 1.4×
New Signed Function(2025)
New Unsigned Function(2025)~0.9×

The Leap Year Algorithm

Unlike previous blog posts in this series, this algorithm exploits overflow behaviour, so this time the pseudocode will specify variable types.

Given:   year (signed 32-bit integer)   —   Then:

New Faster Full-Range 32-Bit Leap Year AlgorithmSee in C++
  1. const uint32 CEN_MUL = (1 << 32) / 100 + 1   // 32-bit reciprocal of 100   (21,474,837)
  2. const uint32 CEN_CUTOFF = CEN_MUL * 4        // Over-estimated cutoff      (85,899,348)
  3. const uint32 CEN_BIAS = CEN_MUL / 2 * 100    // Large multiple of 100y  (1,073,741,800)
  4. uint32 low = (year + CEN_BIAS) * CEN_MUL     // Lower 32-bits
  5. bool maybe_cen = low < CEN_CUTOFF            // Year is likely divisible by 100
  6. bool is_leap = (year & (maybe_cen ? 15 : 3)) == 0
  • Note: For an unsigned version, the algorithm is the same but use CEN_BIAS = 0.
  • Note2: For other bit-widths such as 16-bit and 32-bit, very different constants are required. See the section other bit-widths

This probably looks nothing like your grandma‘s leap year formula.
— Where is the usual modulus-by-100 etc.?
— What on Earth is with the “likely” divisible thing?
— Even if you recognise the CEN_MUL stuff, how does this possibly support the full 32-bit range?
These are all reasonable questions.

Before answering them, it is easiest to first review prior algorithms.

How it Works

For the longest time, leap years were determined by the standard “textbook” formula:

The Textbook Formula
  1. is_leap = year % 4 == 0 && (year % 100 != 0 || year % 400 == 0)

The above is quite slow, as every call must compute two costly modulus functions.

It is important to note that this whole line usually compiles to a branch-free formula, where each equality is always calculated and then boolean-compared, which, on many platforms, is faster than using branches, which can halt the pipeline when mis-predicted.

The textbook formula was improved significantly by Neri-Schneider as per the below:

Neri-Schneider - Faster Leap Year Algorithm
  1. bool is_cen = (year % 100 == 0)
  2. bool leap = (year & (is_cen ? 15 : 3) == 0)

The brilliant insight here is that all parts of the formula are going to run each time in a branchless version, so we might as well check for divisibility by 100 first. If it's divisible by 100 then checking for divisibility by 4 is redundant. In such century-years, a divisibility check by 16 (single-cycle) will accurately match centuries that are a multiple of 400, but not other centuries.

Note: The line: & (is_cen ? 15 : 3) is a bitwise equivalent of % (is_cen ? 16 : 4). The decision to use the more obscure looking version is that some compilers seem to produce more efficient assembly with the former (my testing shows MSVC treats the latter much slower).

This overall technique saves an "expensive" non-power-of-2 modulus call. Such modulus calls are a lot more expensive than they look. In order to compute year % 100, a compiler will usually calculate year - Y / 100 * 100. That is, two chained expensive operations.

The crazy thing is, we can actually get rid of all modulus operations.

It turns out, we don't need to compute the value of Y % 100 == 0, we only need to compute a weaker version.

What we need to do is to:

  • Categorise all years that are divisible by 100 one way.
  • Categorise all other years that are divisible by 4 another way.
  • And importantly: For all other years, categorisation does not matter.

The reason we don't care about other years, is that they are neither divisibile by 4 nor by 100, so whichever path we take will be the same.

With the above in-mind, we can use an approximation to Y % 400 == 0 as per the below:

New Fastest Leap Year Algorithm (32-Bit)
  1. const uint32 CEN_MUL = (1 << 32) / 100 + 1
  2. const uint32 CEN_CUTOFF = CEN_MUL * 4
  3. const uint32 CEN_BIAS = CEN_MUL / 2 * 100
  4. uint32_t low = (year + CEN_BIAS) * CEN_MUL   // Lower 32-bits
  5. bool maybe_cen = low < CEN_CUTOFF            // Year is likely divisible by 100

This is a variant of the classic fixed-point math operation, similar to what was used in the prior algorithms, but in this case only using the lower 32‑bits after multiplication by 100's 32-bit ceil fixed-point reciprocal. So long as those bits are low, then we know we are early in the century.

CEN_BIAS is needed to ensure that all negative-value centuries are pushed into the positive domain, to ensure the multiplication comes out with the correct sign. A handful of very low negative values will underflow, but still calculate correctly.

“Proof by Just Look at It”

While an exhaustive search script is available in the testcase library, I also present a “Proof by Just Look at It” via the table below.

Using the navigation buttons and scrollbar, you can scroll through all 4-billion+ possible 32-bit input values, and see the computation yourself. The scrollbar will snap to centuries, as that is where the important computation steps are found. As you can see, each century the value of low < CEN_CUTOFF evaluates to true, and nearby false positives (“Wrong (but OK)”) are never leap years.

yearCentury?Leap?A = uint32(
year + CEN_BIAS)
(2,147,483,600)
low = u32(
 A * CEN_MUL)
(42,949,673)
low < CEN_CUTOFF
(< 171,798,692)
Correct?
i.e. is a Cent.
0005NoNo2,147,483,605300,647,709falseCorrect
0004NoLeap2,147,483,604257,698,036falseCorrect
0003NoNo2,147,483,603214,748,363falseCorrect
0002NoNo2,147,483,602171,798,690trueWrong (but OK)
0001NoNo2,147,483,601128,849,017trueWrong (but OK)
0000CenturyLeap2,147,483,60085,899,344trueCorrect
-0001NoNo2,147,483,59942,949,671trueWrong (but OK)
-0002NoNo2,147,483,5984,294,967,294falseCorrect
-0003NoNo2,147,483,5974,252,017,621falseCorrect
-0004NoLeap2,147,483,5964,209,067,948falseCorrect
-0005NoNo2,147,483,5954,166,118,275falseCorrect

Other Bit-Widths

The reason this algorithm works is due to 232 being very close to a multiple of 100:

2^32 / 100 = 42,949,672.96

As you can see above, the fractional part of 0.96 is only 4% away from a whole number. That “error” aligns perfectly with the required tolerance, where we can allow the century check to be accurate within ±4 years.

On the other hand, 16-bit and 64-bit do not have as nice fixed-point reciprocals:

  • 2^16 / 100 = 655.36
  • 2^64 / 100 = 184,467,440,737,095,516.16

It turns out, this formulation of the technique is only possible in bit-widths of 2^(12 + 20n), for integers n ≥ 0.

Upon the first release of this article, it was stated that other bit-widths are not possible, however reddit user sporule provided a technique to compute alternative constants in these cases (and for the 32-bit case), using a divisor of 25:

Full range template (credit to Reddit user sporule:
  1. template <std::unsigned_integral T>
  2. constexpr T inverse(T r) {
  3.     assert((r & 1) != 0); // even case omitted for brevity
  4.     for (T c, d = r; (c = d * r) != 1; ) {
  5.         r *= T(2) - c;
  6.     }
  7.     return r;
  8. }
  9. template <std::unsigned_integral T>
  10. bool is_leap(T y) {
  11.     constexpr T threshold = T(~T(0)) / 25;
  12.     constexpr T mul = inverse<T>(25);
  13.     T iy = y * mul;
  14.     bool d25 = (iy <= threshold);
  15.     return (y & (d25 ? 15 : 3)) == 0;
  16. }

The resulting constants are, for 16-bit:

Full range 16-bit leap year function:
  1. const uint16 CEN_MUL = 23593
  2. const uint16 CEN_CUTOFF = 2622
  3. const uint16 CEN_BIAS = (1 << 15) / 100 * 100
  4. uint16 low = (year + CEN_BIAS) * CEN_MUL
  5. bool maybe_cen = low < CEN_CUTOFF
  6. bool is_leap = (year & (maybe_cen ? 15 : 3)) == 0

...and, for 64-bit:

Full range 64-bit leap year function:
  1. const uint64 CEN_MUL = 1106804644422573097
  2. const uint64 CEN_CUTOFF = 737869762948382065
  3. const uint64 CEN_BIAS = (1 << 63) / 100 * 100
  4. uint64 low = (year + CEN_BIAS) * CEN_MUL
  5. bool maybe_cen = low < CEN_CUTOFF
  6. bool is_leap = (year & (maybe_cen ? 15 : 3)) == 0

How it compares to similar prior work

It has been known for some time that there are ways to perform a faster modulus equality check by using the lower bits after multiplication. The following paper discusses this technique by using 64-bit promotion to accurately calculate the 32-bit modulus:
Faster Remainder by Direct Computation: Applications to Compilers and Software Libraries
— Daniel Lemire, Owen Kaser, Nathan Kurz
https://arxiv.org/abs/1902.01961

The reason the above technique is not often useful in general purpose date libraries is:

  1. It is not as portable, requiring 64-bit intermediates.
  2. It is platform-dependent, being slower on ARM than the Neri-Schneider formula.

Neri-Schneider also looked into a similar idea and documented it in his calendar algorithms repository .
It references work he published in the following Journal (see page 15):
https://accu.org/journals/overload/28/155/overload155.pdf

Neri-Schneider's technique is 32-bit, but accurate over only 25% of the 32-bit input range. It is therefore documented as useful for covering the full signed 16-bit input range. It becomes less suitable for general-purpose date libraries outside that range as the function has no built-in way to signal when inputs fall outside the safe interval, and maintainers and users of the library must ensure that all callers remain within that window.

So, this new algorithm is not a fundamentally new faster algorithm in general, but a faster full-width algorithm. Libraries that validate years, parse arbitrary inputs, or support wider ranges typically require a full-range function for safety and consistency, and that is where this new one may be useful.

Generalisation

The technique shown here applies not only to 100, but to any divisor for which the bias–multiply–cutoff produces a small, bounded “fuzzy” neighbourhood around true multiples. If that neighbourhood is narrower than the disambiguation step (a final modulus by a small power of two, such as 4 or 16), the result can be corrected. The smallest divisors with this property act as “seed” generators, and multiplying them by powers of two preserves the behaviour.

In 32-bit, the following (up to 100) work as seed generators for n >= 0:

  • 6 × 2n = 6, 12, 24, 96...
  • 20 × 2n = 20, 40, 80, 160...
  • 52 × 2n = 52, 104, 208, 416...
  • 100 × 2n = 100, 200, 400, 800...

In 64-bit, the following (up to 100) work as seed generators for n >= 0:

  • 6 × 2n = 6, 12, 24, 96...
  • 18 × 2n = 18, 36, 72, 144...
  • 20 × 2n = 20, 40, 80, 160...
  • 38 × 2n = 38, 76, 152, 304...
  • 54 × 2n = 54, 108, 216, 432...
  • 86 × 2n = 86, 172, 344, 688...

The support for 12 and 24 are of notable interest for timekeeping, and they might be utilised in future blog posts in this fast date series.

A practical method for identifying usable divisors is to use the qmodular repository from Neri-Schneider, and note the max-dividend for various integers. For a given integer, if it has, say ~25% coverage of the 32-bit range, then a modulus of 4x greater can be found. Larger powers-of-2 are then trivial to support by increasing the power-of-2 comparison.

For example, using qmodular, ./div mcomp 3 gives a max dividend of 2147483645, which is about half of the 32-bit range. This indicates that we can get full coverage for modulus of 6.

Example Mod 6 == 0
  1. const uint32 MOD_TARGET = 6
  2. const uint32 QMOD_DIV = 3
  3. const uint32 QMOD_MULTIPLIER = 1431655766
  4. const uint32 SCALE = MOD_TARGET / QMOD_DIV
  5. const uint32 DIV_MUL = QMOD_MULTIPLIER / SCALE
  6. const uint32 DIV_CUTOFF = DIV_MUL * SCALE
  7. const uint32 DIV_BIAS = DIV_MUL / 2 * MOD_TARGET
  8. uint32 low = (input + DIV_BIAS) * DIV_MUL
  9. bool result = (input % SCALE) == 0 && (low < DIV_CUTOFF)

Other constants for 32-bit are:

MOD_TARGETQMOD_DIVQMOD_MULTIPLIER
631431655766
205858993460
5213330382100
10025171798692

This is not the full formal classification, but it is sufficient to reproduce the results shown here. The complete derivation will be explored in a later blog post. Preliminary tests on Apple Silicon suggest that this technique can act as a faster general modulus-equality operation for selected divisors. Further testing on other platforms will be required to see if this applies elsewhere.

Benchmark results

The below table shows the number of milliseconds taken to process a large number of dates.
The number of dates processed differed per platform due to memory constraints (stated per row), but the relative times ratios noted in bold are comparable across the board.

The benchmarks were run 5 times on each platform, with the median result selected for each algorithm.

No meaningful difference in speed was detected on Apple Silicon for Signed vs Unsigned.

Algorithm:TextbookNeri-Schneider
(full-range
variant)
New - SignedNew - Unsigned
Dell Inspiron 13-5378 (Windows 10 22H2)
Intel Core i3-7100 @ 2.4 GHz
Compiler: MSVC 19.44 (x64)
Iterations: 100M
5.64x
(966.117)
1.49x
(255.572)
1.00x
(171.323)
0.92x
(157.26)
Lenovo IdeaPad Slim 5 (Windows 11 24H2)
Snapdragon X126100 @ 2.956 GHz
Compiler: MSVC 19.44 (ARM64)
Iterations: 100M
4.76x
(308.393)
1.47x
(95.5454)
1.00x
(64.856)
0.90x
(58.224)
MacBook Pro 2024 (MacOS 15.6.1)
Apple M4 Pro
Compiler: Apple clang 17.0.0
Iterations: 500M
6.28x
(702.688)
1.11x
(123.821)
1.00x
(111.865)
1.00x
(111.694)
MacBook Pro 2020 (MacOS 14.3.1)
Intel Core i5 @ 2 GHz
Compiler: Apple clang 15.0.0
Iterations: 500M
2.20x
(434.592)
1.06x
(209.565)
1.00x
(197.204)
0.91x
(178.917)

The same C++ file is used for verifying the range as well as benchmarking.


In the next article, yet to be published, we will explore how the calculation of "is leap year" can be made almost for "free", when combined with other calendar related algorithms that already need to compute the number of centuries that have elapsed. The technique explored will involve analysing both the high and low 32 bits resulting from the century calculation step, with a particular focus on ordinal dates, as used by the popular "Time" Rust library. If you are particularly interested, you can preview that new algorithm here .

If you found this interesting, you should follow me on X to get notified when I publish more date and algorithm related articles.