A Very Fast 64–Bit Date Algorithm: 30–40% faster

Counting years backwards unlocks faster speed.
The first date algorithm to use only 4 multiplications, instead of 7+

23 November 2025

In this article I present my final very fast date conversion algorithm. It represents a significant speed gain — being similar in magnitude to the speed gains achieved by the previous fastest algorithm (Neri-Schneider 2021) over its predecessor (C++ Boost). The full algorithm implementation in C++ is released as free open source software (MIT License).

The algorithm provides accurate results over a period of ±1.89 Trillion years, making it suitable to process the full UNIX 64–bit time (in seconds).

The entire algorithm has been re-written top-to-bottom, with various micro-optimisations, but three main new ideas:

  • Years are calculated backwards, which removes various intermediate steps.
  • The step to calculate day-of-year is skipped, instead using a year-modulus-bitshift technique which removes a division.
  • The "Julian Map" technique is utilised from my previous article, which speeds up the 100/400 year calculation, removing two more hardware multiplications.

While fast date algorithms have always used 7 or more expensive computations (multiplication, division, or modulus by non-power-of-2 numbers), this algorithm uses only 4 multiplications. The speed-gain can be seen at a glance.

Relative Speeds of Fastest AlgorithmsAs tested on Intel x64 and Apple M4 Pro processors
(smaller numbers = faster)
See benchmark section
for further information.

C++ Boost(2011)
2.4×
Neri-Schneider(2021)
1.6×
This Algorithm(2025)

The benchmark results match what is expected by hand-counting operations:

Approx. CPU Cycles Comparison (x64)
AlgorithmMultiplications `M`
(non-power-of-2)
Basic Operations `B`
e.g. Add, Shift, LEA etc.
Approx Cycles
3 * M + B
C++ Boost102151
Neri-Schneider71940
This Algorithm41527

I will first present the general algorithm in pseudocode, then explain why it counts years backwards, then step through it line-by-line with an explanation of each step. Platform-specific optimisations are then outlined (x64 vs ARM / Apple Silicon), followed by details about range/accuracy, a discussion of various 32–bit fallback options, and finally the benchmark results (which you can test yourself).


The Full Algorithm

Colours are used to indicate the following:

  • Red: Expensive operations (i.e. the four multiplications)
  • Green: Operations that are “free” on x64 processors
  • Pink Comments: Highlighting the most notable new concepts

Given:   days = Days since epoch, where "1970-01-01" is zero   —   Then:

Very Fast 64–Bit Date Algorithm (x64 version)
  1. const ERAS = 4726498270         //  Use 14704 for 32–bit
  2. const D_SHIFT = 146097 * ERAS - 719469
  3. const Y_SHIFT = 400 * ERAS - 1
  4. const C1 = 505054698555331      //  floor(2^64*4/146097)
  5. const C2 = 50504432782230121    //  ceil(2^64*4/1461)
  6. const C3 = 8619973866219416     //  floor(2^64/2140)
  7. /* Adjust for 100/400 leap year rule. */
  8. rev = D_SHIFT - days            //  Reverse day count
  9. cen = (rev * C1) >> 64          //  Divide 365.2425
  10. jul = rev + cen - cen / 4       //  Julian map
  11. /* Determine year and year-part. */
  12. num = jul * C2                  //  Divide 365.25
  13. yrs = Y_SHIFT - (num >> 64)     //  Forward year
  14. low = num / (1 << 64)           //  Lower 64 bits
  15. ypt = (782432 * low) >> 64      //  Year-part (backwards)
  16. bump = ypt < 126464             //  Jan or Feb
  17. shift = bump ? 191360 : 977792  //  Month offset
  18. /* Year-modulo-bitshift for leap years. */
  19. /* Also revert to forward direction. */
  20. N = (yrs % 4) * 512 + shift - ypt
  21. D = ((N % 65536) * C3) >> 64    //  Divide 2140
  22. day = D + 1
  23. month = N / 65536
  24. year = yrs + bump

Note:

  • This is the x64 optimised version of the algorithm. See the section later for ARM-specific optimisations.

Why Count Backwards

It is no surprise that most (if not all) fast date conversion algorithms have counted years in the forward direction, it is the default intuitive choice.

When doing so, one ends up with terms of the format (foo * 4 + 3) / N. This is one of the most common patterns you'll see across fast date algorithms. An example from C++ Boost is shown below:

C++ Boost (extract)Full C++ Version
  1. century = (days * 4 + 3) / 146097       //  Divide by 36524.25
    // ...
  2. year = (dayOfCentury * 4 + 3) / 1461    //  Divide by 365.25

The multiplication by 4 and division by 146097 and 1461 is not too hard to understand, the larger constants being the number of days in 400-years and 4-years respectively — but what about those “+ 3” terms?

The reason for those is due to the longer years and longer centuries being offset slightly from the epoch. This is best understood by viewing the tables below. The first table shows the distribution of leap years when counting from the year zero:

Traditional forward counting years — epoch: 0000-03-01
Year0123
Start
End
0000-03-01
0001-02-28
0001-03-01
0002-02-28
0002-03-01
0003-02-28
0003-03-01
0004-02-29
Length
+ Type
365
Normal
365
Normal
365
Normal
366
Leap

Since the Gregorian leap year rule omits a usual leap year every 100 years, except years divisible by 400, we get the same pattern for centuries:

Traditional forward counting centuries — epoch: 0000-03-01
Century0123
Start
End
0000-03-01
0100-02-28
0100-03-01
0200-02-28
0200-03-01
0300-02-28
0300-03-01
0400-02-29
Length
+ Type
36,524
Short
36,524
Short
36,524
Short
36,525
Long

If we could find a way to count the dates from an epoch where these both start with a Leap/Long immediately (instead of offset-by-3) we could delete the “+ 3” terms. Not only that, we will have a way of merging the terms: foo * 4 / N into a single multiplication and bit-shift. This has the potential to remove up to four CPU-cycles from the algorithm (although the exact speed gain will be platform dependent).

I first tried by setting the epoch at the year-100 (101 BC), but that only solves half the problem: the century slicing must end on a date XX00-02-2Y, which means it must start immediately after a day divisible by 4.

As you know, it turns out counting backwards is the way to solve this problem fully:

Backwards counting years — epoch: 2400-02-29
Year0123
Start
End
2400-02-29
2399-03-01
2399-02-28
2398-03-01
2398-02-28
2397-03-01
2397-02-28
2396-03-01
Length
+ Type
366
Leap
365
Normal
365
Normal
365
Normal

This allows the epoch to begin in a leap-year as shown above, as well as to start in a long century as shown below:

Backwards counting centuries — epoch: 2400-02-29
Century0123
Start
End
2400-02-29
2300-03-01
2300-02-28
2200-03-01
2200-02-28
2100-03-01
2100-02-28
2000-03-01
Length
+ Type
36,525
Long
36,524
Short
36,524
Short
36,524
Short

Fortunately, reversing the timeline comes at zero speed cost, as all date algorithms tend to have epoch adjustments. It will just be a matter of using a minus sign where an addition would normally be.

Now that this general concept is out of the way, we can proceed with the nitty-gritty of the individual steps of the algorithm.


The Algorithm - Line by Line

With the timeline reversed, we can now begin explaining the algorithm from the top:

Given:   days = Days since epoch, where "1970-01-01" is zero   —   Then:

The Algorithm — Step 1
  1. const ERAS = 4726498270
  2. const D_SHIFT = 146097 * ERAS - 719469
    // ...
  3. rev = D_SHIFT - days            //  Reverse day count

The constants D_SHIFT is calculated at compile-time:

  • 719162 is the number of days to count back from the UNIX epoch of 1970-01-01 to 0000-02-29 — our natural alignment point.
    Note that this is one day earlier than many other algorithms which typically count from the date 0000-03-01.
  • 146097 is the number of days per 400 year era, and we will add a large multiple ERAS of these to set our far future count-back point.
    In the example tables, we counted back from 2400, which would mean in that case ERAS = 6.

The specific value for ERAS = 4726498270 was selected to provide the maximum possible range centred around the UNIX epoch in 64–bit.
For 32–bit applications, the value of 14704 is appropriate — see the section about ranges for further details about this.

Finally, the calculation of rev involves that minus sign to reverse the timeline.
 

The Algorithm — Step 2
  1. const C1 = 505054698555331      //  floor(2^64*4/146097)
    // ...
  2. cen = (rev * C1) >> 64          //  Divide 365.2425

In this step, we are performing what was previously cen = (days * 4 + 3) / 146097, but with just a single multiplication.
Performing a division by doing multiplication followed by bit-shift ("mul-shift") is a very common technique in low-level algorithm design. If one uses actual division, the compiler will typically output something similar, but with a larger bit-shift.

There are two reasons that we are hand-writing our own mul-shift in this case

  1. We want to integer-divide by 365.2425 — something that cannot be done with a standard integer division.
  2. We can specify a bit-shift of precisely 64, which is “free” in 64–bit computers.

The reason a bit-shift of 64 in this case is “free” is due to the way 64–bit computers work. Since 64–bit computers cannot hold a number larger than 64–bits in a single register, a multiplication will place the full 128–bit result from the multiplication into two adjacent registers. The bit-shift of 64 is therefore just the compiled machine-code making use of the first of these two registers, and hence, does not actually involve any work.

The reason a compiler would normally use a larger bit-shift when using a normal division, is that eventually, with astronomically large values, this calculation will no longer be correct. We don't need this to be correct forever, just for a very large range to make the algorithm useful. A compiler does not know such requirements, and will usually err on the safe side and go with the more accurate, larger, and costlier, bit-shift.
 

The Algorithm — Step 3
  1. jul = rev + cen - cen / 4       //  Julian map

This is also a “new” step not found in recent fast date algorithms.

What we are doing here is quickly, and efficiently, accounting for the 100-year and 400-year leap year rules. Boost and Neri-Schneider both use alternative steps to calculate the specific day within the associated 400-year block (costly modulus or subtraction and multiplication), and later construct the year with another costly addition and multiplication.

I wrote about this technique in a previous article earlier this month. It is a technique used by some former date-algorithm designers, but the speed gains were either not well understood, or not communicated clearly to later designers. I recommend reading the previous article for more details.
 

The Algorithm — Step 4
  1. const Y_SHIFT = 400 * ERAS - 1
    // ...
  2. const C2 = 50504432782230121    //  ceil(2^64*4/1461)
    // ...
  3. num = jul * C2                  //  Divide 365.25
  4. yrs = Y_SHIFT - (num >> 64)     //  Forward year
  5. low = num / (1 << 64)           //  Lower 64 bits

Once again, we are doing a fast division with a single multiplication, where previous algorithms would have to first calculate 4 * jul + 3.
This time we are not immediately performing the bit-shift by 64, as we will utilise both the high and low parts of the result. This technique was pioneered by Neri‑Schneider .

The high 64–bit part of num, represents the number of years that have elapsed backwards since our forward-epoch, so subtracting this by a carefully selected constant will yield the correct year. As with most other fast date algorithms, this year will later need to be incremented if it is determined that the month is either January or February, since the internal logic of the algorithm is using treating the start of the year as 1 March.

While the upper-64 bits are easy to explain, the lower 64 bits are going to be treated different to usual.
 

The Algorithm — Step 5
  1. ypt = (782432 * low) >> 64      //  Year-part (backwards)

This is where things start to get very different. In the Neri-Schneider algorithm, the lower bits are divided by a constant in order to calculate the exact day_of_year (0-365). That day_of_year is then later multiplied again to recover the month and day.

At first glance, a division followed by a multiplication sounds like it could be merged into a single combined operation. However, the division step introduces an important integer rounding, and that rounding cannot be removed without changing the result.

In this algorithm, we deliberately skip the explicit day_of_year step and merge the multiplication and division together to get just a year-part. This temporarily loses that rounding effect, causing a drift in the value of year-part equal to 1/4 of a day per year — resetting every 4 years. We will allow this “error” to occur and correct it later using a year-modulus-bitshift technique. Hold that thought for now — we will return to it when we construct N on line 27 (Section 7).
 

The Algorithm — Step 6
  1. bump = ypt < 126464             //  Jan or Feb
  2. shift = bump ? 191360 : 977792  //  Month offset

We will adopt the term bump used by Neri-Schneider to indicate that the year beginning 1 March has overflowed to the next calendar year — i.e. the month is January or February. Other algorithms often use day_of_year or month to calculate this, however we are not computing day_of_year, and we'll require bump in order to calculate month. As such, we'll determine this value early via this more cryptic looking method.
Note that ypt (year part) is still in the reverse direction, so the “final” months of January and February (final for a computational year beginning 1 March) are actually low numbers instead of the more natural high numbers one usually expects.

Next, shift is a value that will be used in the next line to shift a linear equation by 12-months to achieve a similar “bump” overflow to our year.
Note that the difference between the two “shift” options is 786,432, which is equal to 12 × 216. Since the next linear equation uses 216 as the denominator, this achieves the shift by 12 months where required, in only 1 CPU-cycle.

Other fast date algorithms often use a step near the end such as month = bump ? month - 12 : month, however on x64 processors that takes 2 CPU-cycles to compute, as M - 12 must always be calculated before the ternary check is performed.
 

The Algorithm — Step 7
  1. /* Year-modulo-bitshift for leap years. */
  2. /* Also revert to forward direction. */
  3. N = (yrs % 4) * 512 + shift - ypt

This is the year-modulus-bitshift magic hinted at earlier. The value of N will be split into two parts, where the high 16 bits will become the month and the low 16 bits will map to the day.

We have already explained shift, but (yrs % 4) * 512 is certainly an unusual looking term.
In simple terms, if we pretend that months are 32-days long, this represents a yearly shift of exactly 1/4 of a day (resetting every four years). This cancels the error discussed in Step 5.

How is it 1/4 of a day?
In our fake 32-day-month model, 512 units correspond to exactly one quarter of a day within the 16–bit space. In binary terms, 512 is (216 / 32) / 4 — that is, one quarter of one fake day. The value 512 is also a power of two, so it compiles to a simple left-shift by 9.

Of course, actual Gregorian months are not exactly 32 days long, but they are just close enough to 32 days long to make this work. There is just enough wiggle room in the valid range of shift for the rounding to land correctly across all months.

Note: at first this trickery looks like it might only save only minimal cycles, but it is actually a necessary step to avoid those pesky +3 terms that we worked so hard to remove earlier. If we used the Neri–Schneider technique from line 17 onwards, that offset would have reappeared.

It is very fortunate that the Julian and Gregorian calendars have month-lengths close to a power of two to make this trick possible. Other calendars that have month-lengths closer to the actual lunar-length might not be able to adopt this technique.
 

The Algorithm — Step 8
  1. const C3 = 8619973866219416     //  floor(2^64/2140)
    // ...
  2. D = ((N % 65536) * C3) >> 64    //  Divide 2140

This is the last non-trivial step, where we take the lower 16–bits and map them to the day-number.
This day number will be a value in the range of 0-30, and will need to be incremented at the end.
Neri-Schneider use division by 2141 here, however I found the value of 2140was seemingly required in order to achieve the year-modulus-bitshift technique.
We also use a hand-crafted mul-shift in order to ensure that the bit-shift is the “free” 64–bit variation.
 

The Algorithm — Step 9
  1. day = D + 1
  2. month = N / 65536
  3. year = yrs + bump

Finally, we clean up the values as previously described:

  • Increment day so it is 1-indexed
  • Take the upper 16 bits of N for month
  • Overflow the year when the month is January or February.

That's it. The date has now been calculated faster than ever before.

Optimisations for ARM64 and Apple Silicon

Many ARM chips take longer to load integers that are larger than 16 bits (max: 65,535).
There are a handful of integers between lines 17-25 that are larger than this threshold, namely: 782432, 126464, 191360, 977792, 65536.

The reason for their size is that they allow calculation of D to involve N % 65536, which compiles to taking the lowest 16–bits of N, which is a free operation on x64.

These constants have been carefully selected to each be divisible by 32. One can use a compile-time check to divide these constants by 32 if the target is ARM (or use the smaller constants and scale by 32 for x64). When doing so, they all fit under 16–bit in size, leading to a speed bump for many ARM chips. Other constants also need adjusting in this case: divide 512 by 32 (for calculating N), and the constant of C3 should be increased by a factor 32.

Additionally, in Step-6 of the explanation, I noted that the new approach using shift specifically improves the performance on x64 based computers, and should have no effect on ARM devices. From my testing, I have found that this particular optimisation negatively impacts the performance on Apple M4 Pro, presumably due to some parallelisation that is lost. As such, my recommendation is to use a compile-time check to limit this improvement specifically to x64 devices.

Adopting the above changes results in the following changes to the algorithm:

  • Note: terms highlighted in Green are simplified at compile-time.
ARM64 / Apple Silicon Improvement
  1. #if IS_ARM
  2.     const SCALE = 1
  3. #else
  4.     const SCALE = 32
  5. #endif
  6. // ...
  7. const C3 = 8619973866219416 * 32 / SCALE
  8. // ...
  9. ypt = ((24451 * SCALE) * low) >> 64
  10. #if IS_ARM
  11.     shift = (30556 * SCALE)
  12. #else
  13.     bump = ypt < 126464
  14.     shift = bump ? (24412 * SCALE) : (30556 * SCALE)
  15. #endif
  16.     N = (yrs % 4) * (16 * SCALE) + shift - ypt
  17.     D = ((N % (2048 * SCALE)) * C3) >> 64
  18.     M = N / (2048 * SCALE)
  19. #if IS_ARM
  20.     bump = M > 12
  21.     month = bump ? M - 12 : M
  22. #else
  23.     month = M
  24. #endif
  25. // ...

Accuracy and Range

Fortunately, the range of accurate values of the algorithm is very wide:

Total Days1,381,054,434,006,886 ~1.381 × 1015~1.381 Quadrillion~250.29
Total Years3,781,198,611,900~3.781 × 1012~3.781 Trillion~241.78
Max Date+1,890,599,308,000-02-29     — Rata Die:   +690,527,217,032,721
Min Date−1,890,599,303,900-03-01     — Rata Die:   −690,527,216,974,164

The first date above this range will overflow.
The first date below this range will incorrectly return February 29 in a non-leap year.

64–bit UNIX time (measured in seconds) covers a range of around 585 Billion years in total, while this algorithm is accurate in the range of Trillions of years. This makes it sufficient to handle the full 64–bit UNIX time range.

While I have not developed a formal proof of this range, there is a testcase in the benchmark codebase which verifies the above range. It is very slow to check the entire range (around a month on Apple M4 Pro), so it first quickly checks:

  • [-232 ... 232]
  • [MAX_DATE − 232 ... MAX_DATE]
  • [MIN_DATE ... MIN_DATE + 232]
  • A random sample of 232 dates

Portability

Since this algorithm is 64–bit only, some library authors may be interested in techniques to safely deploy this in a portable manner that can support 32–bit computers.

The fast 32–bit algorithms from Article 1 (fastest) and Article 2 (full 32–bit input range safe), have been updated with techniques from this algorithm. They are each now notably faster than as stated in their respective articles.

If your API only needs to support a restricted day range of around 230 days (~1 Billion days / ~2.9 Million years), then the following combination is the fastest known for each platform type:

  • This algorithm for 64–bit computers
  • My algorithm from Article 1 for 32–bit computers

If you prefer your API to handle dates in the full 32–bit input range (often the best choice), then the following combination is suitable:

  • This algorithm for 64–bit computers
  • My algorithm from Article 2 for 32–bit computers.

My algorithm from Article 2 is the only fast 32–bit date algorithm that I am aware of that is 100% overflow safe within the 32–bit input range. It is both overflow-safe, and very fast. It is faster than Boost but either slower or faster than Neri-Schneider depending on the device. Note that Boost and Neri-Schneider only support around 25% of the 32–bit input range.
This overflow-safe algorithm was created with the sole purpose of supporting the scenario noted above.

Benchmark Results

The benchmark code is a direct fork of the benchmarks provided by Neri‑Schneider (GitHub link ). The relative speed ratios are calculated by first subtracting the "scan" performance, which removes the overhead of calling the function from the benchmarks, as is also used by Neri‑Schneider.

Lower numbers are faster.

Algorithm:ScanBoostNeri‑
Schneider
Backwards
32
Backwards
32 Wide
Backwards
64
Support of Input Range
on 32-Bit Computers:
~25%~25%~25%100%N/A
Dell Inspiron 13-5378 (Windows 10 22H2)
Intel Core i3-7100 @ 2.4 GHz
Compiler: MSVC 19.44 (x64)
-
(37620)
2.57x
(179847)
1.78x
(136029)
1.21x
(104728)
1.74x
(133839)
1.00x
(92987)
Lenovo MIIX 520 13-5378 (Windows 11 Pro 26200)
Intel Core i7-855OU @ 1.8 GHz
Compiler: MSVC 19.44 (x64)
-
(16575)
2.33x
(107718)
1.66x
(81670)
1.18x
(62856)
1.64x
(80878)
1.00x
(55711)
MacBook Pro 2024 (MacOS 15.6.1)
Apple M4 Pro
Compiler: Apple clang 17.0.0
-
(3720)
2.45x
(32147)
1.62x
(22520)
1.38x
(19751)
1.92x
(25957)
1.00x
(15304)

The above platforms were chosen because they produce stable, consistent results that match expectations. I also tested the algorithm on a Lenovo IdeaPad Slim 5 (Snapdragon ARM64), but it reported nearly a 60% speed gain. This strongly suggests a thermal or power-management behaviour that distorts short benchmark loops. Similarly, x86-based MacBook Pros from 2016 and 2020 reported speed improvements of only 2–10%, while other algorithms appeared twice as slow as expected. This again indicates that the CPU is applying aggressive battery or thermal optimisation, making those measurements unreliable.


If you found this interesting, you should follow me on X to get notified when I publish more date and algorithm related articles.