10 November 2025
This article is part 2 in a series of new fast date algorithms.
Part 1 introduced a new faster date conversion algorithm which outperforms existing algorithms by 2-11%.
Like most fast date algorithms, it works on around 25% of the 32–bit or 64-bit range.
In this article, I explore how Hinnant’s era technique (which expands coverage from ~25% to over ~99.9% of the 32–bit range) can be applied to other fast algorithms. With a small tweak, these algorithms can then be made 12-17% faster than previous algorithms, and expand the coverage to 100% of the 32–bit range.
Although slightly slower than the narrow 25% variant, it still outperforms Boost by ~15%, which, around five years ago, was the seemingly fastest algorithm around.
Once again, benchmark results are provided but you can also run the code yourself.
Finally, I will present arguments as to why 100% 32-bit coverage may often be preferable to the narrower classes of algorithms.
One of the disadvantages of most fast date algorithms is that they often overflow once the day-count reaches around 25% of the 32–bit or 64–bit range. In 32–bit, this gives around 2.94 million years of coverage rather than around 11.86 million years.
The overflow is due to a term that is commonly used like (days * 4 + 3) / 146097 - which efficiently maps to both the 400-year and 100-year Gregorian rules in one step. The part days * 4 in particular is the cause of the overflow.
Algorithms that are affected by this include Boost, Neri-Schneider and my new fast date variant. I will abbreviate these as "25%" coverage (although they're slightly less by varying amounts).
This tradeoff to accept 25% coverage is often acceptable, given that 2.94 million years is still a very long time, however some libraries are designed to take a performance hit to reach close to the full range. An example is the Howard Hinnant algorithm , as well as the Nachum Dershowitz / Edward Reingold algorithm in Calendrical Calculations: Ultimate Edition (2018). These algorithms avoid (most of) the overflow by first determining which 400-year "era" the date belongs to — a calculation which does not require multiplication by 4. This naturally slows down the algorithm as it introduces an extra division and two multiplications. I have not come across any fast wide-input algorithms which provide full 32–bit coverage (although I will update this page if I learn of any).
Prior to this article series, the seemingly fastest way to create a wide-input variant would be to combine the eras concept with the Neri-Schneider algorithm. The fastest way seems to be by adding a large multiple (S) of 146097 days (number of days per 400 years), to push the signed number line into an unsigned range, prior to division.
Given: days = Days since 1970-01-01 — Then:
/* Large offset to push most inputs into positive range. */,const S = 14694const YEAR_SHIFT = 719468 + 146097 * S // 2,147,468,786
const RATA_SHIFT = 400 * S // 5,877,600
/* New - Calculate era first. */uday = days + RATA_SHIFT // Unsignedera = uday / 146097 // 400-year blocks elapseddoe = uday % 146097 // "Day of era": [0, 146096]/* The rest of the Neri-Schneider formula here (using doe instead of days) *//* See link in top-right for full algorithm. */year += era * 400 - YEAR_SHIFT
The constant 14694 is found by calculating the number of eras that fit in half of signed space: round((2^31 - 719468)/146097).
Even with the slowdown from the era calculation, this still runs faster than the C++ Boost algorithm on some platforms (it's close). This shows just how impressive the Neri-Schneider speedups were.
Due to the initial change of epoch from RATA_SHIFT occurring prior to the era calculation, it is impossible for this technique to cover the full 32–bit range. In the specific example shown, once days reaches -2^31 + 14861 (in the negative direction), then uday will underflow. If the constant of S is increased, then overflow occurs for large positive numbers instead.
The interesting thing is, when the era technique is applied to algorithms that don't need eras for their internal logic, then there's no need for the era to be completely correct. The algorithm will continue to work perfectly accurately even if era is significantly wrong, and day-of-era is adjusted by a matching multiple of 146097.
This is much the same as how “5-foot-20” is the same as “6-foot-8”.
The only thing the era adjustment needs to achieve in order to avoid early overflow is to ensure that the day-of-era is non-negative, and comfortably less than MAX_INT / 4.
Noting this, to achieve full 32-bit coverage and faster speed, we can replace the 400-year era as an approximate power-of-two bucket:
era (using division), we'll determine an approximate bucket (using bit-shifting).A good choice for a bucket is 220 (1,048,576), which is about 2.5% more than 7*146097 (1,022,679).
The result is the below algorithm, which is faster, and works for all 32-bit input:
days += 2147483648 // Add 2^31 (map to unsigned int)bckt = days >> 20 // Bucket (approx 7*400 year)bday = days - bckt * 1022679 // Day of bucketbday += 131235 // Align to 400-year block/* Rest of algorithm here, using bday instead of days */year += bckt * 2800 - 5878000 // Undo line 1, 3 & 4 adjustmentsThere are various magic numbers here, so I will explain them in the table below:
| Constant | Meaning |
|---|---|
2147483648 | 231 — which maps the day into a continuous unsigned range. |
1022679 | The number of days in a 7 eras — i.e. 7 * 146097 |
131235 | A constant that realigns to a 400-year block. It is calculated from (719468 - (2^31 % 146097)) % 146097,where 719468 is the number of days from 0000-03-01 to 1970-01-01 |
2800 | The number of years in a 7 eras — i.e. 7 * 400 |
5878000 | This undoes the prior adjustments. It is calculated from (2^31 - 719468 + 131235) / 365.2425, which divides perfectly. |
The algorithm used a bucket size of 220, but it is not the only choice.
See Appendix A for analysis of other possible bucket sizes.
The only change made before completing the whole algorithm is to roll up the constant 131235 into the next line qday = bday * 4 + 3 by multiplying it by 4 and producing qday = bday * 4 + 524943.
Given: days = Days since 1970-01-01 — Then:
/* Overflow Protection */days += 2147483648 // Add 2^31 (map to unsigned int)bckt = days >> 20 // Bucket (approx 7*400 year)bday = days - bckt * 1022679 // Day of bucket/* Algorithm from article 1 */qday = bday * 4 + 524943 // Quarter-days since XX00-02-28 06:00cent = qday / 146097 // Century-Februaries elapsedqjul = qday - (cent & ~3) + cent * 4 // Map to Julian Quarter-Dayyear = qjul / 1461 // Year (later incremented if Jan/Feb)yday = (qjul % 1461) / 4 // Day of Year (starting 1 March)N = yday * 2141 + 197913 // Neri‑Schneider equivalent of:M = N / 65536 // M = (yday * 5 + 2) / 153D = N % 65536 / 2141 // D = yday - (M * 153 + 2) / 5bump = (yday >= 306) // Check if Jan or Febday = D + 1
month = bump ? M - 12 : M
year += bckt * 2800 - 5878000 + bump // Undo adjustmentsThe benefit of expanding the input range from 99.98% to 100% is not limited to just tidying up edge cases and making for a cleaner API. By supporting the full-range in 32-bit, we open the door for compile-time promotion to 64-bit on those platforms that support it.
Existing "wide" algorithms that have minor overflow are restricted from 64-bit alternatives because it is important for algorithms to behave identically across platforms, even for inputs that are outside the defined safe-zone.
The speed advantages of 64-bit algorithms will be explored in the next blog post.
At the time of writing this article I have only had time to test the algorithm on the following systems, however I plan to add more soon.
| Algorithm: | Scan | Boost | Hinnant | Neri‑Schneider (Wide) | New Way | Approx. Speedup |
|---|---|---|---|---|---|---|
| Input Range: | ~25% | 99.x% | 99.x% | 100% | ||
| Dell Inspiron 13-5378 (Windows 10 22H2) Intel Core i3-7100 @ 2.4 GHz Compiler: MSVC 19.44 (x64) | 27626 | 179625 | 295635 | 175385 | 152547 | 15.5% |
| MacBook Pro 2024 (MacOS 15.6.1) Apple M4 Pro Compiler: Apple clang 17.0.0 | 3659 | 30557 | 49533 | 29921 | 26713 | 12.2% |
| MacBook Pro 2020 (MacOS 14.3.1) Intel Core i5 @ 2 GHz Compiler: Apple clang 15.0.0 | 5541 | 87669 | 152419 | 91794 | 77139 | 17.0% |
Stay tuned for more Gregorian date fun. Upcoming articles will cover:
If you are particularly interested, you can have a preview of the fast 64-bit algorithm test-case: fast64bit (although it is currently light on explaination).
If you found this interesting, you should:
The table below shows various bucket sizes along with the largest number of eras that is less than the bucket size.
| Bucket Size (2^b) | Buckets (t) | Eras Per Bucket (e) | Remainder (r) | Max bucket padding (p) | Max day of bucket (m) | Max cent calculation | |
|---|---|---|---|---|---|---|---|
| 2^18 | 262,144 | 16,384 | 1 | 116,047 | 1,901,917,469 | 1,902,063,565 | Overflow |
| 2^19 | 524,288 | 8,192 | 3 | 85,997 | 705,120,895 | 705,559,185 | 19,317 |
| 2^20 | 1,048,576 | 4,096 | 7 | 25,897 | 106,767,683 | 107,790,361 | 2,951 |
| 2^21 | 2,097,152 | 2,048 | 14 | 51,794 | 106,741,786 | 108,787,143 | 2,978 |
| 2^22 | 4,194,304 | 1,024 | 28 | 103,588 | 106,689,992 | 110,780,707 | 3,033 |
| 2^23 | 8,388,608 | 512 | 57 | 61,079 | 31,930,837 | 40,258,365 | 1,102 |
| 2^24 | 16,777,216 | 256 | 114 | 122,158 | 31,869,758 | 48,524,815 | 1,328 |
| 2^25 | 33,554,432 | 128 | 229 | 98,219 | 13,193,281 | 46,649,493 | 1,277 |
| 2^26 | 67,108,864 | 64 | 459 | 50,341 | 3,890,951 | 70,949,473 | 1,942 |
| 2^27 | 134,217,728 | 32 | 918 | 100,682 | 3,840,610 | 137,957,655 | 3,777 |
| 2^28 | 268,435,456 | 16 | 1,837 | 55,267 | 1,548,473 | 269,928,661 | 7,390 |
| 2^29 | 536,870,912 | 8 | 3,674 | 110,534 | 1,493,206 | 538,253,583 | 14,736 |
| 2^30 | 1,073,741,824 | 4 | 7,349 | 74,971 | 944,381 | 1,074,611,233 | Overflow |
| Formula: | 2^32 2^b | 2^b 146097 | 2^b - e*146097 | r * (t - 1) + 719468 | 146097 * e + p - 1 | m * 4 + 3 146097 | |
As shown, the first and last options must be excluded as the maximum "day of bucket" is not less than 25% of 232, leading to integer overflow. Rows outside this range are not shown as smaller buckets are not big enough to hold a single era, and larger buckets also overflow.
This leads to 11 different options for bucket size, from 219 to 229, each with the same full accuracy and identical calculation speed.
Four options stand out as interesting choices (rows highlighted):
It is unfortunate that the minimum possible "day of bucket" is 40,258,365 days, as that is not low enough to enable the EAF from the Neri-Schneider algorithm, which only works in the range of [0, 28825529] (see the section Applying “EAF” from the first article for details about this). This figure would need to be significantly lower, as it would need to be multiplied by 4 and still fit within the safe range noted. Nonetheless, this option is possibly the best for exploring alternative useful mathematical tricks, if any exist.
The last option is of particular interest, as the number of buckets being minimised makes it potentially appropriate for a lookup table of 16 "offsets", removing two multiplications and some additions from the algorithm.An implementation of this is shown in the benchmark codebase . On some platforms it appears to speed up the algorithm by around 7%, on other platforms it seems to have no meaningful effect. This approach is not recommended for general purpose libraries due to using L1 Data Cache, potentially causing much worse performance if the calling software makes heavy use of L1 Data Cache, which may get busted.