Ben Joffe

A New Faster Overflow-Safe Date Algorithm

12-17% faster than comparable algorithm with no overflow.

10 November 2025

This article is part 2 in a series of new fast date algorithms.

Part 1 introduced a new faster date conversion algorithm which outperforms existing algorithms by 2-11%.
Like most fast date algorithms, it works on around 25% of the 32–bit or 64-bit range.

In this article, I explore how Hinnant’s era technique (which expands coverage from ~25% to over ~99.9% of the 32–bit range) can be applied to other fast algorithms. With a small tweak, these algorithms can then be made 12-17% faster than previous algorithms, and expand the coverage to 100% of the 32–bit range.

Although slightly slower than the narrow 25% variant, it still outperforms Boost by ~15%, which, around five years ago, was the seemingly fastest algorithm around.

Once again, benchmark results are provided but you can also run the code yourself.

Finally, I will present arguments as to why 100% 32-bit coverage may often be preferable to the narrower classes of algorithms.

Background

One of the disadvantages of most fast date algorithms is that they often overflow once the day-count reaches around 25% of the 32–bit or 64–bit range. In 32–bit, this gives around 2.94 million years of coverage rather than around 11.86 million years.

The overflow is due to a term that is commonly used like (days * 4 + 3) / 146097 - which efficiently maps to both the 400-year and 100-year Gregorian rules in one step. The part days * 4 in particular is the cause of the overflow.

Algorithms that are affected by this include Boost, Neri-Schneider and my new fast date variant. I will abbreviate these as "25%" coverage (although they're slightly less by varying amounts).

This tradeoff to accept 25% coverage is often acceptable, given that 2.94 million years is still a very long time, however some libraries are designed to take a performance hit to reach close to the full range. An example is the Howard Hinnant algorithm , as well as the Nachum Dershowitz / Edward Reingold algorithm in Calendrical Calculations: Ultimate Edition (2018). These algorithms avoid (most of) the overflow by first determining which 400-year "era" the date belongs to — a calculation which does not require multiplication by 4. This naturally slows down the algorithm as it introduces an extra division and two multiplications. I have not come across any fast wide-input algorithms which provide full 32–bit coverage (although I will update this page if I learn of any).

Comparison Algorithm

Prior to this article series, the seemingly fastest way to create a wide-input variant would be to combine the eras concept with the Neri-Schneider algorithm. The fastest way seems to be by adding a large multiple (S) of 146097 days (number of days per 400 years), to push the signed number line into an unsigned range, prior to division.

Given: days = Days since 1970-01-01 — Then:

Neri–Schneider + ErasFull Algorithm in C++

/*  Large offset to push most inputs into positive range.  */,

```
const S = 14694
```

const YEAR_SHIFT = 719468 + 146097 * S     //  2,147,468,786

const RATA_SHIFT = 400 * S                 //  5,877,600

```
/*  New - Calculate era first.  */
```

uday = days + RATA_SHIFT                   //  Unsigned

era = uday / 146097                        //  400-year blocks elapsed

doe = uday % 146097                        //  "Day of era": [0, 146096]

/* The rest of the Neri-Schneider formula here (using doe instead of days) */

/* See link in top-right for full algorithm. */

```
year += era * 400 - YEAR_SHIFT
```

The constant 14694 is found by calculating the number of eras that fit in half of signed space: round((2^31 - 719468)/146097).

Even with the slowdown from the era calculation, this still runs faster than the C++ Boost algorithm on some platforms (it's close). This shows just how impressive the Neri-Schneider speedups were.

Due to the initial change of epoch from RATA_SHIFT occurring prior to the era calculation, it is impossible for this technique to cover the full 32–bit range. In the specific example shown, once days reaches -2^31 + 14861 (in the negative direction), then uday will underflow. If the constant of S is increased, then overflow occurs for large positive numbers instead.

From Wide-Input to Full-Range Coverage

The interesting thing is, when the era technique is applied to algorithms that don't need eras for their internal logic, then there's no need for the era to be completely correct. The algorithm will continue to work perfectly accurately even if era is significantly wrong, and day-of-era is adjusted by a matching multiple of 146097.

This is much the same as how “5-foot-20” is the same as “6-foot-8”.

The only thing the era adjustment needs to achieve in order to avoid early overflow is to ensure that the day-of-era is non-negative, and comfortably less than MAX_INT / 4.

Noting this, to achieve full 32-bit coverage and faster speed, we can replace the 400-year era as an approximate power-of-two bucket:

The first addition will be exactly 2³¹ (rather than a multiple of 146097 that is close to 2³¹). This ensures that we have a continuous positive range with no input values overflowing or underflowing.
Rather than calculating a precise era (using division), we'll determine an approximate bucket (using bit-shifting).
We pick a bucket size that slightly over-counts a nearby multiple of 146097.
After the above is completed, we'll add a constant so that the day-count is properly aligned again to a 400-year block. This is required for subsequent calculations.
Finally, at the very end, we'll need to undo our adjustments by a multiple-of-400 years that is associated with our bucket size, as well as by a constant number of years.

A good choice for a bucket is 2²⁰ (1,048,576), which is about 2.5% more than 7*146097 (1,022,679).

The result is the below algorithm, which is faster, and works for all 32-bit input:

Oveflow–Safe Bucket Techniquedays += 2147483648            //  Add 2^31 (map to unsigned int)
bckt = days >> 20             //  Bucket (approx 7*400 year)
bday = days - bckt * 1022679  //  Day of bucket
bday += 131235                //  Align to 400-year block
/* Rest of algorithm here, using bday instead of days */
year += bckt * 2800 - 5878000  //  Undo line 1, 3 & 4 adjustments

There are various magic numbers here, so I will explain them in the table below:

Constant	Meaning
`2147483648`	2³¹ — which maps the day into a continuous unsigned range.
`1022679`	The number of days in a 7 eras — i.e. `7 * 146097`
`131235`	A constant that realigns to a 400-year block. It is calculated from `(719468 - (2^31 % 146097)) % 146097`, where `719468` is the number of days from `0000-03-01` to `1970-01-01`
`2800`	The number of years in a 7 eras — i.e. `7 * 400`
`5878000`	This undoes the prior adjustments. It is calculated from `(2^31 - 719468 + 131235) / 365.2425`, which divides perfectly.

The algorithm used a bucket size of 2²⁰, but it is not the only choice.
See Appendix A for analysis of other possible bucket sizes.

Full algorithm (32–bit)

The only change made before completing the whole algorithm is to roll up the constant 131235 into the next line qday = bday * 4 + 3 by multiplying it by 4 and producing qday = bday * 4 + 524943.

Given: days = Days since 1970-01-01 — Then:

Fast 32-bit overflow-safeC++ Example

```
/* Overflow Protection */
```

days += 2147483648                    //  Add 2^31 (map to unsigned int)

bckt = days >> 20                     //  Bucket (approx 7*400 year)

bday = days - bckt * 1022679          //  Day of bucket

```
/* Algorithm from article 1 */
```

qday = bday * 4 + 524943              //  Quarter-days since XX00-02-28 06:00

cent = qday / 146097                  //  Century-Februaries elapsed

qjul = qday - (cent & ~3) + cent * 4  //  Map to Julian Quarter-Day

year = qjul / 1461                    //  Year (later incremented if Jan/Feb)

yday = (qjul % 1461) / 4              //  Day of Year (starting 1 March)

N = yday * 2141 + 197913              //  Neri‑Schneider equivalent of:

M = N / 65536                         //  M = (yday * 5 + 2) / 153

D = N % 65536 / 2141                  //  D = yday - (M * 153 + 2) / 5

bump = (yday >= 306)                  //  Check if Jan or Feb

```
day = D + 1
```
```
month = bump ? M - 12 : M
```

year += bckt * 2800 - 5878000 + bump  //  Undo adjustments

Why 100% range may be preferable

The benefit of expanding the input range from 99.98% to 100% is not limited to just tidying up edge cases and making for a cleaner API. By supporting the full-range in 32-bit, we open the door for compile-time promotion to 64-bit on those platforms that support it.

Existing "wide" algorithms that have minor overflow are restricted from 64-bit alternatives because it is important for algorithms to behave identically across platforms, even for inputs that are outside the defined safe-zone.

The speed advantages of 64-bit algorithms will be explored in the next blog post.

Benchmarks

At the time of writing this article I have only had time to test the algorithm on the following systems, however I plan to add more soon.

Boost is only included for general comparison, it is not really the same category of algorithm due to being only ~25% bit-range.
The "Approx Speedup" column is only comparing the last two highlighted columns. It is calculated by first subtracting the "scan" performance, which removes the overhead of calling the function.
Lower numbers are faster.

Algorithm:	Scan	Boost	Hinnant	Neri‑Schneider (Wide)	New Way	Approx. Speedup
Input Range:		~25%	99.x%	99.x%	100%
Dell Inspiron 13-5378 (Windows 10 22H2) Intel Core i3-7100 @ 2.4 GHz Compiler: MSVC 19.44 (x64)	27626	179625	295635	175385	152547	15.5%
MacBook Pro 2024 (MacOS 15.6.1) Apple M4 Pro Compiler: Apple clang 17.0.0	3659	30557	49533	29921	26713	12.2%
MacBook Pro 2020 (MacOS 14.3.1) Intel Core i5 @ 2 GHz Compiler: Apple clang 15.0.0	5541	87669	152419	91794	77139	17.0%

Next Steps

Stay tuned for more Gregorian date fun. Upcoming articles will cover:

64-bit algorithms that take the next logical step, including one that performs about 20% faster than the 1/4 range Neri‑Schneider on the M4 Mac Pro processor.
Some other completely different techniques that I tried out while developing these algorithms, some that provide speedups in certain cases, others that are just interesting.

If you are particularly interested, you can have a preview of the fast 64-bit algorithm test-case: fast64bit (although it is currently light on explaination).

If you found this interesting, you should:

Follow me on X to get notified when I publish more date and algorithm related articles.
Check out my article on a new Martian timekeeping system.

Appendix A - Other bucket-size options

The table below shows various bucket sizes along with the largest number of eras that is less than the bucket size.

Bucket Size (2^b)		Buckets (t)	Eras Per Bucket (e)	Remainder (r)	Max bucket padding (p)	Max day of bucket (m)	Max cent calculation
2^18	262,144	16,384	1	116,047	1,901,917,469	1,902,063,565	Overflow
2^19	524,288	8,192	3	85,997	705,120,895	705,559,185	19,317
2^20	1,048,576	4,096	7	25,897	106,767,683	107,790,361	2,951
2^21	2,097,152	2,048	14	51,794	106,741,786	108,787,143	2,978
2^22	4,194,304	1,024	28	103,588	106,689,992	110,780,707	3,033
2^23	8,388,608	512	57	61,079	31,930,837	40,258,365	1,102
2^24	16,777,216	256	114	122,158	31,869,758	48,524,815	1,328
2^25	33,554,432	128	229	98,219	13,193,281	46,649,493	1,277
2^26	67,108,864	64	459	50,341	3,890,951	70,949,473	1,942
2^27	134,217,728	32	918	100,682	3,840,610	137,957,655	3,777
2^28	268,435,456	16	1,837	55,267	1,548,473	269,928,661	7,390
2^29	536,870,912	8	3,674	110,534	1,493,206	538,253,583	14,736
2^30	1,073,741,824	4	7,349	74,971	944,381	1,074,611,233	Overflow
Formula:		2^32 2^b	2^b 146097	2^b - e*146097	r * (t - 1) + 719468	146097 * e + p - 1	m * 4 + 3 146097

As shown, the first and last options must be excluded as the maximum "day of bucket" is not less than 25% of 2³², leading to integer overflow. Rows outside this range are not shown as smaller buckets are not big enough to hold a single era, and larger buckets also overflow.

This leads to 11 different options for bucket size, from 2¹⁹ to 2²⁹, each with the same full accuracy and identical calculation speed.
Four options stand out as interesting choices (rows highlighted):

2¹⁹ days ~ 3 × 400 years — Smallest possible bucket size.
2²⁰ days ~ 7 × 400 years — My choice: minimises the delta between the bucket size and the remander-after-eras (25,897 days ~ 2.5% difference).
2²³ days ~ 57 × 400 years — Minimises the maximum "day of bucket" (40,258,365 days).
2²⁹ days ~ 3,674 × 400 years — Minimises the number of buckets (8).

It is unfortunate that the minimum possible "day of bucket" is 40,258,365 days, as that is not low enough to enable the EAF from the Neri-Schneider algorithm, which only works in the range of [0, 28825529] (see the section Applying “EAF” from the first article for details about this). This figure would need to be significantly lower, as it would need to be multiplied by 4 and still fit within the safe range noted. Nonetheless, this option is possibly the best for exploring alternative useful mathematical tricks, if any exist.

The last option is of particular interest, as the number of buckets being minimised makes it potentially appropriate for a lookup table of 16 "offsets", removing two multiplications and some additions from the algorithm.An implementation of this is shown in the benchmark codebase . On some platforms it appears to speed up the algorithm by around 7%, on other platforms it seems to have no meaningful effect. This approach is not recommended for general purpose libraries due to using L1 Data Cache, potentially causing much worse performance if the calling software makes heavy use of L1 Data Cache, which may get busted.