Ben Joffe

Faster Ordinal Date Algorithms

Draft Date: 3 January 2026

This is an early draft blog post. Much more detail and explanation will be added later.
The algorithms are unlikely to change much unless something new comes up.

.......
.......
.......

The below algorithm computes year, ordinal (1-366), and leap (boolean) given rata-die (days since 1 Jan 1970).

Fast Ordinal Date AlgorithmView in C++

```
const ERAS = 5949
```

const D_SHIFT = 146097 * ERAS + 719162 + 366

```
const Y_SHIFT = 400 * ERAS
```
```
const CEN_MUL = (4 << 47) / 146097
```
```
const JUL_MUL = (4 << 40) / 1461 + 1
```
```
const CEN_CUT = (365 << 32) / 36525
```

day += D_SHIFT                       // Epoch: -XX00-01-01

c_n = (day * CEN_MUL) >> 15          // Divide 36524.25

cen = c_n >> 32                      // Century

cpt = c_n % (1 << 32)                // Century-part

ijy = cen % 4 == 0 || cpt > CEN_CUT  // "Is Julian Year"

jul = day + cen - cen / 4            // Julian map

y_n = (jul * JUL_MUL) >> 8           // Divide 365.25

yrs = y_n >> 32                      // Year

ypt = y_n % (1 << 32)                // Year-part

```
year = yrs - Y_SHIFT
```
```
ordinal = ((ypt * 1461) >> 34) + ijy
```
```
leap = yrs % 4 == 0 & ijy
```

Note that the above algorithm is mostly 32-bit friendly. The calculations of both c_n and y_nare the only lines that don't translate directly on 32-bit computers, each being compiled to a 4-cycle operation due to the bit-shift overlapping two adjacent 32-bit registers. This small speed cost is partially offset bycen and yrs being "free" on 32-bit computers (grabbing the upper 32-bits is just requires accessing the upper register), making the overall penalty 2 × (4 - 2) = 4 cycles.

The section below demonstrates the steps required to perform the bit-shift by 8 for y_n followed by upper and lower 32-bits for yrs and ypt resepectively. Each box represents a single byte (8 bits), from highest byte to lowest.

In 64-bit: input = (jul * JUL_MUL) =

87654321

↳ yrs:

0876

– right-shift input by 5 bytes (40 bits)
↳ ypt:

5432

– right-shift input by 1 byte (8 bits) + truncate to 4 bytes (a free operation)

As seen above, this is two steps overall.
In 32-bit though, there are four steps:

In 32-bit: hi =

8765

; lo =

4321

↳ yrs:

0876

– right-shift hi by 1 byte (8 bits)
↳ ypt:

5000

– left-shift hi by 3 bytes (24 bits)

0432

– right-shift lo by 1 byte (8 bits)

5432

– bitwise combine the above two results (& operator).

So the penalty here is 2 cycles on 32-bit.
The same principles apply for the calculation of cen and cpt, hence why the overall penalty is described as 4-cycles.

Accuracy and Range

The range of the algorithm in 32-bit is suitable for many applications:

Total Days	1,739,698,238	~1.7 Billion
Total Years	4,763,130	~4.7 Million
Max Date	+2,383,532-12-30 — Rata Die: +869,848,022
Min Date	−2,379,599-01-01 — Rata Die: −869,850,215

Ordinal to Month + Day

The below function can be used to calculate the day and month given Year/Ordinal/Leap.
It is similar to the algorithm presented in Calendrical Calculations, as well as the previous time-rs algorithm developed by Jacob Pratt. but with the following changes:

Performs a shift after multiplication by STEP instead of prior, which reduces the number of additions, and allows super-scalar processors to multiply in parallel to calculating shift.
Uses the Neri-Schneider technique to use high and low parts of multiplication, which also particularly speeds up performance on modern super-scalar processors.
Uses platform specific scale for micro optimisations (ARM vs x86).

Fast Month & Day from Year/Ordinal/LeapView in C++

```
#if IS_ARM
```
```
    const SCALE = 1
```
```
#else
```
```
    const SCALE = 2
```
```
#endif
```
```
const STEP = 1071 * SCALE
```
```
const DIVISOR = SCALE << 15
```
```
const SHIFT_0 = DIVISOR - 439 * SCALE
```
```
const SHIFT_1 = SHIFT_0 + STEP
```
```
const SHIFT_2 = SHIFT_1 + STEP
```

shift = ordinal < 59 + leap ? SHIFT_0 : (leap ? SHIFT_1 : SHIFT_2)

```
num = ordinal * STEP + shift
```
```
month = num / DIVISOR
```
```
day = (num % DIVISOR) / STEP + 1
```

The modulo by DIVISOR is highlighted in green as it is a free operation on x86 processors. On ARM, this benefit does not exist, however we can get an alternative benefit by using the smaller SCALE, which makes sure that the constants all fit under 16-bits in size, and will thus load faster than otherwise.

This DRAFT article will be improved with more content soon before "official" publication.