Problems with ancient date-time values
Sometimes you get questions that really lead you down rabbit holes:
Year Zero
My database uses
0000-12-31
as a timestamp and it crashes Google Big Query as it doesnt consider0000
as a year. So to solve this customer asks if we can replace all0000-12-31
values with0001-01-01
.
Why does the year 0000 crash Big Query?
Its a “bug” in big query: using Anno Domini calendaring instead of astronomical year numbering which you would think be more appropriate for scientific systems. Year 0000
doesn’t exist in AD, it goes 1BC, 1AD. In this case year 0000
should be converted to -0001
not 0001
.
What does year 0000 even mean to a computer?
Computers interpret date-time
as an instant in time represented by milliseconds relative to Epoch. This works with with remarkable consistency across implementations for dates after the adoption of UTC in 1972 but what about dates in the ancient past?
Parsing 0000-01-01T00:00:00Z
gives different results depending what programming language you are using:
$ java Main
-62135769600000
$ node date.js
-62135596800000
$ python date.py # smallest date allowed is "0001-01-02T00:00:00Z"
-62135546692000.0
$ bash date.bash
-62135596800000
Here are some of the important things happening around the this date:
- The Roman Empire was at its peak, and Emperor Augustus was in power.
- Herod the Great was king of Judea.
- The construction of the Great Wall of China began during the reign of Emperor Cheng of the Han dynasty.
- The Silk Road trade route was well-established, connecting China to the Mediterranean.
- The Jewish temple in Jerusalem was still standing, but would be destroyed in the year 70 AD by the Romans.
So why are the answers different? Basically it could be anything but some prime suspects are:
- Leap seconds
- Leap years
- Pope/Emperor decree
- Timezones (or lack of)
- Early dates are a great place to put “fiddle factors” since no one should be using them
- The length of a day in milliseconds in 1 AD was different from 2023. The Earth’s rotation is gradually slowing down so the length of a day in milliseconds in 1 AD was approximately 86,400.002 seconds, whereas in 2023, it is approximately 86,400.0002 seconds
Most importantly:
…And probably many more issues
I hope its now clear why referencing a precise instant of time in Year Zero is very difficult!
Why would you even need a precise definition of time for an instant in Year Zero?
This is a very good question.
In the past, I’ve dealt with datasets going back to the 1800s that needed accurate timing and Oracle gave us zero issues, but what on earth could someone be doing that needed accurate timestamps going this far back?
Turns out this customer was doing with realtime data.
What was the real intention?
I have no idea, but its likely 0000-12-31
is a one year offset to a more recent time, such as today. Notice how by missing the time component its not actually the full year too. Already we are seeing problems.
How do we fix this?
Ideally at the source, for example a Go app could calculate date offsets as below. Note that years have different lengths due to leap years:
package main
import (
"fmt"
"time"
)
func main() {
// Length of year in Java:
// https://stackoverflow.com/questions/29899299/is-there-a-way-to-have-a-java8-duration-of-one-year-that-accounts-for-leap-years#29899551
// surely there is a Golang module providing this functionality
durationString := "8760h" // duration string representing one year (365 days * 24 hours/day)
duration, err := time.ParseDuration(durationString)
if err != nil {
fmt.Println(err)
return
}
milliseconds := duration.Milliseconds()
// 31536000000
fmt.Printf("One year has a duration of %v milliseconds\n", milliseconds)
}
With fixed time offsets in milliseconds represented as long
integers all of the above issues and concerns vanish instantly and there is no need to worry about such nonsense or waste time and money identifying and working around the quirks of each system the data passes through.
1000 AD
I have this specific date
1000-01-01T00:00:00Z
, when I sink this data to Oracle I get0999-12-27T00:00:00Z
, all other dates work fine
This seems familiar, but lets ask GPT for its thoughts on timekeeping in 1000 AD:
The accuracy of time measurement in 1000 AD was limited by the available technology and the understanding of astronomy and mechanics at the time. It’s important to remember that the concept of precise minutes and seconds as we understand them today did not exist in the same form during this period. Time was often measured in broader intervals, and the exact time of day was not as crucial for most daily activities as it is in modern society.
Makes perfect sense. This time were using dates-time
values as magic numbers, eg:
- 1000 AD means no date set for “early” event
- 3000 AD means no date set for “late” event
Now we know how computers represent date-time
its obvious for us to see why this will give us problems vs a simple integer such as -1
. For a magic number to work properly we need to match an exact integer value but:
date-time
values in the Middle Ages are not scientifically definabledate-time
values in the Next Millennium are scientifically defined… But the definition may change at some point in the future, so perhaps your app will mysteriously break in a few (hundred?) years!
How can we fix this?
Ideally in the same way as our Year Zero problem. At the source - one/both of these approaches would work:
boolean
field to indicate whether a date time is relevant or expected to be a usable value (replaces magic number)null
value indate-time
field to indicate date is not expected to be usable
But I cant fix my data!
Unfortunately this is the reality for a lot of data consumers. In this case you need to clean the data before you can work on it - either once at the ingest point of your system or if thats not possible, at each point of use where problems are occurring.
Now we understand why the exact representation of this data is nonsense, its probably easiest to:
- Convert
date-time
values tostring
- Carry out any required cleanup by doing find and replace with regular expressions
- Convert back to
data-time
for downstream systems to use.
Yuk!
Im a programmer, how can I avoid these problems?
- Say what you mean. If you want to add 365 days to the current time write data structures to support this and then do so
- Dont shoe-horn multiple items of data into a single field
- If you have an on/off switch use a
boolean
value null
out fields that have no meaning
And most importantly:
Stay away from date-times in the ancient past unless you actually need to reference an instant of time in this period (you dont).
Sample Programs
Dont just take my word for it…
Java
Main.java
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;
public class Main {
public static void main(String[] args) throws ParseException {
String dateString = "0001-01-01T00:00:00Z";
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss'Z'");
dateFormat.setTimeZone(TimeZone.getTimeZone("UTC"));
Date date = dateFormat.parse(dateString);
long milliseconds = date.getTime();
System.out.println(milliseconds);
}
}
JavaScript
date.js
const dateString = "0001-01-01T00:00:00Z";
const dateObject = new Date(Date.parse(dateString));
const milliseconds = dateObject.getTime();
console.log(milliseconds)
Python
date.py
import datetime
date_string = "0001-01-02T00:00:00Z"
date_format = "%Y-%m-%dT%H:%M:%S%fZ"
date_object = datetime.datetime.strptime(date_string, date_format)
milliseconds = date_object.timestamp() * 1000
print(milliseconds)
BASH
date.bash
#!/bin/bash
date_string="0001-01-01T00:00:00Z"
date_format="%Y-%m-%dT%H:%M:%S%Z"
# Convert date string to timestamp in seconds
timestamp=$(date -u -d "$date_string" +"%s")
# Multiply by 1000 to get timestamp in milliseconds
timestamp_ms=$(($timestamp * 1000))
echo $timestamp_ms