Precision Through Imprecision: Improving Time Objects

13 minute read

tl;dr When creating value objects representing time, I recommend choosing how finegrained the time should be with your domain experts and round it off to that precision in the value object.

When modeling important numbers, it’s considered good form to specify the precision. Whether it’s money, size or weight; you’ll typically round off to a given decimal point. Even if it’s only for user display, rounding off makes the data more predictable for manipulation and storage.

Unfortunately, we don’t often do this when handling time and it bites us in the rear. Consider the following code:

$estimatedDeliveryDate = new DateTimeImmutable('2017-06-21');

// let's assume today is ALSO 2017-06-21
$now = new DateTimeImmutable('now');

if ($now > $estimatedDeliveryDate) {
    echo 'Package is late!';
} else {
    echo 'Package is on the way.';
}

Since it’s June 21 in the code sample, this code should print “Package is on the way.” After all, the day isn’t over yet, it might just be coming later in the afternoon.

Except the code doesn’t do that. Because we didn’t specify the time component, PHP helpfully zero pads $estimatedDeliveryDate to 2017-06-21 00:00:00. On the other hand, $now is calculated for…now. “Now” includes the current time (which probably isn’t midnight), so you’ll get 2017-06-21 15:33:34 which is indeed later than 2017-06-21 00:00:00.

Solution 1

“Oh, this is a quick fix.” folks might say and update it to the following.

$estimatedDeliveryDate = new DateTimeImmutable('2017-06-21');
$estimatedDeliveryDate = $estimatedDeliveryDate->setTime(23, 59);

Cool, we changed the time to include up to midnight. Except the time is padded to 23:59:00 so if you look in the last 59 seconds of the day, you’ll have the same problem.

“Grrr, okay.” folks might say.

$estimatedDeliveryDate = new DateTimeImmutable('2017-06-21');
$estimatedDeliveryDate = $estimatedDeliveryDate->setTime(23, 59, 59);

Cool, now it’s fixed.

…Unless you’re on PHP 7.1 which adds microseconds to DateTime objects. So now it only occurs on the last second of the day. I may be biased after working on too many high traffic systems but sooner or later a user or a automated process will hit that and complain. Good luck tracking down THAT bug. :-/

Okay, let’s add microseconds.

$estimatedDeliveryDate = new DateTimeImmutable('2017-06-21')
$estimatedDeliveryDate = $estimatedDeliveryDate->modify('23:59:59.999999');

And this works.

Until we get nanoseconds.

In PHP 7.2.

Okay, okay, we CAN reduce the margin of error further and further to the point that errors become unrealistic. Still, at this point it should be clear this approach is flawed: we’re chasing an infinitely divisible value closer and closer to a point we can never reach. Let’s try a different approach.

Solution 2

Instead of calculating the last moment before our boundary, let’s check against the boundary instead.

$estimatedDeliveryDate = new DateTimeImmutable('2017-06-21');

// Start calculating when it's late instead of the last moment it's running on time
$startOfWhenPackageIsLate = $estimatedDeliveryDate->modify('+1 day');

$now = new DateTimeImmutable('now');

// We've changed the > operator to >=
if ($now >= $startOfWhenPackageIsLate) {
    echo 'Package is late!';
} else {
    echo 'Package is on the way';
}

So this version works and it’s always accurate throughout the whole day. Unfortunately, it’s also become more complex. If you don’t encapsulate this logic within a value object or similar, it’ll get missed somewhere in your app.

Even if you do encapsulate it, we’ve made this one type of operation (>=) logical and consistent but it’s not a consistent fix for all operations. If we wanted to support equality checks, for example, we’d have to do another, different type of special data juggling to make that operation work correctly. Meh.

Finally (and this might just be me) this solution has the misleading smell of a potentially missed domain concept. “Is there a LatePeriodRange? A DeliveryDeadline?” you might say. “The package enters into a late period, then….something happens? The domain expert never mentioned a deadline, but it seems to be there. Is that different than the EstimatedDeliveryDate? Where does it go?” It doesn’t go. It doesn’t go anywhere. It’s just a weird quirk of the implementation that’s now stuck in your head.

So, this is a better solution in that it consistently yields a correct answer…but it’s not a great solution. Let’s see if we can do better.

Solution 3

So, all we want to do is compare two days. Now, if we picture a DateTime object as a set of numbers (year, month, day, hour, month, second, etc…) everything up to the day part is working fine. All of the problems we’ve had are due to the extra values after that: hour, minute, second, etc. We can argue about the annoying and insidious ways those values keep leaking in there, but the fact remains that the time component that’s wrecking our checks.

If the day component is all that’s important to us, why do we put up with these extra values? Unless it rolls over into the next day, a few extra hours or minutes won’t change the outcome of the business rules.

So, let’s just throw the extra cruft away.

// Simplify the dates down to just the day, discarding the rest
$estimatedDeliveryDate = day(new DateTimeImmutable('2017-06-21'));
$now = day(new DateTimeImmutable('now'));

// Now the comparison is simple
if ($now > $estimatedDeliveryDate) {
    echo 'Package is late!';
} else {
    echo 'Package is on the way.';
}

// Clunky but effective way to discard extra precision. PHP
// will zero pad the remaining values such as milli/nanosecond.
// In the case of dates, we can just use the setTime function but for
// other precisions, you'll have to use other logic to discard the extra
// data beyond what you need.
function day(DateTimeImmutable $date) {
    return $date->setTime(0, 0, 0);
}

This gives us the simpler comparison/calculation we saw in solution 1, with the accuracy we had in solution 2. It’s just…the ugliest version of the code yet, plus it’s super easy to forget to call day() within in your code.

However, the code IS easy to abstract. More importantly though, it’s becoming clear that when we’re talking about estimated delivery dates, we’re ALWAYS talking about a day, never about a time. Both of these things make this a good candidate for pushing this into a type.

Encapsulation At Last

In other words, let’s make this a value object.

$estimatedDeliveryDate = EstimatedDeliveryDate::fromString('2017-06-21');
$today = EstimatedDeliveryDate::today();

if ($estimatedDeliveryDate->wasBefore($today)) {
    echo 'Package is late!';
} else {
    echo 'Package is on the way.';
}

Look how nice that reads. The value object itself is nice and boring:

class EstimatedDeliveryDate
{
    private $day;

    private function __construct(DateTimeImmutable $date)
    {
        $this->day = $date->setTime(0, 0, 0);
    }
    public static function fromString(string $date): self
    {
         // Possibly verify YYYY-MM-DD format, etc
        return new static(new DateTimeImmutable($date));
    }
    public static function today(): self
    {
        return new static(new DateTimeImmutable('now'));
    }
    public function wasBefore(EstimatedDeliveryDate $otherDate): bool
    {
        return $this->day < $otherDate->day;
    }
}

Because we’ve now made this a class, we’re automatically enforcing a lot of helpful rules: You can only compare a EstimatedDeliveryDate to another EstimatedDeliveryDate, so the precision always lines up.

The correct precision handling is in a single internal place, the consuming code never needs to consider precision at all.

It’s easy to test.

You’ve got a great single place to centralize your timezone handling (not discussed here but super important).

One quick pro-tip: I’ve used a today() method here to show how you can have multiple constructors. In practice, I’d recommend creating a system clock and get your “now” instances from that, it’ll make your unit tests much easier to write. The “real” version would probably look like:

$today = EstimatedDeliveryDate::fromDateTime($this->clock->now());

Precision Through Imprecision

The important takeaway here isn’t “value objects, yay, inline juggling, boo!” It’s that we were able to remove several classes of errors by reducing the precision of the DateTime we were handling. If we hadn’t done that, the value object would still be handling all of these edges cases and probably failing at some of them too.

Reducing the quality of data to get a correct answer might seem counter-intuitive but it’s actually a more realistic view of the system we’re trying to model. Our computers might run in picoseconds but our business (probably) doesn’t. Plus, the computer is probably lying anyways.

As devs, it might feel we’re being more flexible and future-proof by keeping all possible information. After all, who are you to decide what information to throw away? Yet, the truth is that while information can potentially be worth money in the future, it definitely costs money to keep it until that future. It’s not just the cost of a bigger hard drive either, it’s the cost of complexity, of people, of time, and in the case of bugs, reputation. Sometimes working with data in its most complex form will turn out to be worth the cost but sometimes it isn’t, so just blindly saving everything you can because you can isn’t always a winning game.

To be clear: I’m not recommending you just randomly remove available time information.

What I am recommending: Explicitly choose a precision for your time points, together with your domain experts. If you’re getting more precision than you expect, it can cause bugs and additional complexity. If you’re getting less precision than you expect, it can cause bugs and failed business rules. The important thing is that we define the expected and necessary level of precision.

Further, choose the precision separately for each use case. Rounding will usually be in the value object, not at the system clock level. As we’ll talk about later, some places still need nanosecond precision but others might only need a year. Getting the precision right makes the language clearer.

This Crap Is Everywhere

It’s worth pointing out that we’ve only talked about a specific type of bug here: excess precision throwing off greater than/less than checks. But this advice applies to a much wider set of errors. I won’t go into all of them, though I do want to point out a personal favorite, “leftover” precision.

// Let's assume today is June 21st, so this equals June 28
$oneWeekFromNow = new DateTimeImmutable('+7 days');
// Also June 28 but set explicitly or loaded from DB
$explicitDate = new DateTimeImmutable('2017-06-28');

// Comparing based on state, are these the same date?
var_dump($oneWeekFromNow == $explicitDate);

No, they’re not the same date because $oneWeekFromNow also has the current time whereas $explicitDate is set to 00:00:00. Delightful.

The examples above talked about precision primarily in time vs date but modeling precision applies to any unit of time. Imagine how many scheduling apps only need times to the minute and how many financial apps need support for quarters of the year.

Once you start looking at it, you realize how many time errors can be explained by undefined precision. They might look like bad range checks or poorly designed bounds but when you dive in, you start to see a pattern emerge.

My experience is that this class of errors are often missed in testing. System clock objects aren’t a common sight (yet), so testing code that uses the current time is a bit tricky. And when there are tests, the fixtures often don’t pad the date out completely so so it’s easy to miss the error windows.

Nor is this a problem specific to PHP’s DateTime library. When I tweeted about this last week, Anthony Ferrara mentioned how Ruby’s time precision varies depending on the operating system yet the database library had an fixed level. That sounds fun to debug.

Time is just hard to work with. Time math doubly so.

Choosing A Level Of Precision

So we can say that choosing a level of precision for your time objects is super important but how do we select the right one? As a rule of thumb, I would say be open-ended with timepoints for your technical needs but set an explicit level of precision for all of your domain objects.

For your logs, your event sourcing data, your metrics, go as fine-grained as you need/want/can. These are primarily aimed at technical personnel who are more familiar with fine grained dates and the extra precision is often necessary for debugging. You’ll likely need to get very finegrained for system or sequenced data. That’s okay, it’s what the constraints demand.

For business concerns, talk to your domain experts about how fine-grained that information needs to be. They can help you balance what they’re using now vs what they might need in the future. Business rules are often an area where you’re playing with borrowed knowledge, so shedding complexity can be a smart move. Remember, you’re not after an accurate-to-real-life model, you’re after a useful one.

Within the code, this might occasionally lead to varying levels of precision, even within the same class. For example, consider this class in an event sourced application.

class OrderShipped
{
    // Order object that's capped to day precision.
    private $estimatedDeliveryDate;

    // Order object that's capped to second level precision.
    private $shippedAt;

    // Event sourcing object that's capped to microsecond
    private $eventRecordedAt;
}

If the varying levels of precision seem strange, remember that these time points have very different use cases. Even the $shippedAt and $eventRecordedAt might point to the same “time”, but they belong to very different sections of the code.

You might also find the business is working with units (and therefore precisions) of time you don’t expect: quarters, financial calendars, shifts, morning/afternoon/evening parts. There’s a lot of interesting conversations to be had in exploring these extra units.

Changing Requirements

Another good part of having these conversations: If the business rules change in the future and you turn out to need more precision than originally recorded, then it was a joint decision and you talk about how to fix the legacy cases. Ideally, we can shift it from being a technical problem to a business one.

In most cases, this can be simple: “We originally only needed the date they signed up but now we need the time so we can see if it falls before business closing times.” Maybe this affects a small number of accounts and you can set them to the beginning of the next business day. Or just zero pad the dates. Or maybe there’s an extra business rule where signing up after 18:00 sets the subscription end date to tomorrow+1 year instead of today+1 year. Talk to them about it. Folks are more proactive and understanding with changes if you include them in the discussion from the beginning (if only to mitigate blame).

In more complex scenarios, you can look at reconstructing data based on other data in your system. Maybe we can derive it from event sourcing times or user registration dates. In some cases, it simply isn’t possible and you’ll have to construct new business rules about what to do with migrating legacy cases. But the truth is, you can’t plan for everything and you probably won’t know what will change. That’s life.

So that’s my thoughts about time precision: use what you need and no more.

Appendix: An Ideal Solution

Going forward, I feel there’s real practical benefits to choosing fixed precisions and modeling them as custom types. My ideal PHP time library would probably be something that provides units of time as abstract classes that I extend into my value objects and then build on.

class ExpectedDeliveryDate extends PointPreciseToDate
{
}
class OrderShippedAt extends PointPreciseToMinute
{
}
class EventGenerationTime extends PointPreciseToMicrosecond
{
}

By pushing the precision to the class, we force a decision about precision. We can limit methods like “setTime()” to the precisions they actually apply to (not on Dates!) and we can round DateInterval to whatever makes sense for the type. Most of the utility methods could have protected visibility and my value objects could expose only those that make sense for my domain. Also, we’d be encouraging folks to create value objects. So. Many. Value objects. Yessss.

Bonus points if the library makes it easy to define custom units of time.

Actually building it though? Ain’t nobody got time for that.

Many thanks to Frank de Jonge, Jeroen Heijmans, Anna Baas and Matthias Noback for feedback on earlier drafts of this article!

Updated: