Everything Has a Timestamp and a Deadline
Every record in a DTC-built system answers three questions: when was I written? when did I last change upstream? when should I give up? If any of those answers is "I don't know," the system can't reason about freshness, can't detect drift, and can't escalate stuck work.
The discipline: every cached row, every job, every async operation carries the timestamps necessary to make those questions answerable in SQL.
The four timestamps
| Timestamp | Carries on | Meaning |
|---|---|---|
cached_at |
Cached entities | When we last wrote this to our local store |
<upstream>_updated_at |
Cached entities | When the upstream last modified this (from their payload) |
abandon_after |
Poll/long-running jobs | When to stop retrying and escalate |
available_at |
Queue jobs | When this job is eligible to run (for delayed dispatch) |
Freshness is cached_at + TTL > NOW(). Drift detection is upstream_updated_at <> cached upstream_updated_at. Stuck-work detection is abandon_after < NOW() AND status = 'pending'. All three are plain SQL, queryable by humans and by the escalation worker.
The idempotency marker
A fifth pattern, related but distinct: the idempotency guard that stops a one-shot operation from firing twice.
UPDATE guest_tokens
SET expires_at = NOW() + interval '30 days'
WHERE id = $1 AND expires_at IS NULL;
The WHERE expires_at IS NULL is the guard. It means "start the countdown only if it hasn't started yet." Without it, every incoming webhook would reset the clock, and the token would never expire.
Examples of the pattern:
portal_welcome_sent—INSERT ... ON CONFLICT (user_id) DO NOTHING. Table row existence = "already done."- Stripe PaymentIntent creation — idempotency key header prevents double-creation if we retry.
- Job queue dispatch dedup —
WHERE status = 'pending' AND payload LIKE '%entity_id:42%'before inserting.
The shape is always the same: a cheap-to-check condition that says "this work is already in flight or complete."
The escalation pipeline
Poll jobs have an abandon_after deadline set at dispatch time. A job that hits the deadline without reaching Completed triggers the escalation handler:
- Marks the job
escalatedin the queue (stops retries) - Creates a ticket to the on-call queue in HaloPSA with full context
- Logs an operational metric
The deadline is set based on what's reasonable for the workflow — the Client Portal defaults poll jobs to PENDING_JOB_TIMEOUT_DAYS = 1, which means "if Stripe hasn't confirmed this payment after a day, a human should look." Different workflows have different reasonable deadlines; never infinity.
Why we do this
- Debuggability. "Why hasn't X happened?" is a SQL query against the job queue, not a log dive.
- Drift detection is free. Compare
upstream_updated_atacross runs; if it moved, upstream changed. Fire a refetch. - Nothing runs forever silently. Every async operation has a bounded lifetime. If it exceeds it, a human finds out.
- Skip-if-unchanged upserts work. Compare two timestamps in SQL; skip the write if they match. Saves DB IO on hot reconciliation loops.
The NULL-safe skip guard
Timestamp comparison is where NULL bites. Plain <> returns NULL (not FALSE) when either side is NULL, and NULL in a WHERE clause evaluates as FALSE — so your UPDATE silently does nothing.
-- Wrong: silently swallows updates when either side is NULL
WHERE cached_at <> EXCLUDED.cached_at
-- Right: explicit NULL handling
WHERE cached_at IS NULL
OR EXCLUDED.cached_at IS NULL
OR cached_at <> EXCLUDED.cached_at
This isn't a stylistic preference. We've hit it in production — a quote's approvalstate stayed at 0 in the cache after twelve webhooks arrived with approvalstate=2, because Halo's quote webhook payload didn't include a timestamp field, so both sides were NULL, and every update was dropped. Explicit NULL handling is mandatory.
When this applies
Every cached record, every async job, every operation whose outcome the system needs to reason about.
When it does not apply
Truly ephemeral state where nothing needs to be debugged after the fact (in-memory request context, one-shot calculations with no persisted result). But "we don't need to debug this" is usually wrong — design for observability unless you're sure.
Related
- Explicit Handling of Absent Data — the general NULL-handling rule
- Webhook-Driven, Reconciler-Bounded — how timestamps drive the reconciler
- Operational UUID (OUID) Standard — timestamps are also embedded in the entity identity itself (UUIDv7)