Every Integration Has a Resume Endpoint
Users close tabs. Networks drop connections. Browsers reconnect SSE streams. Servers deploy. Workers miss events. Emails arrive hours after the user forgot what they were doing.
All of those paths need to converge on the same outcome: the user comes back, and the system shows them exactly where they left off. That convergence point is called a resume endpoint, and every long-running workflow has one.
The rule: for any workflow that spans more than a single request-response cycle, expose an endpoint that returns the current state directly, so any recovery path can call it and render the right UI.
The pattern
A long-running workflow in a DTC system has three possible entry points:
- The original flow. User approves a quote, inline Stripe form renders, they pay.
- The email fallback. Background job emails a resume link because the user closed the tab before paying.
- The page reload. Something happened (SSE disconnect, browser crash, server deploy), user hits refresh.
All three paths should land the user on the same UI in the same state. The mechanism that makes this work is a resume endpoint that takes an entity ID and returns the current in-flight state — which PaymentIntent is active, what's the client_secret, what amount is owed, has it already been paid?
GET /api/v1/quotes/{id}/resume-payment
→ { status: "payment_required", payment_intent_secret, amount, ... }
→ { status: "already_paid", quote_id }
→ { status: "no_active_payment", quote_id }
The frontend calls it on page load. If a PI is active, it mounts the Stripe form inline with the returned client_secret. If already paid, it hides the payment UI entirely. If nothing's active, it falls through to normal page rendering.
Why SSE alone isn't enough
Server-Sent Events are a great mechanism for pushing state while the user is connected. But:
- Publish-before-subscribe race. If the backend publishes an event before the frontend's EventSource connects, the event is lost. That connection window can be a fraction of a second or tens of seconds depending on browser timing.
- Auto-reconnect gaps. When SSE reconnects after a network blip, any events that fired during the gap are gone — the broadcast channel doesn't replay.
- Tab close. The user closes the tab. SSE stops. The background job publishes
payment_readyto nobody.
The resume endpoint closes all three gaps. On page load — whether it's the first load or a reload after a gap — the frontend asks the backend "what's the current state?" and gets a definitive answer.
Guest + authenticated parity
A resume endpoint exists for both authenticated and guest flows. The Client Portal has:
GET /api/v1/quotes/{id}/resume-payment— authenticatedGET /api/v1/quotes/{id}/guest/resume?token=...— guest (token-authenticated)
Same response shape. The frontend components (GuestStripePaymentForm) are shared between the two entry points. The resume endpoints are the bridge that makes the shared UI possible.
Belt-and-suspenders with the email fallback
The email fallback (invoice_payment_link job in the Client Portal) is not conditional on SSE failure — it always runs. Both delivery paths execute in parallel:
- Primary: SSE pushes
payment_ready→ inline Stripe form renders immediately - Secondary: Email job dispatches a resume link → user gets the link in their inbox regardless
If the user is on the page when the event fires, they pay inline and move on. If they left, they get the email. Both paths route through the same resume endpoint.
Why we do this
- Users recover from anything. Tab crash, network drop, server deploy — none of them become an "I lost my place" support ticket.
- Email fallback and primary flow share code. One endpoint, one set of shared components, two ways to arrive at them.
- Backend owns the state. Frontend is stateless about workflow position. This matches how REST services should work anyway.
- Testable. The resume endpoint is a plain GET. Unit and integration tests for "what state does the user see?" are straightforward.
When this applies
Every async workflow that the user can return to after some delay:
- Payment flows (down payments, invoice payments)
- Approval workflows
- Signup / onboarding
- Ticket interactions spanning multiple steps
- Any flow that sends an email with a link back to the portal
When it does not apply
- Single-request workflows. A button that deletes something doesn't need a resume endpoint; the user either did it or didn't.
- Workflows that genuinely can't be resumed (a timed OTP that expired, for instance). Surface the expiry explicitly with a useful page, not a 404.
Related
- Writes Are Jobs, Reads Are Cached — the job queue that dispatches the email fallback
- Observability Is a First-Class Citizen — SSE is one of the UI-state mechanisms
- Build for Unreliable Integrations — resume is the graceful recovery for when the upstream misbehaves mid-flow