Skip to main content

Disaster Recovery Runbook

Document Type: Standard Operating Procedure — Emergency Reference Audience: T2 / T3 Technicians Last Updated: February 2026 Version: 1.0


1. Purpose

Something catastrophic happened. Server won't boot, hardware failure, ransomware, fire, flood, or an update went sideways and the backups are the only way forward. This document tells you exactly what to do and in what order to get the practice back to operational.

This is not a troubleshooting guide — if you're here, troubleshooting has failed. This is the recovery playbook.


2. Disaster Classification

Before you do anything, classify the disaster. The classification drives the recovery path.

Classification

Description

Examples

Target Recovery Time

Level 1 — Single Application

One application is down, everything else works

Dental software corrupted, database locked, application won't launch after failed update

1–2 hours

Level 2 — Single Server

Server is down but network and workstations function

Server hardware failure, OS won't boot, failed Windows Update, VM corruption

2–4 hours

Level 3 — Infrastructure

Multiple systems down, network may be affected

Core switch failure, firewall failure, power event + UPS failure

2–6 hours

Level 4 — Total Loss

Everything is gone. Hardware destroyed or compromised.

Ransomware (encrypted everything), fire/flood, lightning strike, theft

4–24 hours


3. Immediate Actions (All Levels)

Do these first regardless of what happened:

Step

Action

Done?

1

Notify your manager / T3 lead immediately.

Don't try to handle Level 2+ alone.

2

Open a Priority 1 HALO ticket.

Include: client name, what's down, when it happened, what you know so far.

3

Notify the client.

Be honest about what you know: "The server is down. We're working on recovery. I'll update you every 30 minutes." Give them a realistic timeframe, not a hopeful one. NOTE: If you say something like "I will update you every XX minutes" make sure you set a recurring alarm for yourself. Nothing is worse than committing something to a stressed client that you do not honor because you lost track of time.

4

Assess what's still working.

Workstations? Internet? Phones? Email? Identify what the practice can still do while you recover.

5

Identify the last known good backup.

Check Veeam for the most recent successful backup job. Note the timestamp.

6

Do NOT make changes that could make it worse.

Don't format drives, don't reinstall Windows, don't delete anything until you have a recovery plan.

🔴 RANSOMWARE SPECIAL RULE: If this is ransomware — DISCONNECT THE SERVER FROM THE NETWORK IMMEDIATELY. Pull the Ethernet cable. Do NOT power off (forensics may need the RAM state). Do NOT pay the ransom. Do NOT attempt to decrypt. Notify management and begin the Level 4 recovery path. Document everything you see.


4. Level 1 — Single Application Recovery

Scenario: Dental software (PMS, imaging, or ancillary) is down. Server and OS are fine.

Recovery Priority Order

1. Identify which application and what broke
2. Check the procedure-specific SOP for that application
3. If SOP troubleshooting fails → restore application from Veeam (file-level)
4. If file-level restore fails → restore full application directory from Veeam
5. If application is database-dependent → restore database files too
6. Verify application functionality
7. Verify no other systems were affected

Veeam File-Level Restore (Application Files or Config)

Step

Action

1

Open Veeam Backup & Replication console

2

Navigate to Home → Backups → Disk (or wherever the backup job lives)

3

Right-click the most recent restore point →

Restore guest files

Microsoft Windows

4

Browse to the application directory (e.g.,

C:\PBSEndo\Client\

,

C:\Program Files\TDO\

)

5

Select the files/folders to restore

6

Choose

Overwrite

or

Keep both

depending on whether you want to replace or compare

7

Restore and verify the application launches

Application-Specific Recovery Notes

Application

Key Files to Restore

Special Considerations

PBS Endo

C:\PBSEndo\Client\endoui.exe.config

, Updates folder

Config wipe is documented failure mode — see PBS Endo Config Wipe SOP

TDO

SQL database, TDO application files

Must restore server-side first, then update workstations to match

Dentrix

Data directory (check Dentrix server config for path)

May need re-registration with Henry Schein after restore: 800-824-6375

Eaglesoft

Data directory, Patterson server config

May need reactivation: Patterson 800-475-5036

Open Dental

MySQL data directory

Must stop MySQL service before restoring database files

CS Imaging 8

SQL database, CS Imaging service config

Verify SQL connectivity and CS Imaging service starts after restore

Sidexis 4

SQL database, Sidexis config files

Verify SQL connectivity and Sidexis server service starts after restore

EzDent-i

PostgreSQL database, EzServer config

Must stop PostgreSQL before restoring. Version-specific restore process.


5. Level 2 — Single Server Recovery

Scenario: Server won't boot, OS is corrupted, hardware failed, or VM is unrecoverable.

Decision Tree: Server Recovery Path

SERVER IS DOWN
│
├─ Is this a VIRTUAL MACHINE?
│   ├─ YES → Is a VM checkpoint/snapshot available?
│   │   ├─ YES → Revert to checkpoint. Fastest recovery (5 minutes).
│   │   │   ├─ Verify all services start after revert
│   │   │   └─ If checkpoint revert fails → proceed to Veeam restore
│   │   │
│   │   └─ NO → Proceed to Veeam restore
│   │       ├─ OPTION A: Restore entire VM from Veeam
│   │       │   └─ Veeam → Restore → Entire VM → select restore point → restore to original location
│   │       │
│   │       └─ OPTION B: Restore to different host (if original host hardware failed)
│   │           └─ Veeam → Restore → Entire VM → select restore point → restore to DIFFERENT host/datastore
│   │
│   └─ NO (Physical server) → Is the hardware functional?
│       ├─ YES (OS issue, not hardware) →
│       │   ├─ Try: Boot from Windows install media → Startup Repair
│       │   ├─ Try: Boot to Safe Mode → check Event Viewer → troubleshoot
│       │   ├─ If OS is recoverable → fix and verify
│       │   └─ If OS is not recoverable → Veeam Bare Metal Restore (see Section 5.2)
│       │
│       └─ NO (hardware failure — dead motherboard, failed RAID, bad PSU) →
│           ├─ Can replacement hardware be obtained quickly?
│           │   ├─ YES → Obtain hardware → Veeam Bare Metal Restore to new hardware
│           │   └─ NO → Temporary VM option:
│           │       ├─ Veeam Instant VM Recovery → boots the backup as a VM directly
│           │       ├─ Requires a Hyper-V or VMware host with capacity
│           │       └─ This is a TEMPORARY solution — plan for permanent hardware replacement
│           │
│           └─ No Hyper-V/VMware host available?
│               ├─ Veeam Bare Metal Restore to any available hardware
│               └─ Or: rebuild server from scratch + restore data only

5.1 Veeam Full VM Restore

Step

Action

1

Open Veeam Backup & Replication console on the Veeam server

2

Home → Backups → Disk → locate the server backup job

3

Right-click the most recent successful restore point →

Restore entire VM

4

Select restore destination: Original location (same host) or Different location

5

Choose whether to overwrite existing VM or create new

6

Start restore and monitor progress

7

Once complete: boot the VM, verify OS loads, check all services

5.2 Veeam Bare Metal Restore (Physical Server)

Step

Action

1

Create Veeam Recovery Media (USB) from the Veeam console if not already created

2

Boot the target hardware from Veeam Recovery Media

3

Select

Bare Metal Recovery

4

Connect to Veeam backup repository (network path or direct-attached storage)

5

Select the server backup and restore point

6

Map disks (source disk layout → target disk layout)

7

Start restore and monitor progress

8

Once complete: remove recovery media, boot normally, verify all services

Veeam Recovery Media should be pre-created and stored for each client. Don't wait until disaster strikes to create it. Add this to the recurring maintenance checklist.

5.3 Veeam Instant VM Recovery (Emergency Temporary)

Step

Action

1

Open Veeam Backup & Replication console

2

Home → Backups → Disk → right-click server backup →

Instant VM Recovery

3

Select the restore point

4

Choose the Hyper-V or VMware host to run the VM on

5

Veeam boots the backup directly as a running VM (reads from backup files)

6

This is temporary

— performance is limited by backup storage speed

7

Use this to get the practice operational while you prepare permanent hardware

8

Migrate to permanent hardware using Storage vMotion or Veeam Quick Migration

5.4 Post-Server-Recovery Checklist

Step

Action

Done?

1

Server boots and OS loads

2

All server roles functional (AD, DNS, DHCP, File Server, Print Server)

3

SQL Server running and databases accessible

4

Dental software server services running

5

Workstations can connect to the server (ping, share access, DNS resolution)

6

Dental software launches on workstations and data is current

7

Imaging works (if server-dependent imaging — test acquisition and retrieval)

8

Printers working (if print server was on this server)

9

Backup job reconfigured and running (Veeam may need to be re-pointed after restore)

10

Remove VM checkpoint if one was used (don't leave checkpoints running long-term)

11

Client notified that systems are restored and operational

12

HALO ticket updated with full recovery timeline and actions taken


6. Level 3 — Infrastructure Recovery

Scenario: Multiple systems down due to infrastructure failure.

Service Restoration Priority Order

This is the order things must come back online. You can't skip steps — each layer depends on the one below it.

PRIORITY 1: Physical Infrastructure (Power + Network Core)
   ├─ Verify UPS is online and providing power
   ├─ Verify modem/ONT is powered and synced (wait 2 min)
   └─ Verify core switch is powered and operational

PRIORITY 2: Firewall / Router (Gateway)
   ├─ Verify firewall boots and WAN link is active (wait 3–5 min)
   ├─ Verify DHCP is serving IPs
   └─ Verify inter-VLAN routing is functioning

PRIORITY 3: Server
   ├─ Boot server (physical or VM)
   ├─ Verify AD/DNS/DHCP services start
   ├─ Verify SQL Server starts
   └─ Verify file shares are accessible

PRIORITY 4: Dental Software Services
   ├─ Verify dental software server services are running
   ├─ Verify database connectivity from workstations
   └─ Verify imaging services (if server-dependent)

PRIORITY 5: Workstations
   ├─ Boot workstations (or ipconfig /renew if already on)
   ├─ Verify domain login works
   ├─ Verify dental software launches and data is accessible
   ├─ Verify printing works
   └─ Verify imaging works

PRIORITY 6: Peripheral Systems
   ├─ Printers, scanners, label printers
   ├─ VoIP phones (if on network)
   └─ WiFi APs (for guest/patient WiFi)

Don't let the client start using workstations until Priority 3 is confirmed. If the server isn't fully up and workstations connect with cached credentials, dental software may start in a degraded state or create data sync issues.


7. Level 4 — Total Loss Recovery

Scenario: Everything is destroyed, encrypted, or compromised. Starting from backup only.

7.1 Ransomware-Specific Steps

Step

Action

Done?

1

Isolate ALL affected systems from network (pull cables, disable WiFi)

2

DO NOT

power off systems (RAM forensics may be needed)

3

DO NOT

pay the ransom

4

Notify DTC management immediately — this may require cyber insurance claim

5

Document everything: screenshots of ransom notes, encrypted file extensions, timeline

6

Determine scope: which systems are encrypted? Are backups affected?

7

Are Veeam backups intact? (If Veeam backup files are also encrypted → worst case)

8

Are off-site / cloud backup copies available?

9

Determine ransomware variant: check nomoreransom.org for known decryptors

10

Plan clean rebuild: all affected systems must be wiped and rebuilt from scratch. Never trust a system that was compromised.

7.2 Total Loss Recovery Sequence

Phase

Action

Estimated Time

1

Procure replacement hardware (or clean/wipe existing hardware)

Hours to days

2

Rebuild network infrastructure (firewall, switches, APs) from config backups or DTC templates

2–4 hours

3

Rebuild server: fresh OS install, rejoin/recreate domain

2–4 hours

4

Restore data from Veeam: database files, dental software data, shared files

2–6 hours

5

Reinstall dental software on server, point to restored database

1–3 hours

6

Rebuild workstations: follow New Workstation Deployment SOP for each

1–2 hours each

7

Reconfigure dental software on workstations

30 min each

8

Verify all systems operational

1–2 hours

9

Reconfigure Veeam backup to protect new environment

30 min

Realistic total for complete rebuild: 12–24+ hours of labor

For ransomware recovery: the practice will likely be down for 1–3 business days minimum. Set expectations with the client early and honestly.


8. Critical Contact List During Disaster

Contact

When to Call

Number

DTC Management / T3 Lead

Immediately for Level 2+

[Internal contact]

Client primary contact

After initial assessment — provide status and timeline

From HALO ticket

Microsoft Support

If M365/Entra/Exchange issues during recovery

800-642-7676

Veeam Support

If backup restore fails or backup files appear corrupted

614-339-8200

ISP

If internet circuit is down as part of the disaster

Client-specific (check HALO/documentation)

Dental software vendor

If application won't start after data restore (licensing, activation)

See Vendor Escalation Quick Reference

Cyber insurance provider

Ransomware or data breach confirmed

Client's policy (check with client/AM)


9. What the Practice Can Do While You Recover

Help the client stay partially operational during the outage:

If This Is Down

The Practice Can Still...

Server (PMS down)

See patients using paper charts, collect copays manually, reschedule non-urgent

Imaging only

See patients for non-imaging procedures, take impressions, do cleanings

Internet only

Use PMS and imaging (they're local), no insurance verification or claim submission

Everything

Triage: see emergency patients only, handwrite notes, collect payment manually, call patients directly to reschedule

Providing this guidance to the office manager during a disaster reduces their stress and keeps the practice generating some revenue while you work the recovery.


10. Post-Disaster Review

After recovery is complete and the practice is operational, conduct a post-incident review within 48 hours:

Question

Document In HALO

What happened? (Root cause)

Exact failure — hardware, software, human error, security incident

When did it happen?

Timeline from first symptom to full recovery

How long was the practice down?

Total downtime in hours

What was the recovery path?

Which Veeam method was used, what was the restore point date/time

Was any data lost?

Gap between last backup and failure = data loss window

What worked well in the recovery?

What should be improved?

Backup frequency? Off-site copy? Recovery media pre-staged? Documentation gaps?

What preventive measures should be implemented?

UPS replacement, backup verification schedule, hardware refresh, security hardening


11. Preventive Measures Checklist

These are the things that prevent disasters or minimize their impact. Reference: Recurring Maintenance Checklist (when created).

Measure

Frequency

Owner

Veeam backup verification (job status check)

Daily (automated alerts), weekly (manual review)

T1/T2

Veeam test restore (actually restore a file and verify)

Monthly

T2

Veeam Recovery Media created and stored for each client

Annually or after server hardware change

T2

VM checkpoint cleanup (remove old checkpoints)

After every maintenance window

T2

UPS battery test

Quarterly

T2

Server hardware health check (SMART, RAID, temps)

Monthly via NinjaRMM alerts

T1

Off-site / cloud backup copy verified

Monthly

T2

Firewall config exported and stored

After every change

T2

DNS records documented

After every change

T2

Emergency contact list current in HALO

Quarterly

AM


Document

When to Reference

Pre-Upgrade / Pre-Flight Checklist

Should have been run before any change that led to disaster

Veeam Backup Alert Response Guide

For understanding backup health before disaster strikes

New Workstation Deployment SOP

For rebuilding workstations after total loss

Network Assessment Guide & Checklist

For understanding client network architecture during infrastructure recovery

Vendor Escalation Quick Reference

For contacting dental software vendors during application recovery

M365 Tenant Administration Transition SOP

If M365 services are affected and need recovery

HALO Ticket Documentation Standard

For documenting the disaster and recovery in the ticket


13. Document Control

Version

Date

Author

Changes

1.0

February 2026

IT Support Engineering

Initial release. Four-level disaster classification, recovery decision trees, Veeam restore procedures (file-level, full VM, bare metal, instant VM recovery), infrastructure restoration priority order, ransomware response, post-disaster review template, preventive measures checklist.


Confidential — Internal Use Only