Disaster Recovery Runbook
Document Type: Standard Operating Procedure — Emergency Reference Audience: T2 / T3 Technicians Last Updated: February 2026 Version: 1.0
1. Purpose
Something catastrophic happened. Server won't boot, hardware failure, ransomware, fire, flood, or an update went sideways and the backups are the only way forward. This document tells you exactly what to do and in what order to get the practice back to operational.
This is not a troubleshooting guide — if you're here, troubleshooting has failed. This is the recovery playbook.
2. Disaster Classification
Before you do anything, classify the disaster. The classification drives the recovery path.
Classification | Description | Examples | Target Recovery Time |
|---|---|---|---|
Level 1 — Single Application | One application is down, everything else works | Dental software corrupted, database locked, application won't launch after failed update | 1–2 hours |
Level 2 — Single Server | Server is down but network and workstations function | Server hardware failure, OS won't boot, failed Windows Update, VM corruption | 2–4 hours |
Level 3 — Infrastructure | Multiple systems down, network may be affected | Core switch failure, firewall failure, power event + UPS failure | 2–6 hours |
Level 4 — Total Loss | Everything is gone. Hardware destroyed or compromised. | Ransomware (encrypted everything), fire/flood, lightning strike, theft | 4–24 hours |
3. Immediate Actions (All Levels)
Do these first regardless of what happened:
Step | Action | Done? |
|---|---|---|
1 | Notify your manager / T3 lead immediately. Don't try to handle Level 2+ alone. | ☐ |
2 | Open a Priority 1 HALO ticket. Include: client name, what's down, when it happened, what you know so far. | ☐ |
3 | Notify the client. Be honest about what you know: "The server is down. We're working on recovery. I'll update you every 30 minutes." Give them a realistic timeframe, not a hopeful one. NOTE: If you say something like "I will update you every XX minutes" make sure you set a recurring alarm for yourself. Nothing is worse than committing something to a stressed client that you do not honor because you lost track of time. | ☐ |
4 | Assess what's still working. Workstations? Internet? Phones? Email? Identify what the practice can still do while you recover. | ☐ |
5 | Identify the last known good backup. Check Veeam for the most recent successful backup job. Note the timestamp. | ☐ |
6 | Do NOT make changes that could make it worse. Don't format drives, don't reinstall Windows, don't delete anything until you have a recovery plan. | ☐ |
🔴 RANSOMWARE SPECIAL RULE: If this is ransomware — DISCONNECT THE SERVER FROM THE NETWORK IMMEDIATELY. Pull the Ethernet cable. Do NOT power off (forensics may need the RAM state). Do NOT pay the ransom. Do NOT attempt to decrypt. Notify management and begin the Level 4 recovery path. Document everything you see.
4. Level 1 — Single Application Recovery
Scenario: Dental software (PMS, imaging, or ancillary) is down. Server and OS are fine.
Recovery Priority Order
1. Identify which application and what broke
2. Check the procedure-specific SOP for that application
3. If SOP troubleshooting fails → restore application from Veeam (file-level)
4. If file-level restore fails → restore full application directory from Veeam
5. If application is database-dependent → restore database files too
6. Verify application functionality
7. Verify no other systems were affected
Veeam File-Level Restore (Application Files or Config)
Step | Action |
|---|---|
1 | Open Veeam Backup & Replication console |
2 | Navigate to Home → Backups → Disk (or wherever the backup job lives) |
3 | Right-click the most recent restore point → Restore guest files → Microsoft Windows |
4 | Browse to the application directory (e.g.,
,
) |
5 | Select the files/folders to restore |
6 | Choose Overwrite or Keep both depending on whether you want to replace or compare |
7 | Restore and verify the application launches |
Application-Specific Recovery Notes
Application | Key Files to Restore | Special Considerations |
|---|---|---|
PBS Endo |
, Updates folder | Config wipe is documented failure mode — see PBS Endo Config Wipe SOP |
TDO | SQL database, TDO application files | Must restore server-side first, then update workstations to match |
Dentrix | Data directory (check Dentrix server config for path) | May need re-registration with Henry Schein after restore: 800-824-6375 |
Eaglesoft | Data directory, Patterson server config | May need reactivation: Patterson 800-475-5036 |
Open Dental | MySQL data directory | Must stop MySQL service before restoring database files |
CS Imaging 8 | SQL database, CS Imaging service config | Verify SQL connectivity and CS Imaging service starts after restore |
Sidexis 4 | SQL database, Sidexis config files | Verify SQL connectivity and Sidexis server service starts after restore |
EzDent-i | PostgreSQL database, EzServer config | Must stop PostgreSQL before restoring. Version-specific restore process. |
5. Level 2 — Single Server Recovery
Scenario: Server won't boot, OS is corrupted, hardware failed, or VM is unrecoverable.
Decision Tree: Server Recovery Path
SERVER IS DOWN
│
├─ Is this a VIRTUAL MACHINE?
│ ├─ YES → Is a VM checkpoint/snapshot available?
│ │ ├─ YES → Revert to checkpoint. Fastest recovery (5 minutes).
│ │ │ ├─ Verify all services start after revert
│ │ │ └─ If checkpoint revert fails → proceed to Veeam restore
│ │ │
│ │ └─ NO → Proceed to Veeam restore
│ │ ├─ OPTION A: Restore entire VM from Veeam
│ │ │ └─ Veeam → Restore → Entire VM → select restore point → restore to original location
│ │ │
│ │ └─ OPTION B: Restore to different host (if original host hardware failed)
│ │ └─ Veeam → Restore → Entire VM → select restore point → restore to DIFFERENT host/datastore
│ │
│ └─ NO (Physical server) → Is the hardware functional?
│ ├─ YES (OS issue, not hardware) →
│ │ ├─ Try: Boot from Windows install media → Startup Repair
│ │ ├─ Try: Boot to Safe Mode → check Event Viewer → troubleshoot
│ │ ├─ If OS is recoverable → fix and verify
│ │ └─ If OS is not recoverable → Veeam Bare Metal Restore (see Section 5.2)
│ │
│ └─ NO (hardware failure — dead motherboard, failed RAID, bad PSU) →
│ ├─ Can replacement hardware be obtained quickly?
│ │ ├─ YES → Obtain hardware → Veeam Bare Metal Restore to new hardware
│ │ └─ NO → Temporary VM option:
│ │ ├─ Veeam Instant VM Recovery → boots the backup as a VM directly
│ │ ├─ Requires a Hyper-V or VMware host with capacity
│ │ └─ This is a TEMPORARY solution — plan for permanent hardware replacement
│ │
│ └─ No Hyper-V/VMware host available?
│ ├─ Veeam Bare Metal Restore to any available hardware
│ └─ Or: rebuild server from scratch + restore data only
5.1 Veeam Full VM Restore
Step | Action |
|---|---|
1 | Open Veeam Backup & Replication console on the Veeam server |
2 | Home → Backups → Disk → locate the server backup job |
3 | Right-click the most recent successful restore point → Restore entire VM |
4 | Select restore destination: Original location (same host) or Different location |
5 | Choose whether to overwrite existing VM or create new |
6 | Start restore and monitor progress |
7 | Once complete: boot the VM, verify OS loads, check all services |
5.2 Veeam Bare Metal Restore (Physical Server)
Step | Action |
|---|---|
1 | Create Veeam Recovery Media (USB) from the Veeam console if not already created |
2 | Boot the target hardware from Veeam Recovery Media |
3 | Select Bare Metal Recovery |
4 | Connect to Veeam backup repository (network path or direct-attached storage) |
5 | Select the server backup and restore point |
6 | Map disks (source disk layout → target disk layout) |
7 | Start restore and monitor progress |
8 | Once complete: remove recovery media, boot normally, verify all services |
Veeam Recovery Media should be pre-created and stored for each client. Don't wait until disaster strikes to create it. Add this to the recurring maintenance checklist.
5.3 Veeam Instant VM Recovery (Emergency Temporary)
Step | Action |
|---|---|
1 | Open Veeam Backup & Replication console |
2 | Home → Backups → Disk → right-click server backup → Instant VM Recovery |
3 | Select the restore point |
4 | Choose the Hyper-V or VMware host to run the VM on |
5 | Veeam boots the backup directly as a running VM (reads from backup files) |
6 | This is temporary — performance is limited by backup storage speed |
7 | Use this to get the practice operational while you prepare permanent hardware |
8 | Migrate to permanent hardware using Storage vMotion or Veeam Quick Migration |
5.4 Post-Server-Recovery Checklist
Step | Action | Done? |
|---|---|---|
1 | Server boots and OS loads | ☐ |
2 | All server roles functional (AD, DNS, DHCP, File Server, Print Server) | ☐ |
3 | SQL Server running and databases accessible | ☐ |
4 | Dental software server services running | ☐ |
5 | Workstations can connect to the server (ping, share access, DNS resolution) | ☐ |
6 | Dental software launches on workstations and data is current | ☐ |
7 | Imaging works (if server-dependent imaging — test acquisition and retrieval) | ☐ |
8 | Printers working (if print server was on this server) | ☐ |
9 | Backup job reconfigured and running (Veeam may need to be re-pointed after restore) | ☐ |
10 | Remove VM checkpoint if one was used (don't leave checkpoints running long-term) | ☐ |
11 | Client notified that systems are restored and operational | ☐ |
12 | HALO ticket updated with full recovery timeline and actions taken | ☐ |
6. Level 3 — Infrastructure Recovery
Scenario: Multiple systems down due to infrastructure failure.
Service Restoration Priority Order
This is the order things must come back online. You can't skip steps — each layer depends on the one below it.
PRIORITY 1: Physical Infrastructure (Power + Network Core)
├─ Verify UPS is online and providing power
├─ Verify modem/ONT is powered and synced (wait 2 min)
└─ Verify core switch is powered and operational
PRIORITY 2: Firewall / Router (Gateway)
├─ Verify firewall boots and WAN link is active (wait 3–5 min)
├─ Verify DHCP is serving IPs
└─ Verify inter-VLAN routing is functioning
PRIORITY 3: Server
├─ Boot server (physical or VM)
├─ Verify AD/DNS/DHCP services start
├─ Verify SQL Server starts
└─ Verify file shares are accessible
PRIORITY 4: Dental Software Services
├─ Verify dental software server services are running
├─ Verify database connectivity from workstations
└─ Verify imaging services (if server-dependent)
PRIORITY 5: Workstations
├─ Boot workstations (or ipconfig /renew if already on)
├─ Verify domain login works
├─ Verify dental software launches and data is accessible
├─ Verify printing works
└─ Verify imaging works
PRIORITY 6: Peripheral Systems
├─ Printers, scanners, label printers
├─ VoIP phones (if on network)
└─ WiFi APs (for guest/patient WiFi)
Don't let the client start using workstations until Priority 3 is confirmed. If the server isn't fully up and workstations connect with cached credentials, dental software may start in a degraded state or create data sync issues.
7. Level 4 — Total Loss Recovery
Scenario: Everything is destroyed, encrypted, or compromised. Starting from backup only.
7.1 Ransomware-Specific Steps
Step | Action | Done? |
|---|---|---|
1 | Isolate ALL affected systems from network (pull cables, disable WiFi) | ☐ |
2 | DO NOT power off systems (RAM forensics may be needed) | ☐ |
3 | DO NOT pay the ransom | ☐ |
4 | Notify DTC management immediately — this may require cyber insurance claim | ☐ |
5 | Document everything: screenshots of ransom notes, encrypted file extensions, timeline | ☐ |
6 | Determine scope: which systems are encrypted? Are backups affected? | ☐ |
7 | Are Veeam backups intact? (If Veeam backup files are also encrypted → worst case) | ☐ |
8 | Are off-site / cloud backup copies available? | ☐ |
9 | Determine ransomware variant: check nomoreransom.org for known decryptors | ☐ |
10 | Plan clean rebuild: all affected systems must be wiped and rebuilt from scratch. Never trust a system that was compromised. | ☐ |
7.2 Total Loss Recovery Sequence
Phase | Action | Estimated Time |
|---|---|---|
1 | Procure replacement hardware (or clean/wipe existing hardware) | Hours to days |
2 | Rebuild network infrastructure (firewall, switches, APs) from config backups or DTC templates | 2–4 hours |
3 | Rebuild server: fresh OS install, rejoin/recreate domain | 2–4 hours |
4 | Restore data from Veeam: database files, dental software data, shared files | 2–6 hours |
5 | Reinstall dental software on server, point to restored database | 1–3 hours |
6 | Rebuild workstations: follow New Workstation Deployment SOP for each | 1–2 hours each |
7 | Reconfigure dental software on workstations | 30 min each |
8 | Verify all systems operational | 1–2 hours |
9 | Reconfigure Veeam backup to protect new environment | 30 min |
Realistic total for complete rebuild: 12–24+ hours of labor
For ransomware recovery: the practice will likely be down for 1–3 business days minimum. Set expectations with the client early and honestly.
8. Critical Contact List During Disaster
Contact | When to Call | Number |
|---|---|---|
DTC Management / T3 Lead | Immediately for Level 2+ | [Internal contact] |
Client primary contact | After initial assessment — provide status and timeline | From HALO ticket |
Microsoft Support | If M365/Entra/Exchange issues during recovery | 800-642-7676 |
Veeam Support | If backup restore fails or backup files appear corrupted | 614-339-8200 |
ISP | If internet circuit is down as part of the disaster | Client-specific (check HALO/documentation) |
Dental software vendor | If application won't start after data restore (licensing, activation) | |
Cyber insurance provider | Ransomware or data breach confirmed | Client's policy (check with client/AM) |
9. What the Practice Can Do While You Recover
Help the client stay partially operational during the outage:
If This Is Down | The Practice Can Still... |
|---|---|
Server (PMS down) | See patients using paper charts, collect copays manually, reschedule non-urgent |
Imaging only | See patients for non-imaging procedures, take impressions, do cleanings |
Internet only | Use PMS and imaging (they're local), no insurance verification or claim submission |
Everything | Triage: see emergency patients only, handwrite notes, collect payment manually, call patients directly to reschedule |
Providing this guidance to the office manager during a disaster reduces their stress and keeps the practice generating some revenue while you work the recovery.
10. Post-Disaster Review
After recovery is complete and the practice is operational, conduct a post-incident review within 48 hours:
Question | Document In HALO |
|---|---|
What happened? (Root cause) | Exact failure — hardware, software, human error, security incident |
When did it happen? | Timeline from first symptom to full recovery |
How long was the practice down? | Total downtime in hours |
What was the recovery path? | Which Veeam method was used, what was the restore point date/time |
Was any data lost? | Gap between last backup and failure = data loss window |
What worked well in the recovery? | — |
What should be improved? | Backup frequency? Off-site copy? Recovery media pre-staged? Documentation gaps? |
What preventive measures should be implemented? | UPS replacement, backup verification schedule, hardware refresh, security hardening |
11. Preventive Measures Checklist
These are the things that prevent disasters or minimize their impact. Reference: Recurring Maintenance Checklist (when created).
Measure | Frequency | Owner |
|---|---|---|
Veeam backup verification (job status check) | Daily (automated alerts), weekly (manual review) | T1/T2 |
Veeam test restore (actually restore a file and verify) | Monthly | T2 |
Veeam Recovery Media created and stored for each client | Annually or after server hardware change | T2 |
VM checkpoint cleanup (remove old checkpoints) | After every maintenance window | T2 |
UPS battery test | Quarterly | T2 |
Server hardware health check (SMART, RAID, temps) | Monthly via NinjaRMM alerts | T1 |
Off-site / cloud backup copy verified | Monthly | T2 |
Firewall config exported and stored | After every change | T2 |
DNS records documented | After every change | T2 |
Emergency contact list current in HALO | Quarterly | AM |
12. Related Documents
Document | When to Reference |
|---|---|
Should have been run before any change that led to disaster | |
For understanding backup health before disaster strikes | |
For rebuilding workstations after total loss | |
For understanding client network architecture during infrastructure recovery | |
For contacting dental software vendors during application recovery | |
If M365 services are affected and need recovery | |
For documenting the disaster and recovery in the ticket |
13. Document Control
Version | Date | Author | Changes |
|---|---|---|---|
1.0 | February 2026 | IT Support Engineering | Initial release. Four-level disaster classification, recovery decision trees, Veeam restore procedures (file-level, full VM, bare metal, instant VM recovery), infrastructure restoration priority order, ransomware response, post-disaster review template, preventive measures checklist. |
Confidential — Internal Use Only