Disaster Recovery Runbook

Document Type: Standard Operating Procedure — Emergency Reference Audience: T2 / T3 Technicians Last Updated: February 2026 Version: 1.0

1. Purpose

Something catastrophic happened. Server won't boot, hardware failure, ransomware, fire, flood, or an update went sideways and the backups are the only way forward. This document tells you exactly what to do and in what order to get the practice back to operational.

This is not a troubleshooting guide — if you're here, troubleshooting has failed. This is the recovery playbook.

2. Disaster Classification

Before you do anything, classify the disaster. The classification drives the recovery path.

Classification	Description	Examples	Target Recovery Time
Level 1 — Single Application	One application is down, everything else works	Dental software corrupted, database locked, application won't launch after failed update	1–2 hours
Level 2 — Single Server	Server is down but network and workstations function	Server hardware failure, OS won't boot, failed Windows Update, VM corruption	2–4 hours
Level 3 — Infrastructure	Multiple systems down, network may be affected	Core switch failure, firewall failure, power event + UPS failure	2–6 hours
Level 4 — Total Loss	Everything is gone. Hardware destroyed or compromised.	Ransomware (encrypted everything), fire/flood, lightning strike, theft	4–24 hours

3. Immediate Actions (All Levels)

Do these first regardless of what happened:

Step	Action	Done?
1	Notify your manager / T3 lead immediately. Don't try to handle Level 2+ alone.	☐
2	Open a Priority 1 HALO ticket. Include: client name, what's down, when it happened, what you know so far.	☐
3	Notify the client. Be honest about what you know: "The server is down. We're working on recovery. I'll update you every 30 minutes." Give them a realistic timeframe, not a hopeful one. NOTE: If you say something like "I will update you every XX minutes" make sure you set a recurring alarm for yourself. Nothing is worse than committing something to a stressed client that you do not honor because you lost track of time.	☐
4	Assess what's still working. Workstations? Internet? Phones? Email? Identify what the practice can still do while you recover.	☐
5	Identify the last known good backup. Check Veeam for the most recent successful backup job. Note the timestamp.	☐
6	Do NOT make changes that could make it worse. Don't format drives, don't reinstall Windows, don't delete anything until you have a recovery plan.	☐

🔴 RANSOMWARE SPECIAL RULE: If this is ransomware — DISCONNECT THE SERVER FROM THE NETWORK IMMEDIATELY. Pull the Ethernet cable. Do NOT power off (forensics may need the RAM state). Do NOT pay the ransom. Do NOT attempt to decrypt. Notify management and begin the Level 4 recovery path. Document everything you see.

4. Level 1 — Single Application Recovery

Scenario: Dental software (PMS, imaging, or ancillary) is down. Server and OS are fine.

Recovery Priority Order

1. Identify which application and what broke
2. Check the procedure-specific SOP for that application
3. If SOP troubleshooting fails → restore application from Veeam (file-level)
4. If file-level restore fails → restore full application directory from Veeam
5. If application is database-dependent → restore database files too
6. Verify application functionality
7. Verify no other systems were affected

Veeam File-Level Restore (Application Files or Config)

Step	Action
1	Open Veeam Backup & Replication console
2	Navigate to Home → Backups → Disk (or wherever the backup job lives)
3	Right-click the most recent restore point → Restore guest files → Microsoft Windows
4	Browse to the application directory (e.g., `C:\PBSEndo\Client\` , `C:\Program Files\TDO\` )
5	Select the files/folders to restore
6	Choose Overwrite or Keep both depending on whether you want to replace or compare
7	Restore and verify the application launches

Application-Specific Recovery Notes

Application	Key Files to Restore	Special Considerations
PBS Endo	`C:\PBSEndo\Client\endoui.exe.config` , Updates folder	Config wipe is documented failure mode — see PBS Endo Config Wipe SOP
TDO	SQL database, TDO application files	Must restore server-side first, then update workstations to match
Dentrix	Data directory (check Dentrix server config for path)	May need re-registration with Henry Schein after restore: 800-824-6375
Eaglesoft	Data directory, Patterson server config	May need reactivation: Patterson 800-475-5036
Open Dental	MySQL data directory	Must stop MySQL service before restoring database files
CS Imaging 8	SQL database, CS Imaging service config	Verify SQL connectivity and CS Imaging service starts after restore
Sidexis 4	SQL database, Sidexis config files	Verify SQL connectivity and Sidexis server service starts after restore
EzDent-i	PostgreSQL database, EzServer config	Must stop PostgreSQL before restoring. Version-specific restore process.

5. Level 2 — Single Server Recovery

Scenario: Server won't boot, OS is corrupted, hardware failed, or VM is unrecoverable.

Decision Tree: Server Recovery Path

SERVER IS DOWN
│
├─ Is this a VIRTUAL MACHINE?
│   ├─ YES → Is a VM checkpoint/snapshot available?
│   │   ├─ YES → Revert to checkpoint. Fastest recovery (5 minutes).
│   │   │   ├─ Verify all services start after revert
│   │   │   └─ If checkpoint revert fails → proceed to Veeam restore
│   │   │
│   │   └─ NO → Proceed to Veeam restore
│   │       ├─ OPTION A: Restore entire VM from Veeam
│   │       │   └─ Veeam → Restore → Entire VM → select restore point → restore to original location
│   │       │
│   │       └─ OPTION B: Restore to different host (if original host hardware failed)
│   │           └─ Veeam → Restore → Entire VM → select restore point → restore to DIFFERENT host/datastore
│   │
│   └─ NO (Physical server) → Is the hardware functional?
│       ├─ YES (OS issue, not hardware) →
│       │   ├─ Try: Boot from Windows install media → Startup Repair
│       │   ├─ Try: Boot to Safe Mode → check Event Viewer → troubleshoot
│       │   ├─ If OS is recoverable → fix and verify
│       │   └─ If OS is not recoverable → Veeam Bare Metal Restore (see Section 5.2)
│       │
│       └─ NO (hardware failure — dead motherboard, failed RAID, bad PSU) →
│           ├─ Can replacement hardware be obtained quickly?
│           │   ├─ YES → Obtain hardware → Veeam Bare Metal Restore to new hardware
│           │   └─ NO → Temporary VM option:
│           │       ├─ Veeam Instant VM Recovery → boots the backup as a VM directly
│           │       ├─ Requires a Hyper-V or VMware host with capacity
│           │       └─ This is a TEMPORARY solution — plan for permanent hardware replacement
│           │
│           └─ No Hyper-V/VMware host available?
│               ├─ Veeam Bare Metal Restore to any available hardware
│               └─ Or: rebuild server from scratch + restore data only

5.1 Veeam Full VM Restore

Step	Action
1	Open Veeam Backup & Replication console on the Veeam server
2	Home → Backups → Disk → locate the server backup job
3	Right-click the most recent successful restore point → Restore entire VM
4	Select restore destination: Original location (same host) or Different location
5	Choose whether to overwrite existing VM or create new
6	Start restore and monitor progress
7	Once complete: boot the VM, verify OS loads, check all services

5.2 Veeam Bare Metal Restore (Physical Server)

Step	Action
1	Create Veeam Recovery Media (USB) from the Veeam console if not already created
2	Boot the target hardware from Veeam Recovery Media
3	Select Bare Metal Recovery
4	Connect to Veeam backup repository (network path or direct-attached storage)
5	Select the server backup and restore point
6	Map disks (source disk layout → target disk layout)
7	Start restore and monitor progress
8	Once complete: remove recovery media, boot normally, verify all services

Veeam Recovery Media should be pre-created and stored for each client. Don't wait until disaster strikes to create it. Add this to the recurring maintenance checklist.

5.3 Veeam Instant VM Recovery (Emergency Temporary)

Step	Action
1	Open Veeam Backup & Replication console
2	Home → Backups → Disk → right-click server backup → Instant VM Recovery
3	Select the restore point
4	Choose the Hyper-V or VMware host to run the VM on
5	Veeam boots the backup directly as a running VM (reads from backup files)
6	This is temporary — performance is limited by backup storage speed
7	Use this to get the practice operational while you prepare permanent hardware
8	Migrate to permanent hardware using Storage vMotion or Veeam Quick Migration

5.4 Post-Server-Recovery Checklist

Step	Action	Done?
1	Server boots and OS loads	☐
2	All server roles functional (AD, DNS, DHCP, File Server, Print Server)	☐
3	SQL Server running and databases accessible	☐
4	Dental software server services running	☐
5	Workstations can connect to the server (ping, share access, DNS resolution)	☐
6	Dental software launches on workstations and data is current	☐
7	Imaging works (if server-dependent imaging — test acquisition and retrieval)	☐
8	Printers working (if print server was on this server)	☐
9	Backup job reconfigured and running (Veeam may need to be re-pointed after restore)	☐
10	Remove VM checkpoint if one was used (don't leave checkpoints running long-term)	☐
11	Client notified that systems are restored and operational	☐
12	HALO ticket updated with full recovery timeline and actions taken	☐

6. Level 3 — Infrastructure Recovery

Scenario: Multiple systems down due to infrastructure failure.

Service Restoration Priority Order

This is the order things must come back online. You can't skip steps — each layer depends on the one below it.

PRIORITY 1: Physical Infrastructure (Power + Network Core)
   ├─ Verify UPS is online and providing power
   ├─ Verify modem/ONT is powered and synced (wait 2 min)
   └─ Verify core switch is powered and operational

PRIORITY 2: Firewall / Router (Gateway)
   ├─ Verify firewall boots and WAN link is active (wait 3–5 min)
   ├─ Verify DHCP is serving IPs
   └─ Verify inter-VLAN routing is functioning

PRIORITY 3: Server
   ├─ Boot server (physical or VM)
   ├─ Verify AD/DNS/DHCP services start
   ├─ Verify SQL Server starts
   └─ Verify file shares are accessible

PRIORITY 4: Dental Software Services
   ├─ Verify dental software server services are running
   ├─ Verify database connectivity from workstations
   └─ Verify imaging services (if server-dependent)

PRIORITY 5: Workstations
   ├─ Boot workstations (or ipconfig /renew if already on)
   ├─ Verify domain login works
   ├─ Verify dental software launches and data is accessible
   ├─ Verify printing works
   └─ Verify imaging works

PRIORITY 6: Peripheral Systems
   ├─ Printers, scanners, label printers
   ├─ VoIP phones (if on network)
   └─ WiFi APs (for guest/patient WiFi)

Don't let the client start using workstations until Priority 3 is confirmed. If the server isn't fully up and workstations connect with cached credentials, dental software may start in a degraded state or create data sync issues.

7. Level 4 — Total Loss Recovery

Scenario: Everything is destroyed, encrypted, or compromised. Starting from backup only.

7.1 Ransomware-Specific Steps

Step	Action	Done?
1	Isolate ALL affected systems from network (pull cables, disable WiFi)	☐
2	DO NOT power off systems (RAM forensics may be needed)	☐
3	DO NOT pay the ransom	☐
4	Notify DTC management immediately — this may require cyber insurance claim	☐
5	Document everything: screenshots of ransom notes, encrypted file extensions, timeline	☐
6	Determine scope: which systems are encrypted? Are backups affected?	☐
7	Are Veeam backups intact? (If Veeam backup files are also encrypted → worst case)	☐
8	Are off-site / cloud backup copies available?	☐
9	Determine ransomware variant: check nomoreransom.org for known decryptors	☐
10	Plan clean rebuild: all affected systems must be wiped and rebuilt from scratch. Never trust a system that was compromised.	☐

7.2 Total Loss Recovery Sequence

Phase	Action	Estimated Time
1	Procure replacement hardware (or clean/wipe existing hardware)	Hours to days
2	Rebuild network infrastructure (firewall, switches, APs) from config backups or DTC templates	2–4 hours
3	Rebuild server: fresh OS install, rejoin/recreate domain	2–4 hours
4	Restore data from Veeam: database files, dental software data, shared files	2–6 hours
5	Reinstall dental software on server, point to restored database	1–3 hours
6	Rebuild workstations: follow New Workstation Deployment SOP for each	1–2 hours each
7	Reconfigure dental software on workstations	30 min each
8	Verify all systems operational	1–2 hours
9	Reconfigure Veeam backup to protect new environment	30 min

Realistic total for complete rebuild: 12–24+ hours of labor

For ransomware recovery: the practice will likely be down for 1–3 business days minimum. Set expectations with the client early and honestly.

8. Critical Contact List During Disaster

Contact	When to Call	Number
DTC Management / T3 Lead	Immediately for Level 2+	[Internal contact]
Client primary contact	After initial assessment — provide status and timeline	From HALO ticket
Microsoft Support	If M365/Entra/Exchange issues during recovery	800-642-7676
Veeam Support	If backup restore fails or backup files appear corrupted	614-339-8200
ISP	If internet circuit is down as part of the disaster	Client-specific (check HALO/documentation)
Dental software vendor	If application won't start after data restore (licensing, activation)	See Vendor Escalation Quick Reference
Cyber insurance provider	Ransomware or data breach confirmed	Client's policy (check with client/AM)

9. What the Practice Can Do While You Recover

Help the client stay partially operational during the outage:

If This Is Down	The Practice Can Still...
Server (PMS down)	See patients using paper charts, collect copays manually, reschedule non-urgent
Imaging only	See patients for non-imaging procedures, take impressions, do cleanings
Internet only	Use PMS and imaging (they're local), no insurance verification or claim submission
Everything	Triage: see emergency patients only, handwrite notes, collect payment manually, call patients directly to reschedule

Providing this guidance to the office manager during a disaster reduces their stress and keeps the practice generating some revenue while you work the recovery.

10. Post-Disaster Review

After recovery is complete and the practice is operational, conduct a post-incident review within 48 hours:

Question	Document In HALO
What happened? (Root cause)	Exact failure — hardware, software, human error, security incident
When did it happen?	Timeline from first symptom to full recovery
How long was the practice down?	Total downtime in hours
What was the recovery path?	Which Veeam method was used, what was the restore point date/time
Was any data lost?	Gap between last backup and failure = data loss window
What worked well in the recovery?	—
What should be improved?	Backup frequency? Off-site copy? Recovery media pre-staged? Documentation gaps?
What preventive measures should be implemented?	UPS replacement, backup verification schedule, hardware refresh, security hardening

11. Preventive Measures Checklist

These are the things that prevent disasters or minimize their impact. Reference: Recurring Maintenance Checklist (when created).

Measure	Frequency	Owner
Veeam backup verification (job status check)	Daily (automated alerts), weekly (manual review)	T1/T2
Veeam test restore (actually restore a file and verify)	Monthly	T2
Veeam Recovery Media created and stored for each client	Annually or after server hardware change	T2
VM checkpoint cleanup (remove old checkpoints)	After every maintenance window	T2
UPS battery test	Quarterly	T2
Server hardware health check (SMART, RAID, temps)	Monthly via NinjaRMM alerts	T1
Off-site / cloud backup copy verified	Monthly	T2
Firewall config exported and stored	After every change	T2
DNS records documented	After every change	T2
Emergency contact list current in HALO	Quarterly	AM

Document	When to Reference
Pre-Upgrade / Pre-Flight Checklist	Should have been run before any change that led to disaster
Veeam Backup Alert Response Guide	For understanding backup health before disaster strikes
New Workstation Deployment SOP	For rebuilding workstations after total loss
Network Assessment Guide & Checklist	For understanding client network architecture during infrastructure recovery
Vendor Escalation Quick Reference	For contacting dental software vendors during application recovery
M365 Tenant Administration Transition SOP	If M365 services are affected and need recovery
HALO Ticket Documentation Standard	For documenting the disaster and recovery in the ticket

13. Document Control

Version	Date	Author	Changes
1.0	February 2026	IT Support Engineering	Initial release. Four-level disaster classification, recovery decision trees, Veeam restore procedures (file-level, full VM, bare metal, instant VM recovery), infrastructure restoration priority order, ransomware response, post-disaster review template, preventive measures checklist.

Confidential — Internal Use Only

How to Create MSP360/Cloudberry Accounts for New Employees

NinjaOne Image Backup Plan Configuration Standard

NinjaOne Backup — Architecture Deep Dive: Lockhart, Cloud Storage & Hybrid Model

NinjaOne Backup — Agent Won't Install: TLS 1.2 & Prerequisites

NinjaOne Backup — Monthly Health Verification Checklist

NinjaOne Backup — MSP360 vs. NinjaOne: What Changes for DTC Techs

NinjaOne Backup — NinjaOne Support Escalation: When to Call & What to Bring

NinjaOne Backup — Lockhart Service: Start, Stop, Restart & Status Checks

NinjaOne Backup — Backup Integrity: Manual Verification & Spot-Check Procedure

NinjaOne Backup — Migration Verification: First Successful Backup Checklist

NinjaOne Backup — Post-Migration: Confirming Cloud Sync is Working

NinjaOne Backup — Agent Not Showing / Backup Not Appearing After Installation

NinjaOne Backup — NAS Setup & Best Practices for DTC Sites

NinjaOne Backup — Decommissioning MSP360 at a Migrated Site

NinjaOne Backup — Parallel Run: Monitoring Both Platforms During Transition

NinjaOne Backup — Client Communication Template: Backup Platform Change

NinjaOne Backup — Backup Won't Start / Stuck on "Backup Started"

NinjaOne Backup — Log File Locations & How to Read Them

NinjaOne Backup — VSS Error 132: Overview & General Triage

NinjaOne Backup — Error 303: NAS Path Not Configured on Device

NinjaOne Backup — Error 360: Cloud Communication Error

NinjaOne Backup — Error 13: Access Denied (NTFS Permissions)

NinjaOne Backup — Error 315: NAS Authentication Failed

NinjaOne Backup — File & Folder Restore: Complete Procedure

NinjaOne Backup — Error Code Master Reference

NinjaOne Backup — VSS DLL Re-registration & Writer Repair Procedure

NinjaOne Backup — Error 305: Unable to Access Local Storage

NinjaOne Backup — Error 131: Connection Lost During Backup

NinjaOne Backup — Error 5: EFS-Encrypted File Access Denied

NinjaOne Backup — Error 316: No Host Found for Network Storage

NinjaOne Backup — Image Restore: Bare Metal & Different Hardware Recovery

NinjaOne Backup — Backup Summary Report: Generating & Interpreting

NinjaOne Backup — Lockhart High CPU/Disk Usage & ReFS Interaction

NinjaOne Backup — Error 307: Low Disk Space Preventing VSS Snapshot

NinjaOne Backup — Error 306: Snapshot Deleted While Uploading

NinjaOne Backup — Error 10053 & 10054: Connection Aborted / Reset

NinjaOne Backup — Error 20: Individual File Deleted from Backup Path

NinjaOne Backup — Error 317: Unable to Request Credentials

NinjaOne Backup — Restore Fails: No Data Available & Device Not in Drop-Down

NinjaOne Backup — Error 327: VSS Writer Error (Image Backup)

NinjaOne Backup — Error 308: Unable to Determine Free Space

NinjaOne Backup — Network Allowlist & Firewall Requirements

NinjaOne Backup — Error 313 & 314: File Not Found / Inconsistent File

NinjaOne Backup — Mounting an Image to the Cloud for File-Level Recovery

NinjaOne Backup — Error 121: Windows Semaphore Timeout

NinjaOne Backup — Error 310: Unable to Backup Volume

NinjaOne Backup — Error 318: Network Storage Not Defined

NinjaOne Backup — Error 311: Integrity Check Failed

NinjaOne Backup — Error 312: Backup Repository Root Folder Missing

NinjaOne Backup — Error 344: NAS Storage Low Space (Warning)

NinjaOne Backup — Error 342: NAS Write Error

NinjaOne Backup — Error 150: Backup Database Error

NinjaOne Backup — SMB Credentials Rejected (System Error 86) Despite Correct Password: LmCompatibilityLevel / NTLMv2

Veeam Backup Daily Operations & Verification SOP

Veeam BDR Deployment SOP

Veeam Backup and Replication Standards

Adding & Replacing Computers in Veeam BDR

Veeam Troubleshooting Playbook

MariaDB Crash-Consistent Backup — Missing InnoDB Tablespace Files

Veeam IR Mount Instability During OS-Level Changes

Veeam & BDR Troubleshooting Guide

BDR Storage Alerts & Capacity Issues

Agent & Endpoint Offline

Performance & Slow Backups

BDR Offline & Connectivity

Veeam Console Connection & Permission Errors

TrueNAS Cloud Sync Provisioning SOP

Synology NAS — Google Workspace Backup Configuration SOP