Veeam Troubleshooting Playbook

Location: Backup & Disaster Recovery → Troubleshooting (Chapter 1088) Version: 1.0 Last Updated: March 2026 Applies To: Veeam Backup & Replication v12.x on DTC Custom BDR Appliances Audience: T1 / T2 / T3

How to Use This Playbook

Each scenario follows the same structure: Symptoms → Quick Check → Tiered Response → Root Cause Notes. Start at T1. If T1 steps don't resolve, escalate to T2, then T3. If T3 cannot resolve, open a case with Veeam Support.

Tier definitions for this document:

Tier	Scope	Expected Resolution
T1	Retry, verify, collect info	15 minutes or less
T2	Investigate, targeted fix, service-level changes	30–60 minutes
T3	Rebuild, architectural fix, vendor escalation	1+ hours, may require maintenance window

Scenario 1: VSS Writer Failures

Symptoms: Backup job fails or hangs mid-progress (often on C: drive). Veeam logs show "VSS writer failed" or "Writer [name] is in Failed state." Job may stall at a fixed percentage for hours with throughput dropping to 0 KB/s.

Most Common Writer: NTDS (Active Directory) — causes State: [11] Failed, Last error: Non-retryable error. This blocks ALL VSS-based operations on that volume because other writers queue behind it in "Waiting for completion."

HALO Reference: Ticket 1117116 — NTDS writer failure caused C: drive backup to hang at 68% for 1:38+ with 0 KB/s throughput. D: and F: completed fine because NTDS writer only affects the system volume.

T1 — Retry

Stop the failed/hung backup job in the Veeam console (right-click → Stop).
On the source server, check writer status:
```
vssadmin list writers
```
Look for any writer showing State: Failed, Waiting for completion, or Timed out.
If all writers show Stable — retry the job. Sometimes a transient lock causes a one-time failure.
If any writer shows Failed — escalate to T2.

T2 — Investigate

Reset VSS services (does NOT affect AD or server availability):

net stop vss /y
net stop swprv /y
net start swprv
net start vss

Re-check writers: vssadmin list writers | findstr /i "Failed Error Waiting"
If the NTDS writer is still Failed, restart the AD service. This briefly interrupts domain authentication (seconds):
```
net stop ntds /y
net start ntds
```
Verify NTDS recovered:
```
vssadmin list writers | Select-String -Pattern "NTDS" -Context 0,4
```
Expected: State: [1] Stable, Last error: No error
If NTDS is stable — retry the backup job.
If NTDS is still Failed after service restart — escalate to T3.

T3 — Rebuild

If the writer survives service restarts in a Failed state, a full server reboot is required.
Before rebooting: Confirm no active backup jobs, no VSS snapshots, and no competing backup products running (MSP360, Acronis, etc. — two backup products taking VSS snapshots simultaneously will deadlock):
```
Get-Service Veeam* | Select Name, Status
vssadmin list shadows
```
If the Veeam BDR still shows the job as "Running" — reboot the BDR first to release the remote VSS session, then reboot the source server.
After reboot, confirm all writers are Stable before restarting the job.
If the problem recurs across multiple job runs — investigate application-level corruption. Run sfc /scannow and check Event Viewer > Application for recurring VSS errors. Component store corruption may require a repair install.

Root Cause Notes: VSS writer failures are most commonly caused by stale VSS sessions from previous failed/cancelled backup jobs, competing backup software holding VSS locks, or NTDS database inconsistency after unclean shutdowns. On physical servers without out-of-band management (iDRAC/IPMI), rebooting carries risk if the server hangs during POST — confirm physical console access is available or scheduled.

Scenario 2: Repository Full / Insufficient Disk Space on BDR

Symptoms: Job fails with "Insufficient disk space on repository" or "Failed to create snapshot." BDR local storage is at or near capacity.

T1 — Retry

Check BDR disk space:

Get-WmiObject Win32_LogicalDisk | Select DeviceID, @{N='FreeGB';E={[math]::Round($_.FreeSpace/1GB,2)}}, @{N='TotalGB';E={[math]::Round($_.Size/1GB,2)}}

If free space is above 10% — this may be a transient issue (large incremental + merge). Retry the job.
If free space is below 10% — escalate to T2.

T2 — Investigate

Open the Veeam console → Backup Infrastructure → Backup Repositories. Check the "Free" column.
Review retention policy: Right-click the job → Edit → Storage → Retention policy. If retention is set higher than necessary (e.g., 30 days on a small repo), reduce to DTC standard and run an Active Full to trigger merge.
Check for orphaned backups: Backups → Disk — look for backup chains that are no longer tied to an active job. Right-click → Delete from disk if confirmed orphaned.
Check for deleted VMs still consuming space: Backups → Disk → [Job name] — expand and look for VMs that no longer exist but still have restore points.
Verify no stale Veeam .vbk/.vib/.vrb merge files in the repository path — these can accumulate after interrupted merge operations.

T3 — Rebuild

If the repo is legitimately undersized for the workload — this is a capacity planning conversation with the Account Manager. Document current usage, growth rate, and retention requirements.
If orphaned merge files are consuming significant space and can't be cleaned through the Veeam console — manually remove from the repository directory after confirming they're not part of an active chain (check the .vbm metadata file).
Consider restructuring: move large/static volumes (D: data drives) to a separate job with lower retention, keep C: system volumes on standard retention.
If recurrent — evaluate adding storage to the BDR or deploying a secondary repository.

Root Cause Notes: Repository full is often a retention misconfiguration issue at initial deployment. DTC standard BDR specs should be sized for the client's data footprint plus 2x growth headroom. This scenario also occurs when stale backup chains from decommissioned servers aren't cleaned up during migrations.

Scenario 3: Network Timeout Between Subnets

Symptoms: Backup jobs from servers or workstations on a different subnet than the BDR fail with "Connection timed out," "Failed to connect to host," or throughput drops to 0 and stalls. Jobs on the same subnet as the BDR work fine.

HALO Reference: Ticket 1117116 — Cross-subnet Veeam transport between 192.168.1.x (source server) and 192.168.2.x (BDR) was fundamentally unstable. Throughput would start at 44 MB/s then flatline.

T1 — Retry

From the BDR, test basic connectivity to the source:

Test-NetConnection -ComputerName [SOURCE_IP] -Port 445
ping [SOURCE_IP]

If ping works but TCP fails — likely a firewall issue. Check Windows Firewall on both sides.
If both work — retry the job. Transient network blips on cross-subnet routes can cause one-time failures.

T2 — Investigate

Verify routing between subnets — the UDM should handle inter-VLAN routing. Confirm both subnets can reach each other's gateway.
Check for MTU issues on the cross-subnet path:
```
ping [SOURCE_IP] -f -l 1472
```
If this fails but normal ping works — MTU mismatch causing fragmentation. Check UDM VLAN interface MTU settings.
Check if the BDR is on the correct VLAN/subnet. DTC standard is BDR on the server VLAN.
Verify no bandwidth throttling is configured in Veeam: Main Menu → General Options → Network Traffic.
If the source server has multiple NICs — confirm Veeam is binding to the correct interface (see Scenario 7).

T3 — Rebuild

If cross-subnet transport is fundamentally unstable (recurring stalls/timeouts across multiple job runs), deploy a Veeam proxy on the same subnet as the source. This keeps data transport local and only metadata crosses subnets.
If proxy deployment isn't feasible — consider Disk2VHD as a fallback for one-time P2V migrations (not for ongoing backup).
Document the subnet layout and routing path in the HALO ticket for future reference.

Root Cause Notes: Cross-subnet Veeam transport relies on Windows networking stack and inter-VLAN routing quality. Asymmetric routing, MTU mismatches, and firewall inspection on inter-VLAN traffic are the most common culprits. The UDM's IPS/DPI engine can also interfere with large sustained data streams — check IPS logs for blocked traffic if other causes are ruled out.

Scenario 4: Agent Deployment Failures (Intune Policy Conflicts)

Symptoms: Pushing the Veeam Agent from the BDR console fails with "Failed to resolve host name," "Failed to connect to [hostname] on port 6160/11731," or "Access is denied." Disabling Windows Firewall locally doesn't help — it re-enables itself.

HALO Reference: Ticket 1117116 — DVA-ST-DOC01 was managed by the client's Intune tenant. Intune firewall policies overrode local Set-NetFirewallProfile -Enabled False. Microsoft Defender for Endpoint's Web Threat Defense Service silently dropped inbound connections independently of Windows Firewall.

T1 — Retry

Verify DNS resolution from the BDR:
```
nslookup [TARGET_HOSTNAME]
```
If resolution fails — the DNS record is missing. Add an A record or use the IP address directly in the Veeam job.

Test Veeam deployment ports from the BDR:

Test-NetConnection -ComputerName [TARGET_IP] -Port 6160
Test-NetConnection -ComputerName [TARGET_IP] -Port 11731

If port tests fail — check Windows Firewall on the target. If it's domain-joined and Intune-managed, local firewall changes will be overwritten by policy. Escalate to T2.

T2 — Investigate

On the target machine, check if Intune/MDM is managing the firewall:
```
Get-Service mpssvc | Select Status
Get-NetFirewallProfile | Select Name, Enabled
dsregcmd /status
```
If dsregcmd shows AzureAdJoined: YES — the device is Intune-managed.
Check for Defender for Endpoint network protection:
```
Get-Service webthreatdefsvc | Select Status
```
If running — this filters traffic independently of Windows Firewall and can silently block Veeam deployment.
Workaround (immediate): Install the Veeam Agent manually on the target instead of pushing from the BDR. Download the agent installer from the Veeam console or copy from the BDR's installation share. Run locally, point to the BDR. This bypasses the deployment kit entirely.
Proper fix: If DTC has access to the client's Intune portal — create firewall exceptions for Veeam ports (TCP 6160, 11731) under Endpoint Security → Firewall. Also check Attack Surface Reduction → Network Protection.

T3 — Rebuild

If the client or previous MSP manages Intune and won't grant DTC access — document the required firewall rules and submit as a change request to whoever manages the tenant.
Required Intune firewall exceptions for Veeam Agent deployment:
- TCP 6160 inbound (Veeam Installer Service)
- TCP 11731 inbound (Veeam Data Mover)
- Source: BDR IP address
Until Intune policy is updated, use manual agent installation as the standard deployment method for that client's Intune-managed endpoints.
Document in HALO which clients have Intune-managed endpoints so future techs know to expect this.

Root Cause Notes: This is increasingly common as dental practices adopt Microsoft 365 Business Premium (includes Intune and Defender for Endpoint). The key tell is that disabling Windows Firewall locally doesn't stick — Intune pushes the policy back. Defender for Endpoint's network protection layer (webthreatdefsvc) is a separate blocker that most techs don't know about. Always check dsregcmd /status first on any deployment failure.

Scenario 5: SQL Backup Consistency Errors

Symptoms: Application-aware processing fails on servers running SQL Server (common with Dentrix, Eaglesoft, Open Dental). Veeam logs show "SQL Writer failed," "Database consistency check failed," or "Transaction log backup failed."

T1 — Retry

Check if the dental software's own database maintenance ran recently — Dentrix and Eaglesoft both have scheduled database optimization tasks that lock SQL during execution.
Verify the SQL Server service is running:
```
Get-Service MSSQL* | Select Name, Status
```
If SQL is running and no maintenance tasks are active — retry the job.

T2 — Investigate

Check the SQL Writer status:

vssadmin list writers | Select-String -Pattern "SqlServerWriter" -Context 0,4

If the SQL Writer is Failed — restart the SQL Server service (schedule with the practice if during business hours, as this briefly interrupts the PMS).
Check for database corruption — run a consistency check:
```
DBCC CHECKDB ('DatabaseName') WITH NO_INFOMSGS
```
Replace DatabaseName with the actual dental software database name (check SQL Server Management Studio or the dental software documentation for the DB name).
Review Veeam's Guest Processing settings for the job: Right-click job → Edit → Guest Processing. Verify that application-aware processing is using the correct credentials (domain admin or a SQL-privileged service account).
Check if transaction logs are accumulating — if the database is in Full recovery mode and no log backups are configured, the transaction log will grow until it fills the disk.

T3 — Rebuild

If database corruption is confirmed — this is a dental software vendor escalation (Patterson for Eaglesoft, Henry Schein for Dentrix). Document the DBCC output and engage the vendor.
If transaction log growth is the issue — evaluate switching the database recovery model to Simple (discuss with the dental software vendor first, as some explicitly require Full recovery).
If application-aware processing consistently fails — as a workaround, disable guest processing on the job and rely on crash-consistent backups until the SQL issue is resolved. Document this trade-off in the HALO ticket.

Root Cause Notes: SQL consistency failures most often stem from the dental PMS's own maintenance tasks conflicting with the backup window, or from databases that have accumulated corruption over time. Dentrix in particular runs a "Database Maintenance" utility that holds exclusive locks. Schedule Veeam jobs to avoid overlap with PMS maintenance windows.

Scenario 6: Backup Chain Integrity Breaks

Symptoms: Veeam reports "Backup chain is broken" or "Required restore point is missing." Incremental jobs fail because they can't find the parent .vbk or previous .vib. Health check shows chain as Unhealthy.

T1 — Retry

Open the Veeam console → Backups → Disk → right-click the affected backup → Properties. Check if any restore points show as missing or corrupt.
If the chain shows a gap but the most recent full backup exists — run an Active Full backup (right-click job → Active Full). This creates a new base .vbk and resets the chain.
If the Active Full completes successfully — the chain is repaired. Subsequent incrementals will build from the new base.

T2 — Investigate

Check the repository path for the affected job — look for orphaned .vib files without a corresponding .vbk, or .vbm (metadata) files that reference missing restore points.
If files were manually deleted or moved from the repository — the chain is permanently broken for those points. Run Active Full to establish a new base.
Check if the repository is on a drive with errors:
```
chkdsk [DRIVE_LETTER]: /scan
```
If CBT (Changed Block Tracking) data is corrupted — the incremental can't determine what changed. Reset CBT by running an Active Full. This was a documented issue on Ticket 1117116 where corrupted CBT/digest data on a physical server caused incrementals to stall repeatedly at the same block.

T3 — Rebuild

If the chain is unrecoverable and multiple restore points are missing — delete the backup chain from disk (after confirming no critical restore points are needed) and start fresh with a new Full.
If chain breaks recur — check for underlying storage issues on the BDR (SMART status, disk errors in Event Viewer).
For physical-to-virtual migrations where chain integrity fails repeatedly — pivot to Disk2VHD as a direct P2V method rather than continuing to troubleshoot backup-based migration.
Review the job schedule — chain breaks often happen when an Active Full is interrupted (power loss, manual stop, network disconnect during the full backup window).

Root Cause Notes: Chain integrity is most commonly broken by interrupted full backups, manual deletion of files from the repository path, or storage-level corruption. CBT corruption on the source (especially physical servers) causes incrementals to fail silently. When in doubt, Active Full resets everything.

Scenario 7: Dual-NIC / Defender for Endpoint Interference

Symptoms: BDR or Hyper-V host has two NICs on the same subnet, both with IP addresses. Backup jobs intermittently fail with timeouts or stalls. Ping may work but TCP connections fail. ARP table shows duplicate or inconsistent entries for the target.

HALO Reference: Ticket 1117116 — BDR (DTCBSURE-5001) had Ethernet at .2.166 and Ethernet 2 at .2.215 on the same subnet with the same gateway. This caused asymmetric routing — Windows sent traffic out one NIC, the response came back to the other, and TCP sessions broke. ICMP (ping) worked because it's stateless.

T1 — Retry

On the BDR or Hyper-V host, check NIC configuration:

Get-NetAdapter | Select Name, Status, LinkSpeed
Get-NetIPAddress -AddressFamily IPv4 | Select InterfaceAlias, IPAddress

If two NICs have IPs on the same subnet — this is the problem. Escalate to T2.
Flush ARP on both sides and retry as a quick test:
```
arp -d *
```

T2 — Investigate

Determine which NIC Veeam is bound to — check the Veeam console server management IP.

Immediate fix: Disable the NIC Veeam isn't using:

Disable-NetAdapter -Name "[UNUSED_NIC_NAME]" -Confirm:$false

Flush ARP on both BDR and source, re-enable any disabled firewalls, and retry.
If this is a Hyper-V host — check if the SET (Switch Embedded Teaming) vSwitch management vNIC is disabled:
```
Get-VMSwitch | Select Name, SwitchType, EmbeddedTeamingEnabled
Get-VMNetworkAdapter -ManagementOS | Select Name, SwitchName, IPAddresses, Status
```
If the management vNIC is disabled and physical NICs have individual IPs — the SET team was either never completed or was torn down.

T3 — Rebuild

For Hyper-V hosts: rebuild the SET team properly — both physical NICs teamed into a single vSwitch with one management vNIC holding a single IP. Remove IPs from individual physical NICs.
For BDR appliances: if dual NICs are by design (production + backup network), ensure they're on different subnets. Two NICs on the same subnet with the same gateway will always cause asymmetric routing.
Check for Defender for Endpoint interference if the target is Intune-managed:
```
Get-Service webthreatdefsvc | Select Status
```
Defender for Endpoint's network protection filters traffic independently of Windows Firewall. Even with firewall disabled, Defender can silently drop inbound connections.
If Defender is the blocker — see Scenario 4 for the Intune policy resolution path.

Root Cause Notes: Dual-NIC same-subnet is the most common "invisible" networking issue in DTC environments. It produces symptoms that look like firewall problems (TCP fails, ICMP works) because the asymmetric routing breaks stateful TCP connections but not stateless ICMP. Always check Get-NetIPAddress early in troubleshooting when connectivity is inconsistent. For Hyper-V hosts, the management vNIC being disabled is a common post-migration issue.

Scenario 8: DNS Resolution Failures

Symptoms: Veeam job fails with "Failed to resolve host name [hostname] from [BDR_NAME]." The BDR can't find the source server by hostname. May also manifest as agent deployment failures (see Scenario 4).

T1 — Retry

From the BDR, test name resolution:
```
nslookup [TARGET_HOSTNAME]
```
If resolution fails — check what DNS server the BDR is using:
```
Get-DnsClientServerAddress -AddressFamily IPv4
```
DTC standard: DNS should point to the domain controller (if domain-joined) or the UDM gateway. If the BDR is pointing at an external DNS (8.8.8.8, 1.1.1.1) — it won't resolve internal hostnames.
Quick fix — use the target's IP address directly in the Veeam job instead of hostname. This gets the backup running while DNS is fixed.

T2 — Investigate

Check if the target server has a DNS A record:

# On the domain controller
Get-DnsServerResourceRecord -ZoneName "[DOMAIN.COM]" -Name "[TARGET_HOSTNAME]"

If the record is missing — add it manually or force a DNS registration from the target:
```
# On the target server
ipconfig /registerdns
```
If the BDR is not domain-joined (common for DTC BDR appliances) — it relies on the DNS server configured on its NIC to resolve internal names. Verify it's pointing at the client's DC or a DNS server that hosts the internal zone.
Check for DNS scavenging — stale records may have been cleaned up if the target hasn't refreshed its registration.

T3 — Rebuild

If DNS infrastructure is unreliable — configure Veeam jobs to use IP addresses instead of hostnames as a permanent workaround. Document this in the job notes.
For environments transitioning DNS to the UDM per DTC standard — ensure the UDM's DNS is configured to resolve internal domain names (either as a conditional forwarder to the DC or hosting the zone).
If the BDR consistently can't resolve names — add static host entries as a last resort:
```
notepad C:\Windows\System32\drivers\etc\hosts
```
Add: [IP_ADDRESS] [HOSTNAME]

Root Cause Notes: DNS failures are the most overlooked Veeam issue. BDR appliances that aren't domain-joined often default to DHCP-assigned DNS (usually the UDM gateway), which may not resolve internal Active Directory hostnames. During client onboardings, verify the BDR's DNS points to a server that can resolve all backup targets.

Scenario 9: Storage Filled by Incorrect Backup Settings or Stale Backups

Symptoms: BDR or source server disk fills up. Backups fail with disk space errors. Investigation reveals excessive retention, forgotten backup jobs for decommissioned servers, or backup jobs writing to unintended locations.

T1 — Retry

Identify what's consuming space — on the BDR:

Get-ChildItem -Path "[REPO_PATH]" -Recurse | Group-Object Extension | Sort-Object @{E={($_.Group | Measure-Object Length -Sum).Sum}} -Descending | Select Name, Count, @{N='SizeGB';E={[math]::Round(($_.Group | Measure-Object Length -Sum).Sum/1GB,2)}}

Check for obvious issues in the Veeam console: Backups → Disk — look for backup chains belonging to servers that no longer exist.
If the issue is the source server's disk (not the BDR) — check C:\Windows\Installer size (see Scenario 10).

T2 — Investigate

Audit all backup jobs — compare the list of active Veeam jobs against the client's current server/workstation inventory. Flag any jobs targeting decommissioned or migrated systems.
Check retention settings on every job — DTC standard retention should be documented per client. Common over-provisioning: 30-day retention on a repo sized for 14 days.
Look for duplicate jobs — techs sometimes create new jobs after migrations without disabling or removing the old ones, resulting in double backup storage consumption.
Check if any jobs are writing to non-standard locations (C: drive instead of the dedicated repo volume, for example).
Check for GFS (Grandfather-Father-Son) retention that's accumulating monthly/yearly fulls nobody intended: Right-click job → Edit → Storage → Configure secondary destinations.

T3 — Rebuild

Remove stale backup chains: right-click in Backups → Disk → Delete from disk. Verify with the team lead before deleting.
Restructure retention if the repo can't support the current settings — reduce retention or add storage.
Implement a quarterly backup audit practice — review all jobs per client, validate targets still exist, confirm retention is appropriate for repo size.
If the problem is at the source server level — see Scenario 10 for the Windows Installer pattern and HALO 1125653.

Root Cause Notes: "Storage filled" is usually a process problem, not a technical one. It accumulates over months: a server gets migrated, the old job stays active, retention builds, nobody notices until the repo is full. Post-migration cleanup checklists should include a Veeam job audit step. DTC's Automation System Prompts document includes plans for a NinjaRMM-based Orphaned Installer Patch Monitor that addresses the source-side disk exhaustion pattern.

Scenario 10: Disk Space Exhaustion from Windows Installer (Orphaned Patches)

Symptoms: Source server or workstation C: drive fills up over time with no obvious cause. Investigation reveals C:\Windows\Installer is consuming 20–100+ GB. Backups fail because VSS can't create snapshots on a full volume.

HALO Reference: Ticket 1125653 — Dental workstation accumulated 128 GB of orphaned .msp/.msi patches in C:\Windows\Installer (57% of total disk). This caused cascading backup failures, application instability, and near-total disk exhaustion.

T1 — Retry

Check disk space and the Installer folder size:

Get-WmiObject Win32_LogicalDisk -Filter "DeviceID='C:'" | Select @{N='FreeGB';E={[math]::Round($_.FreeSpace/1GB,2)}}, @{N='TotalGB';E={[math]::Round($_.Size/1GB,2)}}

(Get-ChildItem -Path "C:\Windows\Installer" -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object Length -Sum).Sum / 1GB

DTC thresholds: Warning if Installer > 20 GB, Critical if > 50 GB.
If Installer is under 20 GB and disk has > 20% free — the backup failure is likely something else. Retry the job.
If Installer is over 20 GB — escalate to T2.

T2 — Investigate

Run DISM component cleanup (safe, no reboot required, does NOT use /ResetBase):
```
DISM /Online /Cleanup-Image /StartComponentCleanup
```
This cleans superseded WinSxS components — won't touch C:\Windows\Installer directly, but frees related space.
Run Windows Disk Cleanup with system file option:
```
cleanmgr /sageset:1
```
Select all categories, especially Windows Update Cleanup and Previous Windows Installations.

Check for other space consumers:

# Check WinSxS size
DISM /Online /Cleanup-Image /AnalyzeComponentStore

# Check Windows Update cache
(Get-ChildItem "C:\Windows\SoftwareDistribution\Download" -Recurse -Force -ErrorAction SilentlyContinue | Measure-Object Length -Sum).Sum / 1GB

If C:\Windows\Installer is the primary consumer and is > 50 GB — escalate to T3 for orphaned patch cleanup.

T3 — Rebuild

⛔ WARNING: Do NOT blindly delete files from C:\Windows\Installer. Referenced files are required by installed applications. Deleting referenced files breaks MSI repair, uninstall, and update operations.

Identify orphaned (unreferenced) files using registry queries — enumerate the Installer registry database to determine which .msp/.msi files are still referenced by installed products:
- Referenced .msi files: HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Products\[GUID]\InstallProperties\ → LocalPackage value
- Referenced .msp files: HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\Installer\UserData\S-1-5-18\Patches\[GUID]\ → LocalPackage value
- Any file in C:\Windows\Installer NOT in the referenced list = orphaned
Do NOT use Win32_Product WMI class — it triggers a consistency check/reconfigure on every installed MSI and can take 20+ minutes while destabilizing applications.
Quarantine orphaned files before deleting — move to C:\DTC\InstallerCleanup\Quarantine[date]\ and monitor for 30 days.
If disk is critically full (< 5% free) and quarantining would make it worse — delete orphans directly with detailed logging of every file removed.
DTC is developing an automated NinjaRMM monitoring and cleanup solution for this pattern (ref: Orphaned Installer Patch Monitor project in Automation System Prompts). Until that's deployed, this is a manual T3 operation.

Root Cause Notes: Windows never cleans up C:\Windows\Installer on its own. Every MSI install, update, and patch caches files here. Over years, especially on dental workstations with frequent PMS updates (Dentrix, Eaglesoft, etc.), orphaned patches accumulate silently. This is a ticking time bomb — monitor proactively via NinjaRMM custom fields once the automated solution is deployed.

Quick Reference — Escalation Summary

#	Scenario	T1 Action	T2 Action	T3 Action
1	VSS Writer Failure	Stop job, check writers, retry	Reset VSS/NTDS services	Full server reboot
2	Repository Full	Check disk, retry if > 10% free	Review retention, clean orphaned chains	Capacity planning / add storage
3	Network Timeout (Cross-Subnet)	Test connectivity, retry	Check MTU, routing, throttling	Deploy same-subnet proxy
4	Agent Deployment (Intune)	Test DNS and ports	Manual agent install workaround	Intune policy change request
5	SQL Consistency	Check SQL service, retry	Reset SQL Writer, run DBCC	Vendor escalation / recovery model change
6	Chain Integrity Break	Run Active Full	Check repo storage, reset CBT	Delete chain and rebuild / Disk2VHD fallback
7	Dual-NIC / Defender	Check NICs, flush ARP	Disable unused NIC	Rebuild SET team / Intune policy fix
8	DNS Resolution	nslookup, use IP as workaround	Fix DNS record, verify BDR DNS config	Static hosts / DNS infrastructure fix
9	Stale Backups / Settings	Audit jobs vs. inventory	Fix retention, remove stale jobs	Quarterly audit process
10	Windows Installer Bloat	Check folder size vs. thresholds	DISM cleanup, Disk Cleanup	Registry-based orphan detection and cleanup

Escalation to Veeam Support

When T3 cannot resolve the issue:

Collect Veeam logs before calling — in the Veeam console: Main Menu → Help → Support Information → Export. This generates a .zip of all logs.
Note the exact error message and job session ID from the Veeam console.
Open a case with Veeam Support directly.
Document the Veeam case number in the HALO ticket for tracking.

Ticket	Relevance
1125653	Windows Installer disk exhaustion pattern — 128 GB orphaned patches, cascading backup failures
1117116	Server migration: VSS NTDS writer failure, dual-NIC asymmetric routing, cross-subnet transport instability, Disk2VHD fallback, Intune/Defender agent deployment block

Document History

Version	Date	Author	Changes
1.0	March 2026	[Author]	Initial creation — 10 scenarios from HALO ticket analysis

How to Create MSP360/Cloudberry Accounts for New Employees

NinjaOne Image Backup Plan Configuration Standard

NinjaOne Backup — Architecture Deep Dive: Lockhart, Cloud Storage & Hybrid Model

NinjaOne Backup — Agent Won't Install: TLS 1.2 & Prerequisites

NinjaOne Backup — Monthly Health Verification Checklist

NinjaOne Backup — MSP360 vs. NinjaOne: What Changes for DTC Techs

NinjaOne Backup — NinjaOne Support Escalation: When to Call & What to Bring

NinjaOne Backup — Lockhart Service: Start, Stop, Restart & Status Checks

NinjaOne Backup — Backup Integrity: Manual Verification & Spot-Check Procedure

NinjaOne Backup — Migration Verification: First Successful Backup Checklist

NinjaOne Backup — Post-Migration: Confirming Cloud Sync is Working

NinjaOne Backup — Agent Not Showing / Backup Not Appearing After Installation

NinjaOne Backup — NAS Setup & Best Practices for DTC Sites

NinjaOne Backup — Decommissioning MSP360 at a Migrated Site

NinjaOne Backup — Parallel Run: Monitoring Both Platforms During Transition

NinjaOne Backup — Client Communication Template: Backup Platform Change

NinjaOne Backup — Backup Won't Start / Stuck on "Backup Started"

NinjaOne Backup — Log File Locations & How to Read Them

NinjaOne Backup — VSS Error 132: Overview & General Triage

NinjaOne Backup — Error 303: NAS Path Not Configured on Device

NinjaOne Backup — Error 360: Cloud Communication Error

NinjaOne Backup — Error 13: Access Denied (NTFS Permissions)

NinjaOne Backup — Error 315: NAS Authentication Failed

NinjaOne Backup — File & Folder Restore: Complete Procedure

NinjaOne Backup — Error Code Master Reference

NinjaOne Backup — VSS DLL Re-registration & Writer Repair Procedure

NinjaOne Backup — Error 305: Unable to Access Local Storage

NinjaOne Backup — Error 131: Connection Lost During Backup

NinjaOne Backup — Error 5: EFS-Encrypted File Access Denied

NinjaOne Backup — Error 316: No Host Found for Network Storage

NinjaOne Backup — Image Restore: Bare Metal & Different Hardware Recovery

NinjaOne Backup — Backup Summary Report: Generating & Interpreting

NinjaOne Backup — Lockhart High CPU/Disk Usage & ReFS Interaction

NinjaOne Backup — Error 307: Low Disk Space Preventing VSS Snapshot

NinjaOne Backup — Error 306: Snapshot Deleted While Uploading

NinjaOne Backup — Error 10053 & 10054: Connection Aborted / Reset

NinjaOne Backup — Error 20: Individual File Deleted from Backup Path

NinjaOne Backup — Error 317: Unable to Request Credentials

NinjaOne Backup — Restore Fails: No Data Available & Device Not in Drop-Down

NinjaOne Backup — Error 327: VSS Writer Error (Image Backup)

NinjaOne Backup — Error 308: Unable to Determine Free Space

NinjaOne Backup — Network Allowlist & Firewall Requirements

NinjaOne Backup — Error 313 & 314: File Not Found / Inconsistent File

NinjaOne Backup — Mounting an Image to the Cloud for File-Level Recovery

NinjaOne Backup — Error 121: Windows Semaphore Timeout

NinjaOne Backup — Error 310: Unable to Backup Volume

NinjaOne Backup — Error 318: Network Storage Not Defined

NinjaOne Backup — Error 311: Integrity Check Failed

NinjaOne Backup — Error 312: Backup Repository Root Folder Missing

NinjaOne Backup — Error 344: NAS Storage Low Space (Warning)

NinjaOne Backup — Error 342: NAS Write Error

NinjaOne Backup — Error 150: Backup Database Error

NinjaOne Backup — SMB Credentials Rejected (System Error 86) Despite Correct Password: LmCompatibilityLevel / NTLMv2

Veeam Backup Daily Operations & Verification SOP

Veeam BDR Deployment SOP

Veeam Backup and Replication Standards

Adding & Replacing Computers in Veeam BDR

Veeam Troubleshooting Playbook

MariaDB Crash-Consistent Backup — Missing InnoDB Tablespace Files

Veeam IR Mount Instability During OS-Level Changes

Veeam & BDR Troubleshooting Guide

BDR Storage Alerts & Capacity Issues

Agent & Endpoint Offline

Performance & Slow Backups

BDR Offline & Connectivity

Veeam Console Connection & Permission Errors

TrueNAS Cloud Sync Provisioning SOP

Synology NAS — Google Workspace Backup Configuration SOP

Veeam Troubleshooting Playbook

How to Use This Playbook

Scenario 1: VSS Writer Failures

T1 — Retry

T2 — Investigate

T3 — Rebuild

Scenario 2: Repository Full / Insufficient Disk Space on BDR

T1 — Retry

T2 — Investigate

T3 — Rebuild

Scenario 3: Network Timeout Between Subnets

T1 — Retry