I have a virtualized infrastructure running on Hyper-V with a separate backup/DRP server. The entire stack runs on 19 VMs: two Active Directory domains with domain joining, Linux DNS forwarders (BIND9), RADIUS, monitoring, application services… in short, not something we can afford to just reboot haphazardly.

The project’s objective was simple to state, but complex to achieve:

> Switchover the entire infrastructure to the DRP server with minimal effort, ensuring service continuity (at the very least, not cutting off internet access), then return to production cleanly.

Spoiler: It took me several iterations, a few cold sweats, and a good dose of PowerShell to pull it off. A few liters of coffee, too…


Architecture

HYPERV1 (Hyper-V prod)
├── 2 AD domains with trust
│   ├── Domain1: SRV-PDC1 (PDC + AD-integrated DNS)
│   │             SRV-DC1  (Secondary DC + AD-integrated DNS)
│   └── Domain2: SRV-PDC2 (PDC + AD-integrated DNS + DHCP)
│                 SRV-DC2  (Secondary DC + AD-integrated DNS + DHCP)
├── 2 Linux BIND9 DNS forwarders + NTP
│   ├── SRV-DNS1 (Primary DNS forwarder + NTP)
│   └── SRV-DNS2 (Secondary DNS forwarder + NTP)
├── RADIUS, Proxy, SMTP, SIEM, Monitoring...
└── 3 virtual workstations

HYPERV2 (Hyper-V DRP)
└── Veeam B&R; 12 + NVMe replicas of all VMs

Replication runs 4 times a week with a retention of 2 restore points — sufficient to guarantee a maximum RPO of 24 hours.


The two scenarios

From the outset, I identified two radically different use cases:

FULL DRP — Actual crash Production is down. We switch everything over immediately. The PDCs start up first, followed by the rest. No continuity to manage since everything is already down.

MCO DRP — Planned maintenance Production is still running. We must switch over without service interruption. Here, the startup order becomes critical—we must always have at least one active DC per domain and one active DNS forwarder somewhere on the network.

This distinction necessitated the creation of two distinct failover plans in Veeam, with different startup orders.


The MCO Failover Plan: The Order That Changes Everything

For planned maintenance, the correct sequence is as follows:

The principle relies on two groups that switch over alternately, ensuring that at all times Domain1, Domain2, and the DNS always have at least one active service somewhere on the network.

Group 1 — PDC Domain1 + Secondary DC Domain2 + Primary DNS forwarder: SRV-PDC1 · SRV-DC2 · SRV-DNS1

Group 2 — Secondary DC Domain1 + PDC Domain2 + Secondary DNS forwarder: SRV-DC1 · SRV-PDC2 · SRV-DNS2

┌────────────────────────────────────────────────────────────────────────────────────────┐
│                     HYPERV1 (prod)                     HYPERV2 (DRP)                   │
│                  Domain1  Domain2  DNS              Domain1  Domain2  DNS              │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Start     SRV-PDC1 ✅  SRV-PDC2 ✅  DNS1 ✅             —        —        —          │
│            SRV-DC1  ✅  SRV-DC2  ✅  DNS2 ✅                                          │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Step 1    SRV-PDC1 ⬇️  SRV-PDC2 ✅  DNS1 ⬇️             —        —        —          │
│ Group 1   SRV-DC1  ✅  SRV-DC2  ⬇️  DNS2 ✅                                          │
│ H1 down    ↑DC1+PDC2+DNS2 still UP on HYPERV1 → continuity guaranteed ✅               │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Step 2    SRV-DC1  ✅  SRV-PDC2 ✅  DNS2 ✅     SRV-PDC1 🔄  SRV-DC2 🔄  DNS1 🔄   │
│ Group 1   ↑Group 2 remains UP on HYPERV1            ↑Group 1 starts on HYPERV2     │
│ up H2      ↑MCO-DRP Wave 1 Failover Plan launched                                         │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Step 3    SRV-DC1  ✅  SRV-PDC2 ✅  DNS2 ✅   SRV-PDC1 ✅  SRV-DC2 ✅  DNS1 ✅     │
│ Group 1   ↑Group 2 still UP on HYPERV1          ↑Group 1 confirmed Running H2      │
│ confirmed                                                                               │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Step 4    SRV-DC1  ⬇️  SRV-PDC2 ⬇️  DNS2 ⬇️   SRV-PDC1 ✅  SRV-DC2 ✅  DNS1 ✅     │ 
│ Group 2   ↑Group 2 down on HYPERV1            ↑PDC1+DC2+DNS1 up on HYPERV2       │
│ down H1                                             → continuity guaranteed ✅          │
├────────────────────────────────────────────────────────────────────────────────────────┤
│ Step 5    —                                    SRV-PDC1 ✅  SRV-DC2 ✅  DNS1 ✅      │
│ Group 2                                        SRV-DC1  ✅  SRV-PDC2 ✅  DNS2 ✅     │
│ up H2                                                  ↑All on HYPERV2 ✅           │
└────────────────────────────────────────────────────────────────────────────────────────┘

At each step, Domain1 always has an active DC, Domain2 does too, and the DNS forwarder always responds. No interruption in AD, DNS, or DHCP service throughout the entire failover process.


The Start-DRP.ps1 script

Everything is automated starting with HYPERV2 in PowerShell. The script manages both modes via an interactive menu:

.\Start-DRP.ps1
============================================================
  DRP PROCEDURE - Select failover mode
============================================================

  [1] CRASH  - Production down, immediate startup
               Uses the Failover Plan: FULL DRP

  [2] MCO    - Scheduled maintenance, continuity guaranteed
               Uses the Failover Plan: MCO DRP

  [3] MCO + Skip  - MCO without prior replication
============================================================
Your choice (1/2/3):

After confirmation, the script automatically proceeds:

  1. Manual replication — final sync before failover
  2. Anti-shutdown flag — blocks the post-script that shuts down HYPERV1 after replication
  3. Ordered shutdown of non-critical VMs (waves 5→3)
  4. In MCO mode: Shutdown of Group 1, launch of the plan, wait for Group 1 ping, shutdown of Group 2
  5. In CRASH mode: Shut down everything, launch the plan

Verification of replica startup is done via local Hyper-V (Get-VM SRV-DC1_VeeamReplica) and not over the network — avoids false positives when the same IP is still responding from HYPERV1.


Return to production: Start-FailbackToProd.ps1

This is where things got complicated. Several versions were needed to achieve a clean solution.

Veeam 12 pitfalls

Pitfall 1 — Get-VBRJobSession no longer exists Replaced by Get-VBRSession with a filter on JobName.

Pitfall 2 — Committing to the wrong restore point After a Start-VBRHvReplicaFailback, Veeam creates a new restore point (index 0). If you pass this RP to Stop-VBRHvReplicaFailback to commit, the VM gets stuck in LockedItem. You must commit to index 1 — the RP from before the failback.

# Index 0 = new RP created by the failback → DO NOT commit this
# Index 1 = old Failover RP → this is the one to commit
$rpCommit = Get-VBRRestorePoint |
    Where-Object { $_.IsReplica() -and $_.VmName -eq $vmName } |
    Sort-Object CreationTime -Descending |
    Select-Object -Skip 1 -First 1

Stop-VBRHvReplicaFailback -RestorePoint $rpCommit

Pitfall 3 — The Failover Plan that restarts the replicas As long as the Failover Plan is active, it automatically restarts the replicas after each commit. The solution: use Stop-VBRReplicaFailover individually on each VM before initiating its failback. This shuts down the replica cleanly without affecting the others.

# Individual undo — does not affect other VMs in the plan
Stop-VBRReplicaFailover -RestorePoint $rpFailover

Pitfall 4 — The VHDX is still locked After the commit, Hyper-V has not yet released the VHDX file. An immediate Start-VM fails with "The process cannot access the file". Solution: wait 15 seconds between the commit and starting the VM.

The final sequence for each VM

1. Stop-VBRReplicaFailover (index 0)
   → Cleanly shuts down the replica on HYPERV2

2. Start-VBRHvReplicaFailback (WITHOUT RunAsync = blocking)
   → Complete resync to HYPERV1
   → Veeam creates a new RP

3. Stop-VBRHvReplicaFailback (index 1)
   → Commit

4. Start-Sleep 15
   → VHDX release

5. Start-VM on HYPERV1
   → Wave delay → Next VM

Rollback order for AD/DNS continuity

SRV-PDC1  (PDC Domain1 + AD DNS)          120s
SRV-PDC2  (PDC Domain2 + AD DNS + DHCP)    90s
SRV-DNS1  (DNS forwarder + NTP)            60s
SRV-DC1   (DC2 Domain1 + DNS AD)           60s
SRV-DC2   (DC2 Domain2 + DNS AD + DHCP)    60s
SRV-DNS2  (DNS forwarder + NTP)            30s
... application services ...
... workstations ...

The logic is identical to the DRP MCO: at any given moment, at least one DC per domain and one DNS server are active somewhere on the network.


Bugs that cost me time

Hyper-V module not loaded in script context

Get-VM worked perfectly in an interactive session but returned null in the script. Cause: the Hyper-V module is not automatically loaded in a non-interactive PowerShell context. Fix:

Import-Module Hyper-V -ErrorAction Stop -WarningAction SilentlyContinue

The .State property returned as a PSObject via WinRM

When querying a VM’s state via Invoke-Command, .State returns a PSObject rather than a string. The comparison $state -eq "Off" fails silently. Fix:

(Get-VM -Name $name).State.ToString()

Get-VBRSession requires a mandatory parameter

Contrary to what the documentation suggests, Get-VBRSession without parameters opens an interactive prompt. You must pass -ErrorAction SilentlyContinue to prevent a hang in an automated script.

The MCO deadlock

Initial implementation: the script waited for the replicas to be Running on HYPERV2 before shutting them down on HYPERV1. Problem: Veeam does not start a replica while the source VM is running. Result: infinite deadlock. The solution was to shut them down before launching the Failover Plan, then waiting for them to start.


Results

After a day of testing and iterations, both scripts are working in production:

Scenario Total time AD/DNS downtime
MCO DRP (failover) ~8 min 0 seconds
MCO failback ~25 min 0 seconds
FULL DRP (crash) ~3 min N/A

The failback time is longer because each VM is processed sequentially with its own resync—that’s the price of service continuity.


What I Would Have Done Differently

Test on an isolated VM before automating everything. Each Veeam pitfall (commit index 1, RunAsync vs. blocking, Stop-VBRReplicaFailover) could have been discovered on a single VM before integrating it into the full script.

Document Veeam PowerShell commands as you go. The official documentation is incomplete on certain points (behavior of Start-VBRHvReplicaFailback with -RunAsync, RP management after failback). The Veeam forums are more reliable.


The Scripts

Both scripts are designed to run from HYPERV2 (the DRP server). They include a versioned changelog, a -WhatIf mode for simulation without execution, and complete timestamped logs in C:\Scripts\DRP\Logs\.

Start-DRP.ps1

Switches from HYPERV1 (prod) to HYPERV2 (DRP). Interactive menu on launch.

# Usage
.\Start-DRP.ps1                           # Interactive menu
.\Start-DRP.ps1 -Mode MCO                 # Direct MCO
.\Start-DRP.ps1 -Mode CRASH               # Direct crash
.\Start-DRP.ps1 -Mode MCO -SkipReplication # MCO without replication
# =============================================================================
# Start-DRP.ps1
# Complete DRP failover script from GIEDI PRIME
#
# Two failover modes:
#
#   CRASH MODE (default) :
#     Use when production is down or inaccessible.
#     Starts all VMs on HYPERV2 via the "FULL-DRP" Failover Plan
#     without worrying about service continuity.
#     Steps:
#       1. Veeam Replication
#       2. Shut down all VMs on HYPERV1
#       3.Launch "FULL-DRP" Failover Plan
#
#   MCO MODE (scheduled maintenance):
#     Use for scheduled maintenance with service continuity.
#     Ensures that one DC per domain and one DNS server remain up during the switchover.
#     Steps:
#       1. Veeam replication
#       2. Shut down waves 5/4/3 on HYPERV1
#       3. Launch the "MCO-DRP" failover plan
#       4. Wait for Group 1 to be running on HYPERV2 (SRV-DC1+SRV-DC2+SRV-DNS1)
#          then shut down Group 1 on HYPERV1
#       5. Wait for Group 2 to be running on HYPERV2 (SRV-PDC1+SRV-PDC2+SRV-DNS2)
#          then shut down Group 2 on HYPERV1
#
# Prerequisites:
#   - WinRM enabled on HYPERV1
#   - Admin rights on HYPERV1
#   - Veeam Backup & Replication console installed on GIEDI PRIME
#   - Script to be run in PowerShell Administrator on GIEDI PRIME
#   - Failover Plans "FULL-DRP" and "MCO-DRP" created in Veeam
#
# Usage:
#   .\Start-DRP.ps1                          -> CRASH mode + replication
#   .\Start-DRP.ps1 -SkipReplication         -> CRASH mode without replication
#   .\Start-DRP.ps1 -Mode MCO                -> MCO mode + replication
#   .\Start-DRP.ps1 -Mode MCO -SkipReplication -> MCO mode without replication
#
# =============================================================================
# WARNING  SCRIPT MAINTENANCE - READ BEFORE MAKING ANY CHANGES
# =============================================================================
#
# Whenever a VM is added or removed from the infrastructure:
#
#   1. Update the "FULL-DRP" and "MCO-DRP" Failover Plans in Veeam
#   2. Update $VMShutdownNonCritical, $MCOGroupe1, and $MCOGroupe2
#   3. Update the CHANGELOG
#
# FULL-DRP sequence reminder:
#   Wave 1: SRV-PDC1 (120s) > SRV-PDC2 (90s) > SRV-DNS1 (60s)
#   Wave 2: SRV-DC1 (60s)   > SRV-DC2 (60s)         > SRV-DNS2 (30s)
#   Wave 3: SRV-RADIUS (45s)  > SRV-PROXY (30s)        > SRV-SMTP (30s)  > SRV-PASSBOLT (30s)
#   Wave 4: SRV-SIEM (45s)    > SRV-MONITORING (45s)         > SRV-WSUS (30s)
#            > SRV-PRINT (30s)    > SRV-PKI (20s)
#   Wave 5: SRV-PXE (20s)  > WS-01 (20s)            > WS-02 (20s)
#            > WS-03 (20s)
#
# MCO-DRP order reminder:
#   Wave 1: SRV-DC1 (90s)   > SRV-DC2 (90s)         > SRV-DNS1 (60s)
#   Wave 2: SRV-PDC1 (120s) > SRV-PDC2 (90s) > SRV-DNS2 (30s)
#   Wave 3: SRV-RADIUS (45s)  > SRV-PROXY (30s)        > SRV-SMTP (30s)  > SRV-PASSBOLT (30s)
#   Wave 4: SRV-SIEM (45s)    > SRV-MONITORING (45s)         > SRV-WSUS (30s)
#            > SRV-PRINT (30s)    > SRV-PKI (20s)
#   Wave 5: SRV-PXE (20s)  > WS-01 (20s)            > WS-02 (20s)
#            > WS-03 (20s)
#
# =============================================================================
# CHANGELOG
# =============================================================================
# 2026-03-25 - v1.0 - Initial version - 18 VMs
# 2026-03-25 - v1.1 - Job name correction: ReplicaVM-HYPERV1_Dayly
# 2026-03-25 - v1.2 - Added DRP flag
# 2026-03-28 - v1.3 - Import-Module instead of Add-PSSnapin
# 2026-03-28 - v1.4 - Fixed job end detection + -SkipReplication
# 2026-03-28 - v1.5 - Fixed Get-VBRSession
# 2026-03-28 - v1.6 - Fixed .ToString() on State via WinRM
# 2026-03-28 - v1.7 - Added verification of critical NETLOGON DCs (incorrect logic)
# 2026-03-28 - v1.8 - Fixed Get-VBRSession -ErrorAction SilentlyContinue
# 2026-03-28 - v1.9 - Refactored DC logic (incorrect order)
# 2026-03-28 - v2.0 - Completely refactored DC/DNS logic in pairs
# 2026-03-28 - v2.1 - Added MCO-DRP mode with guaranteed service continuity
#                     Group 1 (SRV-DC1+SRV-DC2+SRV-DNS1) starts on HYPERV2
#                     then shuts down on HYPERV1 before Group 2
#                     Verification via local Hyper-V (Get-VM _VeeamReplica)
#                     to avoid any name/IP conflicts during failover
# 2026-03-28 - v2.2 - Added interactive menu if -Mode is not specified
#                     Confirmation before launch
# 2026-03-28 - v2.3 - Fixed null state on Get-VM
# 2026-03-28 - v2.4 - Fixed MCO deadlock:
#                     Group 1 shuts down on HYPERV1 BEFORE the Failover Plan
#                     The Failover Plan starts Group 1 on HYPERV2
#                     Then Group 2 shuts down on HYPERV1
# =============================================================================

param(
    [ValidateSet("CRASH","MCO")]
    [string]$Mode = "",
    [switch]$SkipReplication
)

# --- INTERACTIVE MENU ---------------------------------------------------------
# If Mode is not specified in the parameter, display the selection menu

if ($Mode -eq "") {
    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  DRP PROCEDURE - Select failover mode" -ForegroundColor Cyan
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    Write-Host "  [1] CRASH        " -ForegroundColor Red -NoNewline
    Write-Host "- Production down, immediate startup on GIEDI PRIME"
    Write-Host "               Uses the Failover Plan: FULL-DRP"
    Write-Host ""
    Write-Host "  [2] MCO          " -ForegroundColor Yellow -NoNewline
    Write-Host "- Scheduled maintenance, service continuity guaranteed"
    Write-Host "               Uses the Failover Plan: MCO-DRP"
    Write-Host ""
    Write-Host "  [3] MCO + Skip   " -ForegroundColor Yellow -NoNewline
    Write-Host "- MCO without replication (replicas already up to date)"
    Write-Host "               Use the Failover Plan: MCO-DRP"
    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    $choice = Read-Host "Your choice (1/2/3)"

    switch ($choice) {
        "1" {
            $Mode = "CRASH"
            Write-Host ""
            Write-Host "CRASH mode selected." -ForegroundColor Red
        }
        "2" {
            $Mode = "MCO"
            Write-Host ""
            Write-Host "MCO mode selected." -ForegroundColor Yellow
        }
        "3" {
            $Mode = "MCO"
            $SkipReplication = $true
            Write-Host ""
            Write-Host "MCO + SkipReplication mode selected." -ForegroundColor Yellow
        }
        default {
            Write-Host ""
            Write-Host "Invalid choice. Stopping script." -ForegroundColor Red
            exit 1
        }
    }

    # Confirmation before running
    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  CONFIRMATION" -ForegroundColor Cyan
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  Mode         : $Mode" -ForegroundColor White
    Write-Host "  Failover Plan: $(if ($Mode -eq 'MCO') { 'MCO-DRP' } else { 'FULL-DRP' })" -ForegroundColor White
    Write-Host "  Replication  : $(if ($SkipReplication) { 'IGNORED' } else { 'YES' })" -ForegroundColor White
    Write-Host "  Prod Host    : HYPERV1" -ForegroundColor White
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    $confirm = Read-Host "Confirm launch? (Y/N)"
    if ($confirm -notmatch "^[Y]$") {
        Write-Host "Cancelled by user." -ForegroundColor Yellow
        exit 0
    }
}

# --- CONFIGURATION -----------------------------------------------------------
$ScriptVersion        = "2.8"
$VeeamReplicaJobName  = "ReplicaVM-HYPERV1_Daily"
$VeeamFailoverPlan    = if ($Mode -eq "MCO") { "MCO-DRP" } else { "FULL-DRP" }
$ProdHost             = "HYPERV1"
$LogFile              = "C:\Scripts\DRP\Logs\DRP_${Mode}_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
$ShutdownTimeout      = 300   # Max seconds to wait for a VM to shut down (5 min)
$ReplicationTimeout   = 7200  # Max seconds to wait for replication to complete (2h)
$VMReadyTimeout       = 600   # Max seconds to wait for a VM to be Running on HYPERV2 (10 min)
$VeeamModule          = "C:\Program Files\Veeam\Backup and Replication\Console\Veeam.Backup.PowerShell.dll"

# Non-critical VMs - shutdown in both modes (waves 5, 4, 3)
$VMShutdownNonCritical = @(
    # Wave 5 - Workstations
    "WS-03", "WS-02", "WS-01",
    # Wave 4 - Application services
    "SRV-PXE", "SRV-PKI", "SRV-PRINT", "SRV-WSUS", "SRV-MONITORING", "SRV-SIEM",
    # Wave 3 - Network services
    "SRV-PASSBOLT", "SRV-SMTP", "SRV-PROXY", "SRV-RADIUS"
)

# CRASH Mode - complete shutdown of waves 2 and 1 (non-critical)
$VMShutdownCrash = @(
    "SRV-DNS2", "SRV-DC2", "SRV-DC1",
    "SRV-DNS1", "SRV-PDC2", "SRV-PDC1"
)

# MCO Mode - Group 1: Secondary DCs + Primary DNS
# Start in Wave 1 of the MCO-DRP → shut down on HYPERV1 once up on HYPERV2
# (their counterparts SRV-PDC1, SRV-PDC2, SRV-DNS2 remain up on HYPERV1)
$MCOGroup1 = @("SRV-DC1", "SRV-DC2", "SRV-DNS1")

# MCO Mode - Group 2: Primary DCs + Failover DNS
# Start in Wave 2 of the MCO-DRP → Shut down on HYPERV1 once up on HYPERV2
$MCOGroup2 = @("SRV-PDC1", "SRV-PDC2", "SRV-DNS2")

# --- FUNCTIONS ---------------------------------------------------------------

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $line = "[$timestamp] [$Level] $Message"
    Write-Host $line -ForegroundColor $(switch ($Level) {
        "INFO"    { "Cyan" }
        "OK"      { "Green" }
        "WARN"    { "Yellow" }
        "ERROR"   { "Red" }
        default   { "White" }
    })
    Add-Content -Path $LogFile -Value $line
}

function Wait-VMOff {
    param([string]$VMName, [int]$TimeoutSec = $ShutdownTimeout)
    $elapsed = 0
    while ($elapsed -lt $TimeoutSec) {
        $state = Invoke-Command -ComputerName $ProdHost -ScriptBlock {
            param($name)
            (Get-VM -Name $name -ErrorAction SilentlyContinue).State.ToString()
        } -ArgumentList $VMName
        if ($state -eq "Off") { return $true }
        Start-Sleep -Seconds 5
        $elapsed += 5
    }
    return $false
}

function Stop-VMProprement {
    param([string]$VMName)
    $vmState = Invoke-Command -ComputerName $ProdHost -ScriptBlock {
        param($name)
        $v = Get-VM -Name $name -ErrorAction SilentlyContinue
        if ($v) { $v.State.ToString() } else { "NotFound" }
    } -ArgumentList $VMName

    if ($vmState -eq "NotFound") {
        Write-Log "VM '$VMName' not found on $ProdHost, skipped" "WARN"
        return
    }
    if ($vmState -eq "Off") {
        Write-Log "VM '$VMName' already powered off, skipped" "OK"
        return
    }

    Write-Log "Shutting down '$VMName' (state: $vmState)..."
    Invoke-Command -ComputerName $ProdHost -ScriptBlock {
        param($name)
        Stop-VM -Name $name -Force -ErrorAction SilentlyContinue
    } -ArgumentList $VMName

    $isOff = Wait-VMOff -VMName $VMName
    if ($isOff) {
        Write-Log "VM '$VMName' shut down properly" "OK"
    } else {
        Write-Log "VM '$VMName' did not respond, forced power off..." "WARN"
        Invoke-Command -ComputerName $ProdHost -ScriptBlock {
            param($name)
            Stop-VM -Name $name -TurnOff -Force -ErrorAction SilentlyContinue
        } -ArgumentList $VMName
        Start-Sleep -Seconds 10
        $finalState = Invoke-Command -ComputerName $ProdHost -ScriptBlock {
            param($name)
            (Get-VM -Name $name).State.ToString()
        } -ArgumentList $VMName
        if ($finalState -eq "Off") {
            Write-Log "VM '$VMName' forcefully shut down" "OK"
        } else {
            Write-Log "VM '$VMName' could not be shut down! (state: $finalState)" "ERROR"
        }
    }
}

function Wait-VMRunningLocal {
    # Verifies that the VM replica is running on HYPERV2 via local Hyper-V
    # Checks the _VeeamReplica name directly on the local hypervisor
    # No network dependency — avoids any conflict with the source VM on HYPERV1
    param([string]$VMName, [int]$TimeoutSec = $VMReadyTimeout)
    $replicaName = "${VMName}_VeeamReplica"
    $elapsed = 0
    Write-Log "Waiting for '$replicaName' to be running on GIEDI PRIME (local Hyper-V)..." "WARN"
    while ($elapsed -lt $TimeoutSec) {
        $vm = Get-VM -Name $replicaName -ErrorAction SilentlyContinue
        $state = if ($vm) { $vm.State.ToString() } else { "NotFound" }
        if ($state -eq "Running") {
            Write-Log "'$VMName' confirmed as Running on GIEDI PRIME" "OK"
            return $true
        }
        Write-Log "'$VMName' not yet Running (state: $state) — $([math]::Round($elapsed/60,1)) min" "WARN"
        Start-Sleep -Seconds 15
        $elapsed += 15
    }
    Write-Log "'$VMName' not Running after $($TimeoutSec/60) min — continuing anyway" "ERROR"
    return $false
}

function Wait-GroupRunning {
    param([string[]]$VMNames)
    Write-Log "Waiting for all VMs in the group to be Running on GIEDI PRIME..."
    $allReady = $true
    foreach ($vmName in $VMNames) {
        $ready = Wait-VMRunningLocal -VMName $vmName
        if (-not $ready) { $allReady = $false }
    }
    return $allReady
}

# --- INITIALIZATION ----------------------------------------------------------

$logDir = Split-Path $LogFile
if (-not (Test-Path $logDir)) { New-Item -ItemType Directory -Path $logDir -Force | Out-Null }

Write-Log "============================================================"
Write-Log "START OF DRP PROCEDURE"
Write-Log "Version      : $ScriptVersion"
Write-Log "Mode         : $Mode"
Write-Log "Failover Plan: $VeeamFailoverPlan"
Write-Log "Veeam Job    : $VeeamReplicaJobName"
Write-Log "Prod Host    : $ProdHost"
if ($SkipReplication) { Write-Log "Replication  : skipped (-SkipReplication)" "WARN" }
Write-Log "============================================================"

# Load the Hyper-V module (required for Get-VM in the script context)
Write-Log "Loading Hyper-V module..."
try {
    Import-Module Hyper-V -ErrorAction Stop -WarningAction SilentlyContinue
    Write-Log "Hyper-V module loaded" "OK"
} catch {
    Write-Log "Unable to load Hyper-V module: $_" "ERROR"
    exit 1
}

# Load the Veeam module
Write-Log "Loading Veeam PowerShell module..."
try {
    Import-Module $VeeamModule -ErrorAction Stop -WarningAction SilentlyContinue
    Write-Log "Veeam module loaded" "OK"
} catch {
    Write-Log "Unable to load the Veeam module: $_" "ERROR"
    exit 1
}

# Check WinRM on HYPERV1
try {
    Invoke-Command -ComputerName $ProdHost -ScriptBlock { $env:COMPUTERNAME } -ErrorAction Stop | Out-Null
    Write-Log "WinRM connection to $ProdHost OK" "OK"
} catch {
    Write-Log "Unable to connect to $ProdHost via WinRM: $_" "ERROR"
    exit 1
}

# --- STEP 1: REPLICATION ---------------------------------------------------

if ($SkipReplication) {
    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 1: Replication skipped (-SkipReplication)" "WARN"
} else {
    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 1: Launching replication job '$VeeamReplicaJobName'"

    $flagFile = "C:\Scripts\DRP\DRP_MODE.flag"
    try {
        Invoke-Command -ComputerName $ProdHost -ScriptBlock {
            param($f) New-Item -Path $f -ItemType File -Force | Out-Null
        } -ArgumentList $flagFile
        Write-Log "DRP flag created on $ProdHost: $flagFile" "OK"
    } catch {
        Write-Log "Unable to create DRP flag on $ProdHost: $_" "ERROR"
        exit 1
    }

    try {
        $job = Get-VBRJob -Name $VeeamReplicaJobName -ErrorAction Stop
    } catch {
        Write-Log "Replication job not found: $_" "ERROR"
        Invoke-Command -ComputerName $ProdHost -ScriptBlock {
            param($f) Remove-Item $f -Force -ErrorAction SilentlyContinue
        } -ArgumentList $flagFile
        exit 1
    }

    if ($job.IsRunning) {
        Write-Log "Job is already running, waiting for completion..." "WARN"
    } else {
        Start-VBRJob -Job $job | Out-Null
        Write-Log "Replication job starting" "OK"
    }

    Write-Log "Waiting for the job to actually start..."
    Start-Sleep -Seconds 20

    Write-Log "Waiting for replication to finish (timeout $($ReplicationTimeout/60) min)..."
    $elapsed = 0
    $success = $false
    while ($elapsed -lt $ReplicationTimeout) {
        $job = Get-VBRJob -Name $VeeamReplicaJobName
        if (-not $job.IsRunning) {
            $lastSession = Get-VBRSession -ErrorAction SilentlyContinue |
                Where-Object { $_.JobName -eq $VeeamReplicaJobName } |
                Sort-Object CreationTime -Descending |
                Select-Object -First 1
            if ($lastSession -and ($lastSession.Result -eq "Success" -or $lastSession.Result -eq "Warning")) {
                Write-Log "Replication completed successfully (Result: $($lastSession.Result))" "OK"
                $success = $true
                break
            } elseif ($lastSession -and $lastSession.Result -ne "" -and $lastSession.Result -ne "None") {
                Write-Log "Replication completed with an ERROR (Result: $($lastSession.Result))" "ERROR"
                break
            } else {
                Write-Log "Job starting, waiting for result..." "INFO"
                Start-Sleep -Seconds 30
                $elapsed += 30
                continue
            }
        }
        Start-Sleep -Seconds 30
        $elapsed += 30
        Write-Log "Replication in progress... ($([math]::Round($elapsed/60,1)) minutes elapsed)"
    }

    if (-not $success) {
        Write-Log "Replication failed or timed out. Stopping script." "ERROR"
        Write-Log "Run again with -SkipReplication if replicas are up to date." "ERROR"
        exit 1
    }
}

# --- STEP 2: SHUT DOWN NON-CRITICAL VMs ------------------------------------

Write-Log "------------------------------------------------------------"
Write-Log "STEP 2: Shut down non-critical VMs on $ProdHost (waves 5/4/3)"

foreach ($vmName in $VMShutdownNonCritical) {
    Stop-VMProprement -VMName $vmName
}

# --- STEP 3: SHUT DOWN DC/DNS + LAUNCH FAILOVER PLAN --------------------

Write-Log "------------------------------------------------------------"

if ($Mode -eq "CRASH") {
    # CRASH Mode: Shut down everything, then launch the Failover Plan
    Write-Log "STEP 3: CRASH Mode  Shut down DC/DNS on $ProdHost"
    foreach ($vmName in $VMShutdownCrash) {
        Stop-VMProprement -VMName $vmName
    }

    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 4: Launching the Failover Plan '$VeeamFailoverPlan'"
    try {
        $fp = Get-VBRFailoverPlan -Name $VeeamFailoverPlan -ErrorAction Stop
        Start-VBRFailoverPlan -FailoverPlan $fp | Out-Null
        Write-Log "Failover Plan launched  VMs starting on GIEDI PRIME" "OK"
    } catch {
        Write-Log "Error launching the Failover Plan: $_" "ERROR"
        exit 1
    }

} else {
    # MCO mode: sequence in 2 groups with service continuity
    #
    # Sequence:
    #   1. Shut down Group 1 on HYPERV1 (SRV-DC1+SRV-DC2+SRV-DNS1)
    #      SRV-PDC1+SRV-PDC2+SRV-DNS2 still up on HYPERV1 -> AD/DNS continuity
    #   2. Launch the MCO-DRP Failover Plan
    #      -> Wave 1 starts SRV-DC1+SRV-DC2+SRV-DNS1 on HYPERV2
    #   3. Wait for Group 1 to be running on HYPERV2
    #   4. Shut down Group 2 on HYPERV1 (SRV-PDC1+SRV-PDC2+SRV-DNS2)
    #      SRV-DC1+SRV-DC2+SRV-DNS1 up on HYPERV2 -> AD/DNS continuity

    Write-Log "STEP 3: MCO Mode  Shut down Group 1 on $ProdHost"
    Write-Log "Group 1: SRV-DC1 + SRV-DC2 + SRV-DNS1"
    Write-Log "SRV-PDC1 + SRV-PDC2 + SRV-DNS2 remain up on $ProdHost -> AD/DNS continuity" "WARN"

    foreach ($vmName in $MCOGroup1) {
        Stop-VMProprement -VMName $vmName
    }
    Write-Log "Group 1 shut down on $ProdHost" "OK"

    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 4: Launching the '$VeeamFailoverPlan' Failover Plan"
    try {
        $fp = Get-VBRFailoverPlan -Name $VeeamFailoverPlan -ErrorAction Stop
        Start-VBRFailoverPlan -FailoverPlan $fp | Out-Null
        Write-Log "Failover Plan launched  Wave 1 starts Group 1 on GIEDI PRIME" "OK"
    } catch {
        Write-Log "Error launching Failover Plan: $_" "ERROR"
        exit 1
    }

    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 5: Waiting for Group 1 to run on GIEDI PRIME"
    Write-Log "Group 1: SRV-DC1 + SRV-DC2 + SRV-DNS1"
    Wait-GroupRunning -VMNames $MCOGroup1 | Out-Null

    Write-Log "------------------------------------------------------------"
    Write-Log "STEP 6: Shutting down Group 2 on $ProdHost"
    Write-Log "Group 2: SRV-PDC1 + SRV-PDC2 + SRV-DNS2"
    Write-Log "Group 1 up on GIEDI PRIME -> AD/DNS continuity guaranteed" "WARN"

    foreach ($vmName in $MCOGroup2) {
        Stop-VMProprement -VMName $vmName
    }
    Write-Log "Group 2 shut down on $ProdHost" "OK"
}

# --- STEP 5: FINAL VERIFICATION -------------------------------------------

Write-Log "------------------------------------------------------------"
Write-Log "STEP 5: Final verification  all VMs off on $ProdHost"

$allOff = $true
$vmStates = Invoke-Command -ComputerName $ProdHost -ScriptBlock {
    Get-VM | Select-Object Name, @{N="State";E={$_.State.ToString()}}
}

foreach ($vm in $vmStates) {
    if ($vm.State -ne "Off") {
        Write-Log "VM '$($vm.Name)' still in state '$($vm.State)" "ERROR"
        $allOff = $false
    } else {
        Write-Log "VM '$($vm.Name)': Off" "OK"
    }
}

if (-not $allOff) {
    Write-Log "Some VMs not powered off  check manually" "WARN"
} else {
    Write-Log "All VMs are Off on $ProdHost" "OK"
}

# --- END ---------------------------------------------------------------------

Write-Log "============================================================"
Write-Log "DRP PROCEDURE $Mode COMPLETED"
Write-Log "Monitor VM startup in the Veeam console"
Write-Log "Full log: $LogFile"
Write-Log "============================================================"

Start-FailbackToProd.ps1

Failback from HYPERV2 (DRP) to HYPERV1 (prod). Interactive menu upon launch.

# Usage
.\Start-FailbackToProd.ps1          # Interactive menu
.\Start-FailbackToProd.ps1 -WhatIf  # Dry run
# =============================================================================
# Start-FailbackToProd.ps1
# Script for returning to production from GIEDI PRIME
#
# Service continuity principle:
#   VMs are processed in PAIRS to ensure that one DC per domain
#   and one DNS are always up throughout the entire failback process.
#
#   For each VM in order:
#   1. Stop-VBRReplicaFailover   → Individual undo, shuts down the replica on HYPERV2
#                                  without affecting other VMs in the Failover Plan
#   2. Start-VBRHvReplicaFailback (blocking) → resync to HYPERV1
#   3. Stop-VBRHvReplicaFailback (index 1)   → commit
#   4. Wait 15s for VHDX release
#   5. Start-VM on HYPERV1
#   6. Wave delay → Next VM
#
#   Example Domain1 (without interruption):
#   SRV-PDC1  undo → failback → commit → start HYPERV1 (120s)
#                SRV-DC1 still failover to HYPERV2 → D1 covered ✅
#   SRV-DC1   undo → failback → commit → start HYPERV1 (60s)
#                SRV-PDC1 up on HYPERV1 → D1 covered ✅
#
# Prerequisites:
#   - WinRM enabled on HYPERV1
#   - Admin rights on HYPERV1
#   - Veeam Backup & Replication console installed on GIEDI PRIME
#   - Script to be run in PowerShell Administrator on GIEDI PRIME
#   - DRP VMs must be in Failover state in Veeam
#
# Usage:
#   .\Start-FailbackToProd.ps1          -> actual execution (interactive menu)
#   .\Start-FailbackToProd.ps1 -WhatIf  -> dry run (simulation without action)
#
# =============================================================================
# WARNING  SCRIPT MAINTENANCE - READ BEFORE MAKING ANY CHANGES
# =============================================================================
#
# Whenever a VM is added or removed from the infrastructure:
#
#   1. Update $VMStartOrder below
#      Follow the order in PAIRS for DC/DNS continuity:
#      - Domain X PDC first, then Domain X secondary DC
#      - Primary DNS first, then secondary DNS
#
#   2. Update the CHANGELOG
#
# Reminder of the order (pairs for service continuity):
#   Wave 1: SRV-PDC1 (120s) > SRV-PDC2 (90s) > SRV-DNS1 (60s)
#   Wave 2: SRV-DC1 (60s)   > SRV-DC2 (60s)         > SRV-DNS2 (30s)
#   Wave 3: SRV-RADIUS (45s)  > SRV-PROXY (30s)        > SRV-SMTP (30s) > SRV-PASSBOLT (30s)
#   Wave 4: SRV-SIEM (45s)    > SRV-MONITORING (45s)         > SRV-WSUS (30s)
#            > SRV-PRINT (30s)    > SRV-PKI (20s)
#   Wave 5: SRV-PXE (20s)  > WS-01 (20s)            > WS-02 (20s) > WS-03 (20s)
#
# =============================================================================
# CHANGELOG
# =============================================================================
# 2026-03-28 - v1.0 - Initial release
# 2026-03-28 - v1.1 - Complete refactoring: logic VM by VM
# 2026-03-28 - v1.2 - Added DC verification before commit
# 2026-04-03 - v1.3 - Added SRV-PASSBOLT (Passbolt) in Wave 3 after SRV-SMTP (30s)
# 2026-04-03 - v1.4 - Removed RunAsync, commit index 1, global Undo, 15s delay
# 2026-04-03 - v1.5 - Fixed active plan detection via restore points
# 2026-04-03 - v1.6 - Simplified Undo logic
# 2026-04-03 - v1.7 - Added interactive MCO/CRASH menu
# 2026-04-03 - v1.8 - Order by pairs, removed global Undo
# 2026-04-03 - v1.9 - Individual Stop-VBRReplicaFailover before each failback
# 2026-04-03 - v2.0 - Final order: SRV-PDC1 > SRV-PDC2 > SRV-DNS1 > SRV-DC1
#                     > SRV-DC2 > SRV-DNS2 > services > workstations Replaced global Undo with individual Stop-VBRReplicaFailover
#                     before each failback
#                     Ensures that the Failover Plan does not restart the replica
#                     after commit. Zero service interruption.
# =============================================================================

param(
    [ValidateSet("MCO","CRASH")]
    [string]$Mode = "",
    [switch]$WhatIf
)

# --- INTERACTIVE MENU ---------------------------------------------------------
if ($Mode -eq "") {
    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  FAILBACK PROCEDURE TO PRODUCTION" -ForegroundColor Cyan
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    Write-Host "  [1] MCO FAILBACK    " -ForegroundColor Yellow -NoNewline
    Write-Host "- Return after scheduled maintenance"
    Write-Host "               Failover Plan: MCO-DRP"
    Write-Host ""
    Write-Host "  [2] FAILBACK CRASH  " -ForegroundColor Red -NoNewline
    Write-Host "- Return after disaster"
    Write-Host "               Failover Plan: FULL-DRP"
    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    $choice = Read-Host "Your choice (1/2)"

    switch ($choice) {
        "1" { $Mode = "MCO";   Write-Host "`nMCO FAILBACK mode selected."   -ForegroundColor Yellow }
        "2" { $Mode = "CRASH"; Write-Host "`nFailback mode CRASH selected." -ForegroundColor Red }
        default {
            Write-Host "`nInvalid choice. Script terminated." -ForegroundColor Red
            exit 1
        }
    }

    Write-Host ""
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  CONFIRMATION" -ForegroundColor Cyan
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host "  Mode         : FAILBACK $Mode" -ForegroundColor White
    Write-Host "  Prod host    : HYPERV1" -ForegroundColor White
    Write-Host "============================================================" -ForegroundColor Cyan
    Write-Host ""
    $confirm = Read-Host "Confirm launch? (Y/N)"
    if ($confirm -notmatch "^[Yo]$") {
        Write-Host "Cancelled by user." -ForegroundColor Yellow
        exit 0
    }
}

# --- CONFIGURATION -----------------------------------------------------------
$ScriptVersion   = "2.0"
$ProdHost        = "HYPERV1"
$LogFile         = "C:\Scripts\DRP\Logs\FAILBACK_${Mode}_$(Get-Date -Format 'yyyyMMdd_HHmmss').log"
$VeeamModule     = "C:\Program Files\Veeam\Backup and Replication\Console\Veeam.Backup.PowerShell.dll"

# Production startup order
$VMStartOrder = @(
    # Wave 1 - Primary DCs + Primary DNS
    @{ Name = "SRV-PDC1";     Delay = 120 },
    @{ Name = "SRV-PDC2"; Delay = 90  },
    @{ Name = "SRV-DNS1";         Delay = 60  },
    # Wave 2 - Secondary DCs + Secondary DNS
    @{ Name = "SRV-DC1";      Delay = 60  },
    @{ Name = "SRV-DC2";         Delay = 60  },
    @{ Name = "SRV-DNS2";         Delay = 30  },
    # Wave 3 - Network services
    @{ Name = "SRV-RADIUS";     Delay = 45  },
    @{ Name = "SRV-PROXY";        Delay = 30  },
    @{ Name = "SRV-SMTP";           Delay = 30  },
    @{ Name = "SRV-PASSBOLT";        Delay = 30  },
    # Wave 4 - Application Services
    @{ Name = "SRV-SIEM";       Delay = 45  },
    @{ Name = "SRV-MONITORING";         Delay = 45  },
    @{ Name = "SRV-WSUS";         Delay = 30  },
    @{ Name = "SRV-PRINT";         Delay = 30  },
    @{ Name = "SRV-PKI";      Delay = 20  },
    # Wave 5 - Workstations
    @{ Name = "SRV-PXE";     Delay = 20  },
    @{ Name = "WS-01";            Delay = 20  },
    @{ Name = "WS-02";            Delay = 20  },
    @{ Name = "WS-03";          Delay = 20  }
)

# --- FUNCTIONS ---------------------------------------------------------------

function Write-Log {
    param([string]$Message, [string]$Level = "INFO")
    $timestamp = Get-Date -Format "yyyy-MM-dd HH:mm:ss"
    $prefix = if ($WhatIf) { "[WHATIF] " } else { "" }
    $line = "[$timestamp] [$Level] $prefix$Message"
    Write-Host $line -ForegroundColor $(switch ($Level) {
        "INFO"   { "Cyan" }
        "OK"     { "Green" }
        "WARN"   { "Yellow" }
        "ERROR"  { "Red" }
        default  { "White" }
    })
    Add-Content -Path $LogFile -Value $line
}

function Get-FailoverRestorePoint {
    # Index 0: restore point in Failover state → proceeds to Stop-VBRReplicaFailover
    # then to Start-VBRHvReplicaFailback
    param([string]$VmName)
    return Get-VBRRestorePoint |
        Where-Object { $_.IsReplica() -and $_.VmName -eq $VmName -and $_.State.ToString() -eq "Failover" } |
        Sort-Object CreationTime -Descending |
        Select-Object -First 1
}

function Get-CommitRestorePoint {
    # Index 1: second most recent RP after failback
    # Start-VBRHvReplicaFailback creates a new RP (index 0)
    # We commit index 1 = the old RP to prevent the VM from being locked in LockedItem
    param([string]$VmName)
    return Get-VBRRestorePoint |
        Where-Object { $_.IsReplica() -and $_.VmName -eq $VmName } |
        Sort-Object CreationTime -Descending |
        Select-Object -Skip 1 -First 1
}

# --- INITIALIZATION ----------------------------------------------------------

$logDir = Split-Path $LogFile
if (-not (Test-Path $logDir)) { New-Item -ItemType Directory -Path $logDir -Force | Out-Null }

Write-Log "============================================================"
Write-Log "START OF THE FAILBACK TO PRODUCTION PROCEDURE"
Write-Log "Version      : $ScriptVersion"
Write-Log "Mode         : FAILBACK $Mode"
Write-Log "Prod Host    : $ProdHost"
if ($WhatIf) { Write-Log "DRY RUN MODE - NO ACTUAL ACTION" "WARN" }
Write-Log "============================================================"

# Load the Veeam module
Write-Log "Loading the Veeam PowerShell module..."
try {
    Import-Module $VeeamModule -ErrorAction Stop -WarningAction SilentlyContinue
    Write-Log "Veeam module loaded" "OK"
} catch {
    Write-Log "Unable to load the Veeam module: $_" "ERROR"
    exit 1
}

# Check WinRM on HYPERV1
try {
    Invoke-Command -ComputerName $ProdHost -ScriptBlock { $env:COMPUTERNAME } -ErrorAction Stop | Out-Null
    Write-Log "WinRM connection to $ProdHost OK" "OK"
} catch {
    Write-Log "Unable to connect to $ProdHost via WinRM: $_" "ERROR"
    exit 1
}

# --- PRE-VOL VERIFICATION ----------------------------------------------------

Write-Log "------------------------------------------------------------"
Write-Log "VERIFICATION: Restore points in Failover state"

$missingVMs = @()
foreach ($vm in $VMStartOrder) {
    $rp = Get-FailoverRestorePoint -VmName $vm.Name
    if (-not $rp) {
        Write-Log "WARNING: No failover restore point for '$($vm.Name)'" "WARN"
        $missingVMs += $vm.Name
    } else {
        Write-Log "OK: '$($vm.Name)' → restore point from $($rp.CreationTime)" "OK"
    }
}

if ($missingVMs.Count -gt 0) {
    Write-Log "$($missingVMs.Count) VM(s) without a failover restore point: $($missingVMs -join ', ')" "WARN"
    Write-Log "These VMs will be ignored for failback" "WARN"
}

# --- DRY RUN -----------------------------------------------------------------

if ($WhatIf) {
    Write-Log "------------------------------------------------------------"
    Write-Log "DRY RUN - Simulation of the paired procedure:" "WARN"
    Write-Log "Continuity guaranteed: 1 DC per domain + 1 DNS always up" "WARN"
    Write-Log "------------------------------------------------------------" "WARN"
    foreach ($vm in $VMStartOrder) {
        $rp = Get-FailoverRestorePoint -VmName $vm.Name
        if ($rp) {
            Write-Log "   Stop-VBRReplicaFailover '$($vm.Name)' — HYPERV2 replica shut down" "WARN"
            Write-Log "   Failback '$($vm.Name)' (blocking — resync to HYPERV1)" "WARN"
            Write-Log "   Commit failback '$($vm.Name)' (index 1)" "WARN"
            Write-Log "   Waiting 15s for VHDX release" "WARN"
        } else {
            Write-Log "   '$($vm.Name)' not in Failover — direct startup if present" "WARN"
        }
        Write-Log "   Starting '$($vm.Name)' on $ProdHost — delay $($vm.Delay)s" "WARN"
        Write-Log "  ---" "WARN"
    }
    Write-Log "------------------------------------------------------------"
    Write-Log "DRY RUN COMPLETED - No action taken" "WARN"
    Write-Log "Run again without -WhatIf to execute" "WARN"
    exit 0
}

# --- VM-BY-VM PROCESSING ----------------------------------------------------

Write-Log "------------------------------------------------------------"
Write-Log "STARTING VM-BY-VM FAILBACK (pair-wise order)"
Write-Log "Continuity guaranteed: 1 DC per domain + 1 DNS always up" "OK"

foreach ($vm in $VMStartOrder) {
    $vmName = $vm.Name
    $delay  = $vm.Delay

    Write-Log "============ $vmName ============"

    $rp = Get-FailoverRestorePoint -VmName $vmName

    if ($rp) {
        # STEP A - Individual undo via Stop-VBRReplicaFailover
        # Cleanly shuts down the replica on HYPERV2 without affecting the others
        # Prevents the Failover Plan from restarting the replica after commit
        Write-Log "[$vmName] Individual undo (Stop-VBRReplicaFailover)..."
        try {
            Stop-VBRReplicaFailover -RestorePoint $rp -ErrorAction Stop | Out-Null
            Write-Log "[$vmName] Replica shut down on HYPERV2" "OK"
        } catch {
            Write-Log "[$vmName] Stop-VBRReplicaFailover error: $_" "ERROR"
        }

        # STEP B - Blocking failback to HYPERV1
        Write-Log "[$vmName] Starting failback (blocking  resync to HYPERV1)..."
        $rpFresh = Get-VBRRestorePoint |
            Where-Object { $_.IsReplica() -and $_.VmName -eq $vmName } |
            Sort-Object CreationTime -Descending |
            Select-Object -First 1
        if ($rpFresh) {
            try {
                Start-VBRHvReplicaFailback `
                    -RestorePoint $rpFresh `
                    -QuickRollback `
                    -PowerOn:$false `
                    -ErrorAction Stop | Out-Null
                Write-Log "[$vmName] Failback completed" "OK"
            } catch {
                Write-Log "[$vmName] Failback error: $_" "ERROR"
            }
        } else {
            Write-Log "[$vmName] No restore point available for failback" "ERROR"
        }

        # STEP C - Commit on index 1
        Write-Log "[$vmName] Commit failback (index 1)..."
        $rpCommit = Get-CommitRestorePoint -VmName $vmName
        if ($rpCommit) {
            try {
                Stop-VBRHvReplicaFailback -RestorePoint $rpCommit -ErrorAction Stop | Out-Null
                Write-Log "[$vmName] Commit OK (RP from $($rpCommit.CreationTime))" "OK"
            } catch {
                Write-Log "[$vmName] Commit error: $_" "WARN"
            }
        } else {
            Write-Log "[$vmName] No RP index 1 found" "WARN"
        }

        # STEP D - Waiting for VHDX release
        Write-Log "[$vmName] Waiting 15s for VHDX release..."
        Start-Sleep -Seconds 15

    } else {
        Write-Log "[$vmName] No Failover restore point  VM skipped for failback" "WARN"
    }

    # STEP E - Start the VM on HYPERV1
    $vmState = Invoke-Command -ComputerName $ProdHost -ScriptBlock {
        param($name)
        $v = Get-VM -Name $name -ErrorAction SilentlyContinue
        if ($v) { $v.State.ToString() } else { "NotFound" }
    } -ArgumentList $vmName

    if ($vmState -eq "NotFound") {
        Write-Log "[$vmName] VM not found on $ProdHost" "WARN"
    } elseif ($vmState -eq "Running") {
        Write-Log "[$vmName] VM already running on $ProdHost" "OK"
    } else {
        try {
            Invoke-Command -ComputerName $ProdHost -ScriptBlock {
                param($name)
                Start-VM -Name $name -ErrorAction Stop
            } -ArgumentList $vmName
            Write-Log "[$vmName] VM started on $ProdHost" "OK"
        } catch {
            Write-Log "[$vmName] Startup error on $ProdHost: $_" "ERROR"
        }
    }

    # STEP F - Delay before next VM
    Write-Log "[$vmName] Waiting $delay seconds before the next VM..."
    Start-Sleep -Seconds $delay
}

# --- END ---------------------------------------------------------------------

Write-Log "============================================================"
Write-Log "FAILBACK COMPLETE  PRODUCTION RESTORED ON $ProdHost"
Write-Log "Check the services (AD, DNS, DHCP, auth...)"
Write-Log "Don't forget to re-enable VM autostart on $ProdHost:"
Write-Log "  Get-VM | Set-VM -AutomaticStartAction StartIfRunning"
Write-Log "Full log: $LogFile"
Write-Log "============================================================"