Cyber Intelligence
Cloud Security15 min read

Cloud Incident Response Playbook 2026: Azure Sentinel, Defender XDR, and KQL

Responding to a security incident in the cloud is fundamentally different from on-premises IR. There is no physical access to affected machines, resources spin up and disappear in minutes, and the blast radius of a compromised identity can span an entire tenant in seconds. This playbook walks through the full NIST incident response lifecycle applied to Azure environments, with concrete KQL triage queries for Microsoft Sentinel, Defender XDR containment actions, evidence collection from Azure-native forensics sources, and a post-incident review framework. Whether you are handling a compromised service principal, an insider data exfiltration event, or a mass resource deletion, this guide gives you the exact commands, queries, and decision points to work through each phase systematically.

I
Microsoft Cloud Solution Architect
Incident ResponseMicrosoft SentinelDefender XDRKQLAzure SecurityCloud SecurityThreat Detection

Why Cloud Incident Response Is Different from On-Premises IR

Cloud incident response is not on-premises IR with a VPN. The differences are architectural, operational, and legal, and treating them the same is a reliable way to lose evidence, miss lateral movement, and miscontain an active attacker.

Four structural differences define cloud IR: Shared responsibility changes your forensic surface. In Azure, Microsoft manages the physical infrastructure, the hypervisor layer, and the network fabric. You do not get kernel-level memory dumps of the underlying host. Your forensic evidence lives in logs: Azure Activity Logs, Microsoft Entra sign-in logs, Defender for Cloud alerts, and Sentinel incidents. If those logs were not enabled before the incident, they are gone. Ephemeral resources evaporate evidence. A compromised Azure Container Instance or serverless Function App may be deleted seconds after a malicious payload executes. There is no disk image to collect. Your only forensic record is what was captured in diagnostic logs and network flow logs before the resource terminated. This makes pre-incident log coverage a hard prerequisite, not a nice-to-have. Identities, not endpoints, are the primary attack surface. On-premises attackers pivot by compromising machines. Cloud attackers pivot by stealing tokens, escalating permissions on service principals, and assigning themselves roles across subscriptions. A single compromised Entra ID service principal with Contributor access on a subscription can exfiltrate every storage account, spin up coin miners in every region, and delete every resource group, all without touching a single VM. Containment is API-driven and instantaneous. The good news: you can revoke all refresh tokens for a compromised account with a single PowerShell command, block a storage account from public access in under 30 seconds, and lock down a VM by modifying a Network Security Group rule without any physical access. Cloud containment is faster than on-premises containment, but only if your runbooks are ready before the incident starts.

---

The NIST IR Framework Applied to Azure

NIST SP 800-61 defines six phases for incident response. The table below maps each phase to Azure-specific tooling and actions.

NIST PhaseAzure ToolingKey Actions
PreparationSentinel, Defender for Cloud, Entra IDEnable diagnostic logs, deploy analytics rules, define runbooks
DetectionSentinel incidents, Defender XDR alerts, UEBATriage high-fidelity alerts, correlate identity anomalies
AnalysisKQL in Sentinel, Activity Logs, PurviewScope blast radius, identify compromised identities and resources
ContainmentPowerShell, Azure Portal, NSGsDisable accounts, revoke tokens, isolate VMs, block storage
EradicationEntra ID, Key Vault, Azure PolicyRemove backdoors, rotate secrets, remediate misconfigurations
RecoveryBackup/Restore, Defender for CloudRestore from clean backups, re-enable hardened resources
Post-IncidentSentinel workbooks, IR reviewDocument timeline, update analytics rules, close gaps
---

Full IR Process Flowchart

Loading diagram...

---

Phase 1: Preparation

Preparation is the phase that determines whether every other phase succeeds or fails. By the time an alert fires, it is too late to enable the logs you needed. Diagnostic log coverage checklist:

  • Microsoft Entra ID sign-in logs: routed to Sentinel (retention minimum 90 days)
  • Azure Activity Logs: all subscriptions, all regions, routed to Log Analytics workspace
  • Defender for Cloud: all plans enabled (Servers, Storage, Key Vault, Databases, Containers)
  • Microsoft Defender for Identity: deployed for hybrid identity environments
  • Azure Network Watcher: NSG flow logs enabled on critical subnets
  • Microsoft Purview: audit logging enabled for SharePoint, Exchange, OneDrive
Sentinel analytics rules to enable on day one:
  1. Successful brute force attack on Microsoft Entra ID (built-in)
  2. Sign-in from an IP address matching threat intelligence (built-in)
  3. Rare subscription-level operations by a user (built-in)
  4. MFA disabled for a user (custom rule)
  5. Privileged role assigned outside working hours (custom rule)
Runbook prerequisites: Pre-approved break-glass accounts with documentation, IR team contact list, legal and privacy counsel on retainer, defined escalation thresholds (P1/P2/P3).

---

Phase 2: Detection with Sentinel and Defender XDR

Detection in Azure comes from two primary sources that operate differently and complement each other.

Microsoft Sentinel: SIEM for Correlation

Sentinel ingests logs from across your Azure environment and applies analytics rules to generate incidents. High-fidelity alert categories to prioritize during initial triage:

  • Identity-based: impossible travel, sign-in from anonymous IP, mass account lockout
  • Privilege escalation: role assignment to new accounts, PIM activation outside normal hours
  • Data access anomalies: UEBA-flagged bulk download, unusual storage account access from new IP
  • Resource manipulation: mass resource deletion, unusual deployment activity

Defender XDR: XDR for Endpoint and Cloud App Correlation

Defender XDR (formerly Microsoft 365 Defender) correlates signals across endpoints, identities, email, and cloud apps into a unified incident graph. When Defender XDR and Sentinel are connected through the unified SOC portal, incidents from both platforms merge into a single investigation view.

Key Defender XDR features for cloud IR:

  • Attack story graph: visualizes the full kill chain across affected entities
  • Automatic investigation: AI-driven playbooks that auto-remediate low-complexity threats
  • Threat intelligence overlay: maps observed TTPs to MITRE ATT&CK techniques
  • Advanced hunting: cross-product KQL queries spanning endpoints, identities, and cloud apps

UEBA Anomalies in Sentinel

Microsoft Sentinel's User and Entity Behavior Analytics (UEBA) baseline normal behavior for users and resources, then surfaces anomalies without requiring rule authors to define thresholds manually. Anomalies to treat as high-priority during an investigation:

  • User logged in from a new country never seen in their history
  • Service principal calling an API it has never called before
  • Storage account accessed by a new IP address with high data volume
  • User performing first-time privileged operation

---

Phase 3: Analysis with KQL Triage Queries

Analysis begins the moment an incident is confirmed. The goal is to answer four questions as fast as possible: What happened? When did it start? What was accessed or modified? What identities and resources are in scope?

The following KQL queries are production-ready for Azure Log Analytics workspaces connected to Sentinel.

Query 1: Suspicious Sign-Ins in the Last Hour

SigninLogs
| where RiskLevelDuringSignIn in ('high', 'medium')
| where TimeGenerated > ago(1h)
| project TimeGenerated, UserPrincipalName, AppDisplayName,
          IPAddress, Location, RiskLevelDuringSignIn,
          RiskDetail, ConditionalAccessStatus
| sort by TimeGenerated desc

This query surfaces risky sign-ins flagged by Entra ID Identity Protection. Focus on users with RiskLevelDuringSignIn == 'high' who also have Contributor or Owner roles on production subscriptions.

Query 2: Privileged Role Assignments in the Last 24 Hours

AuditLogs
| where OperationName contains 'Add member to role'
| where TimeGenerated > ago(24h)
| extend TargetUser = tostring(TargetResources[0].userPrincipalName)
| extend RoleName = tostring(TargetResources[0].displayName)
| extend InitiatedBy = tostring(InitiatedBy.user.userPrincipalName)
| project TimeGenerated, OperationName, TargetUser, RoleName,
          InitiatedBy, Result
| sort by TimeGenerated desc

Any role assignment not matching an approved change ticket is a containment trigger. Pay particular attention to Global Administrator, Privileged Role Administrator, and Subscription Owner assignments.

Query 3: Unusual Resource Deletions in the Last Hour

AzureActivity
| where OperationNameValue contains 'delete'
| where ActivityStatusValue == 'Succeeded'
| where TimeGenerated > ago(1h)
| project TimeGenerated, OperationNameValue, ResourceGroup,
          _ResourceId, Caller, CallerIpAddress, SubscriptionId
| sort by TimeGenerated desc

Mass resource deletion is a leading indicator of ransomware or destructive attack in cloud environments. If this query returns more than a handful of results in a short window, escalate to P1 immediately.

Query 4: Data Exfiltration Indicators from Storage

StorageBlobLogs
| where OperationName == 'GetBlob'
| summarize totalBytes = sum(ResponseBodySize) by CallerIpAddress, bin(TimeGenerated, 1h)
| where totalBytes > 1000000000
| sort by totalBytes desc

One gigabyte of blob reads from a single IP in a one-hour window is a strong exfiltration signal. Cross-reference the CallerIpAddress against your approved IP list and threat intelligence feeds.

Query 5: Service Principal Activity Spike

AADServicePrincipalSignInLogs
| where TimeGenerated > ago(6h)
| summarize callCount = count() by ServicePrincipalName, IPAddress, ResourceDisplayName
| where callCount > 100
| sort by callCount desc

Service principals that suddenly spike in API call volume, especially from new IP addresses or calling new resources, are high-confidence indicators of compromised credentials.

---

Phase 4: Containment in Azure

Containment is the most time-critical phase. Every minute of delay is additional lateral movement, additional data access, and additional forensic evidence potentially overwritten.

Step 1: Disable the Compromised Entra ID Account

# Connect to Microsoft Graph
Connect-MgGraph -Scopes 'User.ReadWrite.All'

# Disable the user account $userId = 'compromised.user@contoso.com' Update-MgUser -UserId $userId -AccountEnabled:$false

# Confirm Get-MgUser -UserId $userId | Select-Object DisplayName, AccountEnabled

Step 2: Revoke All Active Sessions and Refresh Tokens

Disabling the account is not enough. Active sessions and refresh tokens remain valid until they expire (up to 90 days for refresh tokens). Revoke them immediately:

# Revoke all refresh tokens for a user (Microsoft Graph PowerShell)
Revoke-MgUserSignInSession -UserId 'compromised.user@contoso.com'

# Legacy AzureAD module equivalent Revoke-AzureADUserAllRefreshToken -ObjectId '<user-object-id>'

Step 3: Isolate a Compromised VM with NSG Lockdown

# Get the existing NSG
$nsg = Get-AzNetworkSecurityGroup -Name 'vm-nsg' -ResourceGroupName 'prod-rg'

# Add deny-all inbound rule with highest priority $nsg | Add-AzNetworkSecurityRuleConfig ` -Name 'IR-DenyAll-Inbound' ` -Priority 100 ` -Protocol '*' ` -SourceAddressPrefix '*' ` -SourcePortRange '*' ` -DestinationAddressPrefix '*' ` -DestinationPortRange '*' ` -Access Deny ` -Direction Inbound | Set-AzNetworkSecurityGroup

# Add deny-all outbound rule $nsg | Add-AzNetworkSecurityRuleConfig ` -Name 'IR-DenyAll-Outbound' ` -Priority 100 ` -Protocol '*' ` -SourceAddressPrefix '*' ` -SourcePortRange '*' ` -DestinationAddressPrefix '*' ` -DestinationPortRange '*' ` -Access Deny ` -Direction Outbound | Set-AzNetworkSecurityGroup

Step 4: Block Storage Account Public Access

# Disable public blob access on a storage account
Set-AzStorageAccount `
  -ResourceGroupName 'prod-rg' `
  -Name 'prodstorage001' `
  -AllowBlobPublicAccess $false

# Optionally disable all public network access Set-AzStorageAccount ` -ResourceGroupName 'prod-rg' ` -Name 'prodstorage001' ` -PublicNetworkAccess Disabled

Step 5: Rotate Key Vault Secrets

# Set a new version of a secret (old versions remain accessible unless explicitly disabled)
$secretValue = ConvertTo-SecureString 'NewSecretValue-$(New-Guid)' -AsPlainText -Force
Set-AzKeyVaultSecret -VaultName 'prod-keyvault' -Name 'DatabasePassword' -SecretValue $secretValue

# Disable all previous versions $secretVersions = Get-AzKeyVaultSecret -VaultName 'prod-keyvault' -Name 'DatabasePassword' -IncludeVersions foreach ($version in $secretVersions | Where-Object { $_.Version -ne $secretVersions[0].Version }) { Update-AzKeyVaultSecret -VaultName 'prod-keyvault' -Name 'DatabasePassword' ` -Version $version.Version -Enable $false }

---

Phase 5: Evidence Collection (Azure-Native Forensics)

Collect evidence immediately after containment, before automated retention policies or resource cleanup can remove it.

1. Export the Sentinel Incident Timeline

In the Sentinel portal, open the incident and select "Export incident" to download a JSON file containing all associated alerts, entities, bookmarks, and investigation graph data. This is your primary chain-of-custody document.

For programmatic export:

# Export incident details via Azure CLI
az sentinel incident show   --resource-group sentinel-rg   --workspace-name sentinel-workspace   --incident-id '<incident-guid>'   --output json > incident-export.json

2. Azure Activity Logs (90-Day Retention)

Azure Activity Logs retain data for 90 days by default. Export the relevant time window before it ages out or before someone inadvertently deletes the resource:

AzureActivity
| where TimeGenerated between (datetime('2026-06-01') .. datetime('2026-06-05'))
| where Caller == 'compromised.user@contoso.com'
| project TimeGenerated, OperationNameValue, ActivityStatusValue,
          ResourceGroup, _ResourceId, CallerIpAddress
| sort by TimeGenerated asc

Export the results as CSV from Sentinel or via the Log Analytics API for legal hold.

3. Defender for Cloud Security Alerts Export

Navigate to Defender for Cloud, filter alerts by the incident timeframe, and export to CSV. Alternatively, use the Defender for Cloud REST API to pull alerts programmatically and store them in a secure, immutable storage account (configure blob immutability policies before an incident).

4. Microsoft Purview Audit Logs for Data Access

For incidents involving potential data exfiltration from Microsoft 365 services (SharePoint, OneDrive, Exchange), Purview audit logs capture file access, download, and sharing events:

# Search Purview audit logs
Search-UnifiedAuditLog `
  -StartDate '2026-06-01' `
  -EndDate '2026-06-05' `
  -UserIds 'compromised.user@contoso.com' `
  -Operations FileAccessed, FileDownloaded, AnonymousLinkCreated `
  -ResultSize 5000

Purview audit log retention depends on your Microsoft 365 license. E5 licenses retain audit logs for one year; the Advanced Audit add-on extends this to ten years for specific record types.

Evidence Chain of Custody Checklist

Evidence SourceRetentionExport MethodStored To
Sentinel incident JSONManual exportPortal / CLIImmutable blob storage
Azure Activity Logs90 daysKQL + exportLegal hold storage account
Entra ID sign-in logs30 days (basic), 90 days+ (with Sentinel)KQL + exportLegal hold storage account
Defender for Cloud alerts90 daysREST API / Portal exportSIEM archive
Purview audit logs90 days to 10 yearsPowerShell / Compliance portalLegal hold
NSG flow logsConfigurable (7 days default)Network WatcherStorage account
---

Phase 6: Eradication

Eradication removes the attacker's foothold. Until this phase is complete, recovery resources risk immediate re-compromise. Eradication checklist for Azure environments:

  1. Remove all backdoor accounts created during the compromise (query AuditLogs for accounts created in the incident window)
  2. Audit and remove unauthorized Entra ID application registrations and OAuth app permissions
  3. Rotate all service principal client secrets and certificates for affected applications
  4. Review and remove unauthorized federated identity credentials
  5. Remove any unauthorized Azure Policy exemptions or RBAC role assignments created during the incident
  6. Scan all container images deployed from the incident window using Defender for Containers
  7. Review GitHub Actions or Azure DevOps pipeline secrets that may have been exposed
// Find accounts created during the incident window
AuditLogs
| where OperationName == 'Add user'
| where TimeGenerated between (datetime('2026-06-01') .. datetime('2026-06-05'))
| extend NewUser = tostring(TargetResources[0].userPrincipalName)
| extend CreatedBy = tostring(InitiatedBy.user.userPrincipalName)
| project TimeGenerated, NewUser, CreatedBy, Result

---

Phase 7: Recovery and Hardening

Recovery is not just restoring what was damaged. It is restoring to a hardened state that closes the vector the attacker used. Recovery steps:

  1. Restore affected resources from the most recent clean backup (Azure Backup, Azure Site Recovery)
  2. Verify restored resources against a known-good configuration baseline (Azure Policy compliance scan)
  3. Re-enable accounts and services only after eradication is confirmed
  4. Apply Conditional Access policies to enforce MFA for all privileged operations
  5. Deploy Privileged Identity Management (PIM) for just-in-time access to sensitive roles
  6. Enable Defender for Cloud enhanced security features that were previously inactive
Hardening actions triggered by common incident types:
Incident TypeHardening Action
Compromised Entra ID accountEnforce MFA, enable Entra ID Protection risk policies, deploy SSPR
Compromised service principalMigrate to managed identities where possible, enforce short-lived credentials
Storage exfiltrationEnable Defender for Storage, enforce private endpoints, audit SAS token usage
Resource deletion attackEnable Azure Resource Locks on critical resources, enforce delete locks via Policy
Key Vault compromiseEnable soft-delete and purge protection, restrict access to private endpoints
---

Post-Incident: 5 Questions Every IR Review Must Answer

Every closed incident generates a post-incident report. That report must answer five questions honestly. If the answers are not documented, the same incident will happen again. 1. How did the attacker gain initial access? Trace the earliest indicator of compromise in your logs. Was it a phishing email that stole a session cookie? A leaked service principal credential in a public GitHub repository? A misconfigured storage account with public access? The initial access vector determines which preventive controls to prioritize. 2. What did we miss and why? Review every alert that fired in the 72 hours before the incident was detected. Were there low-fidelity signals that, in hindsight, indicated the compromise? If so, update your detection rules and alert thresholds. If no alerts fired at all, identify the log coverage gap. 3. How long did the attacker have access before containment? Calculate the dwell time: the gap between the earliest confirmed compromise indicator and the moment containment actions were completed. Industry average cloud dwell time is measured in days to weeks. If yours is longer, your detection and response process needs acceleration. 4. What data or resources were affected? Produce a definitive scope statement: which user accounts, service principals, subscriptions, resource groups, storage accounts, and data sets were accessed, modified, or deleted. This drives breach notification decisions and regulatory reporting timelines. 5. What would have stopped this attack at each phase? For each NIST phase, identify what control, if it had been in place, would have prevented progression to the next phase. This exercise produces a prioritized remediation roadmap rather than a vague "improve security posture" recommendation.

---

Frequently Asked Questions

How long should cloud incident response take?

Cloud incident response timelines vary by severity. A P1 incident involving active exfiltration or data destruction should reach containment within 1 to 4 hours of initial detection. A P2 incident, such as a compromised non-privileged account, should be contained within 24 hours. Total resolution including eradication and recovery typically takes 3 to 10 business days depending on the scope of damage and the maturity of your recovery processes.

What is the difference between Microsoft Sentinel and Defender XDR for incident response?

Microsoft Sentinel is a cloud-native SIEM that ingests logs from any source, applies analytics rules, and generates incidents from correlated signals. Defender XDR is an extended detection and response platform that correlates threats specifically across Microsoft security products: Defender for Endpoint, Defender for Identity, Defender for Office 365, and Defender for Cloud Apps. In the unified SOC portal, both platforms feed incidents into the same investigation interface. Use Sentinel for cross-environment log analysis and KQL hunting queries, and use Defender XDR for automated investigation, the attack story graph, and endpoint-level response actions.

What logs should be collected first during a cloud incident?

Collect logs in this priority order: first, Microsoft Entra ID sign-in logs and audit logs to identify the compromised identity and the scope of access; second, Azure Activity Logs to identify what resources were created, modified, or deleted; third, Sentinel incident data including all associated alerts and entities; fourth, Purview audit logs if Microsoft 365 data access is suspected; fifth, NSG flow logs and storage blob logs if network exfiltration is suspected. Prioritize sources with the shortest retention windows.

How do you contain a compromised Azure service principal?

Containing a compromised service principal requires four actions in this order: disable the service principal in Entra ID to prevent new authentications; rotate or delete all client secrets and certificates associated with the principal; revoke any outstanding OAuth tokens by removing and re-adding the application; and audit all resources the principal had access to via its role assignments. After containment, evaluate whether the service principal can be replaced with a managed identity, which eliminates the credential rotation problem entirely.

What is the NIST incident response framework?

The NIST Computer Security Incident Handling Guide (SP 800-61) defines a cyclical incident response process with six phases: Preparation, Detection and Analysis, Containment, Eradication, Recovery, and Post-Incident Activity. Preparation establishes the tools, processes, and authority structures needed before an incident occurs. Detection and Analysis identifies and scopes the incident. Containment stops the bleeding. Eradication removes the attacker's access. Recovery restores normal operations. Post-Incident Activity documents lessons learned and improves defenses. The framework is vendor-agnostic and applies equally to cloud environments when mapped to cloud-native tooling.

N

Recommended tool: Nordpass

Up to 40% commission

Get weekly security insights

Cloud security, zero trust, and identity guides — straight to your inbox.

I

Microsoft Cloud Solution Architect

Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.

Share this article

Questions & Answers

Related Articles

Need Help with Your Security?

Our team of security experts can help you implement the strategies discussed in this article.

Contact Us