NHI (Non-Human Identity) Governance: Beyond the Basics
A leaked service principal secret with Contributor access caused a lateral movement chain that took three days to contain. Most organizations have more non-human identities than users, but fewer than 10% have a credential rotation policy that actually runs. This guide covers credential lifecycle automation, ownership attribution, workload identity federation, and the KQL queries that surface your riskiest NHIs before they become incidents.
The Service Principal Secret That Stayed Valid for 847 Days
A financial services firm ran a post-incident review after an attacker moved laterally from a compromised CI/CD pipeline into production Azure resources. The entry point: a service principal with Contributor role on the production subscription. The credential had been created during the initial cloud migration, assigned to a Jenkins pipeline that was decommissioned 18 months earlier, and never rotated. The secret was still valid. No one owned it. No alert fired when it was used from an IP address in a country where the company has no operations.
This is not an edge case. Microsoft's own telemetry shows that the median enterprise Entra ID tenant contains more application registrations and service principals than human user accounts, and the majority of those NHIs have credentials that exceed recommended rotation intervals. The foundational NHI security guide covers the threat model. This article is the implementation playbook: how to build the automation, policies, and detection queries that turn NHI governance from a spreadsheet exercise into an enforceable program.
The NHI Lifecycle: Four Phases, Four Failure Points
Every non-human identity passes through four phases, and each phase has a characteristic failure mode:
| Phase | What Happens | Common Failure |
|---|---|---|
| Creation | App registration created, credentials issued, RBAC assigned | No owner tagged, overprivileged role, client secret instead of certificate or federation |
| Operation | NHI authenticates to APIs and resources | No monitoring of sign-in anomalies, no conditional access policy applied |
| Rotation | Credentials renewed before expiry | Manual process, no automation, rotation window missed |
| Decommission | NHI disabled/deleted when no longer needed | No trigger for removal, orphaned identity persists indefinitely |
Ownership Attribution at Scale
The single highest-impact governance control for NHIs is ownership attribution. Every service principal and app registration must have an identifiable human owner: the person accountable for its lifecycle, its privilege level, and its continued business justification.
Tagging Strategy
Entra ID app registrations support notes and tags fields. Use a structured tagging convention:
# Tag an existing app registration with ownership metadata
az ad app update --id <app-id> \
--set "notes=owner:john.doe@contoso.com;team:platform-engineering;cost-center:CC-4420;created:2026-01-15;review-date:2026-07-15"# Query all app registrations missing an owner tag
az ad app list --all --query "[?!contains(notes, 'owner:')].{AppId:appId, DisplayName:displayName, Created:createdDateTime}" -o table
The notes field is a free-text string, so enforce the schema through automation rather than documentation. A nightly Azure Function can scan all app registrations, parse the notes field, and flag any that are missing required tags or have an owner whose Entra ID account is disabled (meaning the owner has left the organization).
Automated Discovery and Reconciliation
For organizations with hundreds or thousands of NHIs, manual ownership assignment is not feasible retroactively. Use this approach:
- Export all app registrations and service principals with their
createdDateTime,appOwnerOrganizationId, and any existingownerscollection from Graph API - Cross-reference against Azure DevOps / GitHub commit history to identify which CI/CD pipelines use each service principal
- Map pipelines to teams using repository ownership in your DevOps platform
- Assign ownership to the team lead and create ServiceNow CIs (configuration items) for each NHI
- Set a 90-day deadline: any NHI without a confirmed owner after 90 days gets disabled
This one-time reconciliation is painful but necessary. Once ownership is established, enforce it at creation time through an approval workflow that requires an owner before credentials are issued.
Entra Workload ID Premium: What It Adds
Entra Workload ID Premium (licensed per workload identity) extends identity protection and conditional access to non-human identities. The free tier gives you basic service principal authentication. Premium adds three capabilities that matter for governance: Identity Protection for workload identities. Entra ID Identity Protection applies risk-based detection to service principal sign-ins, flagging anomalous authentication patterns such as sign-ins from unusual locations, unfamiliar client applications using the credential, or credential usage patterns that deviate from the baseline. Without Premium, service principal sign-ins generate audit log entries but receive no risk scoring. Conditional access for workload identities. This is the control that closes the gap between human and non-human identity governance. You can create CA policies that target service principals and managed identities, restricting them to specific IP ranges, requiring specific authentication contexts, or blocking access entirely when risk is elevated. Access reviews for workload identities. Extends the access review framework to cover app role assignments and service principal RBAC assignments with the same review/approve/deny workflow used for human accounts.
Conditional Access for Workload Identities
Workload identity CA policies differ from user CA policies in important ways:
| Aspect | User CA Policy | Workload Identity CA Policy |
|---|---|---|
| Supported conditions | Location, device compliance, risk level, app, user group | Location (named locations / IP ranges), service principal risk level |
| Grant controls | MFA, compliant device, terms of use, app protection | Block or Allow only (no MFA for service principals) |
| Session controls | Sign-in frequency, persistent browser, CASB | Not supported |
| Scope | Users and groups | Service principals (by specific ID or "all workload identities") |
| License requirement | Entra ID P1 | Entra Workload ID Premium |
# Create a conditional access policy for workload identities
# Restrict service principals to named locations only
az rest --method POST \
--uri "https://graph.microsoft.com/v1.0/identity/conditionalAccess/policies" \
--body '{
"displayName": "Restrict workload identities to trusted locations",
"state": "enabled",
"conditions": {
"clientApplications": {
"includeServicePrincipals": ["All"],
"excludeServicePrincipals": ["<break-glass-sp-id>"]
},
"locations": {
"includeLocations": ["All"],
"excludeLocations": ["<named-location-id-corporate>", "<named-location-id-azure-datacenter>"]
}
},
"grantControls": {
"operator": "OR",
"builtInControls": ["block"]
}
}'Exclude your break-glass service principal from this policy, just as you exclude break-glass user accounts from user CA policies. Document the break-glass SP separately and monitor its usage with a dedicated alert.
Secret Rotation Pipelines: The Azure Key Vault Pattern
For service principals that still require client secrets (because workload identity federation is not yet available for their use case), automate rotation using Azure Key Vault, Event Grid, and Azure Functions.
The pattern works as follows:
- Store the service principal secret in Azure Key Vault with an expiration date
- Key Vault fires a
SecretNearExpiryevent via Event Grid 30 days before expiration - An Azure Function receives the event, generates a new credential via Graph API, stores it in Key Vault, and removes the old credential
- The consuming application reads the secret from Key Vault at runtime (never from config files or environment variables)
# Event Grid subscription for Key Vault secret near-expiry events
resource "azurerm_eventgrid_system_topic" "keyvault" {
name = "keyvault-events"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
source_arm_resource_id = azurerm_key_vault.main.id
topic_type = "Microsoft.KeyVault.vaults"
}resource "azurerm_eventgrid_system_topic_event_subscription" "secret_rotation" {
name = "secret-near-expiry"
system_topic = azurerm_eventgrid_system_topic.keyvault.name
resource_group_name = azurerm_resource_group.main.name
azure_function_endpoint {
function_id = "${azurerm_linux_function_app.rotation.id}/functions/RotateSecret"
}
included_event_types = [
"Microsoft.KeyVault.SecretNearExpiry"
]
subject_filter {
subject_begins_with = "sp-"
}
}
# Azure Function App for rotation logic
resource "azurerm_linux_function_app" "rotation" {
name = "func-secret-rotation"
resource_group_name = azurerm_resource_group.main.name
location = azurerm_resource_group.main.location
storage_account_name = azurerm_storage_account.func.name
storage_account_access_key = azurerm_storage_account.func.primary_access_key
service_plan_id = azurerm_service_plan.func.id
identity {
type = "SystemAssigned"
}
site_config {
application_stack {
python_version = "3.11"
}
}
}
# Grant the Function App permission to manage Key Vault secrets
resource "azurerm_key_vault_access_policy" "rotation_func" {
key_vault_id = azurerm_key_vault.main.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_linux_function_app.rotation.identity[0].principal_id
secret_permissions = ["Get", "Set", "Delete", "List"]
}
The rotation function itself needs Application.ReadWrite.All on Microsoft Graph to create and remove credentials on the target app registration. Grant this via a managed identity with an app role assignment rather than storing another secret.
Critical detail: the rotation function must add the new credential before removing the old one. There must be a brief overlap period where both credentials are valid. If the function removes the old credential first, any application currently using it will fail authentication until it reads the new secret from Key Vault.
Workload Identity Federation: Eliminating Secrets Entirely
Workload identity federation is the architectural answer to the rotation problem. Instead of issuing a client secret or certificate to an external workload, you configure a trust relationship between the Entra ID app registration and the external identity provider (GitHub Actions, Terraform Cloud, AWS, GCP, Kubernetes). The external workload presents a token from its own identity provider, and Entra ID validates it against the configured trust without any shared secret.
The federated credential deep-dive covers the setup mechanics. Here, the focus is on enforcement at scale: how to make federated credentials the default and block new secret creation.
Azure Policy for Enforcing Federated Credentials
Azure Policy operates at the ARM (Azure Resource Manager) layer, which means it cannot directly govern Entra ID app registrations (those are Microsoft Graph objects, not ARM resources). However, you can enforce federation requirements through complementary controls:
- Custom Entra ID governance rule via Microsoft Graph: use a nightly automation that scans all app registrations created in the last 24 hours and flags any that have
passwordCredentials(client secrets) without an approved exception - Azure DevOps / GitHub pipeline policy: enforce that all new service connections use workload identity federation. In Azure DevOps, the "Workload Identity federation (automatic)" service connection type is now the default
- Conditional access policy: block service principal sign-ins that use client secret authentication from non-approved IP ranges, making secrets operationally painful to use compared to federation
# Find all app registrations with client secrets (passwordCredentials)
az ad app list --all \
--query "[?length(passwordCredentials) > <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">0</code>].{AppId:appId, Name:displayName, SecretCount:length(passwordCredentials), OldestExpiry:min(passwordCredentials[].endDateTime)}" \
-o table# Find app registrations that have secrets but NO federated credentials
az ad app list --all \
--query "[?length(passwordCredentials) > <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">0</code> && length(federatedIdentityCredentials) == <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">0</code>].{AppId:appId, Name:displayName}" \
-o table
The goal state: every app registration uses federated credentials where technically possible, client certificates where federation is not supported, and client secrets only for legacy systems with an approved exception and a mandatory 90-day rotation enforced by the Key Vault pattern above.
Detecting NHI Risk with KQL
The following queries run in Microsoft Sentinel or Log Analytics workspaces connected to Entra ID diagnostic logs. They surface the NHI risks that governance programs are designed to prevent.
Expired or Soon-to-Expire Credentials
This query identifies service principals whose credentials have expired or will expire within 30 days, cross-referenced against recent sign-in activity to distinguish between active and dormant NHIs:
// Find service principals with expiring or expired credentials
let ExpiryWindow = 30d;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(90d)
| summarize LastSignIn = max(TimeGenerated), SignInCount = count() by ServicePrincipalId, ServicePrincipalName, AppId
| join kind=inner (
// Pull credential metadata from the ServicePrincipalInventory or use externaldata
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(1d)
| distinct ServicePrincipalId, ServicePrincipalName, AppId
) on ServicePrincipalId
| extend DaysSinceLastSignIn = datetime_diff('day', now(), LastSignIn)
| where DaysSinceLastSignIn > 60
| project ServicePrincipalName, AppId, LastSignIn, DaysSinceLastSignIn, SignInCount
| order by DaysSinceLastSignIn desc
Anomalous NHI Sign-In Patterns
This query detects service principals authenticating from IP addresses they have never used before, which indicates either credential theft or unauthorized usage:
// Detect service principal sign-ins from new IP addresses
let LookbackPeriod = 30d;
let DetectionWindow = 1d;
let HistoricalIPs = AADServicePrincipalSignInLogs
| where TimeGenerated between (ago(LookbackPeriod) .. ago(DetectionWindow))
| summarize KnownIPs = make_set(IPAddress) by ServicePrincipalId;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(DetectionWindow)
| join kind=leftanti HistoricalIPs on ServicePrincipalId
| union (
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(DetectionWindow)
| join kind=inner HistoricalIPs on ServicePrincipalId
| where not(KnownIPs has IPAddress)
)
| project TimeGenerated, ServicePrincipalName, ServicePrincipalId, IPAddress, ResourceDisplayName, ResultType
| where ResultType == 0 // Successful sign-ins only
| order by TimeGenerated desc
Overprivileged NHIs: High-Privilege Role Assignments
// Find service principals with high-privilege directory role assignments
AuditLogs
| where TimeGenerated > ago(90d)
| where OperationName == "Add member to role"
| extend TargetId = tostring(TargetResources[0].id)
| extend TargetType = tostring(TargetResources[0].type)
| extend RoleName = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)
| where TargetType == "ServicePrincipal"
| where RoleName has_any ("Global Administrator", "Application Administrator", "Cloud Application Administrator", "Privileged Role Administrator")
| project TimeGenerated, TargetId, RoleName, OperationName,
InitiatedBy = tostring(InitiatedBy.user.userPrincipalName)
| order by TimeGenerated descAny service principal with Global Administrator or Application Administrator should be reviewed immediately. These roles allow the NHI to create new credentials on other app registrations, which is the NHI equivalent of privilege escalation.
Comparison: Secrets vs. Certificates vs. Federated Credentials
| Characteristic | Client Secrets | Client Certificates | Federated Credentials |
|---|---|---|---|
| Credential stored in Entra ID | Yes (hashed) | Yes (public key only) | No (trust policy only) |
| Credential stored by consumer | Yes (plaintext secret) | Yes (private key + cert) | No credential to store |
| Rotation required | Yes (max 2 years, recommended 90 days) | Yes (typically 1 year) | No (no credential to rotate) |
| Vulnerable to credential theft | High (secret in config/env vars) | Medium (private key on disk) | Not applicable |
| Supported identity providers | Any OAuth client | Any OAuth client | GitHub Actions, Terraform Cloud, AWS, GCP, Kubernetes, custom OIDC |
| Setup complexity | Low | Medium | Medium (one-time federation config) |
| Operational overhead | High (rotation + monitoring) | Medium (cert renewal) | Low (no lifecycle management) |
| Conditional access support | Yes (with Workload ID Premium) | Yes (with Workload ID Premium) | Yes (with Workload ID Premium) |
| Recommended for | Legacy systems with approved exceptions | Systems that cannot use federation | All new workloads, CI/CD, cloud-to-cloud |
The Decommission Phase Nobody Automates
Decommission is the lifecycle phase most organizations skip entirely, and it is the one that produces the largest attack surface over time. A service principal created for a proof of concept in 2023, never used since Q1 2024, and still holding Contributor on a resource group is exactly the credential an attacker will find and use. It has no active consumers who would notice anomalous behavior, no owner who would flag a suspicious sign-in, and no monitoring that distinguishes its traffic from legitimate automation.
The safe decommission pattern is a three-step sequence: disable, soak, delete.
When the unused-SP KQL query flags an identity with no sign-in activity for 90+ days, do not delete it immediately. Set accountEnabled to false on the service principal. This instantly breaks any authentication attempt using that identity and surfaces any hidden consumer you missed during discovery. If a pipeline or application was silently depending on that identity, you will know within hours.
Hold the identity in disabled state for a defined soak period, typically 30 days. Monitor for any support tickets, pipeline failures, or application errors that reference the disabled identity. If nothing breaks and no owner comes forward during the soak period, proceed with deletion: remove the service principal, delete the app registration, and close the corresponding ServiceNow CI record.
This soak-before-delete pattern is what makes decommission safe enough to automate at scale. Without it, every deletion is a gamble, and teams stop deleting anything. The result is a tenant that accumulates hundreds of orphaned identities over years, each one a potential lateral movement path that no one monitors.
Automate the full sequence: the KQL query feeds a Logic App or Azure Function that disables flagged identities, opens a tracking ticket, waits the soak period, and deletes if no objection is raised. Human intervention is only required when someone objects during the soak window.
Integrating NHI Governance with Broader Security Architecture
NHI governance does not exist in isolation. The permissions and access patterns of non-human identities interact with every other security control in your environment:
Azure AI services: the service principals used by Azure AI Foundry and custom AI workloads often require broad data access for RAG pipelines and model training. Apply the same NHI governance controls: federated credentials for CI/CD deployment, managed identities for runtime access, and conditional access policies restricting authentication to trusted networks.
Infrastructure as Code pipelines: Terraform and Bicep deployments use service principals or managed identities to provision resources. These NHIs often have Owner or Contributor at the subscription level. Use federated credentials for GitHub and Terraform workflows to eliminate stored secrets, and scope RBAC assignments to the narrowest resource group possible.
Monitoring and SIEM integration: ensure that AADServicePrincipalSignInLogs and AuditLogs are flowing to your Microsoft Sentinel workspace. Without these log sources, the KQL queries above return nothing and NHI anomalies are invisible.
Managed identities as the default for Azure-native workloads: for any workload running inside Azure (VMs, App Services, Functions, AKS pods), managed identities should be the default authentication mechanism. A system-assigned managed identity is automatically provisioned and deleted with its parent resource, which eliminates the creation and decommission lifecycle phases entirely. User-assigned managed identities require manual lifecycle management but can be shared across resources. The key governance rule: never create an app registration for a workload that could use a managed identity instead.
NHI Governance Hardening Checklist
- [ ] Inventory all app registrations and service principals with
az ad app list --allandaz ad sp list --all - [ ] Assign a human owner to every NHI and enforce ownership tagging in the
notesfield - [ ] Identify and disable all NHIs with no sign-in activity in the past 90 days
- [ ] Migrate all CI/CD pipelines to workload identity federation (eliminate client secrets)
- [ ] Deploy the Key Vault + Event Grid + Azure Function rotation pipeline for NHIs that still require secrets
- [ ] Set maximum credential lifetime to 90 days for client secrets via organizational policy
- [ ] Enable Entra Workload ID Premium for production service principals
- [ ] Create conditional access policies restricting workload identity sign-ins to trusted IP ranges
- [ ] Remove
Global AdministratorandApplication Administratorroles from all service principals (use scoped roles) - [ ] Deploy KQL queries for expired credentials, anomalous sign-in IPs, and overprivileged NHIs as Sentinel analytics rules
- [ ] Configure access reviews for app role assignments on a quarterly cycle
- [ ] Build a nightly automation that scans for new app registrations with
passwordCredentialsand alerts the security team - [ ] Document the break-glass service principal and monitor its usage with a dedicated alert
- [ ] Integrate NHI lifecycle events with ServiceNow CMDB for change tracking
- [ ] Review and update this checklist quarterly as Entra ID capabilities evolve
Get weekly security insights
Cloud security, zero trust, and identity guides — straight to your inbox.
Microsoft Cloud Solution Architect
Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.
Share this article
Questions & Answers
Related Articles
JWT Complete Guide: Structure, How It Works, and Security
14 min read
Entra ID External Identities Security: Governing B2B Guest Access at Scale
15 min read
Entra ID Workload Identity Federation: Replacing Secrets with Certificates at Scale
18 min read
Need Help with Your Security?
Our team of security experts can help you implement the strategies discussed in this article.
Contact Us