What is the biggest mistake organizations make with non-human identity governance?

Most security programs focus governance effort on the creation phase, approval workflows for new service principals, and assume rotation and decommissioning take care of themselves. They usually don't: many enterprise tenants have more app registrations and service principals than human user accounts, and a lot of those credentials sit unrotated with no assigned owner to notice.

How do you rotate Azure service principal secrets without breaking pipelines?

Store the secret in Azure Key Vault with an expiration date, and trigger rotation ahead of expiry so a new credential is generated via the Graph API before the old one is removed. Adding the new credential first, then removing the old one, is the critical ordering detail: removing the old secret first breaks any application still using it until it picks up the new one.

What is the difference between client secrets, certificates, and workload identity federation for Azure service principals?

Client secrets are plaintext credentials that must be rotated and are vulnerable to theft from config files or environment variables. Certificates store only the public key in Entra ID and are somewhat safer but still require periodic renewal. Workload identity federation eliminates the credential entirely by establishing a trust relationship with an external identity provider like GitHub Actions or Terraform Cloud, so there's nothing to rotate or steal, making it the recommended default for new workloads.

What is the "soak" period when decommissioning an unused service principal?

Instead of deleting a flagged, inactive identity immediately, disable it first and hold it in that disabled state for a defined period. This surfaces any hidden consumer that was silently depending on it, since disabling breaks authentication instantly, while giving a window for an owner or pipeline failure to come forward before the identity is permanently deleted.

Can Conditional Access policies restrict service principals the same way they restrict user sign-ins?

Partially, and only with an Entra Workload ID Premium license. Workload identity Conditional Access policies support location and service principal risk level as conditions and can only block or allow, unlike user policies which also support MFA, compliant device, and session controls. A common workload identity policy restricts service principals to authenticate only from corporate and known cloud provider IP ranges, which prevents a leaked credential from being used from attacker infrastructure.

NHI Governance: Credential Lifecycle & Workload...

The Service Principal Secret That Stayed Valid for 847 Days

A financial services firm ran a post-incident review after an attacker moved laterally from a compromised CI/CD pipeline into production Azure resources. The entry point: a service principal with Contributor role on the production subscription. The credential had been created during the initial cloud migration, assigned to a Jenkins pipeline that was decommissioned 18 months earlier, and never rotated. The secret was still valid. No one owned it. No alert fired when it was used from an IP address in a country where the company has no operations.

This is not an edge case. Microsoft's own telemetry shows that the median enterprise Entra ID tenant contains more application registrations and service principals than human user accounts, and the majority of those NHIs have credentials that exceed recommended rotation intervals. The [foundational NHI security guide](/blog/non-human-identities-nhi-security-guide) covers the threat model. This article is the implementation playbook: how to build the automation, policies, and detection queries that turn NHI governance from a spreadsheet exercise into an enforceable program.

The NHI Lifecycle: Four Phases, Four Failure Points

Every non-human identity passes through four phases, and each phase has a characteristic failure mode:

Phase	What Happens	Common Failure
Creation	App registration created, credentials issued, RBAC assigned	No owner tagged, overprivileged role, client secret instead of certificate or federation
Operation	NHI authenticates to APIs and resources	No monitoring of sign-in anomalies, no conditional access policy applied
Rotation	Credentials renewed before expiry	Manual process, no automation, rotation window missed
Decommission	NHI disabled/deleted when no longer needed	No trigger for removal, orphaned identity persists indefinitely

Most security programs focus on creation (approval workflows) and hope the other three phases take care of themselves. They do not.

Ownership Attribution at Scale

The single highest-impact governance control for NHIs is ownership attribution. Every service principal and app registration must have an identifiable human owner: the person accountable for its lifecycle, its privilege level, and its continued business justification.

Tagging Strategy

Entra ID app registrations support notes and tags fields. Use a structured tagging convention:

# Tag an existing app registration with ownership metadata
az ad app update --id <app-id> \
  --set "notes=owner:john.doe@contoso.com;team:platform-engineering;cost-center:CC-4420;created:2026-01-15;review-date:2026-07-15"

# Query all app registrations missing an owner tag
az ad app list --all --query "[?!contains(notes, 'owner:')].{AppId:appId, DisplayName:displayName, Created:createdDateTime}" -o table

The notes field is a free-text string, so enforce the schema through automation rather than documentation. A nightly Azure Function can scan all app registrations, parse the notes field, and flag any that are missing required tags or have an owner whose Entra ID account is disabled (meaning the owner has left the organization).

Automated Discovery and Reconciliation

For organizations with hundreds or thousands of NHIs, manual ownership assignment is not feasible retroactively. Use this approach:

Export all app registrations and service principals with their createdDateTime, appOwnerOrganizationId, and any existing owners collection from Graph API
Cross-reference against Azure DevOps / GitHub commit history to identify which CI/CD pipelines use each service principal
Map pipelines to teams using repository ownership in your DevOps platform
Assign ownership to the team lead and create ServiceNow CIs (configuration items) for each NHI
Set a 90-day deadline: any NHI without a confirmed owner after 90 days gets disabled

This one-time reconciliation is painful but necessary. Once ownership is established, enforce it at creation time through an approval workflow that requires an owner before credentials are issued.

Entra Workload ID Premium: What It Adds

Entra Workload ID Premium (licensed per workload identity) extends identity protection and conditional access to non-human identities. The free tier gives you basic service principal authentication. Premium adds three capabilities that matter for governance:

Identity Protection for workload identities. Entra ID Identity Protection applies risk-based detection to service principal sign-ins, flagging anomalous authentication patterns such as sign-ins from unusual locations, unfamiliar client applications using the credential, or credential usage patterns that deviate from the baseline. Without Premium, service principal sign-ins generate audit log entries but receive no risk scoring.

Conditional access for workload identities. This is the control that closes the gap between human and non-human identity governance. You can create CA policies that target service principals and managed identities, restricting them to specific IP ranges, requiring specific authentication contexts, or blocking access entirely when risk is elevated.

Access reviews for workload identities. Extends the access review framework to cover app role assignments and service principal RBAC assignments with the same review/approve/deny workflow used for human accounts.

Conditional Access for Workload Identities

Workload identity CA policies differ from user CA policies in important ways:

Aspect	User CA Policy	Workload Identity CA Policy
Supported conditions	Location, device compliance, risk level, app, user group	Location (named locations / IP ranges), service principal risk level
Grant controls	MFA, compliant device, terms of use, app protection	Block or Allow only (no MFA for service principals)
Session controls	Sign-in frequency, persistent browser, CASB	Not supported
Scope	Users and groups	Service principals (by specific ID or "all workload identities")
License requirement	Entra ID P1	Entra Workload ID Premium

The most common workload identity CA policy: restrict all service principals to authenticate only from your corporate IP ranges and your cloud provider's IP ranges (Azure datacenter IPs, GitHub Actions runner IPs). This prevents a leaked credential from being used from an attacker's infrastructure.

# Create a conditional access policy for workload identities
# Restrict service principals to named locations only
az rest --method POST \
  --uri "https://graph.microsoft.com/v1.0/identity/conditionalAccess/policies" \
  --body '{
    "displayName": "Restrict workload identities to trusted locations",
    "state": "enabled",
    "conditions": {
      "clientApplications": {
        "includeServicePrincipals": ["All"],
        "excludeServicePrincipals": ["<break-glass-sp-id>"]
      },
      "locations": {
        "includeLocations": ["All"],
        "excludeLocations": ["<named-location-id-corporate>", "<named-location-id-azure-datacenter>"]
      }
    },
    "grantControls": {
      "operator": "OR",
      "builtInControls": ["block"]
    }
  }'

Exclude your break-glass service principal from this policy, just as you exclude break-glass user accounts from user CA policies. Document the break-glass SP separately and monitor its usage with a dedicated alert.

Secret Rotation Pipelines: The Azure Key Vault Pattern

For service principals that still require client secrets (because workload identity federation is not yet available for their use case), automate rotation using Azure Key Vault, Event Grid, and Azure Functions.

The pattern works as follows:

Store the service principal secret in Azure Key Vault with an expiration date
Key Vault fires a SecretNearExpiry event via Event Grid 30 days before expiration
An Azure Function receives the event, generates a new credential via Graph API, stores it in Key Vault, and removes the old credential
The consuming application reads the secret from Key Vault at runtime (never from config files or environment variables)

# Event Grid subscription for Key Vault secret near-expiry events
resource "azurerm_eventgrid_system_topic" "keyvault" {
  name                   = "keyvault-events"
  resource_group_name    = azurerm_resource_group.main.name
  location               = azurerm_resource_group.main.location
  source_arm_resource_id = azurerm_key_vault.main.id
  topic_type             = "Microsoft.KeyVault.vaults"
}

resource "azurerm_eventgrid_system_topic_event_subscription" "secret_rotation" {
  name                = "secret-near-expiry"
  system_topic        = azurerm_eventgrid_system_topic.keyvault.name
  resource_group_name = azurerm_resource_group.main.name

  azure_function_endpoint {
    function_id = "${azurerm_linux_function_app.rotation.id}/functions/RotateSecret"
  }

  included_event_types = [
    "Microsoft.KeyVault.SecretNearExpiry"
  ]

  subject_filter {
    subject_begins_with = "sp-"
  }
}

# Azure Function App for rotation logic
resource "azurerm_linux_function_app" "rotation" {
  name                       = "func-secret-rotation"
  resource_group_name        = azurerm_resource_group.main.name
  location                   = azurerm_resource_group.main.location
  storage_account_name       = azurerm_storage_account.func.name
  storage_account_access_key = azurerm_storage_account.func.primary_access_key
  service_plan_id            = azurerm_service_plan.func.id

  identity {
    type = "SystemAssigned"
  }

  site_config {
    application_stack {
      python_version = "3.11"
    }
  }
}

# Grant the Function App permission to manage Key Vault secrets
resource "azurerm_key_vault_access_policy" "rotation_func" {
  key_vault_id = azurerm_key_vault.main.id
  tenant_id    = data.azurerm_client_config.current.tenant_id
  object_id    = azurerm_linux_function_app.rotation.identity[0].principal_id

  secret_permissions = ["Get", "Set", "Delete", "List"]
}

The rotation function itself needs Application.ReadWrite.All on Microsoft Graph to create and remove credentials on the target app registration. Grant this via a managed identity with an app role assignment rather than storing another secret.

Critical detail: the rotation function must add the new credential before removing the old one. There must be a brief overlap period where both credentials are valid. If the function removes the old credential first, any application currently using it will fail authentication until it reads the new secret from Key Vault.

Workload Identity Federation: Eliminating Secrets Entirely

Workload identity federation is the architectural answer to the rotation problem. Instead of issuing a client secret or certificate to an external workload, you configure a trust relationship between the Entra ID app registration and the external identity provider (GitHub Actions, Terraform Cloud, AWS, GCP, Kubernetes). The external workload presents a token from its own identity provider, and Entra ID validates it against the configured trust without any shared secret.

The [federated credential deep-dive](/blog/flexible-federated-identity-credentials-entra-github-terraform) covers the setup mechanics. Here, the focus is on enforcement at scale: how to make federated credentials the default and block new secret creation.

Azure Policy for Enforcing Federated Credentials

Azure Policy operates at the ARM (Azure Resource Manager) layer, which means it cannot directly govern Entra ID app registrations (those are Microsoft Graph objects, not ARM resources). However, you can enforce federation requirements through complementary controls:

Custom Entra ID governance rule via Microsoft Graph: use a nightly automation that scans all app registrations created in the last 24 hours and flags any that have passwordCredentials (client secrets) without an approved exception
Azure DevOps / GitHub pipeline policy: enforce that all new service connections use workload identity federation. In Azure DevOps, the "Workload Identity federation (automatic)" service connection type is now the default
Conditional access policy: block service principal sign-ins that use client secret authentication from non-approved IP ranges, making secrets operationally painful to use compared to federation

# Find all app registrations with client secrets (passwordCredentials)
az ad app list --all \
  --query "[?length(passwordCredentials) > `0`].{AppId:appId, Name:displayName, SecretCount:length(passwordCredentials), OldestExpiry:min(passwordCredentials[].endDateTime)}" \
  -o table

# Find app registrations that have secrets but NO federated credentials
az ad app list --all \
  --query "[?length(passwordCredentials) > `0` && length(federatedIdentityCredentials) == `0`].{AppId:appId, Name:displayName}" \
  -o table

The goal state: every app registration uses federated credentials where technically possible, client certificates where federation is not supported, and client secrets only for legacy systems with an approved exception and a mandatory 90-day rotation enforced by the Key Vault pattern above.

Detecting NHI Risk with KQL

The following queries run in Microsoft Sentinel or Log Analytics workspaces connected to Entra ID diagnostic logs. They surface the NHI risks that governance programs are designed to prevent.

Expired or Soon-to-Expire Credentials

This query identifies service principals whose credentials have expired or will expire within 30 days, cross-referenced against recent sign-in activity to distinguish between active and dormant NHIs:

// Find service principals with expiring or expired credentials
let ExpiryWindow = 30d;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(90d)
| summarize LastSignIn = max(TimeGenerated), SignInCount = count() by ServicePrincipalId, ServicePrincipalName, AppId
| join kind=inner (
    // Pull credential metadata from the ServicePrincipalInventory or use externaldata
    AADServicePrincipalSignInLogs
    | where TimeGenerated > ago(1d)
    | distinct ServicePrincipalId, ServicePrincipalName, AppId
) on ServicePrincipalId
| extend DaysSinceLastSignIn = datetime_diff('day', now(), LastSignIn)
| where DaysSinceLastSignIn > 60
| project ServicePrincipalName, AppId, LastSignIn, DaysSinceLastSignIn, SignInCount
| order by DaysSinceLastSignIn desc

This query detects service principals authenticating from IP addresses they have never used before, which indicates either credential theft or unauthorized usage:

// Detect service principal sign-ins from new IP addresses
let LookbackPeriod = 30d;
let DetectionWindow = 1d;
let HistoricalIPs = AADServicePrincipalSignInLogs
| where TimeGenerated between (ago(LookbackPeriod) .. ago(DetectionWindow))
| summarize KnownIPs = make_set(IPAddress) by ServicePrincipalId;
AADServicePrincipalSignInLogs
| where TimeGenerated > ago(DetectionWindow)
| join kind=leftanti HistoricalIPs on ServicePrincipalId
| union (
    AADServicePrincipalSignInLogs
    | where TimeGenerated > ago(DetectionWindow)
    | join kind=inner HistoricalIPs on ServicePrincipalId
    | where not(KnownIPs has IPAddress)
)
| project TimeGenerated, ServicePrincipalName, ServicePrincipalId, IPAddress, ResourceDisplayName, ResultType
| where ResultType == 0 // Successful sign-ins only
| order by TimeGenerated desc

Overprivileged NHIs: High-Privilege Role Assignments

// Find service principals with high-privilege directory role assignments
AuditLogs
| where TimeGenerated > ago(90d)
| where OperationName == "Add member to role"
| extend TargetId = tostring(TargetResources[0].id)
| extend TargetType = tostring(TargetResources[0].type)
| extend RoleName = tostring(parse_json(tostring(TargetResources[0].modifiedProperties))[1].newValue)
| where TargetType == "ServicePrincipal"
| where RoleName has_any ("Global Administrator", "Application Administrator", "Cloud Application Administrator", "Privileged Role Administrator")
| project TimeGenerated, TargetId, RoleName, OperationName,
    InitiatedBy = tostring(InitiatedBy.user.userPrincipalName)
| order by TimeGenerated desc

Any service principal with Global Administrator or Application Administrator should be reviewed immediately. These roles allow the NHI to create new credentials on other app registrations, which is the NHI equivalent of privilege escalation.

Comparison: Secrets vs. Certificates vs. Federated Credentials

Characteristic	Client Secrets	Client Certificates	Federated Credentials
Credential stored in Entra ID	Yes (hashed)	Yes (public key only)	No (trust policy only)
Credential stored by consumer	Yes (plaintext secret)	Yes (private key + cert)	No credential to store
Rotation required	Yes (max 2 years, recommended 90 days)	Yes (typically 1 year)	No (no credential to rotate)
Vulnerable to credential theft	High (secret in config/env vars)	Medium (private key on disk)	Not applicable
Supported identity providers	Any OAuth client	Any OAuth client	GitHub Actions, Terraform Cloud, AWS, GCP, Kubernetes, custom OIDC
Setup complexity	Low	Medium	Medium (one-time federation config)
Operational overhead	High (rotation + monitoring)	Medium (cert renewal)	Low (no lifecycle management)
Conditional access support	Yes (with Workload ID Premium)	Yes (with Workload ID Premium)	Yes (with Workload ID Premium)
Recommended for	Legacy systems with approved exceptions	Systems that cannot use federation	All new workloads, CI/CD, cloud-to-cloud

The migration path is clear: federated credentials for everything that supports OIDC federation, certificates for legacy systems that require a local credential, and client secrets only with an approved exception, a 90-day rotation enforced by automation, and Key Vault as the secret store.

The Decommission Phase Nobody Automates

Decommission is the lifecycle phase most organizations skip entirely, and it is the one that produces the largest attack surface over time. A service principal created for a proof of concept in 2023, never used since Q1 2024, and still holding Contributor on a resource group is exactly the credential an attacker will find and use. It has no active consumers who would notice anomalous behavior, no owner who would flag a suspicious sign-in, and no monitoring that distinguishes its traffic from legitimate automation.

The safe decommission pattern is a three-step sequence: disable, soak, delete.

When the unused-SP KQL query flags an identity with no sign-in activity for 90+ days, do not delete it immediately. Set accountEnabled to false on the service principal. This instantly breaks any authentication attempt using that identity and surfaces any hidden consumer you missed during discovery. If a pipeline or application was silently depending on that identity, you will know within hours.

Hold the identity in disabled state for a defined soak period, typically 30 days. Monitor for any support tickets, pipeline failures, or application errors that reference the disabled identity. If nothing breaks and no owner comes forward during the soak period, proceed with deletion: remove the service principal, delete the app registration, and close the corresponding ServiceNow CI record.

This soak-before-delete pattern is what makes decommission safe enough to automate at scale. Without it, every deletion is a gamble, and teams stop deleting anything. The result is a tenant that accumulates hundreds of orphaned identities over years, each one a potential lateral movement path that no one monitors.

Automate the full sequence: the KQL query feeds a Logic App or Azure Function that disables flagged identities, opens a tracking ticket, waits the soak period, and deletes if no objection is raised. Human intervention is only required when someone objects during the soak window.

Integrating NHI Governance with Broader Security Architecture

NHI governance does not exist in isolation. The permissions and access patterns of non-human identities interact with every other security control in your environment:

Azure AI services: the service principals used by [Azure AI Foundry](/blog/azure-ai-foundry-security-threat-model-rbac-governance) and custom AI workloads often require broad data access for RAG pipelines and model training. Apply the same NHI governance controls: federated credentials for CI/CD deployment, managed identities for runtime access, and conditional access policies restricting authentication to trusted networks.

Infrastructure as Code pipelines: Terraform and Bicep deployments use service principals or managed identities to provision resources. These NHIs often have Owner or Contributor at the subscription level. Use [federated credentials for GitHub and Terraform workflows](/blog/flexible-federated-identity-credentials-entra-github-terraform) to eliminate stored secrets, and scope RBAC assignments to the narrowest resource group possible.

Monitoring and SIEM integration: ensure that AADServicePrincipalSignInLogs and AuditLogs are flowing to your Microsoft Sentinel workspace. Without these log sources, the KQL queries above return nothing and NHI anomalies are invisible.

Managed identities as the default for Azure-native workloads: for any workload running inside Azure (VMs, App Services, Functions, AKS pods), managed identities should be the default authentication mechanism. A system-assigned managed identity is automatically provisioned and deleted with its parent resource, which eliminates the creation and decommission lifecycle phases entirely. User-assigned managed identities require manual lifecycle management but can be shared across resources. The key governance rule: never create an app registration for a workload that could use a managed identity instead.

NHI Governance Hardening Checklist

[ ] Inventory all app registrations and service principals with az ad app list --all and az ad sp list --all
[ ] Assign a human owner to every NHI and enforce ownership tagging in the notes field
[ ] Identify and disable all NHIs with no sign-in activity in the past 90 days
[ ] Migrate all CI/CD pipelines to workload identity federation (eliminate client secrets)
[ ] Deploy the Key Vault + Event Grid + Azure Function rotation pipeline for NHIs that still require secrets
[ ] Set maximum credential lifetime to 90 days for client secrets via organizational policy
[ ] Enable Entra Workload ID Premium for production service principals
[ ] Create conditional access policies restricting workload identity sign-ins to trusted IP ranges
[ ] Remove Global Administrator and Application Administrator roles from all service principals (use scoped roles)
[ ] Deploy KQL queries for expired credentials, anomalous sign-in IPs, and overprivileged NHIs as Sentinel analytics rules
[ ] Configure access reviews for app role assignments on a quarterly cycle
[ ] Build a nightly automation that scans for new app registrations with passwordCredentials and alerts the security team
[ ] Document the break-glass service principal and monitor its usage with a dedicated alert
[ ] Integrate NHI lifecycle events with ServiceNow CMDB for change tracking
[ ] Review and update this checklist quarterly as Entra ID capabilities evolve

NHI (Non-Human Identity) Governance: Beyond the Basics

The Service Principal Secret That Stayed Valid for 847 Days

The NHI Lifecycle: Four Phases, Four Failure Points

Ownership Attribution at Scale

Tagging Strategy

Automated Discovery and Reconciliation

Entra Workload ID Premium: What It Adds

Conditional Access for Workload Identities

Secret Rotation Pipelines: The Azure Key Vault Pattern

Workload Identity Federation: Eliminating Secrets Entirely

Azure Policy for Enforcing Federated Credentials

Detecting NHI Risk with KQL

Expired or Soon-to-Expire Credentials

Overprivileged NHIs: High-Privilege Role Assignments

Comparison: Secrets vs. Certificates vs. Federated Credentials

The Decommission Phase Nobody Automates

Integrating NHI Governance with Broader Security Architecture

NHI Governance Hardening Checklist

Security Hardening Checklist

Identity & Access Management Roadmap

Idan Ohayon

Share this article

Questions & Answers

Ask a Question

Related Articles

Identity Debt: The Security Problem AI Agents Just Turned Into a Detonator

SyncJacking: Protect Privileged Entra Identities from AD Sync Takeover

JWT Complete Guide: Structure, How It Works, and Security

Need Help with Your Security?

The Service Principal Secret That Stayed Valid for 847 Days

The NHI Lifecycle: Four Phases, Four Failure Points

Ownership Attribution at Scale

Tagging Strategy

Automated Discovery and Reconciliation

Entra Workload ID Premium: What It Adds

Conditional Access for Workload Identities

Secret Rotation Pipelines: The Azure Key Vault Pattern

Workload Identity Federation: Eliminating Secrets Entirely

Azure Policy for Enforcing Federated Credentials

Detecting NHI Risk with KQL

Expired or Soon-to-Expire Credentials

Anomalous NHI Sign-In Patterns

Overprivileged NHIs: High-Privilege Role Assignments

Comparison: Secrets vs. Certificates vs. Federated Credentials

The Decommission Phase Nobody Automates

Integrating NHI Governance with Broader Security Architecture

NHI Governance Hardening Checklist

Security Hardening Checklist

Identity & Access Management Roadmap

Idan Ohayon

Share this article

Questions & Answers

Ask a Question

Related Articles

Identity Debt: The Security Problem AI Agents Just Turned Into a Detonator

SyncJacking: Protect Privileged Entra Identities from AD Sync Takeover

JWT Complete Guide: Structure, How It Works, and Security

Need Help with Your Security?