Cyber Intelligence
AI Security19 min read

OWASP LLM Top 10 2025: What Changed and What It Means for Azure AI Deployments

The OWASP LLM Top 10 2025 revision reshuffled the risk landscape: prompt injection dropped to second place, unbounded consumption is new, and system prompt leakage got its own category. If you run Azure OpenAI or AI Foundry workloads, every change maps to specific controls you either have or are missing. This guide breaks down each updated risk with Azure-native mitigations, detection queries, and the controls that actually close the gaps.

I
Microsoft Cloud Solution Architect
OWASP LLM Top 10Azure AI SecurityPrompt InjectionAI GovernanceAzure OpenAI

The 2025 Reshuffle Nobody Expected

The first OWASP LLM Top 10 (2023) put prompt injection at the top spot and treated everything else as secondary. The 2025 revision tells a different story. Prompt injection moved to LLM02. The new number one is LLM01: Excessive Agency. Unbounded Consumption is entirely new at LLM10. System Prompt Leakage got its own dedicated category instead of being folded into prompt injection. Vector and Embedding Weaknesses replaced Insecure Output Handling.

These are not cosmetic shuffles. Each change reflects real-world incidents that the original list did not adequately address. The 2023 list was built on theoretical threat models. The 2025 list is built on production failures.

If you operate Azure OpenAI, AI Foundry, or any LLM-backed application on Azure, every risk in this updated list maps to specific controls. Some you already have. Some you are probably missing. This guide maps each LLM risk to the Azure control plane and gives you the detection queries and policy configurations to close the gaps.

What Changed: 2023 vs. 2025 Side-by-Side

2025 Rank2025 Risk2023 Rank2023 RiskWhat Changed
LLM01Excessive AgencyLLM08Excessive AgencyPromoted: agentic AI made this the top real-world failure
LLM02Prompt InjectionLLM01Prompt InjectionDemoted: still critical, but agency failures cause more damage
LLM03Supply Chain VulnerabilitiesLLM05Supply Chain VulnerabilitiesPromoted: model poisoning incidents increased 300%
LLM04Data and Model PoisoningLLM03Training Data PoisoningExpanded: now includes inference-time data poisoning (RAG)
LLM05Improper Output HandlingLLM02Insecure Output HandlingRenamed and slightly demoted
LLM06Excessive DisclosureLLM06Sensitive Information DisclosureRenamed: broader scope including system prompt leakage
LLM07System Prompt LeakageNewN/ANew category: extracted from prompt injection
LLM08Vector and Embedding WeaknessesNewN/ANew category: RAG-specific attacks
LLM09MisinformationLLM09OverrelianceRenamed: focus shifted from user behavior to model output
LLM10Unbounded ConsumptionNewN/ANew category: resource exhaustion and denial of wallet
The three entirely new categories (LLM07, LLM08, LLM10) reflect attack patterns that barely existed in production when the 2023 list was written. Agentic AI, enterprise RAG pipelines, and consumption-based pricing have fundamentally changed the threat landscape.

LLM01: Excessive Agency

Why It Is Number One Now

In 2023, most LLM deployments were chat interfaces with no tool access. In 2025, agentic architectures are in production: LLMs that call APIs, execute code, query databases, and trigger workflows. When an agent has more permissions than it needs, a single manipulated prompt can cause the agent to take real-world actions the user never intended.

The risk is not that the LLM is "hacked." The risk is that the LLM is doing exactly what it was designed to do, but with permissions that make innocent mistakes catastrophic.

Azure Mitigations

Principle of least privilege for tool-calling identities. Every Azure OpenAI or AI Foundry agent that calls external tools should authenticate with a dedicated managed identity scoped to the minimum required permissions. Do not reuse the hub managed identity for agent tool calls.
// Dedicated managed identity for an agent that only needs to read from a specific storage container
resource agentIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'agent-tool-caller-identity'
  location: location
}

// Scope: single container, read-only var storageBlobReaderRoleId = '2a2b9908-6ea1-4ae2-8e65-a410df84e7d1' resource agentStorageRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = { name: guid(storageAccount.id, agentIdentity.id, storageBlobReaderRoleId) scope: storageContainer properties: { roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', storageBlobReaderRoleId) principalId: agentIdentity.properties.principalId principalType: 'ServicePrincipal' } }

Human-in-the-loop for destructive operations. Any agent action that creates, modifies, or deletes resources should require explicit user confirmation. Implement this at the application layer, not the model layer. The LLM should output a proposed action; a separate approval service should gate execution. Rate-limit tool invocations per session. Set a maximum number of tool calls per conversation turn and per session. Azure API Management can enforce this at the API gateway level in front of your agent's tool endpoints.

Detection

// Detect agent tool calls exceeding expected volume
AzureDiagnostics
| where ResourceType == "WORKSPACES"
| where Category == "OnlineEndpointTraffic"
| extend RequestPath = extract("path=([^,]+)", 1, properties_s)
| where RequestPath contains "/tool" or RequestPath contains "/function"
| summarize ToolCalls = count() by bin(TimeGenerated, 5m), RequestPath
| where ToolCalls > 50
| order by ToolCalls desc

LLM02: Prompt Injection

What Changed from 2023

Prompt injection is now split into direct (user-supplied) and indirect (data-supplied). Indirect prompt injection through RAG grounding documents and MCP tool results is now the higher-risk variant because it bypasses user-facing input filters entirely.

Azure Mitigations

Azure AI Content Safety Prompt Shields. Enable Prompt Shields on every RAG and agentic deployment. They analyze both user input and grounding document content for injection patterns.
# Enable Prompt Shields on an Azure OpenAI deployment
az cognitiveservices account deployment update \
  --name <account-name> \
  --resource-group <rg> \
  --deployment-name <deployment-name> \
  --content-filter prompt-shield-enabled
Input/output boundary enforcement. Treat the system prompt, user input, and grounding data as separate trust zones. Use XML delimiters or structured message formatting to make boundaries explicit to the model. Azure OpenAI's chat completions API naturally separates system, user, and assistant roles: use them correctly instead of concatenating everything into a single user message. Grounding data sanitization. Before indexing documents into Azure AI Search for RAG, scan them for injection patterns. A simple regex pass for patterns like "ignore previous instructions" or "system: you are now" catches the lowest-effort attacks. For sophisticated attacks, use a secondary LLM call to classify each document chunk as benign or potentially malicious before indexing.

LLM03: Supply Chain Vulnerabilities

This risk has been covered comprehensively in the Secure AI Supply Chain guide. The key controls: internal model registry, automated scanning with ModelScan, Azure Policy gates for deployment, and model SBOM generation.

The 2025 update specifically calls out fine-tuning service providers as a supply chain vector. If you use a third-party fine-tuning service, the model weights returned to you could contain embedded behaviors that were not in your training data. The mitigation is to fine-tune only on infrastructure you control (Azure AI Foundry compute) or to re-validate fine-tuned model outputs against a held-out test set that probes for unexpected behaviors.

LLM04: Data and Model Poisoning

The RAG Expansion

The 2023 list focused on training data poisoning. The 2025 list adds inference-time poisoning through RAG: an attacker who can write to your grounding data store can manipulate model outputs without touching the model itself. This is the threat covered in depth in the AI Foundry threat model under Threat 3.

Azure-Specific Controls

# Enable blob versioning and soft delete on RAG grounding data
az storage account blob-service-properties update \
  --account-name <storage-account> \
  --resource-group <rg> \
  --enable-versioning true \
  --enable-delete-retention true \
  --delete-retention-days 30

# Set immutability policy on the grounding data container az storage container immutability-policy create \ --account-name <storage-account> \ --container-name rag-grounding-data \ --period 7 \ --allow-protected-append-writes true

Combine with Azure Monitor alerts for any PutBlob or PutBlock operation on the grounding container from an identity other than your approved indexing pipeline service principal.

LLM05: Improper Output Handling

LLM outputs should never be trusted as safe for downstream consumption. If your application passes LLM-generated text to a SQL query, shell command, API call, or web page without sanitization, you have a classic injection vulnerability with an LLM as the attack surface.

Practical Controls

  • Never use LLM output in eval(), exec(), or string-interpolated SQL
  • Apply output encoding appropriate to the rendering context (HTML encoding for web, parameterized queries for SQL)
  • Use Azure API Management response transformation policies to strip or encode potentially dangerous characters from LLM API responses before they reach downstream consumers
<!-- APIM policy: sanitize LLM output before passing to downstream API -->
<outbound>
  <set-body>@{
    var response = context.Response.Body.As<string>();
    // Strip potential script injection from LLM output
    response = System.Text.RegularExpressions.Regex.Replace(
      response, @"<script[^>]*>.*?</script>", "",
      System.Text.RegularExpressions.RegexOptions.Singleline);
    return response;
  }</set-body>
</outbound>

LLM06: Excessive Disclosure

Beyond PII Leakage

The 2025 update broadens this category beyond PII. It now includes: leaking internal business logic, exposing training data through extraction attacks, and revealing architectural details through error messages.

Azure Controls

Azure OpenAI content filters with custom blocklists. Add company-specific terms, project codenames, and internal system names to a custom blocklist:
# Create a custom blocklist for sensitive terms
az cognitiveservices account content-filter blocklist create \
  --name internal-terms-blocklist \
  --resource-group <rg> \
  --account-name <aoai-account> \
  --description "Block internal project names and sensitive identifiers"

# Add terms to the blocklist az cognitiveservices account content-filter blocklist item add \ --name internal-terms-blocklist \ --resource-group <rg> \ --account-name <aoai-account> \ --text "ProjectPhoenix" \ --is-regex false

Output token limits. Set max_tokens on every deployment to prevent extraction attacks that rely on generating large volumes of output. A chatbot that should respond in 500 tokens does not need a 4096 token limit.

LLM07: System Prompt Leakage (New)

This was previously a subset of prompt injection. The 2025 list gives it a dedicated category because system prompts in production frequently contain: API keys, internal URLs, business logic rules, content policy workarounds, and role-based access control instructions. Leaking the system prompt gives an attacker a roadmap for every other attack on the list.

Why This Gets Its Own Category

A system prompt like "You have access to the internal HR database at hr-api.internal.corp.com. Use the API key HRKEY-abc123 to authenticate." gives an attacker three things: the existence of the API, the endpoint URL, and a valid credential. This is not hypothetical: security researchers have extracted system prompts from production deployments of major enterprise applications.

Azure Mitigations

  • Never put credentials, internal URLs, or API keys in system prompts. Use Azure Key Vault references resolved at runtime by the application layer.
  • Use Azure OpenAI's system message with "role": "system" and explicitly instruct the model not to reveal system instructions. This is a defense-in-depth measure, not a reliable control.
  • Implement output monitoring that detects when a response contains patterns matching the system prompt structure.

Detection Query

// Detect responses that may contain leaked system prompt content
// Requires request logging enabled on the endpoint
AzureDiagnostics
| where ResourceType == "ACCOUNTS"
| where Category == "RequestResponse"
| extend ResponseText = tostring(parse_json(properties_s).response)
| where ResponseText contains "You are" and ResponseText contains "system"
    and (ResponseText contains "API" or ResponseText contains "key"
    or ResponseText contains "internal" or ResponseText contains "endpoint")
| project TimeGenerated, ResourceId, ResponseText
| take 50

LLM08: Vector and Embedding Weaknesses (New)

RAG pipelines rely on vector embeddings to retrieve relevant context. The 2025 list recognizes that the embedding layer itself is an attack surface.

Attack Patterns

  1. Embedding inversion: reconstructing original text from embedding vectors, which can expose PII from the training or indexing corpus
  2. Adversarial document crafting: creating documents that produce embedding vectors deliberately close to target queries, ensuring the malicious document is always retrieved
  3. Index poisoning: injecting documents into the vector store that manipulate retrieval results for specific query patterns

Azure AI Search Hardening

# Enable customer-managed encryption on Azure AI Search
az search service update \
  --name <search-service> \
  --resource-group <rg> \
  --encryption-key-uri <key-vault-key-uri> \
  --identity-type SystemAssigned

# Restrict index write access to the indexing pipeline identity only az role assignment create \ --role "Search Index Data Contributor" \ --assignee <indexing-pipeline-principal-id> \ --scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Search/searchServices/<search-service>

Separate read and write access on your search indexes. Application identities that query the index should have Search Index Data Reader only. The indexing pipeline identity gets Search Index Data Contributor. No human account should have write access to production indexes.

LLM09: Misinformation

The renamed category shifts focus from "user overreliance" (a human behavior problem) to "model-generated misinformation" (a system output problem). The practical implication: you are responsible for implementing guardrails against hallucination, not just training users to be skeptical.

Controls

  • Grounding with citations. Azure AI Foundry supports grounding with Azure AI Search, and the API returns citation metadata. Surface these citations to users so outputs are verifiable.
  • Confidence scoring. Use the logprobs parameter in Azure OpenAI completions to get token-level confidence scores. Flag responses where average confidence falls below a threshold.
  • Automated fact-checking pipelines. For high-stakes applications (medical, financial, legal), route LLM outputs through a secondary verification model or rules engine before presenting to users.

LLM10: Unbounded Consumption (New)

The Denial-of-Wallet Attack

This is the cloud-native LLM risk. An attacker sends crafted prompts designed to maximize token consumption: long context windows, recursive tool calls, or prompts that trigger maximum-length outputs. On consumption-based pricing, this translates directly to financial damage.

Azure Token and Cost Controls

# Set token-per-minute rate limits on Azure OpenAI deployment
az cognitiveservices account deployment create \
  --name <account-name> \
  --resource-group <rg> \
  --deployment-name <deployment-name> \
  --model-name gpt-4o \
  --model-version "2024-11-20" \
  --sku-capacity 80 \
  --sku-name Standard

# Set Azure budget alert for AI services resource group az consumption budget create \ --budget-name ai-services-monthly-cap \ --amount 5000 \ --category cost \ --resource-group rg-ai-services \ --time-grain monthly \ --start-date 2026-06-01 \ --end-date 2027-06-01 \ --notifications '[{"contactEmails":["security-team@company.com"],"threshold":80,"operator":"GreaterThan","enabled":true}]'

Azure API Management quotas. Place APIM in front of Azure OpenAI and enforce per-user, per-application, and per-IP quotas:
<!-- APIM policy: rate limit per subscription key -->
<inbound>
  <rate-limit-by-key
    calls="100"
    renewal-period="60"
    counter-key="@(context.Subscription.Key)"
    increment-condition="@(context.Response.StatusCode >= 200)" />
  <quota-by-key
    calls="10000"
    renewal-period="86400"
    counter-key="@(context.Subscription.Key)" />
</inbound>

Detection

// Detect token consumption spikes per caller
AzureMetrics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where MetricName == "TokenTransaction"
| summarize TotalTokens = sum(Total) by bin(TimeGenerated, 1h), Resource
| where TotalTokens > 100000
| order by TotalTokens desc

Mapping All 10 Risks to Azure Controls

OWASP RiskPrimary Azure ControlSecondary ControlDetection
LLM01 Excessive AgencyLeast-privilege MI per agentAPIM rate limits on toolsKQL tool call volume
LLM02 Prompt InjectionPrompt ShieldsInput/output trust zonesContent Safety alerts
LLM03 Supply ChainAzure Policy + internal ACRModelScan in CI/CDDeployment source alerts
LLM04 Data PoisoningBlob versioning + immutabilityPurview sensitivity labelsStorage write alerts
LLM05 Output HandlingAPIM response transformationOutput encoding at app layerN/A (app-level)
LLM06 Excessive DisclosureCustom blocklistsmax_tokens limitsResponse content monitoring
LLM07 System Prompt LeakageKey Vault for secretsOutput pattern monitoringKQL response analysis
LLM08 Vector WeaknessesCMK encryption + RBACIndex write separationIndex modification alerts
LLM09 MisinformationGrounding with citationslogprobs confidence scoringN/A (app-level)
LLM10 Unbounded ConsumptionAPIM quotas + TPM limitsAzure budget alertsToken consumption KQL

Hardening Checklist

  • [ ] Prompt Shields enabled on all RAG and agentic Azure OpenAI deployments
  • [ ] Dedicated managed identities per agent with minimum required permissions: no shared hub MI for tool calls
  • [ ] Human-in-the-loop gates implemented for all agent actions that create, modify, or delete resources
  • [ ] Internal model registry (ACR) with Azure Policy denying deployments from external sources
  • [ ] Blob versioning and immutability policies on all RAG grounding data containers
  • [ ] Custom content filter blocklists configured with internal project names and sensitive identifiers
  • [ ] No credentials or internal URLs in system prompts: all secrets resolved from Key Vault at runtime
  • [ ] Azure AI Search index RBAC separated: read-only for applications, write for indexing pipeline only
  • [ ] APIM deployed in front of Azure OpenAI with per-user rate limits and daily quotas
  • [ ] Azure budget alerts configured for AI services resource groups with 80% threshold notifications
  • [ ] max_tokens set explicitly on every deployment to prevent token extraction attacks
  • [ ] Request logging enabled on all Azure OpenAI deployments for post-incident response content analysis
  • [ ] KQL alerts deployed for token consumption spikes, tool call volume anomalies, and system prompt leakage patterns

Frequently Asked Questions

Why did prompt injection move from the number one spot to number two in the 2025 OWASP LLM Top 10?

Prompt injection remains a critical vulnerability, but real-world incident data from 2024 and 2025 showed that Excessive Agency caused more actual damage in production deployments. Prompt injection exploits typically require a secondary vulnerability (like excessive permissions or improper output handling) to cause significant harm. Excessive Agency, where an LLM agent has overly broad permissions and autonomously takes destructive actions, causes direct damage without needing another vulnerability in the chain. The OWASP team reordered the list based on observed impact severity rather than theoretical exploitability.

What is the "denial of wallet" attack described under LLM10 Unbounded Consumption?

Denial of wallet is a resource exhaustion attack specifically targeting consumption-based cloud services. An attacker sends crafted prompts designed to maximize token usage: extremely long context windows, prompts that trigger maximum-length outputs, or recursive patterns that cause repeated tool calls. Unlike traditional denial-of-service attacks that aim to make a service unavailable, denial of wallet aims to generate massive cloud bills. On Azure OpenAI with pay-per-token pricing, an unprotected endpoint can accumulate thousands of dollars in charges within hours. The mitigations are APIM rate limits per user or API key, TPM (tokens per minute) capacity limits on deployments, and Azure budget alerts with automatic notification at spending thresholds.

How do Vector and Embedding Weaknesses (LLM08) differ from Data Poisoning (LLM04)?

Data poisoning targets the content stored in your knowledge base, such as modifying documents in a RAG grounding data container. Vector and embedding weaknesses target the retrieval mechanism itself. An attacker crafting adversarial documents that produce embedding vectors close to specific target queries is manipulating which content gets retrieved, not the content itself. This means even if all your grounding documents are legitimate, an attacker can inject a new document specifically engineered to be retrieved for certain queries, effectively hijacking the RAG pipeline's relevance ranking. The defense requires both content integrity controls (data poisoning mitigations) and index access controls (vector weakness mitigations).

Why does System Prompt Leakage (LLM07) deserve its own category separate from Prompt Injection?

In the 2023 list, system prompt extraction was considered a variant of prompt injection. The 2025 update separates it because the impact profile is fundamentally different. Prompt injection aims to make the model do something unintended. System prompt leakage exposes information that enables other attacks: internal API endpoints, authentication credentials hardcoded in prompts, business logic rules, content policy workarounds, and role-based access patterns. A leaked system prompt is essentially an attacker's reconnaissance report for the entire application. Treating it as a separate risk category ensures organizations implement dedicated controls (Key Vault for secrets, output monitoring for prompt patterns) rather than relying solely on prompt injection defenses.

Pluralsight logo

Recommended tool: Pluralsight

Level up your security skills with expert-led courses. Free 10-day trial, then access thousands of courses across cloud security, networking, and certifications.

Start free trialRecommended

Get weekly security insights

Cloud security, zero trust, and identity guides — straight to your inbox.

Continue Learning

AI Security Engineer Roadmap

The fastest-growing specialty in security.

Start the Intermediate Path10h · 4 topics · 10 quiz questions
I

Microsoft Cloud Solution Architect

Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.

Share this article

Questions & Answers

Related Articles

Need Help with Your Security?

Our team of security experts can help you implement the strategies discussed in this article.

Contact Us