OWASP LLM Top 10 2025: What Changed and What It Means for Azure AI Deployments
The OWASP LLM Top 10 2025 revision reshuffled the risk landscape: prompt injection dropped to second place, unbounded consumption is new, and system prompt leakage got its own category. If you run Azure OpenAI or AI Foundry workloads, every change maps to specific controls you either have or are missing. This guide breaks down each updated risk with Azure-native mitigations, detection queries, and the controls that actually close the gaps.
The 2025 Reshuffle Nobody Expected
The first OWASP LLM Top 10 (2023) put prompt injection at the top spot and treated everything else as secondary. The 2025 revision tells a different story. Prompt injection moved to LLM02. The new number one is LLM01: Excessive Agency. Unbounded Consumption is entirely new at LLM10. System Prompt Leakage got its own dedicated category instead of being folded into prompt injection. Vector and Embedding Weaknesses replaced Insecure Output Handling.
These are not cosmetic shuffles. Each change reflects real-world incidents that the original list did not adequately address. The 2023 list was built on theoretical threat models. The 2025 list is built on production failures.
If you operate Azure OpenAI, AI Foundry, or any LLM-backed application on Azure, every risk in this updated list maps to specific controls. Some you already have. Some you are probably missing. This guide maps each LLM risk to the Azure control plane and gives you the detection queries and policy configurations to close the gaps.
What Changed: 2023 vs. 2025 Side-by-Side
| 2025 Rank | 2025 Risk | 2023 Rank | 2023 Risk | What Changed |
|---|---|---|---|---|
| LLM01 | Excessive Agency | LLM08 | Excessive Agency | Promoted: agentic AI made this the top real-world failure |
| LLM02 | Prompt Injection | LLM01 | Prompt Injection | Demoted: still critical, but agency failures cause more damage |
| LLM03 | Supply Chain Vulnerabilities | LLM05 | Supply Chain Vulnerabilities | Promoted: model poisoning incidents increased 300% |
| LLM04 | Data and Model Poisoning | LLM03 | Training Data Poisoning | Expanded: now includes inference-time data poisoning (RAG) |
| LLM05 | Improper Output Handling | LLM02 | Insecure Output Handling | Renamed and slightly demoted |
| LLM06 | Excessive Disclosure | LLM06 | Sensitive Information Disclosure | Renamed: broader scope including system prompt leakage |
| LLM07 | System Prompt Leakage | New | N/A | New category: extracted from prompt injection |
| LLM08 | Vector and Embedding Weaknesses | New | N/A | New category: RAG-specific attacks |
| LLM09 | Misinformation | LLM09 | Overreliance | Renamed: focus shifted from user behavior to model output |
| LLM10 | Unbounded Consumption | New | N/A | New category: resource exhaustion and denial of wallet |
LLM01: Excessive Agency
Why It Is Number One Now
In 2023, most LLM deployments were chat interfaces with no tool access. In 2025, agentic architectures are in production: LLMs that call APIs, execute code, query databases, and trigger workflows. When an agent has more permissions than it needs, a single manipulated prompt can cause the agent to take real-world actions the user never intended.
The risk is not that the LLM is "hacked." The risk is that the LLM is doing exactly what it was designed to do, but with permissions that make innocent mistakes catastrophic.
Azure Mitigations
Principle of least privilege for tool-calling identities. Every Azure OpenAI or AI Foundry agent that calls external tools should authenticate with a dedicated managed identity scoped to the minimum required permissions. Do not reuse the hub managed identity for agent tool calls.// Dedicated managed identity for an agent that only needs to read from a specific storage container
resource agentIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
name: 'agent-tool-caller-identity'
location: location
}// Scope: single container, read-only
var storageBlobReaderRoleId = '2a2b9908-6ea1-4ae2-8e65-a410df84e7d1'
resource agentStorageRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(storageAccount.id, agentIdentity.id, storageBlobReaderRoleId)
scope: storageContainer
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', storageBlobReaderRoleId)
principalId: agentIdentity.properties.principalId
principalType: 'ServicePrincipal'
}
}
Human-in-the-loop for destructive operations. Any agent action that creates, modifies, or deletes resources should require explicit user confirmation. Implement this at the application layer, not the model layer. The LLM should output a proposed action; a separate approval service should gate execution.
Rate-limit tool invocations per session. Set a maximum number of tool calls per conversation turn and per session. Azure API Management can enforce this at the API gateway level in front of your agent's tool endpoints.
Detection
// Detect agent tool calls exceeding expected volume
AzureDiagnostics
| where ResourceType == "WORKSPACES"
| where Category == "OnlineEndpointTraffic"
| extend RequestPath = extract("path=([^,]+)", 1, properties_s)
| where RequestPath contains "/tool" or RequestPath contains "/function"
| summarize ToolCalls = count() by bin(TimeGenerated, 5m), RequestPath
| where ToolCalls > 50
| order by ToolCalls desc
LLM02: Prompt Injection
What Changed from 2023
Prompt injection is now split into direct (user-supplied) and indirect (data-supplied). Indirect prompt injection through RAG grounding documents and MCP tool results is now the higher-risk variant because it bypasses user-facing input filters entirely.
Azure Mitigations
Azure AI Content Safety Prompt Shields. Enable Prompt Shields on every RAG and agentic deployment. They analyze both user input and grounding document content for injection patterns.# Enable Prompt Shields on an Azure OpenAI deployment
az cognitiveservices account deployment update \
--name <account-name> \
--resource-group <rg> \
--deployment-name <deployment-name> \
--content-filter prompt-shield-enabled
Input/output boundary enforcement. Treat the system prompt, user input, and grounding data as separate trust zones. Use XML delimiters or structured message formatting to make boundaries explicit to the model. Azure OpenAI's chat completions API naturally separates system, user, and assistant roles: use them correctly instead of concatenating everything into a single user message.
Grounding data sanitization. Before indexing documents into Azure AI Search for RAG, scan them for injection patterns. A simple regex pass for patterns like "ignore previous instructions" or "system: you are now" catches the lowest-effort attacks. For sophisticated attacks, use a secondary LLM call to classify each document chunk as benign or potentially malicious before indexing.
LLM03: Supply Chain Vulnerabilities
This risk has been covered comprehensively in the Secure AI Supply Chain guide. The key controls: internal model registry, automated scanning with ModelScan, Azure Policy gates for deployment, and model SBOM generation.
The 2025 update specifically calls out fine-tuning service providers as a supply chain vector. If you use a third-party fine-tuning service, the model weights returned to you could contain embedded behaviors that were not in your training data. The mitigation is to fine-tune only on infrastructure you control (Azure AI Foundry compute) or to re-validate fine-tuned model outputs against a held-out test set that probes for unexpected behaviors.
LLM04: Data and Model Poisoning
The RAG Expansion
The 2023 list focused on training data poisoning. The 2025 list adds inference-time poisoning through RAG: an attacker who can write to your grounding data store can manipulate model outputs without touching the model itself. This is the threat covered in depth in the AI Foundry threat model under Threat 3.
Azure-Specific Controls
# Enable blob versioning and soft delete on RAG grounding data
az storage account blob-service-properties update \
--account-name <storage-account> \
--resource-group <rg> \
--enable-versioning true \
--enable-delete-retention true \
--delete-retention-days 30# Set immutability policy on the grounding data container
az storage container immutability-policy create \
--account-name <storage-account> \
--container-name rag-grounding-data \
--period 7 \
--allow-protected-append-writes true
Combine with Azure Monitor alerts for any PutBlob or PutBlock operation on the grounding container from an identity other than your approved indexing pipeline service principal.
LLM05: Improper Output Handling
LLM outputs should never be trusted as safe for downstream consumption. If your application passes LLM-generated text to a SQL query, shell command, API call, or web page without sanitization, you have a classic injection vulnerability with an LLM as the attack surface.
Practical Controls
- Never use LLM output in
eval(),exec(), or string-interpolated SQL - Apply output encoding appropriate to the rendering context (HTML encoding for web, parameterized queries for SQL)
- Use Azure API Management response transformation policies to strip or encode potentially dangerous characters from LLM API responses before they reach downstream consumers
<!-- APIM policy: sanitize LLM output before passing to downstream API -->
<outbound>
<set-body>@{
var response = context.Response.Body.As<string>();
// Strip potential script injection from LLM output
response = System.Text.RegularExpressions.Regex.Replace(
response, @"<script[^>]*>.*?</script>", "",
System.Text.RegularExpressions.RegexOptions.Singleline);
return response;
}</set-body>
</outbound>
LLM06: Excessive Disclosure
Beyond PII Leakage
The 2025 update broadens this category beyond PII. It now includes: leaking internal business logic, exposing training data through extraction attacks, and revealing architectural details through error messages.
Azure Controls
Azure OpenAI content filters with custom blocklists. Add company-specific terms, project codenames, and internal system names to a custom blocklist:# Create a custom blocklist for sensitive terms
az cognitiveservices account content-filter blocklist create \
--name internal-terms-blocklist \
--resource-group <rg> \
--account-name <aoai-account> \
--description "Block internal project names and sensitive identifiers"# Add terms to the blocklist
az cognitiveservices account content-filter blocklist item add \
--name internal-terms-blocklist \
--resource-group <rg> \
--account-name <aoai-account> \
--text "ProjectPhoenix" \
--is-regex false
Output token limits. Set max_tokens on every deployment to prevent extraction attacks that rely on generating large volumes of output. A chatbot that should respond in 500 tokens does not need a 4096 token limit.
LLM07: System Prompt Leakage (New)
This was previously a subset of prompt injection. The 2025 list gives it a dedicated category because system prompts in production frequently contain: API keys, internal URLs, business logic rules, content policy workarounds, and role-based access control instructions. Leaking the system prompt gives an attacker a roadmap for every other attack on the list.
Why This Gets Its Own Category
A system prompt like "You have access to the internal HR database at hr-api.internal.corp.com. Use the API key HRKEY-abc123 to authenticate." gives an attacker three things: the existence of the API, the endpoint URL, and a valid credential. This is not hypothetical: security researchers have extracted system prompts from production deployments of major enterprise applications.
Azure Mitigations
- Never put credentials, internal URLs, or API keys in system prompts. Use Azure Key Vault references resolved at runtime by the application layer.
- Use Azure OpenAI's system message with
"role": "system"and explicitly instruct the model not to reveal system instructions. This is a defense-in-depth measure, not a reliable control. - Implement output monitoring that detects when a response contains patterns matching the system prompt structure.
Detection Query
// Detect responses that may contain leaked system prompt content
// Requires request logging enabled on the endpoint
AzureDiagnostics
| where ResourceType == "ACCOUNTS"
| where Category == "RequestResponse"
| extend ResponseText = tostring(parse_json(properties_s).response)
| where ResponseText contains "You are" and ResponseText contains "system"
and (ResponseText contains "API" or ResponseText contains "key"
or ResponseText contains "internal" or ResponseText contains "endpoint")
| project TimeGenerated, ResourceId, ResponseText
| take 50
LLM08: Vector and Embedding Weaknesses (New)
RAG pipelines rely on vector embeddings to retrieve relevant context. The 2025 list recognizes that the embedding layer itself is an attack surface.
Attack Patterns
- Embedding inversion: reconstructing original text from embedding vectors, which can expose PII from the training or indexing corpus
- Adversarial document crafting: creating documents that produce embedding vectors deliberately close to target queries, ensuring the malicious document is always retrieved
- Index poisoning: injecting documents into the vector store that manipulate retrieval results for specific query patterns
Azure AI Search Hardening
# Enable customer-managed encryption on Azure AI Search
az search service update \
--name <search-service> \
--resource-group <rg> \
--encryption-key-uri <key-vault-key-uri> \
--identity-type SystemAssigned# Restrict index write access to the indexing pipeline identity only
az role assignment create \
--role "Search Index Data Contributor" \
--assignee <indexing-pipeline-principal-id> \
--scope /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.Search/searchServices/<search-service>
Separate read and write access on your search indexes. Application identities that query the index should have Search Index Data Reader only. The indexing pipeline identity gets Search Index Data Contributor. No human account should have write access to production indexes.
LLM09: Misinformation
The renamed category shifts focus from "user overreliance" (a human behavior problem) to "model-generated misinformation" (a system output problem). The practical implication: you are responsible for implementing guardrails against hallucination, not just training users to be skeptical.
Controls
- Grounding with citations. Azure AI Foundry supports grounding with Azure AI Search, and the API returns citation metadata. Surface these citations to users so outputs are verifiable.
- Confidence scoring. Use the
logprobsparameter in Azure OpenAI completions to get token-level confidence scores. Flag responses where average confidence falls below a threshold. - Automated fact-checking pipelines. For high-stakes applications (medical, financial, legal), route LLM outputs through a secondary verification model or rules engine before presenting to users.
LLM10: Unbounded Consumption (New)
The Denial-of-Wallet Attack
This is the cloud-native LLM risk. An attacker sends crafted prompts designed to maximize token consumption: long context windows, recursive tool calls, or prompts that trigger maximum-length outputs. On consumption-based pricing, this translates directly to financial damage.
Azure Token and Cost Controls
# Set token-per-minute rate limits on Azure OpenAI deployment
az cognitiveservices account deployment create \
--name <account-name> \
--resource-group <rg> \
--deployment-name <deployment-name> \
--model-name gpt-4o \
--model-version "2024-11-20" \
--sku-capacity 80 \
--sku-name Standard# Set Azure budget alert for AI services resource group
az consumption budget create \
--budget-name ai-services-monthly-cap \
--amount 5000 \
--category cost \
--resource-group rg-ai-services \
--time-grain monthly \
--start-date 2026-06-01 \
--end-date 2027-06-01 \
--notifications '[{"contactEmails":["security-team@company.com"],"threshold":80,"operator":"GreaterThan","enabled":true}]'
Azure API Management quotas. Place APIM in front of Azure OpenAI and enforce per-user, per-application, and per-IP quotas:
<!-- APIM policy: rate limit per subscription key -->
<inbound>
<rate-limit-by-key
calls="100"
renewal-period="60"
counter-key="@(context.Subscription.Key)"
increment-condition="@(context.Response.StatusCode >= 200)" />
<quota-by-key
calls="10000"
renewal-period="86400"
counter-key="@(context.Subscription.Key)" />
</inbound>
Detection
// Detect token consumption spikes per caller
AzureMetrics
| where ResourceProvider == "MICROSOFT.COGNITIVESERVICES"
| where MetricName == "TokenTransaction"
| summarize TotalTokens = sum(Total) by bin(TimeGenerated, 1h), Resource
| where TotalTokens > 100000
| order by TotalTokens desc
Mapping All 10 Risks to Azure Controls
| OWASP Risk | Primary Azure Control | Secondary Control | Detection |
|---|---|---|---|
| LLM01 Excessive Agency | Least-privilege MI per agent | APIM rate limits on tools | KQL tool call volume |
| LLM02 Prompt Injection | Prompt Shields | Input/output trust zones | Content Safety alerts |
| LLM03 Supply Chain | Azure Policy + internal ACR | ModelScan in CI/CD | Deployment source alerts |
| LLM04 Data Poisoning | Blob versioning + immutability | Purview sensitivity labels | Storage write alerts |
| LLM05 Output Handling | APIM response transformation | Output encoding at app layer | N/A (app-level) |
| LLM06 Excessive Disclosure | Custom blocklists | max_tokens limits | Response content monitoring |
| LLM07 System Prompt Leakage | Key Vault for secrets | Output pattern monitoring | KQL response analysis |
| LLM08 Vector Weaknesses | CMK encryption + RBAC | Index write separation | Index modification alerts |
| LLM09 Misinformation | Grounding with citations | logprobs confidence scoring | N/A (app-level) |
| LLM10 Unbounded Consumption | APIM quotas + TPM limits | Azure budget alerts | Token consumption KQL |
Hardening Checklist
- [ ] Prompt Shields enabled on all RAG and agentic Azure OpenAI deployments
- [ ] Dedicated managed identities per agent with minimum required permissions: no shared hub MI for tool calls
- [ ] Human-in-the-loop gates implemented for all agent actions that create, modify, or delete resources
- [ ] Internal model registry (ACR) with Azure Policy denying deployments from external sources
- [ ] Blob versioning and immutability policies on all RAG grounding data containers
- [ ] Custom content filter blocklists configured with internal project names and sensitive identifiers
- [ ] No credentials or internal URLs in system prompts: all secrets resolved from Key Vault at runtime
- [ ] Azure AI Search index RBAC separated: read-only for applications, write for indexing pipeline only
- [ ] APIM deployed in front of Azure OpenAI with per-user rate limits and daily quotas
- [ ] Azure budget alerts configured for AI services resource groups with 80% threshold notifications
- [ ]
max_tokensset explicitly on every deployment to prevent token extraction attacks - [ ] Request logging enabled on all Azure OpenAI deployments for post-incident response content analysis
- [ ] KQL alerts deployed for token consumption spikes, tool call volume anomalies, and system prompt leakage patterns
Frequently Asked Questions
Why did prompt injection move from the number one spot to number two in the 2025 OWASP LLM Top 10?
Prompt injection remains a critical vulnerability, but real-world incident data from 2024 and 2025 showed that Excessive Agency caused more actual damage in production deployments. Prompt injection exploits typically require a secondary vulnerability (like excessive permissions or improper output handling) to cause significant harm. Excessive Agency, where an LLM agent has overly broad permissions and autonomously takes destructive actions, causes direct damage without needing another vulnerability in the chain. The OWASP team reordered the list based on observed impact severity rather than theoretical exploitability.
What is the "denial of wallet" attack described under LLM10 Unbounded Consumption?
Denial of wallet is a resource exhaustion attack specifically targeting consumption-based cloud services. An attacker sends crafted prompts designed to maximize token usage: extremely long context windows, prompts that trigger maximum-length outputs, or recursive patterns that cause repeated tool calls. Unlike traditional denial-of-service attacks that aim to make a service unavailable, denial of wallet aims to generate massive cloud bills. On Azure OpenAI with pay-per-token pricing, an unprotected endpoint can accumulate thousands of dollars in charges within hours. The mitigations are APIM rate limits per user or API key, TPM (tokens per minute) capacity limits on deployments, and Azure budget alerts with automatic notification at spending thresholds.
How do Vector and Embedding Weaknesses (LLM08) differ from Data Poisoning (LLM04)?
Data poisoning targets the content stored in your knowledge base, such as modifying documents in a RAG grounding data container. Vector and embedding weaknesses target the retrieval mechanism itself. An attacker crafting adversarial documents that produce embedding vectors close to specific target queries is manipulating which content gets retrieved, not the content itself. This means even if all your grounding documents are legitimate, an attacker can inject a new document specifically engineered to be retrieved for certain queries, effectively hijacking the RAG pipeline's relevance ranking. The defense requires both content integrity controls (data poisoning mitigations) and index access controls (vector weakness mitigations).
Why does System Prompt Leakage (LLM07) deserve its own category separate from Prompt Injection?
In the 2023 list, system prompt extraction was considered a variant of prompt injection. The 2025 update separates it because the impact profile is fundamentally different. Prompt injection aims to make the model do something unintended. System prompt leakage exposes information that enables other attacks: internal API endpoints, authentication credentials hardcoded in prompts, business logic rules, content policy workarounds, and role-based access patterns. A leaked system prompt is essentially an attacker's reconnaissance report for the entire application. Treating it as a separate risk category ensures organizations implement dedicated controls (Key Vault for secrets, output monitoring for prompt patterns) rather than relying solely on prompt injection defenses.
Recommended tool: Pluralsight
Level up your security skills with expert-led courses. Free 10-day trial, then access thousands of courses across cloud security, networking, and certifications.
Get weekly security insights
Cloud security, zero trust, and identity guides — straight to your inbox.
Continue Learning
AI Security Engineer Roadmap
The fastest-growing specialty in security.
Microsoft Cloud Solution Architect
Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.
Share this article
Questions & Answers
Related Articles
Need Help with Your Security?
Our team of security experts can help you implement the strategies discussed in this article.
Contact Us