MCP Server Hardening Case Study: Enterprise Security...

The Incident That Exposed the Architecture Gap

In Q1 2026, a 40-person engineering team at a financial services firm had been running Claude Code enterprise-wide for six weeks. MCP configuration was left to individual developers. Fifteen different MCP server configurations were running across developer workstations and CI/CD runners. Three of those configurations included a file system MCP server with root-level access. Two included an MCP server pointing to the internal secrets management API with no authentication beyond the developer's personal API key.

When a developer left the company, his CI/CD runner (still active, still running his MCP configuration) continued processing scheduled jobs. The runner had his credentials cached. Three weeks later, the firm's DLP system flagged an unusual pattern: 200MB of files from the internal document repository had been accessed from a CI/CD runner at 2 AM. The runner was using the file system MCP server, reading documents outside the project scope.

No malicious actor was involved. A scheduled pipeline had drifted. But the incident exposed what the [MCP server security guide](/blog/mcp-server-security-guide-2026) covers in theory: in a corporate environment, MCP servers are infrastructure, not developer tooling. They need the same controls as any privileged workload.

This article documents the architecture changes that team implemented over three weeks, including working Terraform, Azure API Management policy, Azure Policy definitions, and KQL detection queries.

---

The Target Architecture

The design goal: every MCP server in the corporate environment runs in a controlled, audited, network-isolated configuration. No developer runs an MCP server from a personal workstation with access to shared corporate resources.

The architecture has four layers:

Network isolation: MCP servers run in Azure Container Instances inside a dedicated subnet, not on developer workstations
Identity control: Each MCP server instance uses a dedicated user-assigned managed identity with minimal permissions, not developer personal credentials
Tool scope enforcement: Azure API Management authenticates every tool call with JWT validation and enforces per-client rate limits
Audit pipeline: All tool calls log to a central Log Analytics workspace, with KQL alerts for anomalous access patterns

---

Step 1: Network Isolation with Terraform

VNet Design

The MCP server subnet is isolated from both the developer VNet (where workstations and CI/CD runners are) and the production VNet (where APIs and databases live). MCP servers get controlled access to specific internal APIs via private endpoints. No broad network access.

resource "azurerm_virtual_network" "mcp" {
  name                = "vnet-mcp-${var.environment}"
  location            = var.location
  resource_group_name = var.resource_group_name
  address_space       = ["10.50.0.0/16"]
  tags                = var.common_tags
}

resource "azurerm_subnet" "mcp_servers" {
  name                 = "snet-mcp-servers"
  resource_group_name  = var.resource_group_name
  virtual_network_name = azurerm_virtual_network.mcp.name
  address_prefixes     = ["10.50.1.0/24"]

  delegation {
    name = "container-instances"
    service_delegation {
      name = "Microsoft.ContainerInstance/containerGroups"
      actions = ["Microsoft.Network/virtualNetworks/subnets/action"]
    }
  }
}

resource "azurerm_network_security_group" "mcp_servers" {
  name                = "nsg-mcp-servers"
  location            = var.location
  resource_group_name = var.resource_group_name

  security_rule {
    name                       = "AllowMCPFromCICDRunners"
    priority                   = 100
    direction                  = "Inbound"
    access                     = "Allow"
    protocol                   = "Tcp"
    source_port_range          = "*"
    destination_port_range     = "3000"
    source_address_prefix      = var.cicd_runner_subnet_cidr
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "DenyAllInbound"
    priority                   = 4096
    direction                  = "Inbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "*"
  }

  security_rule {
    name                       = "DenyInternetOutbound"
    priority                   = 200
    direction                  = "Outbound"
    access                     = "Deny"
    protocol                   = "*"
    source_port_range          = "*"
    destination_port_range     = "*"
    source_address_prefix      = "*"
    destination_address_prefix = "Internet"
  }
}

Container Instance Per MCP Server Type

Each MCP server type gets its own container instance with a dedicated managed identity. This prevents a compromised MCP server from using another server's credentials.

resource "azurerm_user_assigned_identity" "mcp_github" {
  name                = "id-mcp-github-${var.environment}"
  location            = var.location
  resource_group_name = var.resource_group_name
}

resource "azurerm_container_group" "mcp_github" {
  name                = "aci-mcp-github-${var.environment}"
  location            = var.location
  resource_group_name = var.resource_group_name
  ip_address_type     = "Private"
  subnet_ids          = [azurerm_subnet.mcp_servers.id]
  os_type             = "Linux"
  restart_policy      = "Always"

  identity {
    type         = "UserAssigned"
    identity_ids = [azurerm_user_assigned_identity.mcp_github.id]
  }

  container {
    name   = "mcp-github"
    image  = "${var.acr_login_server}/mcp-github:${var.mcp_github_version}"
    cpu    = "0.5"
    memory = "0.5"

    ports {
      port     = 3000
      protocol = "TCP"
    }

    environment_variables = {
      "LOG_ENDPOINT"    = var.log_analytics_endpoint
      "MCP_SERVER_NAME" = "github"
      "ALLOWED_ORGS"    = var.allowed_github_orgs
    }
  }

  image_registry_credential {
    server = var.acr_login_server
  }

  tags = merge(var.common_tags, { MCPServer = "true" })
}

resource "azurerm_role_assignment" "mcp_github_acr_pull" {
  scope                = var.acr_resource_id
  role_definition_name = "AcrPull"
  principal_id         = azurerm_user_assigned_identity.mcp_github.principal_id
}

The MCPServer = "true" tag is required for the Azure Policy enforcement in Step 3. Every MCP container group must carry this tag; the policy targets it.

---

Step 2: APIM Gateway with JWT Enforcement

Azure API Management sits in front of all MCP servers. It enforces JWT validation against Entra ID, per-client rate limiting, and logs every tool call request. No MCP server is reachable without going through APIM.

APIM Infrastructure

resource "azurerm_api_management" "mcp_gateway" {
  name                 = "apim-mcp-${var.environment}"
  location             = var.location
  resource_group_name  = var.resource_group_name
  publisher_name       = var.publisher_name
  publisher_email      = var.publisher_email
  sku_name             = "Developer_1"
  virtual_network_type = "Internal"

  virtual_network_configuration {
    subnet_id = azurerm_subnet.apim.id
  }

  identity {
    type = "SystemAssigned"
  }
}

Use Standard_1 in production. Developer_1 has no SLA and is not zone-redundant.

APIM Inbound Policy

The APIM policy validates the Entra ID JWT, enforces rate limits, and logs all tool calls to Event Hub:

<policies>
  <inbound>
    <validate-jwt header-name="Authorization" failed-validation-httpcode="401">
      <openid-config url="https://login.microsoftonline.com/<tenant-id>/v2.0/.well-known/openid-configuration"/>
      <required-claims>
        <claim name="aud" match="any">
          <value><mcp-app-client-id></value>
        </claim>
        <claim name="scp" match="any">
          <value>mcp.tools.read</value>
          <value>mcp.tools.write</value>
        </claim>
      </required-claims>
    </validate-jwt>
    <rate-limit-by-key calls="50" renewal-period="60"
      counter-key="@(context.Request.IpAddress)"/>
    <log-to-eventhub logger-id="mcp-audit-logger">
      @{
        return new JObject(
          new JProperty("timestamp", DateTime.UtcNow),
          new JProperty("caller", context.Request.Headers
            .GetValueOrDefault("X-MS-CLIENT-PRINCIPAL-ID", "unknown")),
          new JProperty("tool", context.Request.Url.Path),
          new JProperty("method", context.Request.Method),
          new JProperty("body", context.Request.Body.As<string>(preserveContent: true))
        ).ToString();
      }
    </log-to-eventhub>
  </inbound>
</policies>

Replace <tenant-id> and <mcp-app-client-id> with your Entra ID tenant and the app registration client ID for the MCP gateway.

Entra ID App Registration Scopes

The MCP gateway app registration in Entra ID defines scopes by tool category. Clients request only what they need:

Scope	Tools Available	Who Gets It
`mcp.tools.read`	File read, repo view, code search	All developer clients
`mcp.tools.write`	File write, PR creation, issue creation	Approved developer clients
`mcp.tools.admin`	Repo settings, webhook management	DevOps service principals only
`mcp.tools.secrets`	Secrets management API access	CI/CD pipeline service principals only

Enforce the secrets scope restriction with a Conditional Access policy: if the requesting principal is a user (not a service principal), and the requested scope includes mcp.tools.secrets, block the authentication. No developer should ever acquire the secrets scope interactively.

---

Step 3: Azure Policy for MCP Server Governance

Two Azure Policy definitions protect the MCP infrastructure from misconfiguration.

Policy 1: Require Managed Identity on All MCP Container Groups

resource "azurerm_policy_definition" "require_mcp_managed_identity" {
  name         = "require-mcp-managed-identity"
  policy_type  = "Custom"
  mode         = "All"
  display_name = "MCP server containers must use managed identity"

  policy_rule = jsonencode({
    if = {
      allOf = [
        {
          field  = "type"
          equals = "Microsoft.ContainerInstance/containerGroups"
        },
        {
          field  = "tags.MCPServer"
          exists = "true"
        },
        {
          anyOf = [
            {
              field  = "identity.type"
              exists = "false"
            },
            {
              field  = "identity.type"
              equals = "None"
            }
          ]
        }
      ]
    }
    then = {
      effect = "Deny"
    }
  })
}

A container group tagged MCPServer: true that lacks a managed identity gets denied at the ARM layer before it starts. This blocks the class of incident in the case study: a developer's local container configuration pointing at shared resources using personal credentials.

Policy 2: Deny MCP Servers Outside the Approved Subnet

resource "azurerm_policy_definition" "mcp_approved_subnet_only" {
  name         = "mcp-approved-subnet-only"
  policy_type  = "Custom"
  mode         = "All"
  display_name = "MCP server containers must run in approved subnet"

  policy_rule = jsonencode({
    if = {
      allOf = [
        {
          field  = "type"
          equals = "Microsoft.ContainerInstance/containerGroups"
        },
        {
          field  = "tags.MCPServer"
          exists = "true"
        },
        {
          field  = "Microsoft.ContainerInstance/containerGroups/subnetIds[*].id"
          notIn  = var.approved_mcp_subnet_ids
        }
      ]
    }
    then = {
      effect = "Deny"
    }
  })
}

This is the policy that would have blocked the original incident. Any MCP container group launched outside the approved subnet (for example, in the developer subnet where the CI/CD runner lived) is denied by policy before the container starts.

Assign both policies to the subscription scope, not just the MCP resource group. Developers may have access to other resource groups where they could attempt to launch containers.

---

Step 4: Audit Logging and KQL Detection

Log Schema

Every tool call processed by the APIM gateway logs a structured JSON event to a Log Analytics workspace via an Event Hub connector. The schema:

{
  "timestamp": "2026-05-17T14:32:11Z",
  "caller_oid": "9f3a2b1c-4d5e-6f7a-8b9c-0d1e2f3a4b5c",
  "caller_upn": "developer@contoso.com",
  "mcp_server": "github",
  "tool_name": "create_pull_request",
  "scope_used": "mcp.tools.write",
  "resource_accessed": "repos/contoso/backend-api",
  "latency_ms": 312,
  "response_status": 200,
  "apim_request_id": "abc123def456"
}

Service principal callers have caller_oid but caller_upn is empty. That distinction is the key for separating pipeline activity from developer activity in KQL.

KQL: Write Tool Calls Outside Business Hours

MCPAuditLog_CL
| where TimeGenerated > ago(24h)
| extend Hour = datetime_part("hour", TimeGenerated)
| where Hour < 7 or Hour > 20
| where tool_name_s startswith "create_"
      or tool_name_s startswith "update_"
      or tool_name_s startswith "delete_"
| where isnotempty(caller_upn_s)  // human user, not service principal
| project TimeGenerated, caller_upn_s, mcp_server_s, tool_name_s,
          resource_accessed_s, Hour
| order by TimeGenerated desc

Alert threshold: any write tool call outside 7 AM to 8 PM from a human user principal. Pipeline service principals legitimately run outside business hours; the isnotempty(caller_upn_s) filter separates them.

KQL: High-Volume File System Access

MCPAuditLog_CL
| where mcp_server_s == "filesystem"
| where tool_name_s in ("read_file", "list_directory", "search_files")
| summarize FilesAccessed = count(),
            UniqueDirectories = dcount(resource_accessed_s)
            by caller_oid_s, caller_upn_s, bin(TimeGenerated, 1h)
| where FilesAccessed > 100 or UniqueDirectories > 20
| project TimeGenerated, caller_upn_s, caller_oid_s,
          FilesAccessed, UniqueDirectories
| order by FilesAccessed desc

This is the detection that would have flagged the incident 3 weeks earlier than the DLP system. Alert threshold: more than 100 file reads or 20 unique directories in a 1-hour window. The 2 AM pipeline had read 847 files across 34 directories before the DLP caught it. This query fires at 101 files.

KQL: Repeated Authorization Failures (Scope Escalation Attempts)

MCPAuditLog_CL
| where response_status_d == 403
| summarize FailedAttempts = count()
            by caller_upn_s, tool_name_s, bin(TimeGenerated, 10m)
| where FailedAttempts > 5
| project TimeGenerated, caller_upn_s, tool_name_s, FailedAttempts
| order by FailedAttempts desc

Multiple 403s from the same caller on restricted tools indicate a client attempting to call tools beyond its authorized scope. In practice this fires for two reasons: misconfigured MCP client (legitimate, need to tune scope) and deliberate probe (needs investigation). Distinguish by looking at which tools are being probed: mcp.tools.admin or mcp.tools.secrets scope failures are higher severity.

---

Step 5: CI/CD Integration Without Stored Credentials

Pipelines authenticate to the MCP gateway using workload identity federation, not stored secrets. This follows the [federated credentials pattern for GitHub Actions and Entra ID](/blog/flexible-federated-identity-credentials-entra-github-terraform).

GitHub Actions Workflow

name: Deploy via MCP Gateway

on:
  push:
    branches: [main]

permissions:
  id-token: write
  contents: read

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Authenticate to Azure via OIDC
        uses: azure/login@v2
        with:
          client-id: ${{ secrets.AZURE_CLIENT_ID }}
          tenant-id: ${{ secrets.AZURE_TENANT_ID }}
          subscription-id: ${{ secrets.AZURE_SUBSCRIPTION_ID }}

      - name: Get MCP gateway token
        run: |
          MCP_TOKEN=$(az account get-access-token \
            --resource ${{ secrets.MCP_APP_CLIENT_ID }} \
            --query accessToken -o tsv)
          echo "::add-mask::${MCP_TOKEN}"
          echo "MCP_TOKEN=${MCP_TOKEN}" >> $GITHUB_ENV

      - name: Call MCP tool via authenticated gateway
        env:
          MCP_ENDPOINT: ${{ secrets.MCP_GATEWAY_ENDPOINT }}
        run: |
          curl -sf \
            -H "Authorization: Bearer ${MCP_TOKEN}" \
            -H "Content-Type: application/json" \
            "${MCP_ENDPOINT}/tools/create_pull_request" \
            -d '{"repo": "contoso/backend-api", "branch": "release/v2.1"}'

No client secrets anywhere. The Entra ID identity authenticates through OIDC, and the resulting token is scoped to mcp.tools.write only.

Federated Credential Terraform Configuration

resource "azurerm_user_assigned_identity" "mcp_pipeline" {
  name                = "id-mcp-pipeline-${var.environment}"
  location            = var.location
  resource_group_name = var.resource_group_name
}

resource "azurerm_federated_identity_credential" "mcp_pipeline_main" {
  name                = "github-actions-main-branch"
  resource_group_name = var.resource_group_name
  audience            = ["api://AzureADTokenExchange"]
  issuer              = "https://token.actions.githubusercontent.com"
  parent_id           = azurerm_user_assigned_identity.mcp_pipeline.id
  subject             = "repo:<github-org>/<github-repo>:ref:refs/heads/main"
}

The subject constraint limits the federated credential to tokens issued for the main branch only. A pull request branch cannot acquire this identity, which prevents feature branch pipelines from getting production-level MCP access.

---

What Changed After the Remediation

After implementing this architecture (three weeks elapsed: one week Terraform, one week APIM policy and Entra ID configuration, one week KQL tuning), the engineering team ran a 30-day comparison:

Metric	Before	After
MCP servers running with personal credentials	14	0
MCP tool calls logged and queryable	0%	100%
Policy violations blocked at ARM layer	N/A	3 (rogue container attempts)
Mean time to detect file system anomaly	21 days (DLP)	38 minutes (KQL alert)
Developer offboarding MCP cleanup steps	0	4 (documented checklist)

Three policy violation blocks in the first 30 days: two developers who tried to launch local MCP containers with the MCPServer tag pointing at a shared resource group, and one CI/CD runner template that hadn't been updated to the new subnet configuration.

The two KQL alerts that fired were both legitimate: one developer testing the file system MCP on a personal repo (tuned to allowlist), one service principal with an expiring certificate attempting re-authentication (certificate rotation accelerated to 90 days from the previous 365).

---

Developer Offboarding Checklist for MCP Access

When a developer leaves, four steps are now standard in the offboarding runbook:

Revoke the developer's Entra ID app registration consent for the MCP gateway app
Remove any federated credential subject entries referencing the developer's GitHub username from pipeline identities
Verify no active CI/CD runners carry the developer's personal API token as an environment variable (scan runner configs in GitHub Actions and Azure Pipelines)
Audit MCPAuditLog_CL for the developer's caller_upn for the past 90 days: confirm last activity matches expected patterns before their last day

None of these steps were in the original offboarding checklist. The incident audit revealed that step 3 is what allowed the drift: a cached environment variable on a runner that had never been cleaned up.

---

Hardening Checklist

[ ] No MCP servers running on developer workstations with access to shared corporate resources or production APIs
[ ] All MCP servers containerized in dedicated subnet (ACI or AKS) with user-assigned managed identity
[ ] MCPServer: true tag applied to every MCP container group resource
[ ] Azure Policy: require managed identity deployed and assigned at subscription scope
[ ] Azure Policy: approved subnet only deployed and assigned at subscription scope for MCPServer-tagged resources
[ ] APIM gateway deployed in front of all MCP servers enforcing JWT validation against Entra ID
[ ] Entra ID scopes granular: read / write / admin / secrets as separate scopes
[ ] Conditional Access policy: block human users from acquiring mcp.tools.secrets scope
[ ] APIM rate limit: 50 tool calls per minute per principal
[ ] All CI/CD pipelines use workload identity federation (OIDC) not stored client secrets
[ ] Federated credential subject constraints scoped to specific branches only (not wildcard)
[ ] All tool calls logging to Log Analytics workspace via Event Hub
[ ] KQL alert: write operations outside business hours from human user principals
[ ] KQL alert: file system access exceeding 100 reads or 20 unique directories per hour
[ ] KQL alert: repeated 403 responses from same caller on restricted tool scopes
[ ] Developer offboarding checklist includes MCP gateway consent revocation and runner credential audit

MCP Server Hardening Case Study: Locking Down a Corporate Dev Environment