{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "Why Most Enterprise AI Projects Stop at the Demo",
      "slug": "why-most-enterprise-ai-projects-stop-at-the-demo",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-02T02:53:47.107Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Why Most Enterprise AI Projects Stop at the Demo\n",
        "\n",
        "This notebook turns the blog post into a hands-on validation workflow in Python. It focuses on the core claim: enterprise AI usually stalls not because the model is weak, but because production requires governance, data readiness, observability, cost discipline, and compliance. You will simulate an internal AI gateway, validate fallback behavior, normalize responses, and estimate workflow costs."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q pandas matplotlib"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "import uuid\n",
        "import time\n",
        "import random\n",
        "from dataclasses import dataclass\n",
        "from typing import Dict, Any, List\n",
        "\n",
        "import pandas as pd\n",
        "import matplotlib.pyplot as plt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Architecture pattern: governed gateway in front of model access\n",
        "\n",
        "The blog argues that production AI should be treated as a governed interface rather than a direct model call from every application. The next cell captures the gateway architecture as structured text and parses the main components so you can validate the design in notebook form."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "gateway_architecture = r'''%% Purpose: Show the enterprise AI gateway architecture that prevents demo-only AI integrations\n",
        "C4Container\n",
        "    title Enterprise AI Gateway Pattern\n",
        "\n",
        "    Person(appUser, \"Business App\", \"Thin client used by employees\")\n",
        "    System_Boundary(ent, \"Enterprise Boundary\") {\n",
        "        Container(client, \"Thin App Client\", \"Python\", \"Sends prompts with trace metadata\")\n",
        "        Container(apim, \"AI Gateway\", \"Azure API Management\", \"Auth, quotas, policies, routing\")\n",
        "        Container(policy, \"Policy Layer\", \"Gateway Policies\", \"PII checks, logging, fallback rules\")\n",
        "        Container(obs, \"Observability\", \"App Insights / SIEM\", \"Tracing, audit, cost monitoring\")\n",
        "    }\n",
        "    System_Ext(providerA, \"Primary Model Provider\", \"Hosted LLM API\")\n",
        "    System_Ext(providerB, \"Fallback Model Provider\", \"Secondary LLM API\")\n",
        "\n",
        "    Rel(appUser, client, \"Uses\")\n",
        "    Rel(client, apim, \"POST /ai/chat\")\n",
        "    Rel(apim, policy, \"Enforces\")\n",
        "    Rel(policy, providerA, \"Routes request\")\n",
        "    Rel(policy, providerB, \"Fallback on failure\")\n",
        "    Rel(apim, obs, \"Emits logs, metrics, traces\")\n",
        "'''\n",
        "\n",
        "print(gateway_architecture)\n",
        "\n",
        "components = []\n",
        "for line in gateway_architecture.splitlines():\n",
        "    line = line.strip()\n",
        "    if line.startswith((\"Person(\", \"Container(\", \"System_Ext(\")):\n",
        "        components.append(line)\n",
        "\n",
        "print(\"\\nParsed components:\")\n",
        "for c in components:\n",
        "    print(\"-\", c)\n",
        "\n",
        "print(f\"\\nTotal architecture components identified: {len(components)}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Required variables and secrets\n",
        "\n",
        "The original blog uses an internal gateway URL and bearer token. For safe notebook validation, the next code cell uses mocked values and a local simulation instead of making a real network call.\n",
        "\n",
        "If you later connect this to a real service, you would typically need:\n",
        "- `GATEWAY_URL`\n",
        "- `INTERNAL_APP_TOKEN`\n",
        "- optional business metadata such as business unit and data classification"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Client pattern: call your internal AI API, not the provider directly\n",
        "\n",
        "This example preserves the intent of the blog's `urllib` sample but replaces the real HTTP call with a local gateway simulator. It validates the production pattern: attach correlation metadata, request a model profile, and let the gateway decide routing and policy."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "import uuid\n",
        "import time\n",
        "\n",
        "GATEWAY_URL = \"https://api.contoso.com/ai/chat\"\n",
        "INTERNAL_APP_TOKEN = \"demo-token\"\n",
        "\n",
        "telemetry_log: List[Dict[str, Any]] = []\n",
        "\n",
        "\n",
        "def simulate_gateway(request_payload: Dict[str, Any], headers: Dict[str, str]) -> Dict[str, Any]:\n",
        "    start = time.time()\n",
        "    correlation_id = headers.get(\"x-correlation-id\", str(uuid.uuid4()))\n",
        "\n",
        "    if not headers.get(\"Authorization\", \"\").startswith(\"Bearer \"):\n",
        "        raise PermissionError(\"Missing or invalid bearer token\")\n",
        "\n",
        "    if headers.get(\"x-data-classification\") not in {\"public\", \"internal\", \"confidential\"}:\n",
        "        raise ValueError(\"Unsupported data classification\")\n",
        "\n",
        "    user_message = request_payload[\"messages\"][0][\"content\"]\n",
        "    answer = f\"Summary for profile={request_payload['model_profile']}: {user_message}\"\n",
        "\n",
        "    response = {\n",
        "        \"answer\": answer,\n",
        "        \"provider\": \"primary-model\",\n",
        "        \"model\": \"gpt-sim-primary\",\n",
        "        \"correlation_id\": correlation_id,\n",
        "        \"fallback_used\": False,\n",
        "        \"latency_ms\": round((time.time() - start) * 1000, 2),\n",
        "    }\n",
        "\n",
        "    telemetry_log.append({\n",
        "        \"correlation_id\": correlation_id,\n",
        "        \"route\": \"primary\",\n",
        "        \"business_unit\": headers.get(\"x-business-unit\"),\n",
        "        \"classification\": headers.get(\"x-data-classification\"),\n",
        "        \"latency_ms\": response[\"latency_ms\"],\n",
        "    })\n",
        "    return response\n",
        "\n",
        "payload = {\n",
        "    \"messages\": [{\"role\": \"user\", \"content\": \"Summarize this contract in 3 bullets.\"}],\n",
        "    \"model_profile\": \"legal-summary\",\n",
        "    \"fallback_allowed\": True,\n",
        "}\n",
        "headers = {\n",
        "    \"Content-Type\": \"application/json\",\n",
        "    \"Authorization\": f\"Bearer {INTERNAL_APP_TOKEN}\",\n",
        "    \"x-correlation-id\": str(uuid.uuid4()),\n",
        "    \"x-business-unit\": \"legal\",\n",
        "    \"x-data-classification\": \"confidential\",\n",
        "}\n",
        "\n",
        "print(\"Target endpoint:\", GATEWAY_URL)\n",
        "print(\"Payload:\")\n",
        "print(json.dumps(payload, indent=2))\n",
        "print(\"\\nGateway response:\")\n",
        "result = simulate_gateway(payload, headers)\n",
        "print(json.dumps(result, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Request flow: policy, tracing, and fallback before the response reaches the app\n",
        "\n",
        "The blog emphasizes that teams often build the happy path first and postpone the control path. The next cell captures the sequence flow as text and then runs a Python simulation that shows authentication, policy checks, primary routing, fallback behavior, and centralized telemetry."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "request_flow = r'''%% Purpose: Show request flow with policy enforcement, tracing, and fallback before a response reaches the app\n",
        "sequenceDiagram\n",
        "    participant App as Thin Client\n",
        "    participant GW as AI Gateway\n",
        "    participant Obs as Observability\n",
        "    participant P1 as Primary Model\n",
        "    participant P2 as Fallback Model\n",
        "\n",
        "    App->>GW: POST /ai/chat + correlation-id + classification\n",
        "    GW->>Obs: Log request metadata and policy decision\n",
        "    GW->>GW: Validate auth, quota, content policy\n",
        "    GW->>P1: Forward approved request\n",
        "    alt Primary succeeds\n",
        "        P1-->>GW: Response\n",
        "    else Primary fails or times out\n",
        "        GW->>P2: Retry using fallback policy\n",
        "        P2-->>GW: Response\n",
        "    end\n",
        "    GW->>Obs: Emit latency, cost, and route used\n",
        "    GW-->>App: Normalized response\n",
        "'''\n",
        "\n",
        "print(request_flow)\n",
        "\n",
        "quota_counter: Dict[str, int] = {}\n",
        "\n",
        "\n",
        "def primary_provider(prompt: str) -> Dict[str, Any]:\n",
        "    if \"timeout\" in prompt.lower() or \"fail\" in prompt.lower():\n",
        "        raise TimeoutError(\"Primary provider timed out\")\n",
        "    return {\"provider\": \"primary-model\", \"model\": \"gpt-sim-primary\", \"text\": f\"Primary handled: {prompt}\"}\n",
        "\n",
        "\n",
        "def fallback_provider(prompt: str) -> Dict[str, Any]:\n",
        "    return {\"provider\": \"fallback-model\", \"model\": \"gpt-sim-fallback\", \"text\": f\"Fallback handled: {prompt}\"}\n",
        "\n",
        "\n",
        "def governed_chat(prompt: str, token: str, classification: str, fallback_allowed: bool = True) -> Dict[str, Any]:\n",
        "    correlation_id = str(uuid.uuid4())\n",
        "    start = time.time()\n",
        "\n",
        "    if not token:\n",
        "        raise PermissionError(\"Authentication required\")\n",
        "    if classification not in {\"public\", \"internal\", \"confidential\"}:\n",
        "        raise ValueError(\"Invalid classification\")\n",
        "\n",
        "    quota_counter[token] = quota_counter.get(token, 0) + 1\n",
        "    if quota_counter[token] > 5:\n",
        "        raise RuntimeError(\"Rate limit exceeded for token\")\n",
        "\n",
        "    route = \"primary\"\n",
        "    fallback_used = False\n",
        "    try:\n",
        "        provider_response = primary_provider(prompt)\n",
        "    except Exception:\n",
        "        if not fallback_allowed:\n",
        "            raise\n",
        "        provider_response = fallback_provider(prompt)\n",
        "        route = \"fallback\"\n",
        "        fallback_used = True\n",
        "\n",
        "    latency_ms = round((time.time() - start) * 1000, 2)\n",
        "    normalized = {\n",
        "        \"answer\": provider_response[\"text\"],\n",
        "        \"provider\": provider_response[\"provider\"],\n",
        "        \"model\": provider_response[\"model\"],\n",
        "        \"correlation_id\": correlation_id,\n",
        "        \"fallback_used\": fallback_used,\n",
        "        \"latency_ms\": latency_ms,\n",
        "        \"route\": route,\n",
        "    }\n",
        "    telemetry_log.append({\n",
        "        \"correlation_id\": correlation_id,\n",
        "        \"route\": route,\n",
        "        \"classification\": classification,\n",
        "        \"latency_ms\": latency_ms,\n",
        "    })\n",
        "    return normalized\n",
        "\n",
        "success_case = governed_chat(\"Summarize approved supplier terms.\", token=\"user-a\", classification=\"confidential\")\n",
        "fallback_case = governed_chat(\"Please fail over after timeout.\", token=\"user-a\", classification=\"confidential\")\n",
        "\n",
        "print(\"Success case:\")\n",
        "print(json.dumps(success_case, indent=2))\n",
        "print(\"\\nFallback case:\")\n",
        "print(json.dumps(fallback_case, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Normalize responses so provider changes do not break apps\n",
        "\n",
        "A key production practice is to keep provider-specific formats out of the application layer. The next cell uses a stable dataclass to validate the normalized result shape that applications should consume from the gateway."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from dataclasses import dataclass\n",
        "\n",
        "@dataclass\n",
        "class AiResult:\n",
        "    answer: str\n",
        "    provider: str\n",
        "    model: str\n",
        "    correlation_id: str\n",
        "    fallback_used: bool\n",
        "\n",
        "\n",
        "gateway_response = {\n",
        "    \"answer\": \"Three key obligations are payment, confidentiality, and termination notice.\",\n",
        "    \"provider\": \"azure-openai\",\n",
        "    \"model\": \"gpt-4o-mini\",\n",
        "    \"correlation_id\": \"9d2f4d8c\",\n",
        "    \"fallback_used\": False,\n",
        "}\n",
        "\n",
        "result = AiResult(**gateway_response)\n",
        "print(result.answer)\n",
        "print(f\"{result.provider}/{result.model} trace={result.correlation_id}\")\n",
        "\n",
        "provider_variant = {\n",
        "    \"answer\": \"Three key obligations are payment, confidentiality, and termination notice.\",\n",
        "    \"provider\": \"fallback-provider\",\n",
        "    \"model\": \"claude-sim\",\n",
        "    \"correlation_id\": \"alt-1234\",\n",
        "    \"fallback_used\": True,\n",
        "}\n",
        "\n",
        "result2 = AiResult(**provider_variant)\n",
        "print(result2)\n",
        "print(\"Stable fields preserved across providers:\", list(result2.__dict__.keys()))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Production-safe client behavior: timeout and graceful fallback UX\n",
        "\n",
        "The blog notes that production AI is judged by reliability under imperfect conditions, not just answer quality. The next cell simulates a client request with timeout handling and a user-friendly fallback message."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "import time\n",
        "\n",
        "\n",
        "def simulated_client_request(prompt: str, timeout_seconds: float = 0.5, simulated_latency: float = 0.1) -> str:\n",
        "    start = time.time()\n",
        "    try:\n",
        "        time.sleep(simulated_latency)\n",
        "        elapsed = time.time() - start\n",
        "        if elapsed > timeout_seconds:\n",
        "            raise TimeoutError(\"Gateway timeout\")\n",
        "        response = {\"answer\": f\"Generated response for: {prompt}\"}\n",
        "        return json.loads(json.dumps(response))[\"answer\"]\n",
        "    except TimeoutError:\n",
        "        return \"AI service is temporarily unavailable. Showing cached guidance instead.\"\n",
        "    except Exception:\n",
        "        return \"Gateway rejected request. Please retry later.\"\n",
        "\n",
        "print(\"Fast response:\")\n",
        "print(simulated_client_request(\"Draft a release note.\", timeout_seconds=0.5, simulated_latency=0.1))\n",
        "\n",
        "print(\"\\nSlow response:\")\n",
        "print(simulated_client_request(\"Draft a release note.\", timeout_seconds=0.2, simulated_latency=0.4))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Validate the demo trap with a simple enterprise readiness scorecard\n",
        "\n",
        "The article argues that demos avoid the hardest production questions: ownership, identity, quotas, traceability, legal region, cost per workflow, and fallback behavior. This cell turns those concerns into a checklist so you can score whether a project is still in demo mode."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "readiness_checks = {\n",
        "    \"system_owner_defined\": True,\n",
        "    \"authorized_data_scope\": False,\n",
        "    \"identity_enforced\": True,\n",
        "    \"timeout_strategy\": True,\n",
        "    \"usage_caps\": False,\n",
        "    \"audit_traceability\": True,\n",
        "    \"legal_region_defined\": False,\n",
        "    \"cost_per_workflow_known\": False,\n",
        "    \"fallback_path_defined\": True,\n",
        "}\n",
        "\n",
        "score_df = pd.DataFrame([\n",
        "    {\"control\": k, \"implemented\": v} for k, v in readiness_checks.items()\n",
        "])\n",
        "score_df[\"score\"] = score_df[\"implemented\"].astype(int)\n",
        "\n",
        "print(score_df)\n",
        "print(\"\\nEnterprise readiness score:\", f\"{score_df['score'].sum()}/{len(score_df)}\")\n",
        "\n",
        "ax = score_df.set_index(\"control\")[\"score\"].plot(kind=\"bar\", figsize=(10, 4), title=\"Enterprise AI Readiness Controls\")\n",
        "ax.set_ylabel(\"Implemented (1=yes, 0=no)\")\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Concrete example: procurement and operations use case under production constraints\n",
        "\n",
        "The blog describes a common pattern: a supplier-document demo looks impressive, then production reveals region restrictions, stale ERP data, incomplete metadata, legal traceability needs, finance caps, and security controls. The next cell simulates those constraints and shows why the model is not the real blocker."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "supplier_records = pd.DataFrame([\n",
        "    {\"supplier\": \"Northwind\", \"region\": \"EU\", \"erp_fresh\": True, \"metadata_complete\": True},\n",
        "    {\"supplier\": \"Contoso Parts\", \"region\": \"US\", \"erp_fresh\": False, \"metadata_complete\": True},\n",
        "    {\"supplier\": \"Fabrikam Metals\", \"region\": \"EU\", \"erp_fresh\": True, \"metadata_complete\": False},\n",
        "    {\"supplier\": \"Litware Goods\", \"region\": \"APAC\", \"erp_fresh\": True, \"metadata_complete\": True},\n",
        "])\n",
        "\n",
        "buyer_region = \"EU\"\n",
        "visible = supplier_records[supplier_records[\"region\"] == buyer_region].copy()\n",
        "visible[\"production_ready\"] = visible[\"erp_fresh\"] & visible[\"metadata_complete\"]\n",
        "\n",
        "print(\"Visible suppliers for buyer region:\")\n",
        "print(visible)\n",
        "\n",
        "print(\"\\nProduction blockers among visible suppliers:\")\n",
        "print(visible[~visible[\"production_ready\"]])\n",
        "\n",
        "summary = {\n",
        "    \"visible_suppliers\": int(len(visible)),\n",
        "    \"ready_for_ai_workflow\": int(visible[\"production_ready\"].sum()),\n",
        "    \"blocked_by_data_or_metadata\": int((~visible[\"production_ready\"]).sum()),\n",
        "}\n",
        "print(\"\\nSummary:\")\n",
        "print(summary)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Govern usage before it explodes: model the gateway policy in Python\n",
        "\n",
        "The blog includes PowerShell for Azure API Management named values, policy, and product-level governance. Since this notebook is Python-first, the next cell models the same ideas as Python configuration objects: backend indirection, header validation, rate limiting, correlation propagation, and subscription-based access."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "apim_config = {\n",
        "    \"named_values\": {\n",
        "        \"primary-llm-endpoint\": \"https://primary-llm.openai.azure.com\"\n",
        "    },\n",
        "    \"policy\": {\n",
        "        \"required_headers\": [\"Authorization\"],\n",
        "        \"rate_limit_calls\": 60,\n",
        "        \"renewal_period_seconds\": 60,\n",
        "        \"propagate_correlation_id\": True,\n",
        "        \"backend_named_value\": \"primary-llm-endpoint\",\n",
        "    },\n",
        "    \"product\": {\n",
        "        \"product_id\": \"internal-ai\",\n",
        "        \"title\": \"Internal AI Gateway\",\n",
        "        \"subscription_required\": True,\n",
        "        \"state\": \"published\",\n",
        "    }\n",
        "}\n",
        "\n",
        "print(json.dumps(apim_config, indent=2))\n",
        "\n",
        "\n",
        "def validate_policy(headers: Dict[str, str], config: Dict[str, Any]) -> Dict[str, Any]:\n",
        "    missing = [h for h in config[\"policy\"][\"required_headers\"] if h not in headers]\n",
        "    if missing:\n",
        "        return {\"allowed\": False, \"reason\": f\"Missing headers: {missing}\"}\n",
        "\n",
        "    correlation_id = headers.get(\"x-correlation-id\", str(uuid.uuid4()))\n",
        "    backend = config[\"named_values\"][config[\"policy\"][\"backend_named_value\"]]\n",
        "    return {\n",
        "        \"allowed\": True,\n",
        "        \"correlation_id\": correlation_id,\n",
        "        \"backend\": backend,\n",
        "        \"subscription_required\": config[\"product\"][\"subscription_required\"],\n",
        "    }\n",
        "\n",
        "print(\"\\nPolicy validation:\")\n",
        "print(validate_policy({\"Authorization\": \"Bearer demo\"}, apim_config))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Treat data readiness as a prerequisite\n",
        "\n",
        "The article stresses that curated demo data hides the real problems: stale records, conflicting schemas, permissions boundaries, inconsistent metadata, and unclear lineage. The next cell creates a small data-readiness review across the five dimensions called out in the post."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "data_readiness = pd.DataFrame([\n",
        "    {\"dimension\": \"Source quality\", \"status\": \"partial\", \"risk\": \"ERP has stale records\"},\n",
        "    {\"dimension\": \"Access patterns\", \"status\": \"weak\", \"risk\": \"Region-based access not consistently enforced\"},\n",
        "    {\"dimension\": \"Freshness\", \"status\": \"partial\", \"risk\": \"Daily sync misses urgent updates\"},\n",
        "    {\"dimension\": \"Authorization boundaries\", \"status\": \"weak\", \"risk\": \"Supplier visibility rules unclear\"},\n",
        "    {\"dimension\": \"Lineage\", \"status\": \"partial\", \"risk\": \"Contract metadata origin not fully tracked\"},\n",
        "])\n",
        "\n",
        "status_score = {\"strong\": 3, \"partial\": 2, \"weak\": 1}\n",
        "data_readiness[\"score\"] = data_readiness[\"status\"].map(status_score)\n",
        "print(data_readiness)\n",
        "print(\"\\nAverage readiness score:\", round(data_readiness[\"score\"].mean(), 2), \"/ 3\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Build cost discipline into the design\n",
        "\n",
        "A major theme of the blog is that teams often cannot explain unit economics per workflow. The next cell creates a simple worksheet for request volume, token usage, fallback frequency, human review, and total cost per successful business outcome."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "cost_inputs = {\n",
        "    \"requests_per_month\": 10000,\n",
        "    \"avg_prompt_tokens\": 1200,\n",
        "    \"avg_completion_tokens\": 400,\n",
        "    \"token_cost_per_1k\": 0.004,\n",
        "    \"fallback_rate\": 0.08,\n",
        "    \"fallback_multiplier\": 1.4,\n",
        "    \"observability_cost_per_request\": 0.001,\n",
        "    \"human_review_rate\": 0.05,\n",
        "    \"human_review_cost\": 1.25,\n",
        "}\n",
        "\n",
        "base_token_units = cost_inputs[\"requests_per_month\"] * (cost_inputs[\"avg_prompt_tokens\"] + cost_inputs[\"avg_completion_tokens\"]) / 1000\n",
        "base_token_cost = base_token_units * cost_inputs[\"token_cost_per_1k\"]\n",
        "fallback_cost = base_token_cost * cost_inputs[\"fallback_rate\"] * (cost_inputs[\"fallback_multiplier\"] - 1)\n",
        "observability_cost = cost_inputs[\"requests_per_month\"] * cost_inputs[\"observability_cost_per_request\"]\n",
        "human_review_cost = cost_inputs[\"requests_per_month\"] * cost_inputs[\"human_review_rate\"] * cost_inputs[\"human_review_cost\"]\n",
        "total_cost = base_token_cost + fallback_cost + observability_cost + human_review_cost\n",
        "cost_per_request = total_cost / cost_inputs[\"requests_per_month\"]\n",
        "\n",
        "cost_breakdown = pd.DataFrame([\n",
        "    {\"component\": \"Base token cost\", \"monthly_cost\": round(base_token_cost, 2)},\n",
        "    {\"component\": \"Fallback overhead\", \"monthly_cost\": round(fallback_cost, 2)},\n",
        "    {\"component\": \"Observability\", \"monthly_cost\": round(observability_cost, 2)},\n",
        "    {\"component\": \"Human review\", \"monthly_cost\": round(human_review_cost, 2)},\n",
        "    {\"component\": \"Total\", \"monthly_cost\": round(total_cost, 2)},\n",
        "])\n",
        "\n",
        "print(cost_breakdown)\n",
        "print(\"\\nEstimated cost per request:\", round(cost_per_request, 4))\n",
        "\n",
        "ax = cost_breakdown.set_index(\"component\")[\"monthly_cost\"].plot(kind=\"bar\", figsize=(8, 4), title=\"Monthly AI Workflow Cost Breakdown\")\n",
        "ax.set_ylabel(\"USD\")\n",
        "plt.tight_layout()\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Design for compliance and deployment constraints early\n",
        "\n",
        "The blog warns that many AI projects are blocked less by technical feasibility than by where the system is legally and operationally allowed to run. The next cell evaluates a few example workloads against region, sovereignty, and cross-border movement constraints."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "deployment_constraints = pd.DataFrame([\n",
        "    {\"workload\": \"Legal contract summary\", \"allowed_region\": \"EU\", \"requested_region\": \"EU\", \"cross_border_allowed\": False, \"isolated_env_required\": False},\n",
        "    {\"workload\": \"Public marketing draft\", \"allowed_region\": \"US\", \"requested_region\": \"US\", \"cross_border_allowed\": True, \"isolated_env_required\": False},\n",
        "    {\"workload\": \"Defense maintenance assistant\", \"allowed_region\": \"LOCAL\", \"requested_region\": \"EU\", \"cross_border_allowed\": False, \"isolated_env_required\": True},\n",
        "])\n",
        "\n",
        "def compliance_status(row):\n",
        "    if row[\"isolated_env_required\"] and row[\"requested_region\"] != \"LOCAL\":\n",
        "        return \"blocked\"\n",
        "    if not row[\"cross_border_allowed\"] and row[\"requested_region\"] != row[\"allowed_region\"]:\n",
        "        return \"blocked\"\n",
        "    return \"allowed\"\n",
        "\n",
        "deployment_constraints[\"status\"] = deployment_constraints.apply(compliance_status, axis=1)\n",
        "print(deployment_constraints)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Be more careful with agents than with chatbots\n",
        "\n",
        "The article argues that agents widen the gap between demo and production because they introduce tool misuse, runaway loops, permission sprawl, hidden retries, and side effects. The next cell compares a bounded agent design with an over-permissioned one using a simple risk scoring model."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "agent_patterns = pd.DataFrame([\n",
        "    {\"pattern\": \"Bounded procurement agent\", \"tools\": 2, \"approval_required\": True, \"objective_scope\": \"narrow\", \"observable\": True},\n",
        "    {\"pattern\": \"Open-ended enterprise agent\", \"tools\": 8, \"approval_required\": False, \"objective_scope\": \"broad\", \"observable\": False},\n",
        "])\n",
        "\n",
        "\n",
        "def risk_score(row):\n",
        "    score = 0\n",
        "    score += row[\"tools\"]\n",
        "    score += 0 if row[\"approval_required\"] else 3\n",
        "    score += 1 if row[\"objective_scope\"] == \"narrow\" else 4\n",
        "    score += 0 if row[\"observable\"] else 3\n",
        "    return score\n",
        "\n",
        "agent_patterns[\"risk_score\"] = agent_patterns.apply(risk_score, axis=1)\n",
        "agent_patterns[\"risk_level\"] = pd.cut(agent_patterns[\"risk_score\"], bins=[0, 5, 10, 20], labels=[\"low\", \"medium\", \"high\"])\n",
        "print(agent_patterns)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## What senior engineers should do differently\n",
        "\n",
        "The post closes with a practical standard: put API management in front of model access, treat prompts as code plus configuration, review data readiness before agent design, require unit economics, design for constrained environments, normalize interfaces, plan fallback, and instrument everything. The next cell turns that guidance into an actionable checklist."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "engineering_standard = pd.DataFrame([\n",
        "    {\"practice\": \"API gateway in front of model access\", \"priority\": \"high\"},\n",
        "    {\"practice\": \"Prompts treated as code + configuration\", \"priority\": \"high\"},\n",
        "    {\"practice\": \"Data readiness review before agent design\", \"priority\": \"high\"},\n",
        "    {\"practice\": \"Unit economics per scenario\", \"priority\": \"high\"},\n",
        "    {\"practice\": \"Constrained environment design early\", \"priority\": \"medium\"},\n",
        "    {\"practice\": \"Normalized interfaces and fallback\", \"priority\": \"high\"},\n",
        "    {\"practice\": \"Full instrumentation\", \"priority\": \"high\"},\n",
        "])\n",
        "print(engineering_standard)\n",
        "print(\"\\nHigh-priority practices:\")\n",
        "print(engineering_standard[engineering_standard['priority'] == 'high']['practice'].tolist())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "This notebook validated the blog's central point: the demo is usually the easy part, while production AI depends on governance, data quality, cost control, observability, fallback design, and compliance-aware deployment. The hands-on examples showed how an internal gateway, normalized response contract, readiness scoring, and cost worksheet make those concerns explicit.\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "- Replace the simulated gateway with a real internal API.\n",
        "- Externalize configuration and secrets with environment variables or a secret manager.\n",
        "- Add real telemetry, quotas, and policy enforcement.\n",
        "- Extend the cost worksheet with your actual token, storage, and review costs.\n",
        "- Run a formal data-readiness and compliance review before scaling any pilot."
      ]
    }
  ]
}