{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "From experimentation to operations: what weekend-built AI data platforms teach us about production readiness",
      "slug": "from-experimentation-to-operations-what-weekend-built-ai-dat",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-05T21:54:12.722Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# From experimentation to operations: what weekend-built AI data platforms teach us about production readiness\n",
        "\n",
        "This notebook turns the blog post into hands-on validation steps you can run and inspect. It focuses on the core production-readiness gap between a convincing prototype and a promotable AI/data workload: identity, least privilege, telemetry, governance gates, and cost-aware operational discipline.\n",
        "\n",
        "The examples are intentionally simple. They are designed to help you validate control intent and failure modes, not to represent a complete enterprise implementation."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install pandas"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "import time\n",
        "import uuid\n",
        "import logging\n",
        "from dataclasses import dataclass, asdict\n",
        "from typing import Dict, List, Any\n",
        "\n",
        "import pandas as pd"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Prototype vs production architecture map\n",
        "\n",
        "The blog contrasts a weekend prototype with a production-ready platform. This Python cell converts that contrast into a simple graph structure you can inspect as data, making the control gaps explicit: secrets, shared roles, and print debugging on one side; managed identity, scoped access, telemetry, and policy gates on the other."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "prototype_edges = [\n",
        "    (\"Weekend prototype\", \"Hardcoded secrets\"),\n",
        "    (\"Weekend prototype\", \"Single shared database role\"),\n",
        "    (\"Weekend prototype\", \"Print-based debugging\"),\n",
        "    (\"Hardcoded secrets\", \"Security review fails\"),\n",
        "    (\"Single shared database role\", \"No least-privilege boundary\"),\n",
        "    (\"Print-based debugging\", \"No operational telemetry\"),\n",
        "]\n",
        "\n",
        "production_edges = [\n",
        "    (\"Production-ready platform\", \"Managed identity\"),\n",
        "    (\"Production-ready platform\", \"Scoped RBAC and data-plane roles\"),\n",
        "    (\"Production-ready platform\", \"Structured logs, traces, metrics\"),\n",
        "    (\"Production-ready platform\", \"Policy gates before promotion\"),\n",
        "    (\"Managed identity\", \"Secretless auth\"),\n",
        "    (\"Scoped RBAC and data-plane roles\", \"Controlled blast radius\"),\n",
        "    (\"Structured logs, traces, metrics\", \"Faster incident response\"),\n",
        "    (\"Policy gates before promotion\", \"Repeatable governance\"),\n",
        "]\n",
        "\n",
        "edges_df = pd.DataFrame(prototype_edges + production_edges, columns=[\"from\", \"to\"])\n",
        "edges_df"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Anti-pattern: prototype code that looks useful but fails production review\n",
        "\n",
        "This example demonstrates the exact combination the blog warns about: hardcoded secrets, broad scope, and ad-hoc logging. Run it to see how easily sensitive values leak into logs and how little operational structure exists for tracing or governance."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Anti-pattern: a prototype couples secrets, broad access, and ad-hoc logging\n",
        "import json\n",
        "import time\n",
        "\n",
        "SQL_CONNECTION = \"Server=tcp:demo.database;User Id=admin;Password=SuperSecret!\"\n",
        "AGENT_API_KEY = \"sk-dev-token\"\n",
        "TOOL_SCOPE = \"all-datasets\"\n",
        "\n",
        "def run_prototype(question: str) -> str:\n",
        "    print(f\"[debug] connecting with {SQL_CONNECTION}\")\n",
        "    print(f\"[debug] calling agent with key={AGENT_API_KEY} scope={TOOL_SCOPE}\")\n",
        "    time.sleep(0.1)\n",
        "    result = {\"question\": question, \"answer\": \"prototype output\", \"source\": \"shared-prod-db\"}\n",
        "    print(\"[debug] result=\", result)\n",
        "    return json.dumps(result)\n",
        "\n",
        "print(run_prototype(\"Which customers are at risk?\"))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Production pattern: managed identity, scoped access, and structured telemetry\n",
        "\n",
        "This version replaces embedded credentials with an explicit request context and structured logging. It still simplifies real enforcement, but it demonstrates the production intent the blog argues for: secretless identity, scoped roles, and traceable execution."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Production pattern: managed identity, scoped access, and structured telemetry\n",
        "import json\n",
        "import logging\n",
        "from dataclasses import dataclass\n",
        "\n",
        "logging.basicConfig(level=logging.INFO, format=\"%(message)s\")\n",
        "logger = logging.getLogger(\"ai-platform\")\n",
        "logger.handlers.clear()\n",
        "stream_handler = logging.StreamHandler()\n",
        "stream_handler.setFormatter(logging.Formatter(\"%(message)s\"))\n",
        "logger.addHandler(stream_handler)\n",
        "logger.setLevel(logging.INFO)\n",
        "\n",
        "@dataclass\n",
        "class RequestContext:\n",
        "    principal_id: str\n",
        "    db_role: str\n",
        "    tool_scope: str\n",
        "    trace_id: str\n",
        "\n",
        "def run_production(question: str, ctx: RequestContext) -> str:\n",
        "    logger.info(json.dumps({\"event\": \"auth\", \"mode\": \"managed_identity\", \"principal\": ctx.principal_id, \"trace_id\": ctx.trace_id}))\n",
        "    logger.info(json.dumps({\"event\": \"authorize\", \"db_role\": ctx.db_role, \"tool_scope\": ctx.tool_scope, \"trace_id\": ctx.trace_id}))\n",
        "    result = {\"question\": question, \"answer\": \"production output\", \"source\": \"curated-view\", \"trace_id\": ctx.trace_id}\n",
        "    logger.info(json.dumps({\"event\": \"completed\", \"status\": \"ok\", \"trace_id\": ctx.trace_id}))\n",
        "    return json.dumps(result)\n",
        "\n",
        "context = RequestContext(\"mi://analytics-app\", \"db_datareader_curated\", \"customer-risk-read\", \"trace-001\")\n",
        "print(run_production(\"Which customers are at risk?\", context))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Validate the difference between unsafe and safer execution\n",
        "\n",
        "The next cell runs both patterns and compares their outputs in a tabular form. This makes the production-readiness gap visible as observable behavior: secret leakage, broad scope, and missing traceability versus scoped context and structured events."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "prototype_output = json.loads(run_prototype(\"Which customers are at risk?\"))\n",
        "production_output = json.loads(run_production(\"Which customers are at risk?\", context))\n",
        "\n",
        "comparison = pd.DataFrame([\n",
        "    {\n",
        "        \"mode\": \"prototype\",\n",
        "        \"source\": prototype_output[\"source\"],\n",
        "        \"has_trace_id\": \"trace_id\" in prototype_output,\n",
        "        \"scope\": TOOL_SCOPE,\n",
        "        \"secret_handling\": \"hardcoded\",\n",
        "        \"logging_style\": \"print\"\n",
        "    },\n",
        "    {\n",
        "        \"mode\": \"production\",\n",
        "        \"source\": production_output[\"source\"],\n",
        "        \"has_trace_id\": \"trace_id\" in production_output,\n",
        "        \"scope\": context.tool_scope,\n",
        "        \"secret_handling\": \"managed identity pattern\",\n",
        "        \"logging_style\": \"structured\"\n",
        "    }\n",
        "])\n",
        "comparison"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Promotion workflow as executable data\n",
        "\n",
        "The blog describes a promotion sequence where CI/CD submits a workload, policy validates identity and diagnostics, and promotion either proceeds or returns remediation steps. This cell models that sequence as structured events so you can inspect the operational flow without relying on Mermaid rendering."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "promotion_sequence = [\n",
        "    {\"step\": 1, \"actor\": \"Builder\", \"action\": \"Submit workload definition\", \"target\": \"CI/CD\"},\n",
        "    {\"step\": 2, \"actor\": \"CI/CD\", \"action\": \"Validate identity, diagnostics, API registration\", \"target\": \"Policy Gate\"},\n",
        "    {\"step\": 3, \"actor\": \"Policy Gate\", \"action\": \"Pass or fail with reasons\", \"target\": \"CI/CD\"},\n",
        "    {\"step\": 4, \"actor\": \"CI/CD\", \"action\": \"Request promotion if passed\", \"target\": \"Platform Team\"},\n",
        "    {\"step\": 5, \"actor\": \"Platform Team\", \"action\": \"Deploy with managed identity and telemetry\", \"target\": \"AI/Data Runtime\"},\n",
        "    {\"step\": 6, \"actor\": \"AI/Data Runtime\", \"action\": \"Emit health and audit signals\", \"target\": \"Platform Team\"},\n",
        "    {\"step\": 7, \"actor\": \"CI/CD\", \"action\": \"Block promotion and return remediation if failed\", \"target\": \"Builder\"},\n",
        "]\n",
        "\n",
        "pd.DataFrame(promotion_sequence)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance gate: require managed identity before promotion\n",
        "\n",
        "The original post used PowerShell. Here, the same policy intent is implemented in Python so it can run directly in this notebook. The gate blocks promotion unless the workload identity type is system-assigned or user-assigned."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def check_managed_identity(workload: Dict[str, Any]) -> Dict[str, Any]:\n",
        "    allowed = {\"SystemAssigned\", \"UserAssigned\"}\n",
        "    if workload.get(\"IdentityType\") not in allowed:\n",
        "        raise ValueError(f\"Promotion blocked: workload '{workload.get('Name')}' must use managed identity.\")\n",
        "    return {\n",
        "        \"Workload\": workload.get(\"Name\"),\n",
        "        \"Check\": \"ManagedIdentity\",\n",
        "        \"Result\": \"Passed\",\n",
        "        \"Target\": workload.get(\"Environment\")\n",
        "    }\n",
        "\n",
        "workload_ok = {\n",
        "    \"Name\": \"ai-risk-service\",\n",
        "    \"IdentityType\": \"SystemAssigned\",\n",
        "    \"Environment\": \"preprod\"\n",
        "}\n",
        "\n",
        "workload_bad = {\n",
        "    \"Name\": \"ai-risk-service\",\n",
        "    \"IdentityType\": \"None\",\n",
        "    \"Environment\": \"preprod\"\n",
        "}\n",
        "\n",
        "print(check_managed_identity(workload_ok))\n",
        "try:\n",
        "    check_managed_identity(workload_bad)\n",
        "except Exception as e:\n",
        "    print(str(e))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance gate: require diagnostics for logs, metrics, and traces\n",
        "\n",
        "Observability is a core production boundary in the blog. This Python gate enforces the same rule as the PowerShell example: if logs, metrics, or trace export are missing, promotion is blocked with explicit remediation feedback."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def check_diagnostics(diagnostics: Dict[str, Any]) -> str:\n",
        "    missing: List[str] = []\n",
        "    if not diagnostics.get(\"LogsEnabled\"):\n",
        "        missing.append(\"logs\")\n",
        "    if not diagnostics.get(\"MetricsEnabled\"):\n",
        "        missing.append(\"metrics\")\n",
        "    if not str(diagnostics.get(\"TraceExport\", \"\")).strip():\n",
        "        missing.append(\"trace export\")\n",
        "    if missing:\n",
        "        raise ValueError(f\"Promotion blocked: missing diagnostic controls: {', '.join(missing)}.\")\n",
        "    return f\"Diagnostics check passed for {diagnostics.get('ResourceName')}\"\n",
        "\n",
        "diagnostics_ok = {\n",
        "    \"ResourceName\": \"ai-risk-service\",\n",
        "    \"LogsEnabled\": True,\n",
        "    \"MetricsEnabled\": True,\n",
        "    \"TraceExport\": \"OpenTelemetry\"\n",
        "}\n",
        "\n",
        "diagnostics_bad = {\n",
        "    \"ResourceName\": \"ai-risk-service\",\n",
        "    \"LogsEnabled\": True,\n",
        "    \"MetricsEnabled\": False,\n",
        "    \"TraceExport\": \"\"\n",
        "}\n",
        "\n",
        "print(check_diagnostics(diagnostics_ok))\n",
        "try:\n",
        "    check_diagnostics(diagnostics_bad)\n",
        "except Exception as e:\n",
        "    print(str(e))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance gate: require approved API surface registration for tools and agents\n",
        "\n",
        "The blog argues that APIs, models, tools, and agents must be governed together. This Python version of the API-surface gate blocks promotion when a workload registers callable interfaces that are not in the approved catalog."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def check_api_surface(api_surface: Dict[str, Any]) -> Dict[str, Any]:\n",
        "    registered = api_surface.get(\"RegisteredApis\", [])\n",
        "    approved = set(api_surface.get(\"ApprovedCatalog\", []))\n",
        "    unapproved = [api for api in registered if api not in approved]\n",
        "    if unapproved:\n",
        "        raise ValueError(f\"Promotion blocked: unapproved API surface detected: {', '.join(unapproved)}.\")\n",
        "    return {\n",
        "        \"Workload\": api_surface.get(\"WorkloadName\"),\n",
        "        \"Check\": \"ApiSurfaceRegistration\",\n",
        "        \"Result\": \"Passed\",\n",
        "        \"Count\": len(registered)\n",
        "    }\n",
        "\n",
        "api_ok = {\n",
        "    \"WorkloadName\": \"ai-risk-service\",\n",
        "    \"RegisteredApis\": [\"customer-risk-read\", \"curated-sql-read\"],\n",
        "    \"ApprovedCatalog\": [\"customer-risk-read\", \"curated-sql-read\", \"feature-store-read\"]\n",
        "}\n",
        "\n",
        "api_bad = {\n",
        "    \"WorkloadName\": \"ai-risk-service\",\n",
        "    \"RegisteredApis\": [\"customer-risk-read\", \"delete-customer-record\"],\n",
        "    \"ApprovedCatalog\": [\"customer-risk-read\", \"curated-sql-read\", \"feature-store-read\"]\n",
        "}\n",
        "\n",
        "print(check_api_surface(api_ok))\n",
        "try:\n",
        "    check_api_surface(api_bad)\n",
        "except Exception as e:\n",
        "    print(str(e))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Readiness scorecard: small set of controls that are hard to waive\n",
        "\n",
        "This scorecard translates the blog's production bar into a compact readiness signal. It is intentionally simple: identity, least privilege, telemetry, and policy gate coverage. The goal is to make promotion criteria easy to understand and difficult to bypass."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "# Readiness scorecard: translate prototype lessons into production criteria\n",
        "from dataclasses import dataclass, asdict\n",
        "import json\n",
        "\n",
        "@dataclass\n",
        "class ReadinessScore:\n",
        "    identity: bool\n",
        "    least_privilege: bool\n",
        "    telemetry: bool\n",
        "    policy_gate: bool\n",
        "\n",
        "    def summary(self) -> str:\n",
        "        passed = sum(asdict(self).values())\n",
        "        return json.dumps({\"passed_controls\": passed, \"total_controls\": 4, \"ready\": passed == 4})\n",
        "\n",
        "prototype = ReadinessScore(False, False, False, False)\n",
        "production = ReadinessScore(True, True, True, True)\n",
        "partial = ReadinessScore(True, True, False, True)\n",
        "\n",
        "print(\"prototype:\", prototype.summary())\n",
        "print(\"partial:\", partial.summary())\n",
        "print(\"production:\", production.summary())"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Promotion checklist as a decision graph\n",
        "\n",
        "The blog ends with a promotion checklist: identity present, access scoped, telemetry enabled, governance passed. This cell represents that checklist as edges in a simple graph-like table so you can validate the dependency chain from prototype to production-ready workload."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "checklist_edges = [\n",
        "    (\"Prototype notebook or script\", \"Promotion checklist\"),\n",
        "    (\"Promotion checklist\", \"Managed identity\"),\n",
        "    (\"Promotion checklist\", \"Curated roles and approved tools\"),\n",
        "    (\"Promotion checklist\", \"Logs, metrics, traces\"),\n",
        "    (\"Promotion checklist\", \"Policy-compliant deployment\"),\n",
        "    (\"Managed identity\", \"Production-ready AI/data workload\"),\n",
        "    (\"Curated roles and approved tools\", \"Production-ready AI/data workload\"),\n",
        "    (\"Logs, metrics, traces\", \"Production-ready AI/data workload\"),\n",
        "    (\"Policy-compliant deployment\", \"Production-ready AI/data workload\"),\n",
        "]\n",
        "\n",
        "pd.DataFrame(checklist_edges, columns=[\"from\", \"to\"])"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## End-to-end validation: combine gates into a promotability decision\n",
        "\n",
        "This final executable example turns the blog's argument into a single promotion function. A workload is promotable only if identity, diagnostics, API surface, and readiness score all pass. This is a practical way to validate the difference between a demo that works and a workload that is governable."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def evaluate_promotion(workload: Dict[str, Any], diagnostics: Dict[str, Any], api_surface: Dict[str, Any], score: ReadinessScore) -> Dict[str, Any]:\n",
        "    results = []\n",
        "    try:\n",
        "        results.append(check_managed_identity(workload))\n",
        "        results.append({\"Check\": \"Diagnostics\", \"Result\": check_diagnostics(diagnostics)})\n",
        "        results.append(check_api_surface(api_surface))\n",
        "        score_summary = json.loads(score.summary())\n",
        "        if not score_summary[\"ready\"]:\n",
        "            raise ValueError(f\"Promotion blocked: readiness score incomplete: {score_summary}\")\n",
        "        results.append({\"Check\": \"ReadinessScore\", \"Result\": \"Passed\", \"Details\": score_summary})\n",
        "        return {\"promotable\": True, \"results\": results}\n",
        "    except Exception as e:\n",
        "        return {\"promotable\": False, \"error\": str(e), \"results\": results}\n",
        "\n",
        "candidate_good = evaluate_promotion(workload_ok, diagnostics_ok, api_ok, production)\n",
        "candidate_bad = evaluate_promotion(workload_bad, diagnostics_bad, api_bad, partial)\n",
        "\n",
        "print(json.dumps(candidate_good, indent=2))\n",
        "print(json.dumps(candidate_bad, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "A weekend build can validate demand, simplify workflows, and reveal useful integration points. It cannot, by itself, prove supportability, bounded risk, cost discipline, or governance readiness.\n",
        "\n",
        "The practical production bar from this notebook is clear: use managed identity or equivalent secretless patterns, enforce least privilege, require logs/metrics/traces, govern APIs and tools together, and block promotion when those controls are missing.\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "- Replace any hardcoded credentials in prototype code with managed identity or a secretless equivalent.\n",
        "- Add structured telemetry with trace continuity across prompts, tools, and downstream actions.\n",
        "- Implement promotion gates for identity, diagnostics, and approved API/tool surfaces in CI/CD.\n",
        "- Define a lightweight readiness scorecard that leadership can understand and teams cannot casually waive.\n",
        "- Add cost guardrails and ownership expectations before scaling user volume."
      ]
    }
  ]
}