{
  "nbformat": 4,
  "nbformat_minor": 5,
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "name": "python",
      "version": "3.13.0"
    },
    "blog_metadata": {
      "topic": "The next wave of AI productization: from search augmentation to domain-specific copilots",
      "slug": "the-next-wave-of-ai-productization-from-search-augmentation-",
      "generated_by": "LinkedIn Post Generator + Azure OpenAI",
      "generated_at": "2026-05-11T12:43:56.279Z"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# The next wave of AI productization: from search augmentation to domain-specific copilots\n",
        "\n",
        "This notebook turns the blog post into a hands-on validation workflow. It focuses on the shift from generic chat toward embedded, domain-specific copilots that operate inside governed workflows with identity-aware access, citations, risk scoring, approvals, and observability.\n",
        "\n",
        "You will validate core architectural patterns in Python: structured copilot responses, evaluation harnesses, approval gates, workflow routing, and governance logging."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "%pip install -q pandas networkx matplotlib"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from dataclasses import dataclass, asdict\n",
        "from typing import List, Dict, Any\n",
        "from datetime import datetime\n",
        "import json\n",
        "import pandas as pd\n",
        "import networkx as nx\n",
        "import matplotlib.pyplot as plt"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Workflow-first mental model\n",
        "\n",
        "The blog argues that enterprise AI advantage is no longer about a smarter chat box. The durable moat comes from copilots embedded in workflows, constrained by domain rules, and trusted enough to support bounded action.\n",
        "\n",
        "The progression is:\n",
        "- Search augmentation helps people find answers.\n",
        "- Embedded copilots help people complete work.\n",
        "- Orchestrated agent patterns help systems coordinate complex work across domains.\n",
        "\n",
        "The diagram below recreates the search-to-action pattern in executable Python so you can inspect the control points."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import networkx as nx\n",
        "import matplotlib.pyplot as plt\n",
        "\n",
        "G = nx.DiGraph()\n",
        "edges = [\n",
        "    (\"User asks domain question\", \"Retriever pulls policy, CRM, ticket, or SOP context\"),\n",
        "    (\"Retriever pulls policy, CRM, ticket, or SOP context\", \"Domain copilot composes answer with citations\"),\n",
        "    (\"Domain copilot composes answer with citations\", \"Evaluation harness scores groundedness and citation coverage\"),\n",
        "    (\"Evaluation harness scores groundedness and citation coverage\", \"High-risk action requested?\"),\n",
        "    (\"High-risk action requested?\", \"Surface answer to user\"),\n",
        "    (\"High-risk action requested?\", \"Require human approval\"),\n",
        "    (\"Require human approval\", \"Approved action execution\"),\n",
        "]\n",
        "G.add_edges_from(edges)\n",
        "\n",
        "plt.figure(figsize=(14, 7))\n",
        "pos = nx.spring_layout(G, seed=42, k=1.2)\n",
        "nx.draw(\n",
        "    G,\n",
        "    pos,\n",
        "    with_labels=True,\n",
        "    node_size=5000,\n",
        "    node_color=\"#DCEEFF\",\n",
        "    font_size=9,\n",
        "    arrows=True,\n",
        "    arrowstyle=\"->\",\n",
        "    arrowsize=18,\n",
        ")\n",
        "plt.title(\"Search Augmentation to Governed Action Flow\")\n",
        "plt.axis(\"off\")\n",
        "plt.show()"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Minimal domain copilot response object\n",
        "\n",
        "A domain copilot should return more than free-form text. It should carry structured evidence and action metadata so downstream systems can evaluate groundedness, risk, and whether an approval is required.\n",
        "\n",
        "This example creates a response object with citations and a requested action."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from dataclasses import dataclass\n",
        "from typing import List\n",
        "\n",
        "@dataclass\n",
        "class Citation:\n",
        "    source_id: str\n",
        "    snippet: str\n",
        "\n",
        "@dataclass\n",
        "class CopilotResponse:\n",
        "    answer: str\n",
        "    citations: List[Citation]\n",
        "    requested_action: str\n",
        "    risk_score: float\n",
        "\n",
        "response = CopilotResponse(\n",
        "    answer=\"Reset the vendor account after confirming invoice mismatch policy.\",\n",
        "    citations=[Citation(\"SOP-42\", \"Finance ops requires manager confirmation for vendor resets.\")],\n",
        "    requested_action=\"reset_vendor_account\",\n",
        "    risk_score=0.82,\n",
        ")\n",
        "\n",
        "print(response)\n",
        "print(\"\\nAs dict:\")\n",
        "print(asdict(response))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Evaluation harness for groundedness, citation coverage, and approval thresholds\n",
        "\n",
        "The blog emphasizes that governance must be designed into the product. That means measuring whether an answer is grounded in trusted sources, whether citations are present, and whether a high-risk action should be routed to approval.\n",
        "\n",
        "This simplified evaluation harness scores those signals."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "from typing import Dict, List\n",
        "\n",
        "def evaluate_response(answer: str, citations: List[Dict[str, str]], risk_score: float) -> Dict[str, float]:\n",
        "    grounded_terms = sum(1 for c in citations if c[\"snippet\"].lower() in answer.lower())\n",
        "    groundedness = grounded_terms / max(len(citations), 1)\n",
        "    citation_coverage = min(len(citations) / max(len(answer.split(\".\")), 1), 1.0)\n",
        "    approval_required = 1.0 if risk_score >= 0.7 else 0.0\n",
        "    return {\n",
        "        \"groundedness\": round(groundedness, 2),\n",
        "        \"citation_coverage\": round(citation_coverage, 2),\n",
        "        \"approval_required\": approval_required,\n",
        "    }\n",
        "\n",
        "result = evaluate_response(\n",
        "    answer=\"Finance ops requires manager confirmation for vendor resets.\",\n",
        "    citations=[{\"source_id\": \"SOP-42\", \"snippet\": \"Finance ops requires manager confirmation for vendor resets\"}],\n",
        "    risk_score=0.82,\n",
        ")\n",
        "print(result)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Surface-or-hold gate\n",
        "\n",
        "A workflow-first copilot should not automatically surface every answer. The blog's core point is that evidence, risk, and action boundaries matter more than fluent language.\n",
        "\n",
        "This gate only surfaces low-risk, well-grounded responses automatically."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def should_surface(metrics: dict) -> bool:\n",
        "    return (\n",
        "        metrics[\"groundedness\"] >= 0.8\n",
        "        and metrics[\"citation_coverage\"] >= 0.5\n",
        "        and metrics[\"approval_required\"] == 0.0\n",
        "    )\n",
        "\n",
        "metrics = {\"groundedness\": 1.0, \"citation_coverage\": 1.0, \"approval_required\": 0.0}\n",
        "if should_surface(metrics):\n",
        "    print(\"Surface answer to user\")\n",
        "else:\n",
        "    print(\"Hold for human review\")\n",
        "\n",
        "metrics_high_risk = {\"groundedness\": 1.0, \"citation_coverage\": 1.0, \"approval_required\": 1.0}\n",
        "if should_surface(metrics_high_risk):\n",
        "    print(\"Surface high-risk answer\")\n",
        "else:\n",
        "    print(\"Hold high-risk answer for human review\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Policy gate for bounded action routing\n",
        "\n",
        "Embedded copilots create value when they can recommend or initiate bounded actions safely. The blog's example distinguishes between advisory responses, preparatory steps, and executable actions.\n",
        "\n",
        "This routing function blocks actions without citations, sends high-risk actions to approval, and only executes low-risk actions directly."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def route_action(requested_action: str, risk_score: float, has_citation: bool) -> str:\n",
        "    if not has_citation:\n",
        "        return \"block_and_request_more_context\"\n",
        "    if risk_score >= 0.7:\n",
        "        return \"send_to_approval_flow\"\n",
        "    return f\"execute_{requested_action}\"\n",
        "\n",
        "decision = route_action(\n",
        "    requested_action=\"update_case_status\",\n",
        "    risk_score=0.82,\n",
        "    has_citation=True,\n",
        ")\n",
        "print(\"Decision:\", decision)\n",
        "\n",
        "scenarios = [\n",
        "    {\"requested_action\": \"update_case_status\", \"risk_score\": 0.82, \"has_citation\": True},\n",
        "    {\"requested_action\": \"draft_email\", \"risk_score\": 0.15, \"has_citation\": True},\n",
        "    {\"requested_action\": \"approve_claim_payout\", \"risk_score\": 0.91, \"has_citation\": False},\n",
        "]\n",
        "\n",
        "for s in scenarios:\n",
        "    print(s, \"->\", route_action(**s))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Sequence of a domain-specific copilot interaction\n",
        "\n",
        "The blog separates retrieval, evaluation, approval, and execution. This is important because a copilot should not collapse all of those responsibilities into one opaque step.\n",
        "\n",
        "The code below simulates the sequence of events in a governed interaction."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def simulate_copilot_sequence(question: str, risk_score: float):\n",
        "    events = []\n",
        "    events.append((\"User\", f\"Ask domain-specific question: {question}\"))\n",
        "    events.append((\"Domain Copilot\", \"Fetch trusted context from retriever\"))\n",
        "    events.append((\"Retriever\", \"Return policies, records, SOPs\"))\n",
        "    events.append((\"Domain Copilot\", \"Submit draft answer + citations + action to evaluation harness\"))\n",
        "    approval_needed = risk_score >= 0.7\n",
        "    events.append((\"Eval Harness\", f\"Return scores + approval decision = {approval_needed}\"))\n",
        "    if approval_needed:\n",
        "        events.append((\"Approver\", \"Approve or reject action\"))\n",
        "    events.append((\"Domain Copilot\", \"Return final answer with citations\"))\n",
        "    return events\n",
        "\n",
        "sequence = simulate_copilot_sequence(\"Can I reset this vendor account?\", 0.82)\n",
        "for actor, event in sequence:\n",
        "    print(f\"[{actor}] {event}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Structured logging for quality and governance signals\n",
        "\n",
        "The blog stresses observability: if you cannot measure groundedness, action requests, approval rates, and what users actually saw, you cannot improve quality or defend the system to risk teams.\n",
        "\n",
        "This example emits a structured event that could be sent to a log pipeline or analytics store."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "import json\n",
        "from datetime import datetime\n",
        "\n",
        "event = {\n",
        "    \"timestamp\": datetime.utcnow().isoformat() + \"Z\",\n",
        "    \"copilot\": \"claims-assistant\",\n",
        "    \"groundedness\": 0.91,\n",
        "    \"citation_coverage\": 0.67,\n",
        "    \"approval_required\": True,\n",
        "    \"requested_action\": \"approve_claim_payout\",\n",
        "    \"user_visible\": False,\n",
        "}\n",
        "print(json.dumps(event, indent=2))"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Portfolio view: search augmentation vs embedded copilot vs orchestrated agents\n",
        "\n",
        "A practical takeaway from the post is to separate three patterns clearly:\n",
        "1. Search augmentation\n",
        "2. Embedded copilots\n",
        "3. Multi-agent orchestration\n",
        "\n",
        "The table below compares them across workflow integration, actionability, governance complexity, and likely ROI profile."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "patterns = pd.DataFrame([\n",
        "    {\n",
        "        \"pattern\": \"search_augmentation\",\n",
        "        \"primary_value\": \"find and synthesize information\",\n",
        "        \"workflow_integration\": \"low\",\n",
        "        \"bounded_actions\": \"none or minimal\",\n",
        "        \"governance_complexity\": \"low\",\n",
        "        \"process_economic_impact\": \"moderate\",\n",
        "    },\n",
        "    {\n",
        "        \"pattern\": \"embedded_copilot\",\n",
        "        \"primary_value\": \"complete work inside application context\",\n",
        "        \"workflow_integration\": \"high\",\n",
        "        \"bounded_actions\": \"yes\",\n",
        "        \"governance_complexity\": \"medium\",\n",
        "        \"process_economic_impact\": \"high\",\n",
        "    },\n",
        "    {\n",
        "        \"pattern\": \"orchestrated_agents\",\n",
        "        \"primary_value\": \"coordinate complex work across domains\",\n",
        "        \"workflow_integration\": \"very high\",\n",
        "        \"bounded_actions\": \"yes, across systems\",\n",
        "        \"governance_complexity\": \"high\",\n",
        "        \"process_economic_impact\": \"selectively very high\",\n",
        "    },\n",
        "])\n",
        "patterns"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Simulate measurable outcomes for a copilot pilot\n",
        "\n",
        "The blog recommends prioritizing measurable use cases and instrumenting outcomes from day one. This example creates a small synthetic dataset to show how approval rates, groundedness, and task completion can be tracked.\n",
        "\n",
        "You can adapt this pattern to real pilot telemetry."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "pilot_runs = pd.DataFrame([\n",
        "    {\"case_id\": 1, \"pattern\": \"embedded_copilot\", \"groundedness\": 0.95, \"citation_coverage\": 0.80, \"risk_score\": 0.20, \"approval_required\": 0, \"completed\": 1, \"cycle_time_min\": 6},\n",
        "    {\"case_id\": 2, \"pattern\": \"embedded_copilot\", \"groundedness\": 0.88, \"citation_coverage\": 0.60, \"risk_score\": 0.75, \"approval_required\": 1, \"completed\": 1, \"cycle_time_min\": 14},\n",
        "    {\"case_id\": 3, \"pattern\": \"search_augmentation\", \"groundedness\": 0.90, \"citation_coverage\": 0.50, \"risk_score\": 0.10, \"approval_required\": 0, \"completed\": 0, \"cycle_time_min\": 18},\n",
        "    {\"case_id\": 4, \"pattern\": \"embedded_copilot\", \"groundedness\": 0.70, \"citation_coverage\": 0.40, \"risk_score\": 0.65, \"approval_required\": 0, \"completed\": 0, \"cycle_time_min\": 20},\n",
        "    {\"case_id\": 5, \"pattern\": \"orchestrated_agents\", \"groundedness\": 0.92, \"citation_coverage\": 0.75, \"risk_score\": 0.85, \"approval_required\": 1, \"completed\": 1, \"cycle_time_min\": 12},\n",
        "])\n",
        "\n",
        "summary = pilot_runs.groupby(\"pattern\").agg(\n",
        "    avg_groundedness=(\"groundedness\", \"mean\"),\n",
        "    avg_citation_coverage=(\"citation_coverage\", \"mean\"),\n",
        "    approval_rate=(\"approval_required\", \"mean\"),\n",
        "    completion_rate=(\"completed\", \"mean\"),\n",
        "    avg_cycle_time_min=(\"cycle_time_min\", \"mean\"),\n",
        ").round(2)\n",
        "\n",
        "print(\"Pilot run data:\")\n",
        "display(pilot_runs)\n",
        "print(\"\\nSummary by pattern:\")\n",
        "display(summary)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Governance scorecard for enterprise readiness\n",
        "\n",
        "The post recommends a reference architecture that includes identity inheritance, retrieval and grounding, action boundaries, observability, evaluation, approval controls, and provenance.\n",
        "\n",
        "This scorecard helps assess whether a use case is ready to move from experimentation to operationalization."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "readiness = pd.DataFrame([\n",
        "    {\"control\": \"identity_and_access_inheritance\", \"status\": \"partial\", \"score\": 2},\n",
        "    {\"control\": \"retrieval_and_grounding\", \"status\": \"implemented\", \"score\": 3},\n",
        "    {\"control\": \"action_boundaries\", \"status\": \"implemented\", \"score\": 3},\n",
        "    {\"control\": \"observability\", \"status\": \"partial\", \"score\": 2},\n",
        "    {\"control\": \"evaluation_harness\", \"status\": \"implemented\", \"score\": 3},\n",
        "    {\"control\": \"approval_controls\", \"status\": \"implemented\", \"score\": 3},\n",
        "    {\"control\": \"provenance\", \"status\": \"partial\", \"score\": 2},\n",
        "])\n",
        "\n",
        "max_score = len(readiness) * 3\n",
        "current_score = readiness[\"score\"].sum()\n",
        "print(f\"Readiness score: {current_score}/{max_score} ({current_score / max_score:.0%})\")\n",
        "display(readiness)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Decision helper: which pattern fits a workflow?\n",
        "\n",
        "The blog suggests starting with retrieval and summarization, moving to role-specific copilots with bounded actions, and adopting orchestrated agents only when process complexity justifies the extra control plane.\n",
        "\n",
        "This helper encodes that progression into a simple recommendation function."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {},
      "source": [
        "def recommend_pattern(needs_bounded_actions: bool, cross_system_complexity: int, governance_maturity: int) -> str:\n",
        "    if not needs_bounded_actions:\n",
        "        return \"search_augmentation\"\n",
        "    if cross_system_complexity >= 8 and governance_maturity >= 7:\n",
        "        return \"orchestrated_agents\"\n",
        "    return \"embedded_copilot\"\n",
        "\n",
        "examples = [\n",
        "    {\"workflow\": \"policy_lookup\", \"needs_bounded_actions\": False, \"cross_system_complexity\": 2, \"governance_maturity\": 5},\n",
        "    {\"workflow\": \"finance_exception_handling\", \"needs_bounded_actions\": True, \"cross_system_complexity\": 5, \"governance_maturity\": 6},\n",
        "    {\"workflow\": \"cross_function_incident_response\", \"needs_bounded_actions\": True, \"cross_system_complexity\": 9, \"governance_maturity\": 8},\n",
        "]\n",
        "\n",
        "for ex in examples:\n",
        "    pattern = recommend_pattern(ex[\"needs_bounded_actions\"], ex[\"cross_system_complexity\"], ex[\"governance_maturity\"])\n",
        "    print(f\"{ex['workflow']}: {pattern}\")"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Summary\n",
        "\n",
        "This notebook validated the blog's central claim: enterprise AI value is shifting from generic chat and search augmentation toward embedded, domain-specific copilots that can operate safely inside governed workflows. The key architectural ingredients are structured responses, identity-aware grounding, risk scoring, approval routing, provenance, and observability.\n",
        "\n",
        "## Next Steps\n",
        "\n",
        "- Map one high-friction workflow in your organization from question to bounded action.\n",
        "- Define the minimum governance controls required before any action can execute.\n",
        "- Instrument groundedness, citation coverage, approval rate, completion rate, and cycle time.\n",
        "- Decide whether your current use case is best served by search augmentation, an embedded copilot, or orchestrated agents.\n",
        "- Extend these notebook patterns into real connectors, policy engines, and approval workflows in your production environment."
      ]
    }
  ]
}